We are looking to (if possible) improve resilver times.
NAS SPEC
Dual Intel Intel 2667V3
512G RAM
Booting off 2x 146GB SAS
1x SAS 2308 (LSI)
1x NFS Mount Point
1 LAGG0 Lagg0 dual 40G
***********************************************
USEAGE SPEC
30TB or 566TB
SNAPSHOT 2h held for 30d
No ZIL
No SLOG
No Sync
64k
vDEV is 6 drives RZ2
Tune we run --
zfs set sync=disabled STORAGE1-Z6
zfs set checksum=fletcher4 STORAGE1-Z6
zfs set primarycache=all STORAGE1-Z6
zfs set logbias=latency STORAGE1-Z6
zfs set recordsize=64k STORAGE1-Z6
zfs set atime=off STORAGE1-Z6
zfs set dedup=off STORAGE1-Z6
sysctl vfs.zfs.vdev.async_read_max_active=64
sysctl vfs.zfs.vdev.async_read_min_active=32
sysctl vfs.zfs.vdev.async_write_max_active=64
sysctl vfs.zfs.vdev.async_write_min_active=32
sysctl vfs.zfs.vdev.sync_read_max_active=64
sysctl vfs.zfs.vdev.sync_read_min_active=32
sysctl vfs.zfs.vdev.sync_write_max_active=64
sysctl vfs.zfs.vdev.sync_write_min_active=32
sysctl vfs.zfs.top_maxinflight=1024
sysctl vfs.zfs.resilver_min_time_ms=3000
sysctl vfs.zfs.vdev.scrub_max_active=64
sysctl vfs.zfs.vdev.scrub_min_active=24
************************************************
HBA SPEC
4x LSI 2308
************************************************
16 Oracle DE24 Enclosures
Controller type : SAS2308_2
BIOS version : 7.39.02.00
Firmware version : 20.00.07.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 0
Maximum physical devices : 1023
Concurrent commands supported : 10240
Slot : 5
Segment : 0
Bus : 3
Device : 0
Function : 0
RAID Support : No
************************************************
DRIVE SPEC
322 4TB HGST SAS 12G HDD 7.2K
************************************************
NIC SPEC
Chelsio Dual 40G
***********************************************
SWITCH SPEC
Quanta 10/40G LY2
**********************************************
NAS SPEC
TRUENAS CORE 12.0-U8-1
512GB RAM
Use case is for NFS attached storage to a 32 Node Cluster running VMWare. VMWare Nodes are all connected 10G to Storage.
**********************************************
Issue is that when a drive fails we locate, offline, remove, replace, smart, online and then resilver. The resilver literally takes days to run. Truenas has 30TB out of 566TB in use. We are concerned (maybe we should not be) that is will take 2-5 days at 30TB. What would the expected be for 80% capacity?
pool: STORAGE1-Z6
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Oct 24 13:50:39 2022
245T scanned at 18.7G/s, 233T issued at 17.8G/s, 247T total
95.1G resilvered, 94.24% done, 00:13:37 to go
*********************************************
Seeking guidance and calculation to set proper expectation. Reboot takes 1.5hrs.
NAS SPEC
Dual Intel Intel 2667V3
512G RAM
Booting off 2x 146GB SAS
1x SAS 2308 (LSI)
1x NFS Mount Point
1 LAGG0 Lagg0 dual 40G
***********************************************
USEAGE SPEC
30TB or 566TB
SNAPSHOT 2h held for 30d
No ZIL
No SLOG
No Sync
64k
vDEV is 6 drives RZ2
Tune we run --
zfs set sync=disabled STORAGE1-Z6
zfs set checksum=fletcher4 STORAGE1-Z6
zfs set primarycache=all STORAGE1-Z6
zfs set logbias=latency STORAGE1-Z6
zfs set recordsize=64k STORAGE1-Z6
zfs set atime=off STORAGE1-Z6
zfs set dedup=off STORAGE1-Z6
sysctl vfs.zfs.vdev.async_read_max_active=64
sysctl vfs.zfs.vdev.async_read_min_active=32
sysctl vfs.zfs.vdev.async_write_max_active=64
sysctl vfs.zfs.vdev.async_write_min_active=32
sysctl vfs.zfs.vdev.sync_read_max_active=64
sysctl vfs.zfs.vdev.sync_read_min_active=32
sysctl vfs.zfs.vdev.sync_write_max_active=64
sysctl vfs.zfs.vdev.sync_write_min_active=32
sysctl vfs.zfs.top_maxinflight=1024
sysctl vfs.zfs.resilver_min_time_ms=3000
sysctl vfs.zfs.vdev.scrub_max_active=64
sysctl vfs.zfs.vdev.scrub_min_active=24
************************************************
HBA SPEC
4x LSI 2308
************************************************
16 Oracle DE24 Enclosures
Controller type : SAS2308_2
BIOS version : 7.39.02.00
Firmware version : 20.00.07.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 0
Maximum physical devices : 1023
Concurrent commands supported : 10240
Slot : 5
Segment : 0
Bus : 3
Device : 0
Function : 0
RAID Support : No
************************************************
DRIVE SPEC
322 4TB HGST SAS 12G HDD 7.2K
************************************************
NIC SPEC
Chelsio Dual 40G
***********************************************
SWITCH SPEC
Quanta 10/40G LY2
**********************************************
NAS SPEC
TRUENAS CORE 12.0-U8-1
512GB RAM
Use case is for NFS attached storage to a 32 Node Cluster running VMWare. VMWare Nodes are all connected 10G to Storage.
**********************************************
Issue is that when a drive fails we locate, offline, remove, replace, smart, online and then resilver. The resilver literally takes days to run. Truenas has 30TB out of 566TB in use. We are concerned (maybe we should not be) that is will take 2-5 days at 30TB. What would the expected be for 80% capacity?
pool: STORAGE1-Z6
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Oct 24 13:50:39 2022
245T scanned at 18.7G/s, 233T issued at 17.8G/s, 247T total
95.1G resilvered, 94.24% done, 00:13:37 to go
*********************************************
Seeking guidance and calculation to set proper expectation. Reboot takes 1.5hrs.