Resilver Performance - 52 vDEV 312 4TB Drives - I am looking for guidance on time frame and speed. I have read the other threads to no avail.

Syptec · Oct 26, 2022

We are looking to (if possible) improve resilver times.

NAS SPEC
Dual Intel Intel 2667V3
512G RAM
Booting off 2x 146GB SAS
1x SAS 2308 (LSI)
1x NFS Mount Point
1 LAGG0 Lagg0 dual 40G
***********************************************
USEAGE SPEC
30TB or 566TB
SNAPSHOT 2h held for 30d

No ZIL
No SLOG
No Sync
64k

vDEV is 6 drives RZ2

Tune we run --
zfs set sync=disabled STORAGE1-Z6
zfs set checksum=fletcher4 STORAGE1-Z6
zfs set primarycache=all STORAGE1-Z6
zfs set logbias=latency STORAGE1-Z6
zfs set recordsize=64k STORAGE1-Z6
zfs set atime=off STORAGE1-Z6
zfs set dedup=off STORAGE1-Z6

sysctl vfs.zfs.vdev.async_read_max_active=64
sysctl vfs.zfs.vdev.async_read_min_active=32
sysctl vfs.zfs.vdev.async_write_max_active=64
sysctl vfs.zfs.vdev.async_write_min_active=32

sysctl vfs.zfs.vdev.sync_read_max_active=64
sysctl vfs.zfs.vdev.sync_read_min_active=32
sysctl vfs.zfs.vdev.sync_write_max_active=64
sysctl vfs.zfs.vdev.sync_write_min_active=32

sysctl vfs.zfs.top_maxinflight=1024
sysctl vfs.zfs.resilver_min_time_ms=3000

sysctl vfs.zfs.vdev.scrub_max_active=64
sysctl vfs.zfs.vdev.scrub_min_active=24

************************************************
HBA SPEC
4x LSI 2308
************************************************
16 Oracle DE24 Enclosures

Controller type : SAS2308_2
BIOS version : 7.39.02.00
Firmware version : 20.00.07.00
Channel description : 1 Serial Attached SCSI
Initiator ID : 0
Maximum physical devices : 1023
Concurrent commands supported : 10240
Slot : 5
Segment : 0
Bus : 3
Device : 0
Function : 0
RAID Support : No
************************************************
DRIVE SPEC
322 4TB HGST SAS 12G HDD 7.2K
************************************************
NIC SPEC
Chelsio Dual 40G
***********************************************
SWITCH SPEC
Quanta 10/40G LY2
**********************************************
NAS SPEC

TRUENAS CORE 12.0-U8-1
512GB RAM

Use case is for NFS attached storage to a 32 Node Cluster running VMWare. VMWare Nodes are all connected 10G to Storage.
**********************************************

Issue is that when a drive fails we locate, offline, remove, replace, smart, online and then resilver. The resilver literally takes days to run. Truenas has 30TB out of 566TB in use. We are concerned (maybe we should not be) that is will take 2-5 days at 30TB. What would the expected be for 80% capacity?

pool: STORAGE1-Z6
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Oct 24 13:50:39 2022
245T scanned at 18.7G/s, 233T issued at 17.8G/s, 247T total
95.1G resilvered, 94.24% done, 00:13:37 to go
*********************************************

Seeking guidance and calculation to set proper expectation. Reboot takes 1.5hrs.

jgreco · Oct 26, 2022

This is for VMware block storage?

ZFS scrub and resilver logic are both similar operations. The system traverses the metadata tree, reading (or fixing for resilvers) all blocks. Even though all the data needed to recover is held on the other disks in the vdev, ZFS is not designed to take advantage of that possible optimization. This means it is walking the entire pool. On an upside, it can handle resilvering multiple disks that are in separate vdevs simultaneously, but that's not any major help to you.

Because this is traversal of all the allocated blocks, as fragmentation increases, the time to traverse will increase (more seeks). Managing ZFS block size can play into optimizing this; larger block sizes may result in reduced seeks.

Syptec · Oct 26, 2022

NFS Mounted storage. VMWare ESXi 6.7 Fully Patched.

What benefits if any do I gain with TRIM features for ZFS?

jgreco · Oct 27, 2022

Syptec said:
NFS Mounted storage.

This is a nonanswer to the question being asked.

"NFS mounted storage" for VMware is typically used for block storage (i.e. to hold virtual machines and their VMDK files). However, it can also be used for access to log files, ISO catalogs, backup images, and other files.

ZFS has two fundamental storage modes, mirrors and RAIDZ. For mirrors, more raw disk space is consumed, but you get better characteristics for random I/O applications such as database storage or VM block storage. This includes applications where you may overwrite blocks of data within the stored files. It is very good at parallel IOPS such as you might get from a cluster of hypervisors accessing it. RAIDZ, on the other hand, consumes less raw disk space, and is really optimized towards access by a single consumer (or at best a small number of simultaneous consumers). It is not particularly good at overwrites, nor is it particularly good at parallel IOPS, although your particular case of having 52 vdevs would mitigate this.

Now, the thing is, NFS can provide access to both mirror and RAIDZ backed storage, but the performance is controlled by the underlying design of the vdevs that make up the pool.

And here's where things sound a little confused.

You said,

Syptec said:
What would the expected be for 80% capacity?

80% capacity for block storage purposes would eventually become catastrophically slow, especially for metadata traversal operations such as resilvers, unless some other factor mitigated this. This is discussed in greater depth in the following article

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

and also in numerous other forum posts throughout the years. You ideally want to keep occupancy rates much lower, perhaps 40-50%, see the Delphix steady state graph to understand why.

Syptec · Oct 27, 2022

jgreco said:
This is a nonanswer to the question being asked.

"NFS mounted storage" for VMware is typically used for block storage (i.e. to hold virtual machines and their VMDK files). However, it can also be used for access to log files, ISO catalogs, backup images, and other files.

ZFS has two fundamental storage modes, mirrors and RAIDZ. For mirrors, more raw disk space is consumed, but you get better characteristics for random I/O applications such as database storage or VM block storage. This includes applications where you may overwrite blocks of data within the stored files. It is very good at parallel IOPS such as you might get from a cluster of hypervisors accessing it. RAIDZ, on the other hand, consumes less raw disk space, and is really optimized towards access by a single consumer (or at best a small number of simultaneous consumers). It is not particularly good at overwrites, nor is it particularly good at parallel IOPS, although your particular case of having 52 vdevs would mitigate this.

Now, the thing is, NFS can provide access to both mirror and RAIDZ backed storage, but the performance is controlled by the underlying design of the vdevs that make up the pool.

And here's where things sound a little confused.

You said,

80% capacity for block storage purposes would eventually become catastrophically slow, especially for metadata traversal operations such as resilvers, unless some other factor mitigated this. This is discussed in greater depth in the following article

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

and also in numerous other forum posts throughout the years. You ideally want to keep occupancy rates much lower, perhaps 40-50%, see the Delphix steady state graph to understand why.

80% is the 100% in ZFS. I assume this is assumed.

jgreco · Oct 27, 2022

80% is about 150% for block storage. 50% is the 100% in ZFS block storage. This is explained in the linked articles. Might want to reset your assumptions accordingly.

Syptec · Oct 27, 2022

I think the boat veered a bit. I could be wrong, but not feeling I am.

The Pool is RAIDZ2
The vDEV are 6 drives.
The HDD is HGST 4TB 7200 12G

Mount is from VMWare via NFS.

The storage is 30TB out of 566TB.

The question with all the info that has been overlooked is "Can" the resilver time frame be improved?

jgreco · Oct 27, 2022

Syptec said:
The question with all the info that has been overlooked is "Can" the resilver time frame be improved?

Overlooked? If you don't understand the reasons how and why something works the way it does, you are unlikely to figure out how to improve the time frame.

Syptec · Oct 28, 2022

jgreco said:
Overlooked? If you don't understand the reasons how and why something works the way it does, you are unlikely to figure out how to improve the time frame.

It is understood that paying for support is easier....

jgreco · Oct 28, 2022

Syptec said:
It is understood that paying for support is easier....

Okay, but all the freeloaders here are generally hobbyists or cheapskates. And before someone takes offense at "freeloaders" and reports me for being rude/whatever, please remember that I too fall into this bucket. It is therefore generally useful and interesting to discuss the elements of the problem.

MrGuvernment · Nov 3, 2022

Any reason you have not updated to 13 u3 yet?

Take this all with a grain of salt due to my limited TrueNAS experience, but also working of many years of Ent. storage usage

The problem with the speed is, when you resilver, it is scanning the entire 566TB of space to rebuild, but this is throwing me off, you mention

vDEV is 6 drives RZ2

Is each vDev 6 drives = 24TB, and you have 53 or 54 vDevs to make up your single pool for the total 566TB?

@jgreco was noting, for this application, doing mirrored pools may of been better (I believe). Mirror also only has to read data written, not go block by block across the entire pool to make sure all data is covered. (thinking Raid 1/10 vs Raid 5, parity raid sucks for rebuilds and performance)

If the above is true, and you did 1 massive 566TB pool, any reason why you did it like this vs breaking it down into smaller pools as needed?

In the virtual world and backup world, restoring smaller chunks is easier than one massive big huge chunk, if doing storage level backups vs say Avamar VM based snapshots.

MrGuvernment · Nov 3, 2022

Also, I just came across this myself as i work to design out my TrueNAS system:

The path to success for block storage

It seems like I haven't written a sticky for awhile, but just in the last week I've had to cover this topic several times. ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle...

www.truenas.com

2) You need to use mirrors for performance.

ZFS generally does not do well with block storage on RAIDZ. RAIDZ is optimized towards variable length ZFS blocks. Unlike "normal" RAID, RAIDZ computes parity and stores it with the ZFS block, and on a RAIDZ3 where you store a single 4K sector, you get three parity sectors stored with it (4x space amplification)! While there are optimizations you can do to make it suck less, the fact is that a RAIDZ vdev tends to adopt the IOPS characteristics of the slowest component member. This is partly because of what Avi calls "seek binding", because multiple disks have to participate in a single operation because the data is spread across the disks. Your ten drive RAIDZ2 vdev may end up about as fast as a single drive, which is fine for archival storage, but not good for storing lots of active VM's on.

By way of comparison, a two-way mirror vdev can be servicing two different operations (clients reading) simultaneously, a three-way mirror vdev can even be servicing three different operations. There is massive parallelism available with mirrors.

& linked from there:

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

ZFS is a complicated, powerful system. Unfortunately, it isn't actually magic, and there's a lot of opportunity for disappointment if you don't understand what's going on. RAIDZ (including Z2, Z3) is good for storing large sequential files. ZFS will allocate long, contiguous stretches of disk...

www.truenas.com

Important Announcement for the TrueNAS Community.

Resilver Performance - 52 vDEV 312 4TB Drives - I am looking for guidance on time frame and speed. I have read the other threads to no avail.

Syptec

Dabbler

jgreco

Resident Grinch

Syptec

Dabbler

jgreco

Resident Grinch

The path to success for block storage

Syptec

Dabbler

The path to success for block storage

jgreco

Resident Grinch

Syptec

Dabbler

jgreco

Resident Grinch

Syptec

Dabbler

jgreco

Resident Grinch

MrGuvernment

Patron

MrGuvernment

Patron

The path to success for block storage

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

Similar threads