Resource icon

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage

ZFS is a complicated, powerful system. Unfortunately, it isn't actually magic, and there's a lot of opportunity for disappointment if you don't understand what's going on.

RAIDZ (including Z2, Z3) is good for storing large sequential files. ZFS will allocate long, contiguous stretches of disk for large blocks of data, compressing them, and storing the parity in an efficient manner. RAIDZ makes very good use of the raw disk space available when you are using it in this fashion. However, RAIDZ is not good at storing small blocks of data. To illustrate what I mean, consider the case where you wish to store an 8K block of data on RAIDZ3. In order to store that, you store the 8K data block, then three additional parity blocks... not efficient. Further, from an IOPS perspective, a RAIDZ vdev tends to exhibit the IOPS behaviour of a single component disk (and the slowest one, at that).

So we see a lot of people coming into the forums trying to store VM data on their 12 disk wide RAIDZ2 and wonder why their 12 disk 30 TB array sucks for performance. It's exhibiting the speed of a single disk.

The solution to this is mirrors. Mirrors aren't as good at making good use of the raw disk space (because you only end up with 1/2 or 1/3 the space), but in return for the greater resource commitment, you get much better performance. First, mirrors do not consume a variable amount of space for parity. Second, you're likely to have more vdevs. That 12 drive system we were just talking about will have 4 three-way mirrors or 6 two-way mirrors, which is 4x or 6x the number of vdevs. This translates directly to greatly enhanced performance!

Another substantial performance enhancement with ZFS is to maintain low pool occupancy rates.

For RAIDZ style file storage, it's commonly thought that performance will suffer once you pass the 80% mark, but this isn't quite right. It's a combination of fragmentation and occupancy that causes performance to suffer.

For mirrors, this is also true, but because the data being stored is often VM disk files or database files, it becomes more complicated. Because it is a copy-on-write filesystem, rewriting a block in a VM disk file causes a new block somewhere else to be allocated, and creates a hole where the old block was, when that block is freed (after any snapshots are released, etc). When writing new data, ZFS likes to allocate contiguous regions of disk to write its transaction groups. An interesting side effect of this is that if you are rewriting VM disk blocks 1, 5000, 22222, and 876543, these may actually be written as sequentially allocated blocks when ZFS dumps that transaction group to disk. A normal disk array would have to do four seeks to do those writes, but ZFS *may* be able to write them sequentially. Taken to its logical conclusion, when ZFS has massive amounts of free space available to work with, it can potentially be five or ten times faster at performing writes than a conventional disk array. The downside? ZFS will suffer if it lacks that free space.

If you want really fast VM writes, keep your occupancy rates low. As low as 10-25% if possible. Going past 50% may eventually lead to very poor performance as fragmentation grows with age and rewrites.

None of this helps with reads, of course, which over time become highly fragmented. ZFS typically mitigates this with gobs of ARC and L2ARC, which allow it to serve up the most frequently accessed data from the cache.
Author
jgreco
Views
8,562
First release
Last update
Rating
0.00 star(s) 0 ratings

More resources from jgreco

Top