Write Distribution with Mirrors

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
Just when I think I hve ZFS mostly figured out, something makes me question my understanding. I have 2 pools - The first has 4 3TB drives as 2 mirrored vdevs. The second is 4 2TB drives in RaidZ1. As I understand from the countless threads I've read, when using mirrored vdevs files being read/wrote are distributed across vdevs. This is why you get better iops and random file access times especially with multiple users or vm workloads etc. And obviously when a file is wrote to a RaidZ1 vdev the write is stripped across all 4 drives with its parity.

But today I copied a large iso to a datastore on the mirrored pool, and I happened to notice that all 4 drives reported drive write activity. Why would this be since it was just a single file write, I thought mirrored pools didn't stripe files across vdevs like RAID10?

Pool Details:
PriData Pool.JPG

Reported Activity:
PriData Pool Activity.JPG
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Why would you think that? As the free space on each vdev shrinks, the other one becomes more attractive to write to. ZFS will tend to write to the device with more free space available....
 

zenon1823

Explorer
Joined
Nov 13, 2018
Messages
66
I swear I've read that mirrored pools are not like raid10 in that data is not striped across vdevs like it is in a traditional Raid10 architecture, and is one of the reasons why zfs allows the addition of mirrors to expand capacity. But i just spent the last couple hours re-reading my old threads, and a mixture of resources and guides that I thought indicated that and I have no idea why I thought that conclusively (altought there are some hints at it). So this might have just been a collective mis-mash of thoughts.

So accepting that i just came to a wrong conclusion and made a very dumb post. I am curious on the reason why RAIDz is referenced as being better suited for large contiguous files and streaming. I had thought it was because the files spanned multiple drives which was ideal for large files, and mirrors were ideal for small files, random access & high parallel demand because it can R/W a bunch of files in parallel.

While it seems the small files hold true for being better on mirrors, if mirrors are indeed spreading data across multiple vdevs then it seems logically large files should be just as good on mirrors as on raidz or better. But the resource below indicates that for a single/small workloads on big files, RaidZ is best suited:

"We have a group of video editors ... Streaming speeds will be very important as high-resolution video files can have gigantic bitrates. The more editors we have, the more performance we’ll need. If we only have a small handful of editors, we can probably get away with several RAIDZ2 vdevs, but as you add more editors, IOPS will become increasingly important to support all their simultaneous IO work ... and a set of mirrored vdevs will make more sense."
----> https://www.ixsystems.com/community/resources/picking-a-zfs-pool-layout-to-optimize-performance.101/

I absolutely get mirrors outperforming raidz with alot of editors or high iops, but in the event of storing and streaming very few large files, (backups, iso's etc) is raidz preferred then? Mirrors still seem to have the advantage.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
For large file workloads, RAIDZ is much more space efficient, and it turns out that the access patterns for many of those workloads do not involve multiple simultaneous accesses. That's just a practical observation and of course there are cases where that isn't true.

Where multiple simultaneous accesses of data is important, especially writing, RAIDZ starts to suffer because it isn't as good at it. It's still worth trying to see if you can make it work through use of L2ARC etc.

Large files, little or no concurrent access, lower cost to implement, RAIDZ is the winner.

Small files, lots of concurrency, mirrors will tend to be the better choice.
 
Top