Different IOPS for drives in one zvol (raidz)

appoli · Aug 10, 2017

Hi All,

I have a question regarding the IOPS I see for the 3 drives that compose the primary zvol where my data is stored on my FreeNAS rig. Specifically two of the drives show the same “OPS” values while the 3rd always shows higher #s (by a significant %age). All three drives are essentially the same (Western Digital RE 3TB disks), however the drive showing higher OPS is a slightly different SKU, not 100% sure as to what the difference is, but more is explained below.

I was wondering if this is normal or if this is because there is a performance difference between the drives, and if so, would the drive with higher OPS be of higher or lower performance (personal curiosity).
Also this is based on the widget that is shown on the FreeNAS Corral dashboard.

A little background - I was looking to upgrade my home NAS as the one I had been using was getting long in the tooth and didn’t have any redundancy so I wanted to get a new NAS before that one failed (have no fear - the NAS was and is used as a backup of data, not the primary source). Not wanting to spend a lot of $, but at the same time wanting the ability to run some basic applications off of the NAS like I had been doing I spent some time searching for deals and after a long while I found a WD DX4000 sentinel for a very good price, however this one only came with two 3TB drives and if I was going to go for some redundancy I wanted to at least have RAID5 (I know now that having only a single parity disk with platter drives over 2TB isn’t probably the best idea due to resilvering Times, but I didn’t want to break the bank and the sentinel runs windows so I liked the idea of faster and allegedly more resilient RE drives).

When the sentinel was first released there were only a few RE drive SKUs that it would work with, but WD has released a patch allowing for all drives to be compatible. Since I couldn’t find the ‘allowed’ RE drives, at least for less than what I paid for the whole unit, I bought an RE drive that wasn’t one of the few allowed SKUs thinking the patch would allow it to be used and it was still an RE drive with the same specs as far as I could tell.
Fast forward some time & the drives worked but the install of the WD ISO for some reason didn’t install the necessary AFP plugins so time machine wouldn’t work unless I used the iSCSI method....which I tried and worked fine for me, but initiators cost $ and other users didn’t want to deal with the hassle of using an initiator. Also the sentinel was very slow even with the ram doubled and using even one extra application taxed the system a lot....and the fan didn’t cycle very well so it was loud. So that led me to FreeNAS and all the wonderful possibilities....unfortunately I learned about FreeNAS/switched right as Corral came out & have become enamored with docker and it works OK for now.

If I had started with FreeNAS from the get I probably would have gotten Red drives or some HGST drives, but seeing as I had like $600 in RE drives that are ‘supposed’ to be more resilient & aren’t from the same batch I figured I’d re-use them as my main pool.

The drive that has higher OPS as per the widget is the ‘unapproved’ drive (WD3000FYYG I believe is the full sku, the FYYG past is the important bit). The other two drives have basically the same OPS and I believe the sku is WD3000F9MZ...not 100% sure but I believe that is it.

So long story short (I work in metadata management in an architectural role so tend to be very verbose in an effort to be as precise as possible/provide proper context - my apologies) is this normal? And if so I would appreciate some background as to why out of curiosity.
If it’s bexause of different performance characteristics, what factors would you guys think are the cause of this & which drives perform better?
One of the bits that made me really curious is that it seems as though write/read operations start/end at what seems like the same time for all the disks (granted I don’t know what the refresh rate of the widget is and I’m sure there is more to it than that as well). I figured that since a raidz vdev would have the performance of one disk in general, namely the slowest one, I figured that if the disk performance was unequal then the OPS changes wouldn’t be done in unison/when the faster disks were done the OPS would drop while the slower one would work to catch up. Or maybe in scenarios where the full bandwidth of the disk isn’t being used there are more OPS because that disk is slower? (This feels wrong since it seems like the peak OPS of that one disk is higher by the same consistent margin when there is a high load/traffic ).

Anyway, thank you very much for any possible answers!

And some hardware/software info if it’s needed:

FreeNAS Corral 10.4 (latest build I believe)
Pool is running with LZ4 compression & no dedupe
Supermicro X11 SLL-F
Intel i3 6100 Skylake
32gb ECC Micron/Crucial Ram at 2133 MT/s over 2 UDIMMs
1 Supermicro 16 gb SATADOM for the OS
3 x 3TB WD RE ‘Datacenter’ drives at 7200 rpm via SATA (1 vdev via RaidZ)
1 NIC used for the OS (i210) & I also have the IPMI NIC being used
Recently added an Oracle Sun F20 storage accelerator for use as an SLOG. All 4 drives are striped and used as a 96gb ZIL. Nothing else attached to that HBA

I use the NAS for timemachine backups and a couple windows backups.
In addition I have Emby, Transmission, Nextcloud and MySQL for Nextcloud running via docker

PS -
I was also wondering if it might be worth while to use some of the FMODS from the F20 as an L2ARC since there isn’t a whole lot of traffic to the Machine except when downloading/watching media and 96gb for the SLOG is way more than I need....and I thought having maybe 24gb, or 48gb which would be two FMODs striped, as an L2ARC might be helpful for queuing up large video files when they are being viewed?

Based on the widget it seems like my metadata hit ratio is usually quite high (once again, this is from the widget... no arcstat in FN10) and the metadata and data misses go up when a movie is being moved around...also thought it might help with transcoding speed for multiple viewing?

I haven’t used LAGG yet since 1 IP can’t use the aggregated bandwidth and it seems like when there are two separate video files being watched it’s the hardware, specifically the drives, that are kind of the bottle neck.

Cheers!

Chris Moore · Aug 10, 2017

I am not specifically familiar with these drives, so I had to do a google search for more info, and it looks as if the one that is posting higher numbers only has 32mb of cache, where the other two both have 64mb of cache. This difference would, theoretically mean that there would be twice as many transfers between the system and the drive to write the same amount of data.
I am also not familiar with the widget in Coral because I did not use it very long before they announced they were abandoning it. I had installed it on a test rig that I subsequently re-purposed.
The OPS number that the widget is reporting probably relates to the number of operations that is required to transfer the data.
If you are using RAID-Z1, the amount of data written to each disk would be the same but the number of operations to write that data would be higher on this disk as it has half as much cache memory. The system transfers data until that cache is full and then must wait for the drive to say it is ready for more. The slow point in the process is the spinning disk.
This is all theoretical and I could be completely wrong. No guarantee, no refund.
I hope this clears it up for you, a little?

appoli · Aug 10, 2017

Hi Chris,

Thanks for the prompt response!

Would you be able to link me to where you saw that the drive has 32mb cache? Looking up wd3000fyyz the spec sheet I find says it has 64mb...

However, if it does indeed has less cache then that makes total sense. Writing the same amount of data would require unloading the cache twice as many times which could result in more (double?) the amount of write operations - if that is how the OPS widget determines its output.
That would be annoying if it did have less cache since I was trying to buy a drive that was as similar as possible on paper...

Regardless, would 1 disk having 32mb or cache while the other two have 64mb yield much of a performance difference with ZFS?
My understanding is that zfs tries to have as much control over how data is stored on a drive as possible. So while the cache would need to be cleared more often, would 32mb more of high speed storage space equate to much of a performance boost? Assuming the write/read speed/performance was the same.

I feel like I read somewhere that for pools where the ZIL is stored in the pool, and if it’s stored on the pool when an slog is present as well, that zfs can gain some efficiency because zfs controls the disk so that the ZIL data is always written in the same place...meaning that looking for it to write/read the ZIL didn’t mean that a physical search had to be performed.
I know that zfs doesn’t know where all the information is stored on a disk (thus the metadata misses from the arc), but if the data that needs to be written is stored in a high speed location (eg ram/slog on pcie Lane) that would be streamed to the disks and zfs is controlling as much of the disk operation as possible/disk speed being the bottle neck I get the feeling that the decrease in cache size wouldn’t be a huge detriment... probably however long it takes the logic board on the hard drive to fill the cache, read the cache, then dump the cache?

This kind of segways nicely for me since I’ve been wondering if a disk with a larger cash has much benefit.

Thanks again!

Chris Moore · Aug 10, 2017

appoli said:
Hi Chris,

Thanks for the prompt response!

Would you be able to link me to where you saw that the drive has 32mb cache? Looking up wd3000fyyz the spec sheet I find says it has 64mb...

However, if it does indeed has less cache then that makes total sense. Writing the same amount of data would require unloading the cache twice as many times which could result in more (double?) the amount of write operations - if that is how the OPS widget determines its output.
That would be annoying if it did have less cache since I was trying to buy a drive that was as similar as possible on paper...

Regardless, would 1 disk having 32mb or cache while the other two have 64mb yield much of a performance difference with ZFS?
My understanding is that zfs tries to have as much control over how data is stored on a drive as possible. So while the cache would need to be cleared more often, would 32mb more of high speed storage space equate to much of a performance boost? Assuming the write/read speed/performance was the same.

I feel like I read somewhere that for pools where the ZIL is stored in the pool, and if it’s stored on the pool when an slog is present as well, that zfs can gain some efficiency because zfs controls the disk so that the ZIL data is always written in the same place...meaning that looking for it to write/read the ZIL didn’t mean that a physical search had to be performed.
I know that zfs doesn’t know where all the information is stored on a disk (thus the metadata misses from the arc), but if the data that needs to be written is stored in a high speed location (eg ram/slog on pcie Lane) that would be streamed to the disks and zfs is controlling as much of the disk operation as possible/disk speed being the bottle neck I get the feeling that the decrease in cache size wouldn’t be a huge detriment... probably however long it takes the logic board on the hard drive to fill the cache, read the cache, then dump the cache?

This kind of segways nicely for me since I’ve been wondering if a disk with a larger cash has much benefit.

Thanks again!

wd3000fyyz is not the same model you gave in your original post.
the WD3000F9MZ model does appear to match the specs of the fyyz, but the f9mz drive is not an 'Re', is is a 'Se' drive and I am not finding a good resource on what the differences are.

Chris Moore · Aug 10, 2017

It is my understanding that the reason disk manufacturers make the cache larger is to improve performance, but I would guess that the fact that ZFS uses the system memory as a cache also; the amount you would notice this is minimal. Disks with large cache are generally most desirable in workstations where the user is directly accessing the disk and the speed of the operation directly influences their user experience.

appoli · Aug 10, 2017

Chris Moore said:
wd3000fyyz is not the same model you gave in your original post.

Ahh sorry, honestly I’m not in front of my machine so it’s all being done from memory!!

This is related to the Corral widget as well, but my SLOG device usually reads a really small number of OPS as well. Usually less than 1 (although over 2 or more disks - however it can look like it’s less than a half). I wonder how thats calculated....

Chris Moore · Aug 10, 2017

appoli said:
Ahh sorry, honestly I’m not in front of my machine so it’s all being done from memory!!

This is related to the Corral widget as well, but my SLOG device usually reads a really small number of OPS as well. Usually less than 1 (although over 2 or more disks - however it can look like it’s less than a half). I wonder how thats calculated....

You really shouldn't need a SLOG with the hardware you have. Did you read the build guide before you started?
I am not saying take it out, but it isn't doing you any good.

appoli · Aug 10, 2017

Chris Moore said:
You really shouldn't need a SLOG with the hardware you have. Did you read the build guide before you started?
I am not saying take it out, but it isn't doing you any good.

I did indeed read the guides and what have you before I put the system together and originally didn’t have one - it was an addition I added about a week ago because I found the card for $20 & it also adds 8 SAS lanes (although they are gen1 so the max SATA size drive I could add would be 2TB....I think I can go larger with an SAS drive tho?) which was of interest to me because I only have 2 SATA ports left and well its $20....

I know when using CIFS and what have you no synchronous writes are being done, but it does get used when my VM is moving around large bits of data (specifically logging videos and transcoding them).

I did see that there are posts saying an SLOG is not needed/detrimental with a “minimal” amount of ram/when ram isn’t maxed out, but I couldn’t find anything about how a fast SLOG would be detrimental (meaning an SLOG that isn’t just a regular SSD hooked up via SATA).

Pardon my noob-ness, but would you either be able to link me to something that would explain how it would be of detriment for my hardware or give a quick breakdown?

My thought process was that since there are instances where the lan NIC is being saturated & the machine itself (namely the VM) is processing/writing a decent amount of data - in those scenarios (which don’t happen all that often, probably only a few times a day) the SLOG could be helpful?

As an example it seemed as though when some larger Nextcloud writes/syncs were taking place they were happening a little quicker....but I’m definitely all ears.
When it comes to hardware I’m no pro!

Edit:
A little more about the next cloud bit - the data transfers with Nextcloud/MySQL have always been below the max network speed so my assumption was that the bottleneck was somewhere internal. Like I said, $20 for a ‘fun’ experiment didn’t sound like a big deal to me and it did feel like it may have helped....
but who knows, that may have been at the expense of something else

Chris Moore · Aug 10, 2017

appoli said:
I did indeed read the guides and what have you before I put the system together and originally didn’t have one - it was an addition I added about a week ago because I found the card for $20 & it also adds 8 SAS lanes (although they are gen1 so the max SATA size drive I could add would be 2TB....I think I can go larger with an SAS drive tho?) which was of interest to me because I only have 2 SATA ports left and well its $20....

I know when using CIFS and what have you no synchronous writes are being done, but it does get used when my VM is moving around large bits of data (specifically logging videos and transcoding them).

I did see that there are posts saying an SLOG is not needed/detrimental with a “minimal” amount of ram/when ram isn’t maxed out, but I couldn’t find anything about how a fast SLOG would be detrimental (meaning an SLOG that isn’t just a regular SSD hooked up via SATA).

Pardon my noob-ness, but would you either be able to link me to something that would explain how it would be of detriment for my hardware or give a quick breakdown?

My thought process was that since there are instances where the lan NIC is being saturated & the machine itself (namely the VM) is processing/writing a decent amount of data - in those scenarios (which don’t happen all that often, probably only a few times a day) the SLOG could be helpful?

As an example it seemed as though when some larger Nextcloud writes/syncs were taking place they were happening a little quicker....but I’m definitely all ears.
When it comes to hardware I’m no pro!

Edit:
A little more about the next cloud bit - the data transfers with Nextcloud/MySQL have always been below the max network speed so my assumption was that the bottleneck was somewhere internal. Like I said, $20 for a ‘fun’ experiment didn’t sound like a big deal to me and it did feel like it may have helped....
but who knows, that may have been at the expense of something else

I am not saying that it is causing any harm, and it may actually be giving you a little boost hear and there. It is just one of those 'thumb' rules and your installation being small, it probably won't get much utilization. I can't fault your logic though. You have it, why not use it.

appoli · Aug 10, 2017

Chris Moore said:
I am not saying that it is causing any harm, and it may actually be giving you a little boost hear and there. It is just one of those 'thumb' rules and your installation being small, it probably won't get much utilization. I can't fault your logic though. You have it, why not use it.

Ahh thank god! I had seen posts about an SLOG being detrimental with not enough ram or some other resources.

Not sure about your build, but if you have a pcie slot and no slog give the F20 a try. A bunch refurbished on eBay for 20-30 bucks (they use 4 memory units consisting of SLC NAND and ram along with a super capacitor & have the extra SAS lanes from the LSI HBA on the card. Seems to have been made specifically for sun systems running zfs in mind....was $3500 in 2010/2011. They’re on the F80 now I believe... I wanted the f20 b/c of the SAS lanes & SLC NAND).

And once again thank you so much for all your help Chris! You have gone above and beyond for a stranger

Ericloewe · Aug 10, 2017

I must point out that Corral is dead.

https://forums.freenas.org/index.php?threads/important-announcement-regarding-freenas-corral.53502/
https://forums.freenas.org/index.ph...ng-from-freenas-corral-to-freenas-9-10-11.36/

appoli · Aug 10, 2017

Ericloewe said:
I must point out that Corral is dead.

https://forums.freenas.org/index.php?threads/important-announcement-regarding-freenas-corral.53502/
https://forums.freenas.org/index.ph...ng-from-freenas-corral-to-freenas-9-10-11.36/

Hi Eric,

Yes, I’m very well aware - I stuck around because of my use of Docker (for the money spent on the rig being able to use VMs is a must and I got everything configured/setup a month or so after it was released & really don’t want to go to a bone Docker enabled system).

I’m not sure what commands/scripts are included in Corral (eg arcstat isn’t) so I didn’t dig too deep into figuring out how else to get stats on disk IOPS (had been considering collectd, but my time is at a premium) and if I’m going to spend time it would be to setup Docker on a FN11 implementation. I have been considering the time it would take me vs when FN11.1 would be released (and then having to set up the dockers again) and ended up deciding to trying to wait 11.1 out.

I know how to turn a computer on, can build one pretty easily, know how to trouble shoot, but it has been a long time since I’ve had to get ‘technical’ and go into a CLI....and back then it was only with windows. I also have a pfSense router with port forwarding and a reverse proxy running so re-doing everything would take time. I’m sure there are plenty of others (maybe the vast majority on the forum?) that would probably be able to breaze through all that, but for me it would require me to sit down and spend a night or maybe two..... it doesn’t sound like much, but I just have so much other stuff on my plate at the moment.... and when I unwind I really like using Emby to kick it and chill! (Hell if I’m at a friends and we are feeling lazy we’ll pull out a phone, instruct FreeNAS to get whatever we want to watch & will be enjoying it shortly after!)

If Corral calculates OPS much differently than any other version then apologies for asking a question based on an irrelevant methodology :(
I’ve just been curious about the OPS difference for so long...finally asked! (First post as well).

Also that drive gets a little warmer than the other two (3-5 C) toeing the line at 40 C now that it’s summer so I thought maybe the two could be related...or it could just be fan/drive placement. But I see so much about 40C being the equivalent of crossing the rubicon or something so....
I did setup the fan duty cycle script a couple days ago to prevent that, but had to unplug the machine to put the CPU fan on the peripheral header and now I can’t get the Docker host back up no matter what so I might end up switching anyway...

Thanks for the heads up, the great work and I’m really really excited about FN11. Just wish I had done a bit more homework before pulling the trigger on FN10 over 9 (I just really really like shiney things!)

Ericloewe · Aug 10, 2017

You don't actually quote IOPS figures, and the specific workload is somewhat vaguely defined, so it's hard to say much. However:

Even within batches, a 10% (5% each side) spread of drive performance is not crazy
Averaged over time, the IOPS figures have to be nearly identical for all disks in the pool, otherwise something's really wrong. This is where the graphs would be useful.

Chris Moore · Aug 10, 2017

At work, I have a bunch of 16 bay servers that are identical except that some have HGST drives, some have Seagate Constellation drives and some have Western Digital Re drives. Totally true that there is individual variation, drive to drive, but on average the Western Digital Re drives run hotter than the others. Just my observation.
I went to WD and looked up the spec sheet on these two drives and they are very similar but there are differences in the details. I also think that the Se version is a newer design than the Re model. Here is the links to the PDF documents.
http://products.wdc.com/library/SpecSheet/ENG/2879-800066.pdf
http://products.wdc.com/library/SpecSheet/ENG/2879-800042.pdf
I noticed that the Re has a lower sustained transfer rate that the Se, but you might find other reasons for the difference when you go through the specifications in detail.
Personally I prefer Seagate drives because of the level of detail they give in the SMART data.

Important Announcement for the TrueNAS Community.

Different IOPS for drives in one zvol (raidz)

appoli

Dabbler

Chris Moore

Hall of Famer

appoli

Dabbler

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

appoli

Dabbler

Chris Moore

Hall of Famer

appoli

Dabbler

Chris Moore

Hall of Famer

appoli

Dabbler

Ericloewe

Server Wrangler

appoli

Dabbler

Ericloewe

Server Wrangler

Chris Moore

Hall of Famer

Similar threads