Help troubleshooting iSCSI bursty write performance.

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
Well in this case I did this to myself when I upgraded from the perfectly fine 1.5 5.9K drives to these 2TB 7.2K drives. Seemed like a good idea at the time, how could they possibly be worse than the 10 year old drives that I was currently using? What I can't figure out is why they get bursty, why don't they just slow down?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Unfortunately it's pervaded all the way down to even the smallest drives, that one is a 2 head/1 platter solution. Helps them cut costs, and "most people won't notice the difference!"

Sorry I didn't pick up on that earlier, would've saved you a lot of troubleshooting. Give those two a happy life in a non-COW filesystem and find a couple fresh drives to replace them.
 

GeorgePatches

Dabbler
Joined
Dec 10, 2019
Messages
39
Unfortunately it's pervaded all the way down to even the smallest drives, that one is a 2 head/1 platter solution. Helps them cut costs, and "most people won't notice the difference!"

Sorry I didn't pick up on that earlier, would've saved you a lot of troubleshooting. Give those two a happy life in a non-COW filesystem and find a couple fresh drives to replace them.
So question then. If I get enough vdevs could I make these work? Or is SMR plus COW just bad?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
So question then. If I get enough vdevs could I make these work? Or is SMR plus COW just bad?
It's a matter of there being enough "idle time" in your pool to not overwhelm the "cache bands" of your drive. I'm not certain how large the cache zone on that specific drive is (and I'm positive Seagate will never tell me) so it's hard to say exactly how many would be needed.

Drive-managed SMR plus COW turns bad quickly because of a COW filesystem's nature to target free space - it marks the earlier blocks as dirty in its own filesystem, but there's no equivalent to an SSD's TRIM operation to go back and free up the space. So when you eventually use up all of the LBAs on the drive, the drive suddenly gets pressed into action when you rewrite a few blocks in a 256MB SMR zone - it now has to read the entire zone to cache, mix the new data in, and shingle it back to an SMR zones on the platter. In DM-SMR none of this behavior or the process is exposed to the controller/host OS/filesystem, so it just sees a drive that's suddenly really slow at responding to a request to write to a given LBA. Things that cause heavy fragmentation (eg: block storage on COW) make this condition worse. Large SMB copies can be OK, if there's enough space being written to and deleted at once then the operations are done in bigger chunks and have less seek/overhead.

Host-Aware and Host-Managed SMR might be better down the road but there's nothing in ZFS to handle them properly yet. For now, avoiding SMR entirely is the safe way.
 
Top