SOLVED iSCSI drops connection

Status
Not open for further replies.

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The new OpenZFS throttle doesn't try to "benchmark your vdevs" to figure out how fast your pool is - you get a set of knobs with the dirty_data tunables, and some relatively sane defaults. With low write pressure, the amount of async I/Os (bulk dirty data flushes) queued to vdevs themselves is low; as you increase the amount of outstanding dirty data in the system, more I/O is queued up until you hit the point at which the throttle starts to apply; then it scales up rapidly towards a maximum delay value (currently 100ms) but this behavior is consistent regardless of pool, boot time, data ingested - there's no "learning" or "benchmarking" aspect, if you want it to behave differently (throttle sooner or later, more aggressively or gradually, or allow for slower or faster vdevs) you need to fiddle with those knobs.

Yet it still behaves as though it does. Emergent second-order effect, perhaps. I don't really care. I came to ZFS years ago for VM block storage and found it largely unusable. I spent a lot of time dredging through stuff to understand what was going on, at a time when this wasn't a widely-understood issue, and came to some practical fixes having wasted far too much time doing so.

I spent less time with the new write throttle, and yet it still proved that it isn't everything that's claimed. It is *better* than it was. It can still be caught off-guard though. I don't care to keep debugging. I have better things to be doing.

Part of the problem may be that I have an unrealistic (ha) expectation that systems should be waiting around for workloads to be dumped on them. I have hypervisors with 40Gbps aggregate network links that average less than 100Mbps of steady production traffic. But when stuff needs to happen, it can and does. I may be positioned somewhat better than average to suddenly be throwing big loads at things.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I'm not saying there aren't still issues; I'm just saying that you can't resolve them the same way you used to be able to. You no longer modify write_limit_shift to use as a divisor from RAM to get the txg max size - you're now directly adjusting vfs.zfs.dirty_data_sync to allow more data to collect in RAM, fiddling with the dirty_percent values to find where optimal bandwidth results but doesn't add more latency, and other things.

I have hypervisors with 40Gbps aggregate network links that average less than 100Mbps of steady production traffic. But when stuff needs to happen, it can and does. I may be positioned somewhat better than average to suddenly be throwing big loads at things.

Fully utilizing a 40Gbps link for writes is a seriously daunting task - if it's a bursty workload and you have at least a little foreknowledge of the size of the workload, you can try to absorb it all into the dirty_data group and then queue it out to be spooled to them later, but otherwise you're talking about a 5GB/s ingest rate - sustaining that necessitates a lot of horsepower on the back end; multiple NVMe drives, most likely. Your reads though are probably deliciously fast.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Fully utilizing a 40Gbps link for writes is a seriously daunting task - if it's a bursty workload and you have at least a little foreknowledge of the size of the workload, you can try to absorb it all into the dirty_data group and then queue it out to be spooled to them later, but otherwise you're talking about a 5GB/s ingest rate - sustaining that necessitates a lot of horsepower on the back end; multiple NVMe drives, most likely. Your reads though are probably deliciously fast.

The point is more that it isn't unusual for there to be the potential for a huge amount of sudden ${stuff} going on, and as it stands, ZFS can still get in a bad spot pretty easily. FreeNAS has always gotten off easily in this regard because so many people were being limited by 1Gbps ethernet, which acts as a natural defense against some of these issues.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The point is more that it isn't unusual for there to be the potential for a huge amount of sudden ${stuff} going on, and as it stands, ZFS can still get in a bad spot pretty easily. FreeNAS has always gotten off easily in this regard because so many people were being limited by 1Gbps ethernet, which acts as a natural defense against some of these issues.

Blowing out a write cache certainly isn't something that's exclusive to ZFS; it just happens to be one of the more common places to see it happen, because FreeNAS is so popular. ;)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Blowing out a write cache certainly isn't something that's exclusive to ZFS; it just happens to be one of the more common places to see it happen, because FreeNAS is so popular. ;)

Flip side: blowing out write caches don't typically freeze all I/O and cause your storage array to drop offline.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Flip side: blowing out write caches don't typically freeze all I/O and cause your storage array to drop offline.

Admission: Ye Olde Write Throttle could do that.

Counterpoint: The new one will eventually throttle your pool down to 10 IOPS if you're running right up against the dirty_data_max value, but I've yet to see that happen in real-world use.

Sidebar: If you know that you're going to have a huge burst of write activity, the array should be prepared for it - whether that be a sufficiently-large write cache (or not-a-write-cache SLOG) or being made of sufficiently quantity/quality devices depending on I/O profile.

Conclusion: I made a new thread in Off-Topic so we can quit hijacking this one. ;)
 
Status
Not open for further replies.
Top