Writing large files to a raidz pool pauses or causes error

phatfish

Dabbler
Joined
Sep 20, 2015
Messages
16
Dear all,

I have a setup running for a few years now. A mirror pool hosting VHD for a number of VMs, and a raidz pool for a file share that is mainly static read only files.

Freenas is on 11.2-U7, hardware is Intel Supermicro X10SDV-4C board, 32GB RAM, WD RED 3TB HDD all connected to the on board SATA ports. There is also an SLOG device for the mirror volume which is an Intel SSD connected to a HBA card.

The mirror pool (x2 3TB) performs fine, it maxes the 1GB network i have for read and write, the VMs all run normally.

The raidz pool (x4 3TB) is fine as well, until you write a larger file, eg. to a Samba share. After around 4-5GB the transfer pauses for 10 to 30 seconds, then starts again, and depending on the size of the file it can pause again, and start again until it finishes. The transfer speed is maxing my 1GB network at 90-110mb/sec

freenas-copy.jpg

The dips here are where the transfer speed goes to 0.

The issue also occurs for local file copy, if i copy a ~8GB file from the mirror pool to the raidz pool i can see it copy at 150-200mb/s for the first ~4-5GB then then it pauses completly for 10-30 seconds then writes again. Same behaviour as writing over SMB from a Windows box. I used rsync locally with --progress to see what was happening.

Copying over FTP just times out the transfer when the pause happens, and recently it has also started intermittently causing a 0x8007003b "network error" on Windows. I did upgrade from 11.2-U3 to 11.2-U7 in the last week, so maybe something changed there to make Samba/Windows more sensitive to this issue.

The dataset on the raidz pool that i write to the most does have de-duplication enabled, but if i copy to a dataset without dedupe i get the same behaviour. The strange thing is, copying the exact same file again a second time will work without any pauses in the copy or slow down at all.

Does anyone have advice on what might cause this behviour? It feels like a cache is being filled and then when it has to commit to disk it has so much data to flush that it stalls writes for a long time? Maybe some settings can be tuned, but none of the logs i checked showed any errors, so its difficult to know where to start.

Thanks in advance.
 

phatfish

Dabbler
Joined
Sep 20, 2015
Messages
16
After some digging i found this article that describes the issue I'm seeing http://dtrace.org/blogs/ahl/2014/08/31/openzfs-tuning/ The section at the end "Write delay" was the most helpful.

Making a couple of sysctl changes described in the article have helped:

vfs.zfs.delay_min_dirty_percent from 60 to 40
vfs.zfs.vdev.async_write_active_max_dirty_percent
from 60 to 40

The stall/pause when copying a ~4GB+ file now doesn't cause Windows to raise the 0x8007003b "network error". Having write operations over SMB error out was not good, so that's an improvement. However i still have big pauses when the "dirty data" buffer gets flushed. I'd love to get more consistent write speed, if only to make sure other protocols like FTP don't have problems.

If anyone has any other ideas for tuning, or why i seem to have such a severe case of performance issues when the buffer is flushed please let me know. My raidz pool is 4 disks raidz1, i checked the SMART test history and all the drives seem fine, with gstat running they all look to be performing the same so i don't think there is a bad disk.



Some more info for those interested:

I ran the dtrace captures mentioned at the end of the article, and although I'm not really sure how to interpret them they seem to show a lot of delayed/high latency writes. That improved slightly after the sysctl change.

dirty_percentage 60 (value is microseconds i believe)
Code:
sudo dtrace -n delay-mintime'{ @ = quantize(arg2); }'
           value  ------------- Distribution ------------- count
               4 |                                         0
               8 |                                         1
              16 |                                         0
              32 |                                         0
              64 |                                         1
             128 |                                         2
             256 |                                         3
             512 |                                         8
            1024 |                                         14
            2048 |                                         27
            4096 |                                         56
            8192 |                                         110
           16384 |@                                        213
           32768 |@                                        401
           65536 |@@                                       712
          131072 |@@@                                      1145
          262144 |@@@@                                     1587
          524288 |@@@@@@@@                                 3373
         1048576 |@@@@@@@@                                 3204
         2097152 |@@@@@@                                   2331
         4194304 |@@@@                                     1452
         8388608 |@@                                       819
        16777216 |@                                        436
        33554432 |@                                        226
        67108864 |@                                        238
       134217728 |                                         0

dirty_percentage 40
Code:
sudo dtrace -n delay-mintime'{ @ = quantize(arg2); }'
           value  ------------- Distribution ------------- count
               8 |                                         0
              16 |                                         1
              32 |                                         2
              64 |                                         4
             128 |                                         8
             256 |                                         16
             512 |                                         32
            1024 |                                         63
            2048 |                                         127
            4096 |                                         251
            8192 |@                                        487
           16384 |@                                        925
           32768 |@@                                       1693
           65536 |@@@@                                     2864
          131072 |@@@@@                                    4244
          262144 |@@@@@@@                                  5233
          524288 |@@@@@@@                                  5154
         1048576 |@@@@@                                    4080
         2097152 |@@@                                      2689
         4194304 |@@                                       1574
         8388608 |@                                        855
        16777216 |@                                        444
        33554432 |                                         228
        67108864 |                                         277
       134217728 |                                         0


dirty_percentage 60 (i think the ratio rather than the total amount is significant here)
Code:
sudo dtrace -n fbt::dsl_pool_need_dirty_delay:return'{ @[args[1] == 0 ? "no delay" : "delay"] = count(); }'
  no delay                                                      16213
  delay                                                         16394

dirty_percentage 40
Code:
sudo dtrace -n fbt::dsl_pool_need_dirty_delay:return'{ @[args[1] == 0 ? "no delay" : "delay"] = count(); }'
  delay                                                         31335
  no delay                                                      31432


Related links.
https://youtu.be/EjGqVdCOIhM?t=1907
https://www.delphix.com/blog/delphix-engineering/zfs-fundamentals-write-throttle
https://www.delphix.com/blog/delphix-engineering/openzfs-write-throttle
https://github.com/zfsonlinux/zfs/wiki/ZFS-Transaction-Delay
https://github.com/zfsonlinux/zfs/wiki/ZFS-on-Linux-Module-Parameters
 

Dan Tudora

Patron
Joined
Jul 6, 2017
Messages
276
hello
looks like you have problems with some harddrive
try to run a command when make a large transfer, iostat and gstat to see RAW transfer from HDD subsistem
and if you see something in red (on gstat), I think you have some "bad" at HDD level
succes
happy new year
 

phatfish

Dabbler
Joined
Sep 20, 2015
Messages
16
Hey Dan, thanks for the reply. I did think about a hardware issue, but i couldn't see anything that looked obviously wrong. Under gstat i can see the drives are loaded heavily (busy going red?), but if writes are being pushed near maximum that would be expected i suppose?

The drives in the raidz are ada4/5/6/7. ada 2/3 are the mirror and ada1 is the mirror SLOG.

I attached a gif screen capture of gstat while a ~4.5 gb file is being written. Hopefully just open it in a new tab and it will play.
 

Attachments

  • mf6BhD8Lmk.gif
    mf6BhD8Lmk.gif
    1.3 MB · Views: 268

phatfish

Dabbler
Joined
Sep 20, 2015
Messages
16
Sorry for resurrecting this thread. However after not really copying much new data to this pool since the beginning of the year I took another look as I needed to start adding data again.

The issue was deduplication, not sure how i discounted it completely in my first post, might have been testing against the wrong dataset by mistake. After turning it off the problem is gone.

So another warning against using deduplication. Even with seemingly the RAM and CPU to support it, it can cause unexpected performance issues.
 

phatfish

Dabbler
Joined
Sep 20, 2015
Messages
16
Hi Dan,

Ah that's interesting, looks like there are some dedupe improvements being worked on that should help the performance. In the case for my data there was very little gain from dedupe anyway, so turning it off seemed the best option right now.

I should be clear, there were no issues with the performance apart from when writing larger files (about 4GB+) over my 1GB LAN. Downloading from the internet was fine for example.
 
Top