We have recently upgraded to v9.3.
The system is a Dual E5430, 32GB RAM, 256GB L2ARC, LSI 9211-8i, 12x WD20EARX.
During this test I am using a single Samsung 850 Pro SSD connected to the LSI 9211-8i to ensure that the speed issues aren't due to slow HDD's.
I have set sync=disabled to try and optimise NFS write speed.
When writing using NFS from linux using FIO to simulate random read/writes I get the following behaviour where performance is good but it drops off to 0KB/s at times while ZFS seems to write the data to disk and then resumes.
This is giving latency in the order of 8 seconds (not milliseconds) at times.
Below is a small extract of the fio output.
When it first starts it is running around 27MB/s writes and zpool iostat shows 0Mbps. Then after say 5-10 seconds it will spike up to 200-400MB/s which is around the time that the throughput drops to 0KB/s via NFS and the latency occurs.
Is there any reason it blocks I/O completely? I have done a lot of searching about the Write throttling and tweaks in relation to it but I cannot get it to perform. I don't mind sacrificing speed to have consistent latency.
Jobs: 4 (f=4): [mmmm] [3.7% done] [31404KB/32224KB/0KB /s] [7851/8056/0 iops] [eJobs: 4 (f=4): [mmmm] [5.3% done] [31480KB/33208KB/0KB /s] [7870/8302/0 iops] [eJobs: 4 (f=4): [mmmm] [6.8% done] [30977KB/30701KB/0KB /s] [7744/7675/0 iops] [eJobs: 4 (f=4): [mmmm] [8.1% done] [27724KB/26984KB/0KB /s] [6931/6746/0 iops] [eJobs: 4 (f=4): [mmmm] [9.3% done] [28616KB/27900KB/0KB /s] [7154/6975/0 iops] [eJobs: 4 (f=4): [mmmm] [10.7% done] [27184KB/27872KB/0KB /s] [6796/6968/0 iops] [Jobs: 4 (f=4): [mmmm] [12.0% done] [28288KB/28732KB/0KB /s] [7072/7183/0 iops] [Jobs: 4 (f=4): [mmmm] [13.5% done] [33672KB/34484KB/0KB /s] [8418/8621/0 iops] [Jobs: 4 (f=4): [mmmm] [15.1% done] [32600KB/32704KB/0KB /s] [8150/8176/0 iops] [Jobs: 4 (f=4): [mmmm] [16.4% done] [31476KB/30424KB/0KB /s] [7869/7606/0 iops] [Jobs: 4 (f=4): [mmmm] [17.8% done] [29768KB/29420KB/0KB /s] [7442/7355/0 iops] [Jobs: 4 (f=4): [mmmm] [19.2% done] [32288KB/32280KB/0KB /s] [8072/8070/0 iops] [Jobs: 4 (f=4): [mmmm] [19.2% done] [1124KB/1120KB/0KB /s] [281/280/0 iops] [eta Jobs: 4 (f=4): [mmmm] [19.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 01m:08s] Jobs: 4 (f=4): [mmmm] [19.3% done] [10836KB/10652KB/0KB /s] [2709/2663/0 iops] [Jobs: 4 (f=4): [mmmm] [20.7% done] [26028KB/26228KB/0KB /s] [6507/6557/0 iops] [Jobs: 4 (f=4): [mmmm] [21.9% done] [29872KB/30468KB/0KB /s] [7468/7617/0 iops] [Jobs: 4 (f=4): [mmmm] [23.4% done] [31564KB/31676KB/0KB /s] [7891/7919/0 iops] [Jobs: 4 (f=4): [mmmm] [24.8% done] [31040KB/30268KB/0KB /s] [7760/7567/0 iops] [Jobs: 4 (f=4): [mmmm] [26.2% done] [30408KB/30292KB/0KB /s] [7602/7573/0 iops] [Jobs: 4 (f=4): [mmmm] [27.6% done] [32504KB/32856KB/0KB /s] [8126/8214/0 iops] [Jobs: 4 (f=4): [mmmm] [29.1% done] [30764KB/31092KB/0KB /s] [7691/7773/0 iops] [Jobs: 4 (f=4): [mmmm] [30.7% done] [31288KB/31304KB/0KB /s] [7822/7826/0 iops] [Jobs: 4 (f=4): [mmmm] [32.0% done] [31580KB/31412KB/0KB /s] [7895/7853/0 iops] [Jobs: 4 (f=4): [mmmm] [33.7% done] [31096KB/30440KB/0KB /s] [7774/7610/0 iops] [Jobs: 4 (f=4): [mmmm] [35.4% done] [31540KB/30516KB/0KB /s] [7885/7629/0 iops] [Jobs: 4 (f=4): [mmmm] [35.4% done] [12536KB/12124KB/0KB /s] [3134/3031/0 iops] [Jobs: 4 (f=4): [mmmm] [36.0% done] [17272KB/17252KB/0KB /s] [4318/4313/0 iops] [Jobs: 4 (f=4): [mmmm] [37.4% done] [29872KB/29816KB/0KB /s] [7468/7454/0 iops] [Jobs: 4 (f=4): [mmmm] [38.8% done] [31060KB/30644KB/0KB /s] [7765/7661/0 iops] [Jobs: 4 (f=4): [mmmm] [40.6% done] [31196KB/30200KB/0KB /s] [7799/7550/0 iops] [Jobs: 4 (f=4): [mmmm] [41.2% done] [8276KB/8032KB/0KB /s] [2069/2008/0 iops] [etJobs: 4 (f=4): [mmmm] [41.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:59s] Jobs: 4 (f=4): [mmmm] [41.7% done] [2584KB/2384KB/0KB /s] [646/596/0 iops] [eta
Here is the summary output showing the latency
random-readwrite: (groupid=0, jobs=1): err= 0: pid=2001: Fri Jan 9 13:52:48 2015
read : io=524704KB, bw=4972.4KB/s, iops=1243, runt=105524msec
slat (usec): min=247, max=9841.7K, avg=788.99, stdev=44548.90
clat (usec): min=22, max=9890.9K, avg=12179.98, stdev=166754.67
lat (usec): min=716, max=9891.4K, avg=12969.34, stdev=172636.71
write: io=523872KB, bw=4964.5KB/s, iops=1241, runt=105524msec
slat (usec): min=5, max=51600, avg= 9.68, stdev=142.78
clat (usec): min=3, max=9889.8K, avg=12764.23, stdev=184508.29
lat (usec): min=9, max=9889.8K, avg=12774.06, stdev=184511.08
Any help or pointing in the right direction would be greatly appreciated.
The system is a Dual E5430, 32GB RAM, 256GB L2ARC, LSI 9211-8i, 12x WD20EARX.
During this test I am using a single Samsung 850 Pro SSD connected to the LSI 9211-8i to ensure that the speed issues aren't due to slow HDD's.
I have set sync=disabled to try and optimise NFS write speed.
When writing using NFS from linux using FIO to simulate random read/writes I get the following behaviour where performance is good but it drops off to 0KB/s at times while ZFS seems to write the data to disk and then resumes.
This is giving latency in the order of 8 seconds (not milliseconds) at times.
Below is a small extract of the fio output.
When it first starts it is running around 27MB/s writes and zpool iostat shows 0Mbps. Then after say 5-10 seconds it will spike up to 200-400MB/s which is around the time that the throughput drops to 0KB/s via NFS and the latency occurs.
Is there any reason it blocks I/O completely? I have done a lot of searching about the Write throttling and tweaks in relation to it but I cannot get it to perform. I don't mind sacrificing speed to have consistent latency.
Jobs: 4 (f=4): [mmmm] [3.7% done] [31404KB/32224KB/0KB /s] [7851/8056/0 iops] [eJobs: 4 (f=4): [mmmm] [5.3% done] [31480KB/33208KB/0KB /s] [7870/8302/0 iops] [eJobs: 4 (f=4): [mmmm] [6.8% done] [30977KB/30701KB/0KB /s] [7744/7675/0 iops] [eJobs: 4 (f=4): [mmmm] [8.1% done] [27724KB/26984KB/0KB /s] [6931/6746/0 iops] [eJobs: 4 (f=4): [mmmm] [9.3% done] [28616KB/27900KB/0KB /s] [7154/6975/0 iops] [eJobs: 4 (f=4): [mmmm] [10.7% done] [27184KB/27872KB/0KB /s] [6796/6968/0 iops] [Jobs: 4 (f=4): [mmmm] [12.0% done] [28288KB/28732KB/0KB /s] [7072/7183/0 iops] [Jobs: 4 (f=4): [mmmm] [13.5% done] [33672KB/34484KB/0KB /s] [8418/8621/0 iops] [Jobs: 4 (f=4): [mmmm] [15.1% done] [32600KB/32704KB/0KB /s] [8150/8176/0 iops] [Jobs: 4 (f=4): [mmmm] [16.4% done] [31476KB/30424KB/0KB /s] [7869/7606/0 iops] [Jobs: 4 (f=4): [mmmm] [17.8% done] [29768KB/29420KB/0KB /s] [7442/7355/0 iops] [Jobs: 4 (f=4): [mmmm] [19.2% done] [32288KB/32280KB/0KB /s] [8072/8070/0 iops] [Jobs: 4 (f=4): [mmmm] [19.2% done] [1124KB/1120KB/0KB /s] [281/280/0 iops] [eta Jobs: 4 (f=4): [mmmm] [19.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 01m:08s] Jobs: 4 (f=4): [mmmm] [19.3% done] [10836KB/10652KB/0KB /s] [2709/2663/0 iops] [Jobs: 4 (f=4): [mmmm] [20.7% done] [26028KB/26228KB/0KB /s] [6507/6557/0 iops] [Jobs: 4 (f=4): [mmmm] [21.9% done] [29872KB/30468KB/0KB /s] [7468/7617/0 iops] [Jobs: 4 (f=4): [mmmm] [23.4% done] [31564KB/31676KB/0KB /s] [7891/7919/0 iops] [Jobs: 4 (f=4): [mmmm] [24.8% done] [31040KB/30268KB/0KB /s] [7760/7567/0 iops] [Jobs: 4 (f=4): [mmmm] [26.2% done] [30408KB/30292KB/0KB /s] [7602/7573/0 iops] [Jobs: 4 (f=4): [mmmm] [27.6% done] [32504KB/32856KB/0KB /s] [8126/8214/0 iops] [Jobs: 4 (f=4): [mmmm] [29.1% done] [30764KB/31092KB/0KB /s] [7691/7773/0 iops] [Jobs: 4 (f=4): [mmmm] [30.7% done] [31288KB/31304KB/0KB /s] [7822/7826/0 iops] [Jobs: 4 (f=4): [mmmm] [32.0% done] [31580KB/31412KB/0KB /s] [7895/7853/0 iops] [Jobs: 4 (f=4): [mmmm] [33.7% done] [31096KB/30440KB/0KB /s] [7774/7610/0 iops] [Jobs: 4 (f=4): [mmmm] [35.4% done] [31540KB/30516KB/0KB /s] [7885/7629/0 iops] [Jobs: 4 (f=4): [mmmm] [35.4% done] [12536KB/12124KB/0KB /s] [3134/3031/0 iops] [Jobs: 4 (f=4): [mmmm] [36.0% done] [17272KB/17252KB/0KB /s] [4318/4313/0 iops] [Jobs: 4 (f=4): [mmmm] [37.4% done] [29872KB/29816KB/0KB /s] [7468/7454/0 iops] [Jobs: 4 (f=4): [mmmm] [38.8% done] [31060KB/30644KB/0KB /s] [7765/7661/0 iops] [Jobs: 4 (f=4): [mmmm] [40.6% done] [31196KB/30200KB/0KB /s] [7799/7550/0 iops] [Jobs: 4 (f=4): [mmmm] [41.2% done] [8276KB/8032KB/0KB /s] [2069/2008/0 iops] [etJobs: 4 (f=4): [mmmm] [41.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:59s] Jobs: 4 (f=4): [mmmm] [41.7% done] [2584KB/2384KB/0KB /s] [646/596/0 iops] [eta
Here is the summary output showing the latency
random-readwrite: (groupid=0, jobs=1): err= 0: pid=2001: Fri Jan 9 13:52:48 2015
read : io=524704KB, bw=4972.4KB/s, iops=1243, runt=105524msec
slat (usec): min=247, max=9841.7K, avg=788.99, stdev=44548.90
clat (usec): min=22, max=9890.9K, avg=12179.98, stdev=166754.67
lat (usec): min=716, max=9891.4K, avg=12969.34, stdev=172636.71
write: io=523872KB, bw=4964.5KB/s, iops=1241, runt=105524msec
slat (usec): min=5, max=51600, avg= 9.68, stdev=142.78
clat (usec): min=3, max=9889.8K, avg=12764.23, stdev=184508.29
lat (usec): min=9, max=9889.8K, avg=12774.06, stdev=184511.08
Any help or pointing in the right direction would be greatly appreciated.
Last edited: