NVME speeds get slower with each additional SSD added to a striped pool

aseba

Cadet
Joined
Mar 12, 2024
Messages
1
I’ve recently started experimenting with TureNAS Scale and I recently got a pair of R730XDs with Dual Xeon E5-2680 v4s, 128GB of RAM and a 10G interface. I’ve set it up with 10x 4TB SAS drives and 4 2TB Samsung 980 Pro NVME SSDs. The NVME’s are all on the same x16 slot from one CPU via an Asus HYPER M.2 X16 PCIe adapter. Later, I tried adding an additional 950 Pro on an additional slot with a startech adapter (made no difference). I'm still experimenting at this stage and have the option of destroying pools as needed. I have tried all sorts of things from different machines (including a more modern Intel i5 machine, TrueNAS Scale and Core, different pool configurations with a log and meta NVMEs, sync on/off, and even different NUMA settings in the BIOS and removing one CPU. I even tried booting straight into Windows and reformatting the SSDs to make sure that there was nothing wrong with them as they were second-hand. I got the full speed out of them as expected with about 70MB/s 4k random. The bottom line is that I'm seeing the same thing I describe below regardless of what I try which only and the different configurations only make minor differences to the results. I also tried basic testing with some SAS SSDs and I seem to be getting the same results (good local speeds, poor on VM), although I did run extensive tests. I was not seeing the results I would expect for SMB either but I didn't test this extensively.

The problem I'm seeing is that when I run FIO directly on TrueNAS, I get decent results that I would expect to see and it broadly scales with each SSD I add to a stripe pool. I think ARC might be distoring the results a bit. However, as soon as I run test on Windows or Debian on a VM on TrueNAS the results are significantly worse and especially the 4k random reads/writes. VMs on my XC-PNG machine over NFS or iSCSI show pretty much the same story although a little bit slower than VMs running directly on TrueNAS. What makes this even more puzzling is that the results with the VMs get progressively slower with each NVME added to the striped pool. The speeds scales as expected I run the test directly in the TrueNAS Console. The 4k random within VMs are so bad, they are often slower than the HDD pool depending on the configuration.

Summary of results and detailed results below. The VMs were set to 4 cores, 8 threads and 8GB RAM, 40GB storage. All running Windows 10 in this case. I have also tried 4x ssds but I haven't included the results here and it was about the same as 5 ssds on a VM. CrytalDiskMark shows similar results.

5x 950 Pros in a striped pool:
  • Sequential directly on TrueNAS: IOPS=143k, read/write =17.4GiB/s
  • 4K random directly on TrueNAS: IOPS=55.5k, read/write=217MiB/s
  • Sequential on TrueNAS VM: IOPS=3345, read/write=418MiB/s
  • 4K random on TrueNAS VM: IOPS=3863, read/write=15.1MiB/s
2x 950 Pros in a striped pool
  • Sequential directly on TrueNAS: IOPS=128k, read/write=15.6GiB/s
  • 4K random directly on TrueNAS: IOPS=33.5k, read/write=131MiB/s
  • Sequential on TrueNAS VM: IOPS=3357, read/write=420MiB/s
  • 4K random on TrueNAS VM: IOPS=5250, read/write=20.5MiB/s

1x 950 Pro in a striped pool
  • Sequential directly on TrueNAS: IOPS=141k, read/write=17.2GiB/s
  • 4K random directly on TrueNAS: IOPS=18.8k, read/write=73.5MiB/s
  • Sequential on TrueNAS read/write: IOPS=2699, read/write=337MiB/s
  • 4K random on TrueNAS VM: IOPS=4039, read/write=15.8MiB/s
Does anyone have any ideas about what's going on? Thanks in advance.

Sequential read/writes directly on the TrunNAS machine with 5 950 Pros in a striped pool:

Code:
TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes


TEST1: (groupid=0, jobs=16): err= 0: pid=45764: Tue Mar 12 05:35:58 2024
  read: IOPS=143k, BW=17.4GiB/s (18.7GB/s)(2088GiB/120002msec)
    clat (usec): min=11, max=48179, avg=42.15, stdev=124.83
     lat (usec): min=11, max=48179, avg=42.24, stdev=124.87
    clat percentiles (usec):
     |  1.00th=[   18],  5.00th=[   20], 10.00th=[   22], 20.00th=[   29],
     | 30.00th=[   31], 40.00th=[   33], 50.00th=[   34], 60.00th=[   37],
     | 70.00th=[   40], 80.00th=[   47], 90.00th=[   62], 95.00th=[   75],
     | 99.00th=[  115], 99.50th=[  155], 99.90th=[  775], 99.95th=[ 1795],
     | 99.99th=[ 6063]
   bw (  MiB/s): min=11714, max=22875, per=100.00%, avg=17839.89, stdev=138.08, samples=3824
   iops        : min=93711, max=183002, avg=142717.95, stdev=1104.69, samples=3824
  write: IOPS=143k, BW=17.4GiB/s (18.7GB/s)(2091GiB/120002msec); 0 zone resets
    clat (usec): min=16, max=47302, avg=66.19, stdev=223.30
     lat (usec): min=17, max=47304, avg=68.12, stdev=224.26
    clat percentiles (usec):
     |  1.00th=[   27],  5.00th=[   30], 10.00th=[   34], 20.00th=[   45],
     | 30.00th=[   49], 40.00th=[   51], 50.00th=[   54], 60.00th=[   57],
     | 70.00th=[   61], 80.00th=[   68], 90.00th=[   81], 95.00th=[   94],
     | 99.00th=[  167], 99.50th=[  334], 99.90th=[ 2900], 99.95th=[ 4883],
     | 99.99th=[ 9634]
   bw (  MiB/s): min=11794, max=22655, per=100.00%, avg=17863.61, stdev=137.32, samples=3824
   iops        : min=94353, max=181239, avg=142906.89, stdev=1098.56, samples=3824
  lat (usec)   : 20=3.09%, 50=56.14%, 100=38.05%, 250=2.27%, 500=0.18%
  lat (usec)   : 750=0.09%, 1000=0.04%
  lat (msec)   : 2=0.06%, 4=0.04%, 10=0.04%, 20=0.01%, 50=0.01%
  cpu          : usr=4.62%, sys=79.71%, ctx=4556658, majf=0, minf=230
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=17107481,17130014,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16


Run status group 0 (all jobs):
   READ: bw=17.4GiB/s (18.7GB/s), 17.4GiB/s-17.4GiB/s (18.7GB/s-18.7GB/s), io=2088GiB (2242GB), run=120002-120002msec
  WRITE: bw=17.4GiB/s (18.7GB/s), 17.4GiB/s-17.4GiB/s (18.7GB/s-18.7GB/s), io=2091GiB (2245GB), run=120002-120002msec


4k random read/writes directly on the TrunNAS machine with 5 950 Pros in a striped pool:

Code:
TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes

TEST2: (groupid=0, jobs=16): err= 0: pid=46671: Tue Mar 12 05:43:29 2024
  read: IOPS=55.5k, BW=217MiB/s (227MB/s)(25.4GiB/120003msec)
    clat (usec): min=4, max=97353, avg=124.09, stdev=540.79
     lat (usec): min=4, max=97353, avg=124.26, stdev=540.92
    clat percentiles (usec):
     |  1.00th=[    9],  5.00th=[   11], 10.00th=[   14], 20.00th=[   65],
     | 30.00th=[   83], 40.00th=[   93], 50.00th=[  110], 60.00th=[  122],
     | 70.00th=[  133], 80.00th=[  143], 90.00th=[  157], 95.00th=[  172],
     | 99.00th=[  330], 99.50th=[  742], 99.90th=[ 6587], 99.95th=[10421],
     | 99.99th=[23200]
   bw (  KiB/s): min=168637, max=292693, per=100.00%, avg=222224.57, stdev=1291.53, samples=3792
   iops        : min=42154, max=73172, avg=55552.02, stdev=322.89, samples=3792
  write: IOPS=55.5k, BW=217MiB/s (227MB/s)(25.4GiB/120003msec); 0 zone resets
    clat (usec): min=7, max=118160, avg=159.20, stdev=705.23
     lat (usec): min=7, max=118160, avg=159.45, stdev=705.60
    clat percentiles (usec):
     |  1.00th=[   18],  5.00th=[   26], 10.00th=[   34], 20.00th=[   88],
     | 30.00th=[  101], 40.00th=[  114], 50.00th=[  129], 60.00th=[  141],
     | 70.00th=[  151], 80.00th=[  163], 90.00th=[  180], 95.00th=[  198],
     | 99.00th=[  498], 99.50th=[ 1303], 99.90th=[ 9634], 99.95th=[15139],
     | 99.99th=[28967]
   bw (  KiB/s): min=169729, max=292712, per=100.00%, avg=222307.32, stdev=1274.05, samples=3792
   iops        : min=42427, max=73177, avg=55572.69, stdev=318.52, samples=3792
  lat (usec)   : 10=1.57%, 20=7.51%, 50=6.37%, 100=21.61%, 250=61.11%
  lat (usec)   : 500=0.99%, 750=0.24%, 1000=0.12%
  lat (msec)   : 2=0.16%, 4=0.10%, 10=0.16%, 20=0.05%, 50=0.02%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=2.30%, sys=74.67%, ctx=751732, majf=0, minf=216
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=6659898,6662249,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=217MiB/s (227MB/s), 217MiB/s-217MiB/s (227MB/s-227MB/s), io=25.4GiB (27.3GB), run=120003-120003msec
  WRITE: bw=217MiB/s (227MB/s), 217MiB/s-217MiB/s (227MB/s-227MB/s), io=25.4GiB (27.3GB), run=120003-120003msec


Sequential read/writes on the TrunNAS VM with 5 950 Pros in a striped pool:

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=windowsaio, iodepth=16
...
fio-3.36
Starting 16 threads

TEST1: (groupid=0, jobs=16): err= 0: pid=10124: Tue Mar 12 13:06:01 2024
  read: IOPS=3345, BW=418MiB/s (438MB/s)(49.1GiB/120282msec)
    slat (usec): min=21, max=292003, avg=696.36, stdev=2382.48
    clat (usec): min=8, max=990862, avg=32903.22, stdev=49249.11
     lat (usec): min=718, max=990915, avg=33599.57, stdev=49305.61
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    7], 10.00th=[   13], 20.00th=[   19],
     | 30.00th=[   23], 40.00th=[   26], 50.00th=[   28], 60.00th=[   31],
     | 70.00th=[   34], 80.00th=[   37], 90.00th=[   44], 95.00th=[   54],
     | 99.00th=[  157], 99.50th=[  422], 99.90th=[  751], 99.95th=[  810],
     | 99.99th=[  885]
   bw (  KiB/s): min=22687, max=876841, per=100.00%, avg=438356.76, stdev=9949.95, samples=3406
   iops        : min=  165, max= 6845, avg=3416.84, stdev=77.77, samples=3406
  write: IOPS=3348, BW=419MiB/s (439MB/s)(49.2GiB/120282msec); 0 zone resets
    slat (usec): min=23, max=357536, avg=753.64, stdev=2546.04
    clat (usec): min=8, max=988956, avg=33047.42, stdev=50548.15
     lat (usec): min=684, max=989023, avg=33801.06, stdev=50609.67
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    7], 10.00th=[   13], 20.00th=[   19],
     | 30.00th=[   23], 40.00th=[   26], 50.00th=[   28], 60.00th=[   31],
     | 70.00th=[   34], 80.00th=[   37], 90.00th=[   44], 95.00th=[   54],
     | 99.00th=[  161], 99.50th=[  443], 99.90th=[  760], 99.95th=[  818],
     | 99.99th=[  902]
   bw (  KiB/s): min=21230, max=880824, per=100.00%, avg=438635.19, stdev=10066.44, samples=3407
   iops        : min=  152, max= 6875, avg=3418.81, stdev=78.67, samples=3407
  lat (usec)   : 10=0.01%, 20=0.04%, 50=0.03%, 100=0.01%, 250=0.02%
  lat (usec)   : 500=0.02%, 750=0.06%, 1000=0.12%
  lat (msec)   : 2=0.67%, 4=1.66%, 10=5.14%, 20=14.99%, 50=71.02%
  lat (msec)   : 100=4.82%, 250=0.55%, 500=0.43%, 750=0.31%, 1000=0.10%
  cpu          : usr=0.83%, sys=24.37%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.7%, 2=4.2%, 4=12.1%, 8=51.5%, 16=31.5%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.5%, 8=1.6%, 16=2.9%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=402375,402731,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=418MiB/s (438MB/s), 418MiB/s-418MiB/s (438MB/s-438MB/s), io=49.1GiB (52.7GB), run=120282-120282msec
  WRITE: bw=419MiB/s (439MB/s), 419MiB/s-419MiB/s (439MB/s-439MB/s), io=49.2GiB (52.8GB), run=120282-120282msec


4k random read/writes on the TrunNAS VM with 5 950 Pros in a striped pool:

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=windowsaio, iodepth=16
...
fio-3.36
Starting 16 threads
TEST2: Laying out IO file (1 file / 10240MiB)

TEST2: (groupid=0, jobs=16): err= 0: pid=7844: Tue Mar 12 13:01:33 2024
  read: IOPS=3863, BW=15.1MiB/s (15.8MB/s)(1812MiB/120029msec)
    slat (usec): min=15, max=497910, avg=621.90, stdev=2489.72
    clat (usec): min=8, max=576572, avg=28111.12, stdev=18191.45
     lat (usec): min=499, max=758814, avg=28733.02, stdev=18409.45
    clat percentiles (usec):
     |  1.00th=[  1778],  5.00th=[  6325], 10.00th=[ 11731], 20.00th=[ 18220],
     | 30.00th=[ 21627], 40.00th=[ 24249], 50.00th=[ 26346], 60.00th=[ 28705],
     | 70.00th=[ 31327], 80.00th=[ 34866], 90.00th=[ 41157], 95.00th=[ 50594],
     | 99.00th=[ 95945], 99.50th=[123208], 99.90th=[217056], 99.95th=[238027],
     | 99.99th=[358613]
   bw (  KiB/s): min= 1688, max=31733, per=100.00%, avg=15687.77, stdev=277.73, samples=3376
   iops        : min=  411, max= 7928, avg=3916.06, stdev=69.45, samples=3376
  write: IOPS=3869, BW=15.1MiB/s (15.8MB/s)(1814MiB/120029msec); 0 zone resets
    slat (usec): min=16, max=260892, avg=660.51, stdev=2416.31
    clat (usec): min=8, max=582509, avg=28115.29, stdev=18108.58
     lat (usec): min=536, max=587199, avg=28775.80, stdev=18308.27
    clat percentiles (usec):
     |  1.00th=[  1827],  5.00th=[  6259], 10.00th=[ 11600], 20.00th=[ 18220],
     | 30.00th=[ 21627], 40.00th=[ 24249], 50.00th=[ 26346], 60.00th=[ 28705],
     | 70.00th=[ 31327], 80.00th=[ 34866], 90.00th=[ 41157], 95.00th=[ 50594],
     | 99.00th=[ 96994], 99.50th=[123208], 99.90th=[214959], 99.95th=[238027],
     | 99.99th=[346031]
   bw (  KiB/s): min= 1937, max=31491, per=100.00%, avg=15707.51, stdev=279.46, samples=3376
   iops        : min=  474, max= 7867, avg=3921.02, stdev=69.88, samples=3376
  lat (usec)   : 10=0.01%, 20=0.05%, 50=0.03%, 100=0.02%, 250=0.02%
  lat (usec)   : 500=0.03%, 750=0.09%, 1000=0.15%
  lat (msec)   : 2=0.78%, 4=1.77%, 10=5.45%, 20=16.11%, 50=70.37%
  lat (msec)   : 100=4.23%, 250=0.86%, 500=0.03%, 750=0.01%
  cpu          : usr=0.36%, sys=25.36%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.7%, 2=4.1%, 4=11.8%, 8=50.8%, 16=32.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.7%, 8=1.6%, 16=2.8%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=463766,464396,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=15.1MiB/s (15.8MB/s), 15.1MiB/s-15.1MiB/s (15.8MB/s-15.8MB/s), io=1812MiB (1900MB), run=120029-120029msec
  WRITE: bw=15.1MiB/s (15.8MB/s), 15.1MiB/s-15.1MiB/s (15.8MB/s-15.8MB/s), io=1814MiB (1902MB), run=120029-120029msec


Sequential read/writes directly on the TrunNAS machine with 2 950 Pros in a striped pool:

Code:
TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes

TEST1: (groupid=0, jobs=16): err= 0: pid=56321: Tue Mar 12 06:49:35 2024
  read: IOPS=128k, BW=15.6GiB/s (16.8GB/s)(1876GiB/120001msec)
    clat (usec): min=12, max=165727, avg=45.22, stdev=237.56
     lat (usec): min=12, max=165727, avg=45.32, stdev=237.85
    clat percentiles (usec):
     |  1.00th=[   18],  5.00th=[   21], 10.00th=[   25], 20.00th=[   31],
     | 30.00th=[   33], 40.00th=[   35], 50.00th=[   37], 60.00th=[   39],
     | 70.00th=[   43], 80.00th=[   50], 90.00th=[   61], 95.00th=[   71],
     | 99.00th=[  103], 99.50th=[  133], 99.90th=[ 1004], 99.95th=[ 2737],
     | 99.99th=[ 8094]
   bw (  MiB/s): min= 5321, max=21514, per=100.00%, avg=16027.59, stdev=195.48, samples=3808
   iops        : min=42563, max=172112, avg=128217.88, stdev=1563.90, samples=3808
  write: IOPS=128k, BW=15.7GiB/s (16.8GB/s)(1879GiB/120001msec); 0 zone resets
    clat (usec): min=16, max=203211, avg=75.30, stdev=493.24
     lat (usec): min=17, max=203212, avg=77.41, stdev=494.64
    clat percentiles (usec):
     |  1.00th=[   28],  5.00th=[   32], 10.00th=[   39], 20.00th=[   49],
     | 30.00th=[   52], 40.00th=[   55], 50.00th=[   58], 60.00th=[   62],
     | 70.00th=[   68], 80.00th=[   75], 90.00th=[   88], 95.00th=[  102],
     | 99.00th=[  190], 99.50th=[  330], 99.90th=[ 3752], 99.95th=[ 6128],
     | 99.99th=[15664]
   bw (  MiB/s): min= 5300, max=21475, per=100.00%, avg=16047.66, stdev=195.45, samples=3808
   iops        : min=42399, max=171800, avg=128378.31, stdev=1563.63, samples=3808
  lat (usec)   : 20=2.00%, 50=50.78%, 100=43.98%, 250=2.78%, 500=0.18%
  lat (usec)   : 750=0.06%, 1000=0.04%
  lat (msec)   : 2=0.06%, 4=0.05%, 10=0.05%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=4.58%, sys=77.07%, ctx=3011763, majf=0, minf=197
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=15371656,15390728,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=15.6GiB/s (16.8GB/s), 15.6GiB/s-15.6GiB/s (16.8GB/s-16.8GB/s), io=1876GiB (2015GB), run=120001-120001msec
  WRITE: bw=15.7GiB/s (16.8GB/s), 15.7GiB/s-15.7GiB/s (16.8GB/s-16.8GB/s), io=1879GiB (2017GB), run=120001-120001msec


4k random read/writes directly on the TrunNAS machine with 2 950 Pros in a striped pool:

Code:
TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes

TEST2: (groupid=0, jobs=16): err= 0: pid=57090: Tue Mar 12 06:54:18 2024
  read: IOPS=33.4k, BW=131MiB/s (137MB/s)(15.3GiB/120003msec)
    clat (usec): min=3, max=252363, avg=78.31, stdev=215.79
     lat (usec): min=3, max=252364, avg=78.52, stdev=215.83
    clat percentiles (usec):
     |  1.00th=[    7],  5.00th=[    9], 10.00th=[   10], 20.00th=[   13],
     | 30.00th=[   16], 40.00th=[   58], 50.00th=[   69], 60.00th=[   83],
     | 70.00th=[   96], 80.00th=[  115], 90.00th=[  135], 95.00th=[  157],
     | 99.00th=[  502], 99.50th=[  783], 99.90th=[ 1582], 99.95th=[ 2147],
     | 99.99th=[ 5145]
   bw (  KiB/s): min=89700, max=292632, per=100.00%, avg=133878.05, stdev=1165.55, samples=3824
   iops        : min=22421, max=73158, avg=33468.25, stdev=291.37, samples=3824
  write: IOPS=33.5k, BW=131MiB/s (137MB/s)(15.3GiB/120003msec); 0 zone resets
    clat (usec): min=9, max=160626, avg=394.45, stdev=514.54
     lat (usec): min=9, max=160626, avg=394.79, stdev=514.66
    clat percentiles (usec):
     |  1.00th=[   73],  5.00th=[   86], 10.00th=[   94], 20.00th=[  108],
     | 30.00th=[  127], 40.00th=[  145], 50.00th=[  192], 60.00th=[  449],
     | 70.00th=[  586], 80.00th=[  685], 90.00th=[  816], 95.00th=[  988],
     | 99.00th=[ 1450], 99.50th=[ 1680], 99.90th=[ 3261], 99.95th=[ 4555],
     | 99.99th=[10290]
   bw (  KiB/s): min=89657, max=294240, per=100.00%, avg=133986.48, stdev=1124.83, samples=3824
   iops        : min=22412, max=73560, avg=33495.38, stdev=281.19, samples=3824
  lat (usec)   : 4=0.01%, 10=5.19%, 20=12.39%, 50=1.43%, 100=24.37%
  lat (usec)   : 250=32.30%, 500=5.38%, 750=11.58%, 1000=4.81%
  lat (msec)   : 2=2.36%, 4=0.15%, 10=0.03%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=1.68%, sys=31.80%, ctx=2338914, majf=0, minf=181
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=4012910,4016169,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=131MiB/s (137MB/s), 131MiB/s-131MiB/s (137MB/s-137MB/s), io=15.3GiB (16.4GB), run=120003-120003msec
  WRITE: bw=131MiB/s (137MB/s), 131MiB/s-131MiB/s (137MB/s-137MB/s), io=15.3GiB (16.5GB), run=120003-120003msec


Sequential read/writes on the TrunNAS VM with 2 950 Pros in a striped pool:

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=windowsaio, iodepth=16
...
fio-3.36
Starting 16 threads
TEST1: Laying out IO file (1 file / 10240MiB)


TEST1: (groupid=0, jobs=16): err= 0: pid=2376: Tue Mar 12 13:35:11 2024
  read: IOPS=3357, BW=420MiB/s (440MB/s)(49.2GiB/120030msec)
    slat (usec): min=18, max=322144, avg=653.28, stdev=2365.20
    clat (usec): min=8, max=1724.6k, avg=32934.21, stdev=39760.16
     lat (usec): min=698, max=1726.9k, avg=33587.49, stdev=39818.23
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    9], 10.00th=[   15], 20.00th=[   19],
     | 30.00th=[   22], 40.00th=[   25], 50.00th=[   28], 60.00th=[   31],
     | 70.00th=[   34], 80.00th=[   39], 90.00th=[   47], 95.00th=[   59],
     | 99.00th=[  192], 99.50th=[  338], 99.90th=[  558], 99.95th=[  684],
     | 99.99th=[  709]
   bw (  KiB/s): min=27753, max=870634, per=100.00%, avg=435861.28, stdev=9775.60, samples=3412
   iops        : min=  205, max= 6794, avg=3397.54, stdev=76.38, samples=3412
  write: IOPS=3358, BW=420MiB/s (440MB/s)(49.2GiB/120030msec); 0 zone resets
    slat (usec): min=20, max=468709, avg=708.75, stdev=2639.21
    clat (usec): min=8, max=1723.5k, avg=33003.02, stdev=40186.58
     lat (usec): min=885, max=1723.9k, avg=33711.76, stdev=40277.37
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    9], 10.00th=[   15], 20.00th=[   20],
     | 30.00th=[   22], 40.00th=[   25], 50.00th=[   28], 60.00th=[   31],
     | 70.00th=[   34], 80.00th=[   39], 90.00th=[   47], 95.00th=[   59],
     | 99.00th=[  188], 99.50th=[  334], 99.90th=[  567], 99.95th=[  684],
     | 99.99th=[  709]
   bw (  KiB/s): min=21215, max=857038, per=100.00%, avg=435423.31, stdev=9778.41, samples=3416
   iops        : min=  152, max= 6686, avg=3394.03, stdev=76.39, samples=3416
  lat (usec)   : 10=0.01%, 20=0.04%, 50=0.02%, 100=0.01%, 250=0.02%
  lat (usec)   : 500=0.02%, 750=0.04%, 1000=0.09%
  lat (msec)   : 2=0.53%, 4=1.30%, 10=3.99%, 20=17.50%, 50=68.68%
  lat (msec)   : 100=5.97%, 250=1.09%, 500=0.54%, 750=0.16%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.68%, sys=22.81%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.6%, 2=3.5%, 4=10.6%, 8=51.2%, 16=34.1%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.8%, 8=1.7%, 16=2.5%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=402964,403173,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16


Run status group 0 (all jobs):
   READ: bw=420MiB/s (440MB/s), 420MiB/s-420MiB/s (440MB/s-440MB/s), io=49.2GiB (52.8GB), run=120030-120030msec
  WRITE: bw=420MiB/s (440MB/s), 420MiB/s-420MiB/s (440MB/s-440MB/s), io=49.2GiB (52.8GB), run=120030-120030msec


4k random read/writes on the TrunNAS VM with 2 950 Pros in a striped pool:

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=windowsaio, iodepth=16
...
fio-3.36
Starting 16 threads


TEST2: (groupid=0, jobs=16): err= 0: pid=5760: Tue Mar 12 13:39:27 2024
  read: IOPS=5250, BW=20.5MiB/s (21.5MB/s)(2462MiB/120029msec)
    slat (usec): min=13, max=283116, avg=463.59, stdev=1603.01
    clat (usec): min=8, max=616759, avg=20852.01, stdev=14970.77
     lat (usec): min=537, max=616817, avg=21315.59, stdev=15031.39
    clat percentiles (usec):
     |  1.00th=[  1827],  5.00th=[  6718], 10.00th=[ 11207], 20.00th=[ 14091],
     | 30.00th=[ 15664], 40.00th=[ 16909], 50.00th=[ 18482], 60.00th=[ 20055],
     | 70.00th=[ 22152], 80.00th=[ 25297], 90.00th=[ 31327], 95.00th=[ 39060],
     | 99.00th=[ 69731], 99.50th=[ 90702], 99.90th=[181404], 99.95th=[212861],
     | 99.99th=[549454]
   bw (  KiB/s): min= 8345, max=30826, per=100.00%, avg=21141.38, stdev=262.99, samples=3488
   iops        : min= 2079, max= 7701, avg=5279.75, stdev=65.77, samples=3488
  write: IOPS=5246, BW=20.5MiB/s (21.5MB/s)(2460MiB/120029msec); 0 zone resets
    slat (usec): min=14, max=186865, avg=484.84, stdev=1595.62
    clat (usec): min=8, max=1041.6k, avg=20885.40, stdev=15668.26
     lat (usec): min=543, max=1041.7k, avg=21370.24, stdev=15716.90
    clat percentiles (usec):
     |  1.00th=[  1827],  5.00th=[  6652], 10.00th=[ 11207], 20.00th=[ 14091],
     | 30.00th=[ 15664], 40.00th=[ 17171], 50.00th=[ 18482], 60.00th=[ 20055],
     | 70.00th=[ 22152], 80.00th=[ 25297], 90.00th=[ 31327], 95.00th=[ 39060],
     | 99.00th=[ 69731], 99.50th=[ 91751], 99.90th=[187696], 99.95th=[214959],
     | 99.99th=[557843]
   bw (  KiB/s): min= 8483, max=30993, per=100.00%, avg=21131.70, stdev=265.38, samples=3488
   iops        : min= 2113, max= 7744, avg=5277.35, stdev=66.37, samples=3488
  lat (usec)   : 10=0.01%, 20=0.05%, 50=0.03%, 100=0.02%, 250=0.02%
  lat (usec)   : 500=0.03%, 750=0.09%, 1000=0.16%
  lat (msec)   : 2=0.72%, 4=1.62%, 10=5.35%, 20=51.31%, 50=38.18%
  lat (msec)   : 100=2.01%, 250=0.37%, 500=0.01%, 750=0.02%, 2000=0.01%
  cpu          : usr=0.52%, sys=25.52%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.7%, 2=3.9%, 4=11.2%, 8=50.6%, 16=33.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.8%, 8=1.6%, 16=2.6%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=630257,629791,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16


Run status group 0 (all jobs):
   READ: bw=20.5MiB/s (21.5MB/s), 20.5MiB/s-20.5MiB/s (21.5MB/s-21.5MB/s), io=2462MiB (2582MB), run=120029-120029msec
  WRITE: bw=20.5MiB/s (21.5MB/s), 20.5MiB/s-20.5MiB/s (21.5MB/s-21.5MB/s), io=2460MiB (2580MB), run=120029-120029msec



Sequential read/writes directly on the TrunNAS machine with 1 950 Pros in a striped pool:

Code:
TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes

TEST1: (groupid=0, jobs=16): err= 0: pid=58791: Tue Mar 12 07:16:02 2024
  read: IOPS=141k, BW=17.2GiB/s (18.5GB/s)(2066GiB/120001msec)
    clat (usec): min=12, max=62061, avg=41.78, stdev=111.94
     lat (usec): min=12, max=62061, avg=41.87, stdev=112.16
    clat percentiles (usec):
     |  1.00th=[   18],  5.00th=[   21], 10.00th=[   24], 20.00th=[   30],
     | 30.00th=[   32], 40.00th=[   35], 50.00th=[   36], 60.00th=[   38],
     | 70.00th=[   41], 80.00th=[   47], 90.00th=[   59], 95.00th=[   70],
     | 99.00th=[  102], 99.50th=[  126], 99.90th=[  685], 99.95th=[ 1516],
     | 99.99th=[ 5080]
   bw (  MiB/s): min= 4423, max=21999, per=100.00%, avg=17649.07, stdev=160.80, samples=3824
   iops        : min=35384, max=175996, avg=141189.41, stdev=1286.41, samples=3824
  write: IOPS=141k, BW=17.2GiB/s (18.5GB/s)(2068GiB/120001msec); 0 zone resets
    clat (usec): min=16, max=80748, avg=67.57, stdev=177.01
     lat (usec): min=17, max=80750, avg=69.57, stdev=177.77
    clat percentiles (usec):
     |  1.00th=[   28],  5.00th=[   33], 10.00th=[   37], 20.00th=[   47],
     | 30.00th=[   51], 40.00th=[   55], 50.00th=[   58], 60.00th=[   61],
     | 70.00th=[   66], 80.00th=[   73], 90.00th=[   85], 95.00th=[   98],
     | 99.00th=[  215], 99.50th=[  334], 99.90th=[ 1811], 99.95th=[ 3621],
     | 99.99th=[ 7635]
   bw (  MiB/s): min= 4468, max=22128, per=100.00%, avg=17671.67, stdev=160.42, samples=3824
   iops        : min=35748, max=177031, avg=141370.12, stdev=1283.35, samples=3824
  lat (usec)   : 20=2.12%, 50=52.80%, 100=42.29%, 250=2.31%, 500=0.28%
  lat (usec)   : 750=0.06%, 1000=0.03%
  lat (msec)   : 2=0.05%, 4=0.04%, 10=0.03%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=4.99%, sys=81.64%, ctx=4553346, majf=0, minf=192
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=16922801,16944218,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=17.2GiB/s (18.5GB/s), 17.2GiB/s-17.2GiB/s (18.5GB/s-18.5GB/s), io=2066GiB (2218GB), run=120001-120001msec
  WRITE: bw=17.2GiB/s (18.5GB/s), 17.2GiB/s-17.2GiB/s (18.5GB/s-18.5GB/s), io=2068GiB (2221GB), run=120001-120001msec



4k random read/writes directly on the TrunNAS machine with 1 950 Pros in a striped pool:

Code:
TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=16
...
fio-3.33
Starting 16 processes
TEST2: Laying out IO file (1 file / 10240MiB)

TEST2: (groupid=0, jobs=16): err= 0: pid=55651: Tue Mar 12 06:39:13 2024
  read: IOPS=18.8k, BW=73.3MiB/s (76.9MB/s)(8801MiB/120002msec)
    clat (usec): min=3, max=74248, avg=50.86, stdev=85.63
     lat (usec): min=3, max=74249, avg=51.06, stdev=85.65
    clat percentiles (usec):
     |  1.00th=[    7],  5.00th=[    9], 10.00th=[   10], 20.00th=[   13],
     | 30.00th=[   17], 40.00th=[   55], 50.00th=[   60], 60.00th=[   65],
     | 70.00th=[   70], 80.00th=[   76], 90.00th=[   85], 95.00th=[   94],
     | 99.00th=[  118], 99.50th=[  128], 99.90th=[  202], 99.95th=[  379],
     | 99.99th=[ 1057]
   bw (  KiB/s): min=26408, max=278600, per=100.00%, avg=75218.74, stdev=1951.08, samples=3824
   iops        : min= 6602, max=69650, avg=18804.69, stdev=487.77, samples=3824
  write: IOPS=18.8k, BW=73.5MiB/s (77.0MB/s)(8814MiB/120002msec); 0 zone resets
    clat (usec): min=12, max=89860, avg=795.16, stdev=518.19
     lat (usec): min=12, max=89861, avg=795.49, stdev=518.20
    clat percentiles (usec):
     |  1.00th=[   72],  5.00th=[   82], 10.00th=[   94], 20.00th=[  118],
     | 30.00th=[  660], 40.00th=[  775], 50.00th=[  906], 60.00th=[ 1045],
     | 70.00th=[ 1123], 80.00th=[ 1188], 90.00th=[ 1270], 95.00th=[ 1336],
     | 99.00th=[ 1500], 99.50th=[ 1614], 99.90th=[ 2409], 99.95th=[ 2573],
     | 99.99th=[ 6652]
   bw (  KiB/s): min=26888, max=279472, per=100.00%, avg=75335.30, stdev=1934.47, samples=3824
   iops        : min= 6722, max=69868, avg=18833.82, stdev=483.62, samples=3824
  lat (usec)   : 4=0.01%, 10=5.49%, 20=10.96%, 50=2.00%, 100=36.35%
  lat (usec)   : 250=7.68%, 500=0.29%, 750=5.97%, 1000=9.02%
  lat (msec)   : 2=22.10%, 4=0.12%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=1.03%, sys=14.96%, ctx=1710296, majf=0, minf=177
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2253090,2256434,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=73.3MiB/s (76.9MB/s), 73.3MiB/s-73.3MiB/s (76.9MB/s-76.9MB/s), io=8801MiB (9229MB), run=120002-120002msec
  WRITE: bw=73.5MiB/s (77.0MB/s), 73.5MiB/s-73.5MiB/s (77.0MB/s-77.0MB/s), io=8814MiB (9242MB), run=120002-120002msec


Sequential read/writes on the TrunNAS VM with 1 950 Pros in a striped pool:

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
TEST1: (g=0): rw=rw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=windowsaio, iodepth=16
...
fio-3.36
Starting 16 threads
TEST1: Laying out IO file (1 file / 10240MiB)

TEST1: (groupid=0, jobs=16): err= 0: pid=6804: Tue Mar 12 14:08:05 2024
  read: IOPS=2699, BW=337MiB/s (354MB/s)(39.5GiB/120031msec)
    slat (usec): min=21, max=467036, avg=787.44, stdev=3280.67
    clat (usec): min=10, max=1063.5k, avg=41271.29, stdev=49260.97
     lat (usec): min=1023, max=1064.3k, avg=42058.73, stdev=49405.57
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[   15], 10.00th=[   21], 20.00th=[   27],
     | 30.00th=[   31], 40.00th=[   33], 50.00th=[   36], 60.00th=[   39],
     | 70.00th=[   41], 80.00th=[   46], 90.00th=[   54], 95.00th=[   67],
     | 99.00th=[  247], 99.50th=[  418], 99.90th=[  709], 99.95th=[  760],
     | 99.99th=[  802]
   bw (  KiB/s): min=35877, max=671636, per=100.00%, avg=351115.44, stdev=7048.00, samples=3287
   iops        : min=  270, max= 5240, avg=2735.28, stdev=55.07, samples=3287
  write: IOPS=2702, BW=338MiB/s (354MB/s)(39.6GiB/120031msec); 0 zone resets
    slat (usec): min=22, max=381750, avg=842.06, stdev=3071.75
    clat (usec): min=10, max=1064.4k, avg=41242.06, stdev=48887.46
     lat (usec): min=1030, max=1133.2k, avg=42084.12, stdev=49007.67
    clat percentiles (msec):
     |  1.00th=[    5],  5.00th=[   15], 10.00th=[   21], 20.00th=[   27],
     | 30.00th=[   30], 40.00th=[   33], 50.00th=[   35], 60.00th=[   39],
     | 70.00th=[   41], 80.00th=[   46], 90.00th=[   54], 95.00th=[   67],
     | 99.00th=[  241], 99.50th=[  414], 99.90th=[  701], 99.95th=[  726],
     | 99.99th=[  802]
   bw (  KiB/s): min=22888, max=658965, per=100.00%, avg=351238.96, stdev=7062.85, samples=3289
   iops        : min=  167, max= 5139, avg=2736.23, stdev=55.18, samples=3289
  lat (usec)   : 20=0.01%, 50=0.02%, 100=0.01%, 250=0.01%, 500=0.01%
  lat (usec)   : 750=0.01%, 1000=0.02%
  lat (msec)   : 2=0.22%, 4=0.61%, 10=2.03%, 20=6.41%, 50=77.70%
  lat (msec)   : 100=10.70%, 250=1.25%, 500=0.61%, 750=0.33%, 1000=0.05%
  lat (msec)   : 2000=0.01%
  cpu          : usr=0.68%, sys=21.61%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.8%, 2=3.5%, 4=10.8%, 8=51.2%, 16=33.7%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.7%, 8=1.7%, 16=2.6%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=323964,324418,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16


Run status group 0 (all jobs):
   READ: bw=337MiB/s (354MB/s), 337MiB/s-337MiB/s (354MB/s-354MB/s), io=39.5GiB (42.5GB), run=120031-120031msec
  WRITE: bw=338MiB/s (354MB/s), 338MiB/s-338MiB/s (354MB/s-354MB/s), io=39.6GiB (42.5GB), run=120031-120031msec


4k random read/writes on the TrunNAS VM with 1 950 Pros in a striped pool:

Code:
fio: this platform does not support process shared mutexes, forcing use of threads. Use the 'thread' option to get rid of this warning.
TEST2: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=windowsaio, iodepth=16
...
fio-3.36
Starting 16 threads

TEST2: (groupid=0, jobs=16): err= 0: pid=7692: Tue Mar 12 14:10:46 2024
  read: IOPS=4039, BW=15.8MiB/s (16.5MB/s)(1894MiB/120039msec)
    slat (usec): min=14, max=329113, avg=643.13, stdev=2176.09
    clat (usec): min=7, max=1097.7k, avg=26716.86, stdev=18917.75
     lat (usec): min=457, max=1097.8k, avg=27359.98, stdev=19004.67
    clat percentiles (usec):
     |  1.00th=[  1418],  5.00th=[  4621], 10.00th=[  8455], 20.00th=[ 15139],
     | 30.00th=[ 19792], 40.00th=[ 23200], 50.00th=[ 25822], 60.00th=[ 28443],
     | 70.00th=[ 31065], 80.00th=[ 34866], 90.00th=[ 41157], 95.00th=[ 49546],
     | 99.00th=[ 82314], 99.50th=[103285], 99.90th=[183501], 99.95th=[299893],
     | 99.99th=[557843]
   bw (  KiB/s): min= 3443, max=31594, per=100.00%, avg=16255.47, stdev=308.83, samples=3456
   iops        : min=  853, max= 7895, avg=4058.17, stdev=77.22, samples=3456
  write: IOPS=4044, BW=15.8MiB/s (16.6MB/s)(1896MiB/120039msec); 0 zone resets
    slat (usec): min=17, max=259285, avg=670.02, stdev=2211.03
    clat (usec): min=8, max=701230, avg=26650.11, stdev=17879.70
     lat (usec): min=560, max=701597, avg=27320.12, stdev=17971.32
    clat percentiles (usec):
     |  1.00th=[  1450],  5.00th=[  4686], 10.00th=[  8586], 20.00th=[ 15270],
     | 30.00th=[ 19792], 40.00th=[ 23200], 50.00th=[ 25822], 60.00th=[ 28443],
     | 70.00th=[ 31065], 80.00th=[ 34866], 90.00th=[ 41157], 95.00th=[ 49546],
     | 99.00th=[ 82314], 99.50th=[101188], 99.90th=[168821], 99.95th=[235930],
     | 99.99th=[549454]
   bw (  KiB/s): min= 3040, max=31964, per=100.00%, avg=16279.61, stdev=311.77, samples=3456
   iops        : min=  751, max= 7986, avg=4064.16, stdev=77.96, samples=3456
  lat (usec)   : 10=0.01%, 20=0.08%, 50=0.04%, 100=0.02%, 250=0.03%
  lat (usec)   : 500=0.04%, 750=0.13%, 1000=0.23%
  lat (msec)   : 2=1.08%, 4=2.52%, 10=7.73%, 20=18.44%, 50=64.83%
  lat (msec)   : 100=4.30%, 250=0.48%, 500=0.02%, 750=0.03%, 2000=0.01%
  cpu          : usr=0.57%, sys=27.96%, ctx=0, majf=0, minf=0
  IO depths    : 1=0.7%, 2=4.7%, 4=12.9%, 8=51.2%, 16=30.6%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=95.6%, 8=1.5%, 16=3.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=484878,485447,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=15.8MiB/s (16.5MB/s), 15.8MiB/s-15.8MiB/s (16.5MB/s-16.5MB/s), io=1894MiB (1986MB), run=120039-120039msec
  WRITE: bw=15.8MiB/s (16.6MB/s), 15.8MiB/s-15.8MiB/s (16.6MB/s-16.6MB/s), io=1896MiB (1988MB), run=120039-120039msec
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Does anyone have any ideas about what's going on?
You're describing problems related to sync writes... not something that's a massive surprise.

Setting sync=disabled on one ZVOL/VM is one way to prove that.

You're also likely running into single thread limits with the IOPS-heavy tests... how many jobs are you using with fio?

For 4K, it seems you haven't tuned the system for smaller blocks, so no surprises there... if you're going to be small-blocks skewed, consider reducing the recordsize on the ZVOL(s) down from the default 128K.
 
Top