Troubleshooting low disk write speed

Status
Not open for further replies.

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
I was running badblocks as part of the burn in process and found the estimate time is excessively long ~14days, but I decided to let it finish anyway. Unfortenately at >90% of last pass it was cut off by a power problem.
Code:
RAC0182: The iDRAC firmware was rebooted with the following reason: ac.
Anyway I was able to find the reason why badblocks runs so slow:
Code:
root@freenas:~ # dd if=/dev/zero of=/dev/da4 bs=1M
^C5111+0 records in
5110+0 records out
5358223360 bytes transferred in 369.763618 secs (14490943 bytes/sec)
root@freenas:~ # dd if=/dev/da4 of=/dev/null bs=1M
^C125744+0 records in
125744+0 records out
131852140544 bytes transferred in 670.443633 secs (196664021 bytes/sec)

As you can see that read speed is decent but write is abnormally slow. Now the thing is that if I boot into a Ubuntu live image on the same hardware everything looks fine:
Code:
ubuntu@ubuntu:~$ sudo dd if=/dev/zero of=/dev/sde bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 63.386 s, 169 MB/s
ubuntu@ubuntu:~$ sudo dd if=/dev/sde of=/dev/null bs=1M count=10K
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB, 10 GiB) copied, 55.7659 s, 193 MB/s

So may be a compatibility issue with FreeBSD? Where should I look at next?

My hardware:
Dell R620 as head unit:
Dual E5 2690 v2
128GB RAM
H710P
LSI 9207 8e
Dell network card with 2 x540 2 i350
1 s3500 80G as boot drive

NetAPP DS4243 as DAS:
24 3.5 Bay
swapped in HB-SBB2-E601-COMP IO module because of price on minisas to qsfp cable
12 4TB HGST NL SAS drives HUS726040AL5210
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
I am almost sure this is a FreeBSD problem, booted into FreeBSD 11.1 live CD on a complete different set of hardware (i7 8700k, 16G RAM, 9211-8i) and directly hooked up (break out cable) another sas HDD and sata SSD. The HDD gets ~18MB/s write and SSD is 112MB/s. HDD result much slower than it should be.
 
Last edited:

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
OK, I am thinking that this may be related to either disk cache or something called command queuing: https://lists.freebsd.org/pipermail/freebsd-scsi/2014-September/006487.html

Especially this:

Code:
Disabling caches globally heavily

affects performance, and in most cases is overkill. It means that

_every_ request will go to the media before the operation complete.

For disks without command queuing (like legacy ATA) that usually meant

_one_ I/O request per platter revolution. Do you want disk doing 120

IOPS peak? If you write huge file in 128K chunks, you will get limited

by 120/s * 128K = 15MB/s! Command queuing (NCQ for SATA) significantly

improved this situation since OS can now send more operations down to

the disk to give it more flexibility, in significant part compensating

disabled cache. But number of simultaneously active tags in disk, HBA

or application can be limited, creating delays.



Now when I do the dd test, gstat give me:

Code:
dT: 1.064s  w: 1.000s

L(q)  ops/s	r/s   kBps   ms/r	w/s   kBps   ms/w   %busy Name

	1	111	  0	  0	0.0	111  14195	8.9   98.9| da3



The write IOPS tops at ~112 during the write test, which matches the description above. Now I need a way to verify/change the on-disk cache/command queuing setting......


Update: I think the tags are set to max, but maybe somehow dd only take one queue (I am assuming the L(q) in the gstat means outstanding commands)?

Code:
root@freenas:~ # camcontrol tags /dev/da3 -v
(pass3:mps0:0:9:0): dev_openings  255
(pass3:mps0:0:9:0): dev_active	0
(pass3:mps0:0:9:0): allocated	 0
(pass3:mps0:0:9:0): queued		0
(pass3:mps0:0:9:0): held		  0
(pass3:mps0:0:9:0): mintags	   2
(pass3:mps0:0:9:0): maxtags	   255



I also tried run 3 dd at the same time with tmux, and was able to push IOPS to ~500 and throughput ~50MB/s. It may be either dd or something in FreeBSD limiting the queue to 1 per command.
 
Last edited:

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
I think I find one way solving it, the on-disk cache was disabled on those drives

Code:
camcontrol mode /dev/da4 -m 0x08 -e


and change WCE to 1 to enable write cache. That brought speed to expected:

Code:
root@freenas:~ # dd if=/dev/zero of=/dev/da4 bs=1M count=1K

1024+0 records in

1024+0 records out

1073741824 bytes transferred in 5.725325 secs (187542510 bytes/sec)

root@freenas:~ # dd if=/dev/da4 of=/dev/null bs=1M count=1K

1024+0 records in

1024+0 records out

1073741824 bytes transferred in 5.690740 secs (188682301 bytes/sec)

root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=1M count=1K

1024+0 records in

1024+0 records out

1073741824 bytes transferred in 73.990152 secs (14511956 bytes/sec)

root@freenas:~ # dd if=/dev/da3 of=/dev/null bs=1M count=1K

1024+0 records in

1024+0 records out

1073741824 bytes transferred in 5.365380 secs (200124085 bytes/sec)



Note that the cache was enabled only on da4 and da3 was left disabled.


Now the problem is, is this the right/safe solution? I would think the on-disk cache was disabled for good reason, like prevent data loss on power outages, etc. I don't think they have super capacitors to flush the cache in emergencies. Maybe I should turn it off and explore the command queue approach?
 

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
Apparently, something in FreeBSD set the max IO size to 128K
Code:
root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=32K count=1K

1024+0 records in

1024+0 records out

33554432 bytes transferred in 8.636776 secs (3885064 bytes/sec)

root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=64K count=1K

1024+0 records in

1024+0 records out

67108864 bytes transferred in 8.969837 secs (7481614 bytes/sec)

root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=128K count=1K

1024+0 records in

1024+0 records out

134217728 bytes transferred in 9.347974 secs (14357949 bytes/sec)

root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=256K count=1K

1024+0 records in

1024+0 records out

268435456 bytes transferred in 18.637970 secs (14402612 bytes/sec)

root@freenas:~ # dd if=/dev/zero of=/dev/da3 bs=512K count=1K

1024+0 records in

1024+0 records out

536870912 bytes transferred in 37.292697 secs (14396141 bytes/sec)



Note that throughput scales well with bs until 128K. If it wasn't the 128K limit, even with WCE=0, the throughput should be ~110w/s*1M=110MB/s, not to spec, but not as miserable as 15MB/s.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Ender117

Patron
Joined
Aug 20, 2018
Messages
219
It sounds to me like you may have found something that should be discussed with the developers.
Here is where that is done: https://redmine.ixsystems.com/projects/freenas
Create your account at the link above, then post your issue here:
https://redmine.ixsystems.com/projects/freenas/issues/new
Be sure to update this thread with a link to the ticket so we can find out the answers.
Hi thanks for chime in. I do believe that the 128K max io was the default for FreeBSD, and set at compile: http://freebsd.1045724.x6.nabble.com/Time-to-increase-MAXPHYS-td6189400.html
So I feel unless FreeBSD change it, it won't be ported into FreeNAS. Of course ixsystems may be different but from what I have seen projects like FreeNAS are usually hesitant to change the base OS kernel.
That being said, I also find out ZFS have its own way to mitigate the problem by queuing up the command, so this would only affect badblocks or the alike most.
https://forums.freenas.org/index.php?threads/on-disk-cache-and-zfs-performance.70267/
 
Status
Not open for further replies.
Top