Slow SSD Pool NFS Speed

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
Hi,

I have a weird problem.

I have upgraded my FreeNAS Server from a harddrive -> SSD only pool.

I only use the pool for my VM's via ESXi.

I have a SuperMicro X10SRi-F with a Xeon E5-1620v3 - 128GB DDR4 ECC RAM, Mellanox ConnectX-3 running Ethernet Connected to a Mellanox 40/56Gbit switch
ESXi server with Dual Xeon E5-2650L v3, 256GB RAM, Mellanox ConnectX-3 also connected to the Mellanox switch.

The SSD's are attached to a SAS3 Avago 9400 controller - since I wanted to add in a few NVME drives as well.

The SSD's are all 960GB Intel DC D3-S4510 - which have a Read/write speed around 400-500MB/s Intel ARK

I have partitioned a Intel Optane 900P 280GB, with a chunk for a SLOG and the remaining for a L2ARC.

My pool is layed out as follows:

Code:
  pool: vms
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:15:23 with 0 errors on Sun Sep  1 00:15:23 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        vms                                             ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/b209b095-77b2-11e9-a7ed-a0369f09f4e8  ONLINE       0     0     0
            gptid/b27d8dae-77b2-11e9-a7ed-a0369f09f4e8  ONLINE       0     0     0
          mirror-1                                      ONLINE       0     0     0
            gptid/b331bac9-77b2-11e9-a7ed-a0369f09f4e8  ONLINE       0     0     0
            gptid/b39c11c0-77b2-11e9-a7ed-a0369f09f4e8  ONLINE       0     0     0
          mirror-2                                      ONLINE       0     0     0
            gptid/b40ecb77-77b2-11e9-a7ed-a0369f09f4e8  ONLINE       0     0     0
            gptid/b466218f-77b2-11e9-a7ed-a0369f09f4e8  ONLINE       0     0     0
          mirror-4                                      ONLINE       0     0     0
            gptid/14f69fa3-d94b-11e9-b4eb-00110a6c4808  ONLINE       0     0     0
            gptid/15306c6f-d94b-11e9-b4eb-00110a6c4808  ONLINE       0     0     0
        logs
          gptid/fe943ab6-a541-11e9-b39f-00110a6c4808    ONLINE       0     0     0
        cache
          gptid/0c30960d-a542-11e9-b39f-00110a6c4808    ONLINE       0     0     0

errors: No known data errors


Zfs properties
Code:
root@vmnas:/mnt/vms/esxi # zfs get all vms
NAME  PROPERTY                 VALUE                    SOURCE
vms   type                     filesystem               -
vms   creation                 Thu May 16 10:17 2019    -
vms   used                     529G                     -
vms   available                2.84T                    -
vms   referenced               96K                      -
vms   compressratio            1.97x                    -
vms   mounted                  yes                      -
vms   quota                    none                     local
vms   reservation              none                     local
vms   recordsize               128K                     local
vms   mountpoint               /mnt/vms                 local
vms   sharenfs                 off                      default
vms   checksum                 on                       default
vms   compression              lz4                      local
vms   atime                    off                      local
vms   devices                  on                       default
vms   exec                     on                       default
vms   setuid                   on                       default
vms   readonly                 off                      default
vms   jailed                   off                      default
vms   snapdir                  hidden                   default
vms   aclmode                  passthrough              local
vms   aclinherit               passthrough              local
vms   canmount                 on                       default
vms   xattr                    off                      temporary
vms   copies                   1                        default
vms   version                  5                        -
vms   utf8only                 off                      -
vms   normalization            none                     -
vms   casesensitivity          sensitive                -
vms   vscan                    off                      default
vms   nbmand                   off                      default
vms   sharesmb                 off                      default
vms   refquota                 none                     local
vms   refreservation           none                     local
vms   primarycache             all                      default
vms   secondarycache           all                      default
vms   usedbysnapshots          0                        -
vms   usedbydataset            96K                      -
vms   usedbychildren           529G                     -
vms   usedbyrefreservation     0                        -
vms   logbias                  latency                  default
vms   dedup                    off                      default
vms   mlslabel                                          -
vms   sync                     standard                 local
vms   refcompressratio         1.00x                    -
vms   written                  96K                      -
vms   logicalused              956G                     -
vms   logicalreferenced        13.5K                    -
vms   volmode                  default                  default
vms   filesystem_limit         none                     default
vms   snapshot_limit           none                     default
vms   filesystem_count         none                     default
vms   snapshot_count           none                     default
vms   redundant_metadata       all                      default
vms   org.freenas:description                           local



Tunables
Code:
root@vmnas:/mnt/vms/esxi # zfs get all vms
NAME  PROPERTY                 VALUE                    SOURCE
vms   type                     filesystem               -
vms   creation                 Thu May 16 10:17 2019    -
vms   used                     529G                     -
vms   available                2.84T                    -
vms   referenced               96K                      -
vms   compressratio            1.97x                    -
vms   mounted                  yes                      -
vms   quota                    none                     local
vms   reservation              none                     local
vms   recordsize               128K                     local
vms   mountpoint               /mnt/vms                 local
vms   sharenfs                 off                      default
vms   checksum                 on                       default
vms   compression              lz4                      local
vms   atime                    off                      local
vms   devices                  on                       default
vms   exec                     on                       default
vms   setuid                   on                       default
vms   readonly                 off                      default
vms   jailed                   off                      default
vms   snapdir                  hidden                   default
vms   aclmode                  passthrough              local
vms   aclinherit               passthrough              local
vms   canmount                 on                       default
vms   xattr                    off                      temporary
vms   copies                   1                        default
vms   version                  5                        -
vms   utf8only                 off                      -
vms   normalization            none                     -
vms   casesensitivity          sensitive                -
vms   vscan                    off                      default
vms   nbmand                   off                      default
vms   sharesmb                 off                      default
vms   refquota                 none                     local
vms   refreservation           none                     local
vms   primarycache             all                      default
vms   secondarycache           all                      default
vms   usedbysnapshots          0                        -
vms   usedbydataset            96K                      -
vms   usedbychildren           529G                     -
vms   usedbyrefreservation     0                        -
vms   logbias                  latency                  default
vms   dedup                    off                      default
vms   mlslabel                                          -
vms   sync                     always                 local
vms   refcompressratio         1.00x                    -
vms   written                  96K                      -
vms   logicalused              956G                     -
vms   logicalreferenced        13.5K                    -
vms   volmode                  default                  default
vms   filesystem_limit         none                     default
vms   snapshot_limit           none                     default
vms   filesystem_count         none                     default
vms   snapshot_count           none                     default
vms   redundant_metadata       all                      default
vms   org.freenas:description                           local



I have recently upgraded to a 40Gbit/s network

Network speed between ESXi -> FreeNAS
Code:
root@vmnas:/mnt/vms/esxi # iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[  4] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 32904
[  5] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 60258
[  6] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 35163
[  7] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 42069
[  8] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 47079
[  9] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 35295
[ 10] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 42450
[ 11] local 10.10.10.201 port 5001 connected with 10.10.10.182 port 31983
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  4.88 GBytes  4.19 Gbits/sec
[ 10]  0.0-10.0 sec  4.36 GBytes  3.74 Gbits/sec
[  9]  0.0-10.0 sec  4.53 GBytes  3.89 Gbits/sec
[  5]  0.0-10.0 sec  3.36 GBytes  2.87 Gbits/sec
[  6]  0.0-10.0 sec  4.11 GBytes  3.51 Gbits/sec
[  8]  0.0-10.1 sec  5.11 GBytes  4.35 Gbits/sec
[ 11]  0.0-10.1 sec  4.07 GBytes  3.45 Gbits/sec
[  7]  0.0-10.3 sec  4.07 GBytes  3.39 Gbits/sec
[SUM]  0.0-10.3 sec  34.5 GBytes  28.7 Gbits/sec


Which is not 40Gbit/s but good enough to handle the speed of my pool I hope.

When I write locally using sync writes I get:
Code:
root@vmnas:/mnt/vms/esxi # dd if=/dev/zero of=test2.bin bs=1M count=16k conv=sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 29.987023 secs (572910125 bytes/sec)

if I remove the SLOG I get:
Code:
root@vmnas:/mnt/vms/esxi # dd if=/dev/zero of=test~32.bin bs=1M count=16k conv=sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 5.910511 secs (2906664182 bytes/sec)

Which is more in the ballpart of what I expected for 4 SSD mirrors considering that I am writing all zero's to a compressed pool.

But when doing it via NSF its just damn slow compared (373 MB/sec)
Code:
[root@nasexsi:/vmfs/volumes/c871d057-89ed8108] time dd if=/dev/zero of=test4.bin bs=1M count=16k
16384+0 records in
16384+0 records out
real    0m 46.05s
user    0m 24.61s
sys     0m 0.00s
[root@nasexsi:/vmfs/volumes/c871d057-89ed8108]


Turning off compression on the pool and testing again locally it drops to the speed of almost a single disk:

Code:
root@vmnas:/mnt/vms/esxi # dd if=/dev/zero of=test5.bin bs=1M count=16k conv=sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 35.156742 secs (488664989 bytes/sec)


I am a bit baffled that my SSD only pool performs this "relatively" poor compared to my spinner pool with just 6 disks that could give me 350MB/s via NFS no problem.

Why is my SSD pool so slow with Sync=forced even locally?

And why is the NFS Speed even slower? Is the limit in the NFS implementation in either ESXi or FreeNAS?

When I write via NSF with a single dd I never see the network utilization on the FreeNAS box go above 4Gbit/s - which fits with the approximate speed i get.

Its not that I want to write heaps of data very fast, I am aware that my pool is great for VM's, lots of IOPS, low latency etc, but its sad that a single thread cannot push data when needed any faster.

Any good ideas of what to try to make it run faster?

Or do I just have to be happy with what I have?

Thanks in advance

Bjørn
 

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
My dmesg regarding the card:

mpr0: <Avago Technologies (LSI) SAS3416> port 0xd000-0xd0ff mem 0xf8500000-0xf85fffff,0xf8400000-0xf84fffff,0xfb800000-0xfb8fffff irq 32 at device 0.0 on pci3
mpr0: Firmware: 04.00.04.00, Driver: 18.03.00.00-fbsd
 

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
No,
Not at all - I have kind of resigned and is considering ditching SATA SSD's entirely and going to NVME - or possibly try with a different OS and ZFS and see if that make any different - just to rule out FreeBSD as the issue.
 

colmconn

Contributor
Joined
Jul 28, 2015
Messages
174
Have you tried experimenting with the various combinations of sync on the pool and nfs? I suspect that may be an issue, but you need to experiment to determine if this is the case.
 

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
Hi,

Yes - it does not matter if I set sync=standard or always - of couse disabled is a no go.

Same - it seems like the NFS daemon on FreeBSD is simply not tuned for 10Gigabit.
 

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
Update, I have bought 2xP4510 2TB that I will try in a mirror.

I have also tested my pool via ISCSi and I get 2GB/s reads/3.2GB speeds sequential writes using queue depth=32(writing all zeroes)/2.3GB/s writes writing random data - so ISCSi on FreeNAS is happy with taking my data over the network, its just NFS in ESXi that is a bastard which I have made a "patch" for in the NFSD for FreeNAS and made a pull request, which I hope they will take in, since its fully configurable if you want to run with my code changes or not via a sysctl flag.

https://github.com/freenas/os/pull/208

If you would like the patch in, please say something in the pull request :)

For picture of crystaldiskmark: https://gyazo.com/1fe101a862cbfb8139553cc6668e50ab
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
I have partitioned a Intel Optane 900P 280GB, with a chunk for a SLOG and the remaining for a L2ARC.
This is clearly against recommendation and may be contributing to your below expectation results. Use that drive only for SLOG.

disabling sync is not recomended
Disabling sync when sync is specifically requested/expected is not good... This is how data gets lost.
They clearly state the reason for not supporting it which is that breaking something else that relates to data integrity (a core value of their product) isn't the correct alternative to fixing the root cause of the problem (not their code).

I understand your frustration at not having your expensive hardware go as fast as you were hoping it would, but getting snarky at your FreeNAS vendor (keeping in mind the part of the title that is "Free") isn't likely to bring anyone joy.

I don't expect you will find a lot of support in the forum for your idea. If Microsoft released a "feature" in Windows that would randomly lose user files in an attempt to write things faster but was only invoked with a registry key put in place by the user, what do you think the reaction would be?
 

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
This is clearly against recommendation and may be contributing
I am well aware that it is against recommendation, but it is not the reason for the bad results. The Optane is probably the fastest NVME out there, or at least was when I bought it. And when I am testing its doing nothing else, so there are no other impact on the SLOG, since no reads are being done.

If Microsoft released a "feature" in Windows that would randomly lose user files in an attempt to write things faster but was only invoked with a registry key put in place by the user, what do you think the reaction would be?

My patch will not cause anything like what you just wrote.


Microsoft already have a feature like this:
https://gyazo.com/76b5ecfabd1c8f4fb6f68d3706da3bd5

And its clearly marked as "dangerous" and should not be used unless you have a UPS, and its not hidden away in the registry - its right there in the face of users poking around.

Just like my patch - nobody should enable the sysctl flag unless you have a UPS.

And to be honest I don't really care if I get a "lot" of support in this forum - I am guessing that most people in this forum is using FreeNAS as a NAS for storing movies/pictures/files - not for serving virtual machines via NFS on ESXi - which should probably be done via a SAN instead. But that does not mean that my patch would not bring something good for those that knew the risks and wanted to use NFS for the ease of usage compared to ISCSi. Those people would probably love my patch, since it would make NFS on FreeNAS useful from ESXi - which it really aren't now because FreeNAS inherits FreeBSD's NFS implementation.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
So I picked a bad example. My point remains.

https://devblogs.microsoft.com/oldnewthing/20130416-00/?p=4643

Note that a UPS doesn't give the protection that the setting mentions it requires... you're covered if power to your UPS is cut, but if the power supply itself fails? A whole bunch of data that is recorded as written to the disks is not there.
 

websmith

Dabbler
Joined
Sep 20, 2018
Messages
38
I am well aware that having a UPS is no guarantee for data loss.

But this patch is not for normal people. This is for people that know the risks.

I have dual PSU and UPS which makes me pretty safe from power loss.

I can still lose data if freebsd itself crashes - no amount of sync writes can protect me from that.

Or if a bomb is dropped on my house, or the earth explodes because someone decided to build a hyper freeway and had to remove earth.

Disabling sync is for people that have done the calculations of risk vs reward and decided that the reward upweighs the risk.

It is not for you or me to say it is bad.

I want to be able to do it just for writes coming from NSF, but without setting sync=disabled on the entire dataset. Most users on this forum would never need this, because they don't use ESXi and is not serving VM files via NFS to ESXi exclusively.

I dont use my FreeNAS for anything else, and I have weighed the risk vs rewards and decided that the tiny risk that expose myself to, when I have dual PSU, UPS is not greater than the risk of FreeNAS itself crashing and losing the last n seconds of writes, since they only exist in memory.

If the writes have reached my disks, but no flush command has been sent, the writes are safe, since all my disks have a built in capacitor so I dont lose data in this case.

I take hourly snapshots on FreeNAS, i backup my vm's every night to different disks.

So all in all I think I am pretty safe and would like to decide whether or not sync writes should be respected via NFS, not you or the FreeNSD developers should decide for me.

For normal people just using FreeNAS for storing files that they don't have backup of any other place, have no dual PSU, no UPS, not capacitor backed disks. Then yes, it is probably not the safest idea - but bad. That is up to them to decide, if they don't care if they lose a little data if the power goes, big whoptie do, they already do that when their windows machine dies. FreeNAS will survive, the Pool will survive - so when the system comes back up, the service continues without any issues.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,702
It is not for you or me to say it is bad.
IX can say that it's bad for FreeNAS though... which is what they did.

I'm all for people understanding the risks that they are taking and electing to accept tradeoffs for cost, speed, integrity or whatever.

People often read only what they want to see, so when the option says "tick here to go faster", the fine print that comes after it with a warning about the risks will only be considered after data has been lost. I generally seek to draw attention to those caveats in advance and as loudly as I can.

It's clear you have put a lot of thought and effort into this to build something that's suited to your particular needs, budget and risk profile. Congratulations for that.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110
I still have a feeling there is something amiss here with the sync/benchmark settings.

In your initial post the vms dataset is shown as having sync both "standard" and "always" - under the "Zfs properties" it says "standard" and under "Tunables" it says always. So I'm not 100% clear what the sync setting is when these two tests are being performed:

When I write locally using sync writes I get:
Code:
root@vmnas:/mnt/vms/esxi # dd if=/dev/zero of=test2.bin bs=1M count=16k conv=sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 29.987023 secs (572910125 bytes/sec)


if I remove the SLOG I get:
Code:
root@vmnas:/mnt/vms/esxi # dd if=/dev/zero of=test~32.bin bs=1M count=16k conv=sync
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 5.910511 secs (2906664182 bytes/sec)

Can you help to clarify this? Running zfs get sync vms right before running the benchmark will confirm it.

Tests using /dev/zero on a pool with compression enabled also will not tell you much since they will be effectively compacted to nothing. Disable compression when running these tests?
 
Top