Poor write performance

Status
Not open for further replies.

ShaneJ

Cadet
Joined
Aug 8, 2013
Messages
9
Hi guys,

I have a FreeNAS build that I have been using for some time now for VMware VM storage without any problem however after trialing another storage product I'm starting to suspect that the NFS write performance I'm seeing from my FreeNAS isn't as good as it could or should be.

I have done some basic tests to try and identify the cause however I am a total noob and could do with a few pointers :|

First this is my hardware:

Supermicro X9DRi-F
2x E5-2620 CPU
128GB RAM (8x Hynix 16GB ECC Reg. DDR3 1600MHz)
2x 60GB Intel SSD 520s
LSI 9211-4i
Chelsio T420-CR NIC
LSI 9207-8e HBA attached to a 48 bay JBOD configured for multipathing


1st Volume:

8x 2TB Hitachi Ultrastar HUS723020ALS640 4 x 2 Mirrors
2x 240GB Intel S3500 Striped Cache
2x 100GB Intel S3700 Mirroed LOG

2nd Volume (an experiment):

5x 480GB Seagate 600 Pro ST480FP0021 2 x 2 Mirrors + spare


At this time to gather performance stats I have just been using Microsoft's SQLIO using the variables below in a VMware VM running on a SuperMicro blade connected directly to the FreeNAS via 10GB fibre.


2 threads writing for 120 secs to file e:\testfile.dat using 8KB random IOs
using specified size: 20480 MB for file: e:\testfile.dat


sqlio -kW -t8 -s120 -o8 -fsequential -b8 -BH -LS


Here are the results:

First two runs were with sync=standard

Datastore IOs/sec: MBs/sec:
1st Volume 357.1 2.78
1st Volume 364.63 2.84

Datastore IOs/sec: MBs/sec:
2nd Volume 85.37 0.66
2nd Volume 85.32 0.66


Second two runs were with sync=disabled

Datastore IOs/sec: MBs/sec:
1st Volume 1675.79 13.09
1st Volume 1642.71 12.83

Datastore IOs/sec: MBs/sec:
2nd Volume 7143.41 55.8
2nd Volume 7792.43 60.87



I also performed a dd locally on both volumes:

1st Volume:

[root@FreeNAS] /mnt/01/ds01# dd if=/dev/zero of=ddfile bs=4k count=2000000
2000000+0 records in
2000000+0 records out
8192000000 bytes transferred in 24.427025 secs (335366258 bytes/sec)

2nd Volume:

[root@FreeNAS] /mnt/02/ds01# dd if=/dev/zero of=ddfile bs=4k count=2000000
2000000+0 records in
2000000+0 records out
8192000000 bytes transferred in 26.817128 secs (305476411 bytes/sec)

Unless I am mistaken the dd shows the volumes are capable of 319 and 291 Megabytes per second so the performance I'm seeing over NFS is a little poor?

While writing this post I thought about running the above SQLIO test on the same volumes over CIFS rather than NFS. Here are the results:

Datastore IOs/sec: MBs/sec:
1st Volume 2850.83 22.27
1st Volume 2709.79 21.17

Datastore IOs/sec: MBs/sec:
2nd Volume 15737.16 122.94
2nd Volume 15830.84 123.67

Even over CIFS the write performance to the spinning disks is terrible which made me start wondering if I had network issues. But then when I ran the same test on the SSD volume the results show that the network appears to be fine.

So at this point I am lost and I'm not sure how to proceed with the investigation.

Ideally what I am looking for is to get NFS performance inline with what the final test results show above. If this means swapping out the Intel S3700's with something better well so be it but I dont beleive this alone will give me what I am looking for.

I would appreciate advice or pointers.

Thank you
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, if you read the forums you know I admonish extensive benchmarking because it's pointless. You've unknowingly proved yet again that benchmarking ZFS is almost impossible without being an expert.

In your case you are doing raw benchmarking that couldn't possibly take into account the L2ARC. It takes time for data to be cached in the L2ARC (which will make benchmark numbers appear to be nearly the same as without the L2ARC). To boot, benchmarking completely invalidates the fact that your "working data size" has a tremendous impact on how much RAM and L2ARC you should have.

So to me, your numbers prove nothing of value. NFS has been and will always be slow for VMs. If you want to run VMs iSCSI has major advantages over NFS. ESXi extensions via VAAI is just one small piece of a larger puzzle. Some of the VAAI extensions cannot be substituted by hardware regardless of the depth of your pocket. So iSCSI is definitely a better idea. There's quite few other reasons, but I won't go into them because I really don't plan to go into a deep discussion this late at night. But long story short... yes, NFS is slow. Try iSCSI. And give up trying to benchmark your system.

I don't know what "other product" you might be referring to. But I can tell you that ZFS' data integrity protection requires some pretty significant hardware while other products that don't use ZFS see major performance advantages just because they don't use ZFS.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
First off, you don't want to use NFS datastores for MS SQL as you have to create thin-provisioned disks. MS SQL should only be run on a thick-provisioned eager zero disk which means you have to use iSCSI datastores. That could also be why you're seeing the results with SQLIO, IDK. Test you network gear to make sure you don't have a flaky optic or poor switch before moving on to server/config.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First off, you don't want to use NFS datastores for MS SQL as you have to create thin-provisioned disks. MS SQL should only be run on a thick-provisioned eager zero disk which means you have to use iSCSI datastores. That could also be why you're seeing the results with SQLIO, IDK. Test you network gear to make sure you don't have a flaky optic or poor switch before moving on to server/config.

Actually, things are nastier than even that. A thin-provisioned disk adds zero value because:

1. ZFS is a copy-on-write filesystem. So the desire to have the data contiguous is meaningless for ZFS. You'll be fragmenting your data with every write no matter what. This is called "ZFS protects your data at all costs and its your job to outmaneuver ZFS' performance penalty with more harware".
2. The default setup is with lz4 compression enabled. So your 100 bajillion TB thick provisioned takes up like 50KB.
 

ShaneJ

Cadet
Joined
Aug 8, 2013
Messages
9
Thanks for you reply.

I have been reading the forums for a long time, though I don't specifically follow all your posts so I was unfamiliar with your dislike of benchmarking.


Despite your opinion, I am looking for something that can be used to confirm if a storage platform is in fact performing or not - Whether that be seat-of-the-pants "feel" or through using an application such as SQLIO. In my case both currently suggest poor performance.

Unfortunately I don't have the knowledge or understanding to know why the L2ARC is having an impact on my write performance. I haven't re-read the

documentation again this morning to refresh my memory but I thought the L2ARC affected reads?

From reading lots of threads on these forums I do understand that NFS is slow but what is not clear is what constitutes slow and what doesn't constitute slow.

So why is NFS slow? Hundreds of posts on these forums state (in simple terms) it's due to all writes having to be sync writes. Some of the same forum threads also suggest to set sync=disable to diagnose slow writes and see if performance increases. In my case, performance did increase a little but still appears to be pretty poor in my opinion.

I haven't done any testing using iSCSI but as per my original post I have performed tests over CIFS and the performance is still poor. If I had

seen a great improvement in writes over CIFS I probably would have just accepted that yes NFS is slow and moved on.

As per my CIFs tests above, I am happy with the writes to the SSD volume, however the writes to the spinning disk volume are poor. I know the SSDs should be faster but I'm suspecting that maybe something is wrong with the configuration of the spinning disk volume.

So forgetting that I even mentioned NFS for a moment can you suggest a way to test write performance to my primary volume?

Thank you


First, if you read the forums you know I admonish extensive benchmarking because it's pointless. You've unknowingly proved yet again that benchmarking ZFS is almost impossible without being an expert.

In your case you are doing raw benchmarking that couldn't possibly take into account the L2ARC. It takes time for data to be cached in the L2ARC (which will make benchmark numbers appear to be nearly the same as without the L2ARC). To boot, benchmarking completely invalidates the fact that your "working data size" has a tremendous impact on how much RAM and L2ARC you should have.

So to me, your numbers prove nothing of value. NFS has been and will always be slow for VMs. If you want to run VMs iSCSI has major advantages over NFS. ESXi extensions via VAAI is just one small piece of a larger puzzle. Some of the VAAI extensions cannot be substituted by hardware regardless of the depth of your pocket. So iSCSI is definitely a better idea. There's quite few other reasons, but I won't go into them because I really don't plan to go into a deep discussion this late at night. But long story short... yes, NFS is slow. Try iSCSI. And give up trying to benchmark your system.

I don't know what "other product" you might be referring to. But I can tell you that ZFS' data integrity protection requires some pretty significant hardware while other products that don't use ZFS see major performance advantages just because they don't use ZFS.
 

ShaneJ

Cadet
Joined
Aug 8, 2013
Messages
9
Thank you for your reply.

I assume your statements regarding SQL were brought on by the fact that I am using SQLIO for performance testing? SQLIO has nothing to do with SQL, its just a silly name they went with.

Write performance to my SSD volume appears to be ok, I would just like to see similar performance to my primary volume.

Network gear is perfectly fine.

Thanks


First off, you don't want to use NFS datastores for MS SQL as you have to create thin-provisioned disks. MS SQL should only be run on a thick-provisioned eager zero disk which means you have to use iSCSI datastores. That could also be why you're seeing the results with SQLIO, IDK. Test you network gear to make sure you don't have a flaky optic or poor switch before moving on to server/config.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
My mistake I figured you were going to host a sql database there because the sqlio tool, instead of the more popular ATTO or crystaldiskmark IO tools. NFS in a ZFS file system is going to suffer, but that's been stated many times.

Sorry for my misunderstanding.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
There's a *bunch* of reasons why NFS performs slowly. Some are design factors that cannot be worked around and others are just due to improper configurations. Unfortunately it takes an expert to even try to narrow down the exact problem (even I have difficulty with it).

The shorter solution is to use iSCSI. As a general rule it works better if you aren't needing to share files themselves. Things like ESXi datastores and SQL databases and other workloads that don't require you to be limited to a file can see significant improvements when doing a comparison of iSCSI vs NFS on the exact same hardware.

iSCSI for ESXi is pretty much the way to go for a bunch of other reasons like multipath IO and VAAI support. The only "good" reason to go with NFS for ESXi is that you can do raw VM driectory copying from the CLI on the FreeNAS server itself which is far faster than trying to do stuff over the network.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Going to second cyberjock's suggestion for a trial on iSCSI.

Random thoughts as well:
  • Have you checked the disks to see if one is reallocating sectors like mad or otherwise throwing SMART errors?
  • Are the S3700s underprovisioned to a reasonable size? I know the 100GBs come factory underprovisioned, but an SLOG should never need to be that big anyways.
  • Speaking of SLOG and transaction sizes, is it possible you're just overwhelming the pool disks? You've got 10GbE and decently fast SLOG so it's possible. I wouldn't expect it to be as poor as it is though.
 
L

L

Guest
I would like to understand the slow nfs performance also. In every other platform, including freebsd, you will see near identical performance for iscsi and nfs with zil.
 
L

L

Guest
these numbers blow my mind

Second two runs were with sync=disabled

Datastore IOs/sec: MBs/sec:
1st Volume 1675.79 13.09
1st Volume 1642.71 12.83

Datastore IOs/sec: MBs/sec:
2nd Volume 7143.41 55.8
2nd Volume 7792.43 60.87

It seems the exact opposite as I would expect. I would expect the system with more vdev to have better performance.

can you do a sysctl -a | grep ashift
 
Status
Not open for further replies.
Top