OK, so NFS performance sucks

Status
Not open for further replies.

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
Now I need to figure out why. I'm running a server with 72G RAM, 4 SAS drives in a RAID10 style config, with an Intel 320 SSD under-provisioned as a log device.

When I set sync=disabled I get 14,394 IOPS on random writes, versus 50 with sync=standard. Reads are 1,117 vs 88. (This is using FIO which I'm new to, but it means I don't need to install a Windows VM.

In trying to diagnose this I ran zpool iostat and got the following output:

Code:
                                          capacity    operations    bandwidth
pool                                    alloc  free  read  write  read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
nas1mirror                              215G  3.41T      0    250      0  7.14M
  mirror                                108G  1.71T      0    69      0  3.20M
    gptid/1b5a485d-6129-11e3-a041-0026b95bb8bd      -      -      0    52     
0  3.20M                                                                       
    gptid/1bce1e51-6129-11e3-a041-0026b95bb8bd      -      -      0    51     
0  3.20M                                                                       
  mirror                                108G  1.71T      0    79      0  3.15M
    gptid/1c420e8a-6129-11e3-a041-0026b95bb8bd      -      -      0    54     
0  3.16M                                                                       
    gptid/1cb736f5-6129-11e3-a041-0026b95bb8bd      -      -      0    54     
0  3.16M                                                                       
logs                                        -      -      -      -      -      -
  gpt/slog                              12.1M  1.97G      0    101      0  813K
--------------------------------------  -----  -----  -----  -----  -----  -----
        
Which shows that the 320 is doing ~ 101 writes per second (I think), versus the 8,000 that Intel advertises.

It's late and I'm new at this, but I'm not sure where to check first. Any suggestions?
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
Maybe this helps:

slog.gif


Putting the ZIL on an Intel 320 SSD should support more than 100 writes/second, right?

Any idea how the system determines the write capacity % busy number on individual drives?
 
D

dlavigne

Guest
Which version of FreeNAS?

If this is a testing system, there were some commits this past weekend that may increase NFS performance by up to 40%. I'm not sure if they are in the ALPHA builds yet, but should be by tomorrow.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
How are you performing this test and getting these numbers? Right now my guess is your test is failing to test what you think its testing. :P
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
I'm running FreeNAS-9.2.0-RELEASE-x64 - the previous version was working fine, but I liked the idea of only getting e-mails when things broke.

The numbers looked weird to me, with writes > reads (in hindsight, I'm probably getting the flu, so that's part of it).

But there's serious slowness with the VM I'm running on the store that's qualitative enough to be quantitative. Let me wake up and I'll see if I can gather better data.

(Now if I can remember what that command was that gave a realtime look at disk traffic in post #2...)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
gstat
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
Aaaand while watching the server it randomly rebooted (!!!), and now that it's back up it looks like it's working, but I'm getting no traffic across it and Xenserver is giving weird errors when I try to migrate my test VMs back off of it.

So something else may be going on as well.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah.. that's usually what us pros call "a bad thing". ;)
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
Still no more info, but another look at the problem. First pair: sync=disabled. Second pair: sync=standard.

Code:
dd if=/dev/zero of=testfile bs=1024 count=50000
50000+0 records in
50000+0 records out
51200000 bytes (51 MB) copied, 0.341646 s, 150 MB/s
# dd if=testfile of=/dev/zero bs=1024 count=50000
50000+0 records in
50000+0 records out
51200000 bytes (51 MB) copied, 0.154043 s, 332 MB/s
# dd if=/dev/zero of=testfile bs=1024 count=50000
50000+0 records in
50000+0 records out
51200000 bytes (51 MB) copied, 9.99458 s, 5.1 MB/s
# dd if=testfile of=/dev/zero bs=1024 count=50000
50000+0 records in
50000+0 records out
51200000 bytes (51 MB) copied, 0.151925 s, 337 MB/s
So at least the ARC is working as expected. ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And fail...

1. local dd writes are NOT sync writes. I surely hope this isn't what you were trying to test or you are seriously lost. I'm talking "you showed up to a baseball game in a football jersey" lost.
2. 50MB written to the pool is not a test at all. Write double however much RAM you have or 50GB(whichever is greater).. that's a write speed test.
3. Remind me what your results are supposed to show? LOL...
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
Those results were run on a VM whose filesystem was shared via NFS.

What would you like me to run instead?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, but just because you used NFS does not mean they are sync writes either.

There's also caveats for sync writes. Also, sync writes that are over 64kbytes are actually written to the pool immediately. I think you can make dd writes sycn with a particular parameter. But I'm not sure how that works with the whole 64kbyte limit and stuff. Quite honestly, I think trying to benchmark your way to figuring out what is better is kind of an exercise in futility. Your best bet is to just follow the best practices or pay someone to build your system for your needs if you even know what your needs are. Or at least pay a consultant to provide you with a recommended build and settings to use.

As far as I know there is no particularly good benchmark test for sync writes. The real test is to put it in production and see how it does. Part of that logic is also related to pool history. Pools that are closer to being full and that are more fragmented perform more slowly. There's no "benchmark" to take that kind of thing into account. There's plenty of things you cannot run a test to prove conclusively something is better or worse. Sometimes you just have to "know your sh*t".
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
We disagree re: sync writes. Flagging sync=disabled ---> 30x increase in write performance over NFS. With that being the only change, it seems like that's probably an indicator that the sync writes are the issue.

Regardless, this isn't about figuring out which is better. It's simply about trying to understand why there's a 30x difference in performance on NFS with a reasonable SSD set up as SLOG is present. I was hoping for guidance on better tests, like determining what speed the SSD is actually capable of writing at to rule that out as a problem, but instead we're arguing.

Maybe it's me. Flu sucks.

I'll keep digging on this end. Slowly, but I'll try and produce useful data that I can use to determine the source of the issue.

Thanks for your reply.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm not arguing with you. I'm trying to explain why I don't think your tests aren't really reflecting reality or are objective enough to be useful. I also don't think that your initial test that showed 100 writes means anything at all(but you do since that's what started this thread). To me that means that only 100 writes were made at that moment. That could mean anything from "that's what the SSD could do" (which is what I think you think it means) to "that's all the pool wanted to write"(which is what I think it means). The pool doesn't write everything it gets to the ZIL while waiting for the next transaction to come. Some writes will go to the ZIL, some will go to the actual spinning media.

This is where you have to look at the forest despite the trees. The whole sync writes problem revolves around a single issue. You have sync writes that must be written right now. So we have some amount of I/O that is low latency writes. We also have pool reads as well as non-sync writes. All 3 of these add I/O to your pool. You must have enough I/O from your pool to keep the sync writes at a low latency.

Possible solutions are to increase I/O performance or decrease the quantity of I/O that must come from your spinning pool media. Adding more RAM obviously helps the latter. Adding the ZIL and the L2ARC also helps with the latter by not requiring your spinning media to do as much work as stuff gets cached. But to help with increasing I/O performance the solution is to change from the RAIDZ2 many people want to use to many mirrored vdevs.

sync=disabled does more than just increase sync write performance. When I do local data moves if I do sync=disabled my pool speeds up like a rocket. My writes are clearly not sync writes. normally you have a ZIL in RAM. sync=disabled completely disables that. So the actual behavior of the pool changes. All data writes aren't committed until the next transaction group, which also means less metadata writes, which speeds up pool speeds. Remember that 64kbyte limit I told you about before? That effectively goes away completely and its as if every write is stored in RAM(without the slight performance penalty of the SSD transactions). If you have a respectable amount of RAM(I have 32GB) then you can see a massive explosion in performance like you've never seen. Even with your typical non-sync writes.

In short, sync=disabled doesn't definitively prove that sync writes are your problem. sync=standard with your standard ESXi NFS sync writes are basically "real data storage is always persistent" while sync=disabled means all writes are "persistent as soon as the disks can manage". Naturally, this means performance improvements you cannot ever match with an L2ARC or ZIL which you are taking to mean something. It really just means that your pool could perform much faster if you decide to ignore the persistence of your data. As soon as you choose to decide that your data's persistence is important(aka sync=standard or always), then you pay a penalty. That penalty is basically "my pool will always be slower but I can mitigate some of this penalty with more RAM, a ZIL, and an L2ARC"

This stuff is far more complex than I think you realize, and you are interpreting the numbers incorrectly in my opinion.
 

ZFS Noob

Contributor
Joined
Nov 27, 2013
Messages
129
You're probably right. Diagnosed with H1N1 today, wife warned me to "stay away from the computer because [you] make mistakes." She's right.

I'll try and reread your replies later and maybe they'll sink in better.
 
Status
Not open for further replies.
Top