NFS tweak / patch for NFS speeding-up-freebsds-nfs-on-zfs-for-esx

Status
Not open for further replies.

ibmg

Cadet
Joined
Feb 2, 2012
Messages
7
I found the following post

http://christopher-technicalmusings.blogspot.se/2011/06/speeding-up-freebsds-nfs-on-zfs-for-esx.html

I feel the results speak for themselves, but in case they don't - We're looking at an increase in IOPS, MB/per sec, and a decerease in the time to access the information when we use the modified NFS server code. For this particular test, we're looking at nearly a doubleing in performance. Other tests are close to a 10% increase in speed, but that's still a wanted increase.

These test results will be apperant if you're using the old NFS server (v2 and v3 only) or the new NFS server (v2-3-4) that is now the default in FreeBSD 9 as of a month ago.

I've used this hack for over 6 months now on my SANs without any issue or corruption, on both 8.1 and various 9-Current builds, so I believe it's fairly safe to use.

I'm too lazy to make a proper patch, but manually editing the source is very easy:

- The file is /usr/src/sys/fs/nfsserver/nfs_nfsdport.c
- Go to line 704, you'll see code like this;
(Edit: Now line 727 in FreeBSD-9.0-RC3)

if (stable == NFSWRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
else
ioflags = (IO_SYNC IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

- Change the code to look like this below. We're commenting out the logic that decides to allow this to be an IO_SYNC write.

// if (stable == NFSWRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
// else
// ioflags = (IO_SYNC | IO_NODELOCKED);
uiop->uio_resid = retlen;
uiop->uio_rw = UIO_WRITE;

- Recompile your kernel, install it, and reboot - You're now free from NFS O_SYNC's under ESX.


If you are running the older NFS server (which is the default for 8.2 or older), the file to modify is /usr/src/sys/nfsserver/nfs_serv.c - Go to line 1162 and comment out these lines as shown in this example:

// if (stable == NFSV3WRITE_UNSTABLE)
ioflags = IO_NODELOCKED;
// else if (stable == NFSV3WRITE_DATASYNC)
// ioflags = (IO_SYNC IO_NODELOCKED);
// else
// ioflags = (IO_METASYNC | IO_SYNC | IO_NODELOCKED);

Is it possible to apply this patch to FreeNAS ?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, it is, but you'll have to rebuild FreeNAS to do it.

You're aware that this isn't really a safe thing to do for the health of your VM's, right?
 

ibmg

Cadet
Joined
Feb 2, 2012
Messages
7
I was not under the intention to turn off the ZIL

It is only about the option / possibility let talk FreeNAS O_Sync free to ESX

ESX uses a NFSv3 client, and when it connects to the server, it always asks for a sync connection. It doesn't matter what you set your server to, it will be forced by the O_SYNC command from ESX to sync all writes.

By itself, this isn't a bad thing, but when you add ZFS to the equation,we now have an unnecessary NFS sync due to ZFS's ZIL. It's best to leave ZFS alone, and let it write to disk when it's ready, instead of instructing it to flush the ZIL all the time. Once ZFS has it, you can forget about it (assuming you haven't turned off the ZIL).

To me this sounds like the general fix for the poor NFS performance and should not break anything

I would love to see an option in FreeNAS where we could turn off the NFS O_Sync
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
ZFS and NFS are doing the correct thing as it stands. This results in poor performance with ESXi if you don't have a fast SSD for ZIL. This is a side effect of ESXi asking for everything to be written sync, because ESXi has no idea whether or not what it is writing is important to your VM's integrity.

Short-circuiting the sync loop seems like an obvious "fix", except that what you're actually doing is undoing all the careful things that have been engineered to protect data integrity in case of a failure of some sort. It'll even seem to work just fine, up until something crashes, panics, or loses power unexpectedly, at which point your VM image is not being protected properly.
 

ibmg

Cadet
Joined
Feb 2, 2012
Messages
7
Why should i use a fast SSD if i have 256GB memory and 12 cores in my FreeNAS Box ?
Than the ZIL should be handled "in memory" and not be slowed down by any sync mismatch

On Linux it does work and i can get stable 500-600 Mbyte/s speed to my storage (10Gbe) on NFS with 128 threads and some tweaks like noop scheduler
using the H710 Raid on my R720xd with 900GB SAS drives (16 each) running Raid 10

Same hardware with a HBA in IT mode, running FreeNAS will slow down after a while to less than 10Mbyte/s
Raid 10 (in ZFS) as well
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Because when I come along and yank the power cord out of your server, if your ZIL is "in memory," you're going to lose up to 32GB of writes that NFS has supposedly committed to disk, and when your filer comes back up online, the disks are going to be in an inconsistent state with what the still-running VM's expect and have cached in memory. Happy corruption nightmare to you.

Your Linux works faster because it doesn't value the consistency of your VM data. You can get that on FreeNAS too. Just do "zfs set sync=disabled yourdataset" and ZFS will be rocket fast. Of course, it will also cease to value your data's consistency and when you crash or lose power, you will introduce problems into your VM environment, but since Linux is already doing that and you seem to find this acceptable, then ... as they say, it's your data.

Most of the people here are using FreeNAS and ZFS because we value our data. ZFS has a focus on being able to keep your data safe, with checksums and multiple protection strategies. It is inherently slower than a lightweight filesystem that omits all these features. We're more likely to spend some time thinking about how to store our data safely, with speed as a secondary goal, rather than just figuring out how to get NFS to blast transactions as fast as possible with little consideration given to what happens when something goes wrong.

But when something goes desperately wrong and your VM's corrupt because you've told ZFS to disable the ZIL, please be aware that while you may get some mild sympathy from those of us here, you'll also be deafened by the chorus of "we told you so." ZFS provides a mechanism to make this work correctly. Only you can choose to make use of it.
 
Status
Not open for further replies.
Top