Revisit ZIL on Ramdisk

Status
Not open for further replies.

TasMot

Dabbler
Joined
Sep 15, 2011
Messages
14
I am using my FreeNAS 8.3.0 box as a backup for my ESXi boxes. I have 6 WD 2TB drives in a raidz2 pool. The problem is that when the backups run at night, I am slowing down at the NAS. Since this is actually Windows backups (mix of 2003, 2008, XP, and 7), it is fairly large writes. It is mostly incremental backups, but there are once a week full backups. The backups start at 11:00pm and are sometimes still running when I logon at 8:00am. My thought that since so much writing is going on from ESXi (which does the synchronous write thing) that some ZIL would help. I've read a bunch of posts about using a ZIL and it seems that an enterprise SLC SSD is desirable due to the massive number of writes that will occur. They are way out of the budget (expensive to start with and 2 in a mirror raid are needed). My next thought is that I could fairly cheaply add some additional RAM and create a Ramdisk to use. I KNOW, volatile due to power outages. I think I have that solved. I have a the FreeNAS box on a UPS. The UPS only needs to last 30 seconds until the standby generator automatically turns on.

With the power problem solved, will this ramdisk work as expected and reduce the time needed for the backups? If so, how do I create the ramdisk at startup and then point the raidz2 at it to use? If so, I have the 6 2TB disks which give me a usable 8TB of storage. What size should I make the ramdisk. I have 8 GB of ram in the machine. It will go up to 32GB. Thanks for your help. tom
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Our HP MicroServer N36L, which is kind of my %!+&#box for trying things because it only houses short term backups, recently got itself a nice Intel 320 40GB SSD, the MLC one with supercap or array of caps or something like that. I had been letting ESXi do NFS with sync=disabled (again, backups) and it was inconsistently averaging ~300-400Mbit/sec. It's pretty much hitting a wall now at around 200Mbit/sec with the SLOG ZIL device.

So ZFSv28 doesn't totally shatter if you lose a SLOG ZIL - you can use a non-mirrored ZIL and the right things should happen even if it fails, unless maybe it fails catastrophically simultaneously as power fails.

And while enterprise SLC SSD is great, it is freaking expensive. A 40GB MLC with power loss protection is $100ish. Underprovision it to 2 or 4GB (we did 2GB) and add it to the pool and let wear leveling work some magic.

UPS's are often just as fallible as other components. I've seen them drop load numerous times. For sites where we're providing power protection, we typically use a RATS with a pair of UPS's, or even a pair of RATS with three UPS's, engineered to feed from separate circuits, with one PDU on each RATS feeding loads - so stuff with redundant power supplies actually needs multiple failures to cause power loss.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
A polite "bump the RAM to 32 GB" and see what happens first would be a nice answer without a RANT on your part.

My impression is that you'd be best off looking at what I suggested, and/or if you are exclusively doing backups AND you've got confidence in your backup power arrangements, then you might look at using sync=disabled - after reading every bit of pro and con on the topic. More memory is helpful for reads, not so much for writes.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You know, I went to ebay and searched for "ssd slc" and I saw SSDs by Intel and other reliable manufacturers for less than $100. So I'm not sure where you're getting the idea that SLC drives are outrageously expensive, but they're not. Even an 8GB SSD is so big you'll never use that much space for a ZIL. In fact, if you had just 1GB of space, that's still alot of space.

Yes I can read. I didn't "absorb" information on ZIL, I "read" it. But it appears that your definition of "expensive SLC SSD" could use some work though. And while we're on the topic of reading did you ever read how the ZIL works and how what you are attempting to do won't work because of all of the reasons I listed above? Yeah, I didn't think so...

jgreco beat me to it. I was thinking that using a ZIL on a ramdisk is pretty much just as risky as sync=disabled, but I'm not a particular fan of doing things like that before you've ruled out other viable(and simple) options such as adding more RAM.

Edit: and I could be wrong, but I believe that if you were to set sync=disabled the performance you'd get is exactly what you'd get if you were using a ZIL. So you can technically see how a ZIL might help you before you even drop the dough, but my money's on more RAM first.

The problem is that you cannot just go out and get any SLC SSD. In order to properly implement POSIX semantics, which is the whole point of the ZIL, you need to be able to assure that data is committed to stable storage even if written immediately prior to a power loss. And ZFS doesn't really give a damn about SLC vs MLC; that's just a matter of designer/builder preference. It is important to understand that devices with high write IOPS may use large write caches to give the appearance of better performance; this translates to more data lost if power blinks and there is not protection. Newer MLC devices like the Intel 320 have a strategy that uses an array of traditional capacitors to give the device sufficient power to complete in-flight writes.

What you'll find is that suitable SLC SSD's such as Intel third generation (Intel 313, etc) are either rather expensive or in short supply. For the purposes of outfitting a NFS target for backups, I got fed up and ordered a 320, largely because they were readily available and I felt that any risk of MLC crapping out under ridiculous ESXi write loads was mitigated by underprovisioning and Intel's warranty. These things are getting cheaper, faster, and better anyways.

As a side note to cyberjock, no, sync=disabled gives you the write speed that the pool would be capable of without worrying about any of the pesky sync stuff. With the SLOG ZIL device, the data that is to be written sync is written to the transaction group as with sync=disabled, BUT the SLOG device has to be given a separate write that must be completed before the client can be sent an acknowledgement.

And "write completed" from ZFS's view really means acknowledged by the SSD controller, not necessarily actually stored in flash, but probably in the SSD's RAM write cache... which is why the whole supercapacitor thing is so important, because the SSD does need to finish storing it in flash.
 
Status
Not open for further replies.
Top