SSDs, ARC and ZIL

Status
Not open for further replies.

hoboville

Dabbler
Joined
May 4, 2013
Messages
14
Hello, I've been recently contemplating doing an SSD setup for L2arc and possibly ZIL. IIRC, it is recommended that ZIL should have SSDs in RAID1 so you don't lose the log. I've read about people partitioning a pair of SSDs with one set of mirrored and one set of striped partitions for ZIL and L2ARC, respectively. It sounds enticing.

My questions then: is RAID0 safe for L2ARC SSD? Do SSDs handling L2arc or ZIL reduce the need for more RAM?

Thanks in advance!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sigh.. this crap is really complex, so here goes:

1. ZILs should be mirrored for reliability.
2. L2ARCs should be striped for performance. (although they don't have to be striped, especially if you are using a fast SSD)
3. You shouldn't be doing HW raids for anything with ZFS, so saying raid0 and raid1 is totally wrong because there is no raid0 or raid1 with ZFS. There is "striped" and "mirrored. Use the proper terminology or you'll be flogged to death here. ;)

L2ARCs do and don't reduce the need for more RAM at the same time.

An L2ARC allows you to use RAM as an index to the L2ARC drive. Each entry uses 380bytes of RAM and the L2ARC block size varies. So you will use RAM to make the l2arc. Now assuming that your l2arc is able to perform acceptably, the theory is that your l2arc can extend the usage of its RAM by up-to 5 fold. Keyword: up-to. It may only go to 2 fold, and it may go to 10 fold. That whole "varies in size" thing changes things based on your loading and how data is stored on the pool physically.

Since your L2ARC uses RAM, you have to have more RAM than you would otherwise need. The common ratio is about 5:1. So your L2ARC shouldn't exceed 5x your ARC. Typically, your ARC is about 50-80% of your RAM if you have 16GB+. But you still need RAM for ZFS to do its normal tasks. Because of this, generally if you want an L2ARC you're wanting it to be fairly big, which means you are buying lots of RAM to start(hence the comment in the manual and my noobie guide to max out your RAM before using an L2ARC. With all this knowledge, for almost 100% of users you are instantly saying if you don't have 64GB of RAM and plan to go with an L2ARC that is less than about 200GB you shouldn't use one. And obviously for most people going with a small L2ARC like 32GB or 64GB doesn't make sense. So obviously the morons people that buy 8GB of RAM and think that they can have more speed by going with a 128GB L2ARC are people that have no clue what they are doing. Some of them have lost their data because of them starving ZFS, and I don't feel bad for them at all. In short, unless you plan to go big, L2ARCs aren't a solution.

Hopefully now you see why L2ARCs aren't a solution for someone that doesn't want to buy enough RAM. You have to buy more than enough RAM so you can take advantage of L2ARCs.

So you have to buy more RAM than you might otherwise need to store the L2ARC index, and assuming your L2ARC is of acceptable performance it can result in a net gain. There is no guarantee depending on how your server is used.

ZILs are an "all-write" device. Unless your system crashes during a write your ZIL will never be read from. Reads from ZIL only occur on bootup and that is to commit any data in the ZIL to disk. Otherwise the ZIL is nothing more than a copy of the stuff that "needs to be written to the pool" but is already cached in RAM. So a ZIL does nothing to change how much RAM you need. Everything in your ZIL should always be in RAM also with the exception being booting up a system after an unplanned shutdown.

So, after all that stuff I've tried to simplify down, there's 2 things to take away from this:

1. A ZIL will not change your RAM requirements.
2. An L2ARC will increase your RAM requirements, but at the potential benefit of an explosion in random read performance.
 

KTrain

Dabbler
Joined
Dec 29, 2013
Messages
36
Hmmm, more good information regarding this SSD thing I keep reading about. I still don't feel like I get it, but things are starting to make more sense.

Thanks again cyberjock.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Hello, I've been recently contemplating doing an SSD setup for L2arc and possibly ZIL. IIRC, it is recommended that ZIL should have SSDs in RAID1 so you don't lose the log. I've read about people partitioning a pair of SSDs with one set of mirrored and one set of striped partitions for ZIL and L2ARC, respectively. It sounds enticing.

My questions then: is RAID0 safe for L2ARC SSD? Do SSDs handling L2arc or ZIL reduce the need for more RAM?

Thanks in advance!


I've done a TON of research on this and asked a wide variety of people these types of questions. Interestingly, the more people you ask the more the answers vary. No one seems to agree on this topic (just my opinion).

The one bit of consensus I could find is that just about any pool can benefit from ZIL. Therefore, that's what I did. I added an entire Samsung 840 pro to ZIL. If you are worried about the SSD failing, add a mirrored vdev. Some folks will tell you to partition the SSD and give only 8-10G to ZIL. IMO, it's better to give the ZIL more so that you don't wear out the one spot on your SSD you give to ZIL (remember SSDs wear out). If you give it more, then the drive should move the used areas around a bit so you don't wear the drive out so quickly.

Don't partition the drive and use part of it for ZIL and part of it for L2ARC. You'd be defeating the purpose of the SSD, which is to speed things up. If you're going to do it, don't cut the bandwidth in half (or what ever) to the SSD but splitting it up.

L2ARC seems to be up in the air a bit and based on your needs. As CyberJock points out, you do loose some DRAM to support the L2ARC. In my situation, I'm going to continue to test and monitor my setup and make the decision later.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The one bit of consensus I could find is that just about any pool can benefit from ZIL. Therefore, that's what I did. I added an entire Samsung 840 pro to ZIL. If you are worried about the SSD failing, add a mirrored vdev. Some folks will tell you to partition the SSD and give only 8-10G to ZIL. IMO, it's better to give the ZIL more so that you don't wear out the one spot on your SSD you give to ZIL (remember SSDs wear out). If you give it more, then the drive should move the used areas around a bit so you don't wear the drive out so quickly.

That red area is totally wrong. Anyone who knows anything about how SSDs work knows that SSDs reallocate their data all over the place, abstracting the entire drive. That's how wear leveling has worked since it existed. You are better off doing a small partition so the SSD will use the unpartitioned space for allocations, thereby increasing performance AND longevity. Sorry, but you are wrong. This is one of many situations with ZFS where it is important to know how deep the rabbit hole goes and be willing to dig down as deep as you can.

You are right there's no concensus because to truely answer a question with a solid "yes" or "no" you have to know alot about your server. Your current hardware, your current configuration, your use case, etc. It's not something that can ever be handed out with solid answers as it varies widely. Its no different than asking what kind of gasoline goes in your car. Many people use Unleaded, some must use higher octane, still others are diesel, and still others must use leaded gas. So you have to know for your situation. To get all this stuff right your options are (1) learn all about this stuff for yourself and do it yourself or (2) pay someone to do it for you.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
That red area is totally wrong. Anyone who knows anything about how SSDs work knows that SSDs reallocate their data all over the place, abstracting the entire drive. That's how wear leveling has worked since it existed. You are better off doing a small partition so the SSD will use the unpartitioned space for allocations, thereby increasing performance AND longevity. Sorry, but you are wrong. This is one of many situations with ZFS where it is important to know how deep the rabbit hole goes and be willing to dig down as deep as you can.

Point taken. I just had this exact conversation with someone else who said just the opposite of this which is why I yanked apart my previous ZIL config and threw an entire SSD at it. I'll look at this further.

r (2) pay someone to do it for you.

Yes, and then when it breaks and they aren't around, you are screwed. Better to either just buy a big expensive black box full of disks that comes with a nice support agreement, or learn how to do it yourself so you can fix it when it breaks.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, and then when it breaks and they aren't around, you are screwed. Better to either just buy a big expensive black box full of disks that comes with a nice support agreement, or learn how to do it yourself so you can fix it when it breaks.

Not necessarily. I've had one or two people that have wanted to build a system, and paid me for an hour or two of my time. They paypal me some money, then we talk on the phone or Skype and he ta;ls about all the stuff that I ask, then I give them a recommended build. They order the parts and assemble it themselves, then they call me up and I do some of the initial setup. I make sure to setup the VERY important stuff for them as I don't want them losing data without having warning that things are wrong, etc. At this point you don't have to worry about much else as you've got the 2 biggest problems for most users solved; rightsizing the server/buying components that are compatible and will provide the performance you need and the initial configuration of FreeNAS for their intended function. Takes about 3-4 hours of my time beginning to end and I explain in some detail the decisions I make and why I make them. And if something happens in a year and they come back asking for me to help rebuild their ZIL because it failed, it'll literally take me 10 minutes to help them. Why wouldn't I when I just spent a few minutes writing this out for a complete stranger?

Not too many people actually come to me asking for help. And to be honest, I bet 5-10% of the users around here would save massive amounts of time(in research and STILL getting the wrong components) and money(since they had to buy 2 or 3 extra components because they got it wrong). And still, there's some users that could save more than $2000 as they buy a whole server, get the WHOLE thing wrong, then still have to buy another one!
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Nothing wrong with that, nothing at all. I'm sure you are right that many people could save alot by getting some help up front so they are at least pointed in the right directly.

I've just had a bad experience going to far down that path with my business. I relied too much on one person, who set it all up. Yes, you do save ALOT of money over windows gear and license when you go the GPL (or similar) software route for your fileserver/email/web/pbx, etc. But what happens when it breaks and YOU (the pro) aren't around? Can't call any of the local IT companies because none of them will touch a *nix box. They will just say they need to replace everything with windows gear -- I know because I've tried.

I'm still a 100% GPL fan, but I want to know how everything works and be able to fix much of it myself if I have to. This is how I'm building my secondary location rack. I may not be able to fix all of it (e.g., if I lost my zpool, I'd need help from someone like you to recover it.) However, I can fix alot of it AND I know how it all fits together because I built most of it.

My rack at my main office is another story another story. That thing breaks and I'm down for the count unless my *nix guy is available. Sometimes he is and sometimes he isn't. One time, we were down over a week because I lost my primary raid volume on my Promise Vtrak box and part of that volume wasn't backed up properly (due to desktop drives in the array -- HA!) He was around, but he has a day job, so could only help so much. Needles to say, my business partners don't trust that rack as far as I could toss it. I'm slowly working to replace all of it with zfs and Joyent KVM visualization. I'll know how it all works and I'll know how to troubleshoot it and fix most of it.

Just saying: Unless you are a total bonehead, there is real value in learning this stuff and not just blindly putting it in -- unless you want to hire someone like you full time to manage it. ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just saying: Unless you are a total bonehead, there is real value in learning this stuff and not just blindly putting it in -- unless you want to hire someone like you full time to manage it. ;)

Truer words could never be spoken. I am currently unemployed ;)
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
you're employed sir. You are the official Forum Guard Dog/Admin of FreeNAS.
 

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
That red area is totally wrong. Anyone who knows anything about how SSDs work knows that SSDs reallocate their data all over the place, abstracting the entire drive. That's how wear leveling has worked since it existed. You are better off doing a small partition so the SSD will use the unpartitioned space for allocations, thereby increasing performance AND longevity. Sorry, but you are wrong.

This is one of many situations with ZFS where it is important to know how deep the rabbit hole goes and be willing to dig down as deep as you can.

Just a followup. Talked with an illumos zfs developer tonight and ssd wear is not abstracted. If it were abstracted, commands like dd would make mess of things (not my words).

It is better to give ZIL much more than zfs needs to extend the life of the ssd. Zfs will only use about 30 seconds of write cache no matter how big you make the ZIL vdev.

This is how illumos zfs code works. Maybe the zfs code in FreeBSD is different, I dont know.

Sent from my XT1060 using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Reality check bro... OpenSolaris/Illumos is NOT FreeBSD. So his info is possibly accurate for OpenSolaris/Illumos, but NOT for FreeBSD. Quite frankly, we're tired of people running to Sun, OpenSolaris/Illumos, and Linux on ZFS for answers. Some of those OSese require you to use unpartitioned disks for performance. There's a reason we do NOT do that in FreeBSD. It doesn't matter! In OpenSolaris you would be absolutely stupid to partition out ZFS because the caching system only works on unpartitioned disks last time I checked.

Anothe reality check: Programmers are not great at hardware implementation. They know their software inside and out, but they aren't exactly well known for hardware implementation.

It's like this. The only thing that is for certain i that a bigger ZIL doesn't help performance. So either a smaller ZIL doesn't break anything, or a smalle ZIL doesn't help anything. So does it even matter who is right or wrong?

You want more fun, some SSDs do even weird crap than just the stuff I'm talking about. Feel free to read up on how garbage collection works on your own SSD and make your own decision.

If you read Intel's own papers on their SSDs they talk about underpartitioning drives to increase drive lifespan regardless of your OS and regardless of the file system on your SSD. I don't remember the exact number, but leaving just 2% of your drive unpartitioned increases lifespan by decades. So please tell me that your Illumos guy knows more than the people that invented the darn things! I've deliberately underpartitioned every SSD I own by 2-5GB to take advantage of that extra lifespan, and I've been doing it since 2009 when Intel published their paper on extending SSD lifespans. Not surprisingly, my drives ALL say 97% or higher despite being in systems that have been up 24x7 for more than 3 years straight. And I don't do anything to minimize writes on them. In fact, I just replaced my laptops 160GB SSD with a 240GB because it's too small, and it says 99% lifespan remaining! And that laptop has been on probably 80% of the time since I bought it in February 2010. In fact, just a few weeks ago I was talking to someone in IRC about my drive's estimated life remaining and the Intel SSD toolbox estimates drive failure in 170 years or something completely absurd!

Edit: Here's a short presentation on SSDs. In particular, this presentation is discussing how wear leveling algorithms abstract the LBA to PBA boundary, resulting in it being difficult to forensically analyze disks.

I'm trying desperately to find that original article from a few years ago talking about leaving disk space unpartitioned. I'm 99% sure it was an Intel powerpoint presentation...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's some cool discussions on the topic. Keep in mind that FreeNAS has no TRIM support as far as I know.

Here's Intels discussion! I found it because I just wanted to prove myself! http://www.matrix44.net/cms/wp-content/uploads/2011/07/intel_over_provisioning.pdf (I've attached the PDF because it appears to be hard to find as it's been removed from Intels cache servers).

The important stuff:

Increasing the spare area allocation of an SSD will result in performance and endurance gains. Increasing this area, also known as “over-provisioning”, is similar to the concept HDD users term “short stroking” the drive. Enlarging the spare area increases the available “ready to be written” resource pool which decreases write amplification. Since there is less background data movement required, performance and endurance increases.

Once the SSD is in a clean state, reduce the usable capacity (which increases the spare area) using one of two methods:
•Issue the SET MAX ADDRESS command (part of the ATA specification) to set the maximum address the operating system (OS) can see. HDPARM* and HDAT2* are third-party industry tools that can be used to issue this command.
•Define a partition that is less than the maximum available capacity of the SSD. This option can be found in the OS drive configuration tools.
Both SET MAX ADDRESS and partitioning will reduce the user addressable space and allow the SSD to use the remaining space as part of the “ready to be written” resource pool.

AFAIK every Sandforce controller has used the partition boundaries for over provisioning drives and the default is enabled. Now if your drive manufacturer choose to disable the feature is a different story(and they don't generally talk about what they do). Some unscrupulous companies *cough* OCZ *cough* are known to enable every feature that gives them maximum performance(even stuff that Sandforce tells them not to do because it can cause data loss) because their whole end-game was to have the fastest SSD around. Not surprisingly, they also had the fastest failing SSDs around. The problem I've always had with Sandforce is they offer up tons of "tunables" but there's no way for the end-user to actually objectively look at the settings and see if CompanyA is choosing conservative settings to protect data while CompanyB is choosing absurdly dangerous settings. I've always felt that this is one way that OCZ was able to fool the market for so long. If they are selling the same controller as other brands, then the uninformed masses will look at OCZ and BrandA and buy OCZ because it's cheaper. Then, add on to that little bit of info the fact that OCZ has been well known to use the lower endurance memory and things get even worse for the uninformed masses. Keep in mind that the vast majority of OCZ failures were NOT related to worn out memory but were due to firmware issues(some studies put that the value of drives that failed from being worn out at less than 2%!). If OCZ had kept their firmware problems from being problems, its quite possible they'd be doing very well financially right now. Instead they are defunct.

So I stand by what I said. And asking an illumos guy for help is pointless as this isn't a software problem. It's a hardware problem that is dealt with entirely on the SSD controller. This isn't some secret sauce that only applies to ZFS' ZILs. To the ZFS programmer the abstraction of the LBA to PBA layer is invisible and unknowable. You, as the end-user, are not capable of running any kind of tool to obtain any information on the LBA to PBA translation. It is also one reason of several why "defragging" SSDs is pointless as your LBA to PBA layer hides your fragmentation(or lack thereof), and you are incapable of knowing the difference! Defragmenting your SSD may correct LBA fragmentation, but that PBA translation completely buggers up any chance of you "defragmenting" files. In fact, there's talk of a new file system that is better designed for handling fragmentation at the LBA level as all file systems that currently exist are somewhat inefficient with fragmented files, but it is obvious that the solution to the problem is not to "defragment" your files because that will unnecessarily wear out your SSD. Cool huh?
 

Attachments

  • intel_over_provisioning.pdf
    190.8 KB · Views: 383

gzartman

Contributor
Joined
Nov 1, 2013
Messages
105
Reality check bro... OpenSolaris/Illumos is NOT FreeBSD. So his info is possibly accurate for OpenSolaris/Illumos, but NOT for FreeBSD. Quite frankly, we're tired of people running to Sun, OpenSolaris/Illumos, and Linux on ZFS for answers. Some of those OSese require you to use unpartitioned disks for performance. There's a reason we do NOT do that in FreeBSD. It doesn't matter! In OpenSolaris you would be absolutely stupid to partition out ZFS because the caching system only works on unpartitioned disks last time I checked. .

I dont need a reality check. I know the difference between freebsd and Solaris/Illumos. I also know that most the zfs coding happens in illumos and freebsd essentially ports over. They write the majority of the code so it makes most sense to ask them the hard questions.

Quite frankly there seems to be a consistent theme of disagreement with the facts the illumos guys state, even when they provide proof they are correct. I personally put of stock in their advice given the vast VPS, Cloud Storvd , and managed systems services they run. They talk the talk and wall it daily.

While they are doing all of that over there, most of us over here are trying to build low cost storage bank for our video games, movie libraries, and the wife's photos and music library with desktop grade hardware. How many actual zfs developers for freebsd do we have here?

ZFS caching most defiantly work with partitions on illumos. PM me and I'll give you creds to see just this on my spool.

Anothe reality check: Programmers are not great at hardware implementation. They know their software inside and out, but they aren't exactly well known for hardware implementation.

With respect to zfs, says who??? All the zfs devs Ive met are quite good at it.

I'd like to see some solid technical paper or data proving your point beyond just ranting and saying "I'm right and you are wrong"
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, but someone a few months back with 9.1.0 was talking about how TRIM didn't work. He provided code and some command that I ended up googling on the topic.. I'll have to see if I can find that thread now. It definitely got me interested. One thing I'm confused about is TRIM on the ZIL. Somewhere someone had talked about TRIM on the ZIL and said that there was none. That is, the code excluded the ZIL for whatever reason. Being that I'm not a programmer I can't read the code and figure it out for myself. I do have the book though, and I have started on C, but I'm a long way from reading code fluently.
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
Is this info about undersizing a partition available in the manual?
Small high performance NAS with SSD would be sweet. The cost though...
It is new to me so thanks of the info! Might save me a drive or two when I finally get a SSD.
 

daimi

Dabbler
Joined
Nov 30, 2013
Messages
26
How to "over-provisioning" the SSD for ZIL? Please give sample on the commands that I should type in SSH shell to the FreeNAS.

Any suggestion whether Intel SSD S3500 (6Gb/s) or 320 series (3Gb/s) should be used for ZIL. Both SSDs support "Enhanced Power Loss Data Protection".

Or I simply buy an UPS, in order for me to buy any SSD without "Enhanced Power Loss Data Protection" feature?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You should have an UPS as well as using an SSD with enhanced power loss data protection.
 

Enchanted14

Dabbler
Joined
Oct 4, 2014
Messages
13
This thread is a bit old but could not resist commenting. I have been working with solid state drives since the Gen2 devices entered the scene back in 2008. I am very pleased to see that ZFS now natively supports TRIM as I have been reluctant to use an SSD formatted ZFS because of my concern that TRIM may not be supported so that is great news! Thank you Dusan!

A comment about over-provisioning, Gen3 and later drives have unallocated space for over-provisioning built in from the manufacturers typically 10 percent which is the conventional wisdom appropriate amount for wear leveling algorithms to perform their functions. So manual setting of space by the end user is not necessary. Just plug that SSD in, configure it for your intended use and let her rip!

The discussion concerning partitioning of an SSD used as a ZIL is typical in a discussion of solid state drives. Given the fact that ZFS does support TRIM and current generations of solid state drives have built in over-provisioning makes the partitioning argument loose merit. With TRIM active and 10 percent over-provisioning by default a current generation SSD used as a non-partitioned ZIL should work flawlessly. Thus I am going to do just that. I will be ordering today a 60GB SSD and provision the entire drive as a ZIL on my FreeNAS box immediately upon getting the SSD in hand. That will provide the safety valve to my system in case my UPS fails to function as intended and should save the day as it were!
 
Status
Not open for further replies.
Top