Understanding how to scale out RAIDz

Status
Not open for further replies.

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
I am trying to understand FreeNAS RAID behavior and limitation. It is possible to add drive or change your RAID configuration without having your data destroy in the process? For example

Going from a single drive, to mirror to RAIDZ1
or
what if you start a RAIDZ1 with 3 drives, and than add in another drive afterward? Does it redistribute the data evenly across now 4 drives?

Thanks
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Have you checked out the stickies? Namely the noobie guide sticky?
 

david kennedy

Explorer
Joined
Dec 19, 2013
Messages
98
I am trying to understand FreeNAS RAID behavior and limitation. It is possible to add drive or change your RAID configuration without having your data destroy in the process? For example

Going from a single drive, to mirror to RAIDZ1
or
what if you start a RAIDZ1 with 3 drives, and than add in another drive afterward? Does it redistribute the data evenly across now 4 drives?

Thanks



Short answer, NO it does not work this way.

Pools are made out of vdev's. you can expand a pool by adding vdevs but NOT individual drives.

DO NOT MIX REDUNDANCY level of a pool.

Common problem is adding a drive to a raidz pool. If this drive fails, the pool is gone.
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
Sorry for my wrong usage of the terminology here. It sounded like that I can just convert a pool that is currently mirror over to RAIDz1 without being destroying the data. So it may be better off starting out with a RAIDz1 as this give me optional to add on additional vdevs without losing data. However, what do you mean by "DO NOT MIX REDUNDANCY level of a pool."? In addition, why a single drive failure within a raidz pool would destroy the entire pool? Isn't raidz work like RAID5 and able to survive a single drive (zdev) failure? Thanks again
 

david kennedy

Explorer
Joined
Dec 19, 2013
Messages
98
Sorry for my wrong usage of the terminology here. It sounded like that I can just convert a pool that is currently mirror over to RAIDz1 without being destroying the data. So it may be better off starting out with a RAIDz1 as this give me optional to add on additional vdevs without losing data. However, what do you mean by "DO NOT MIX REDUNDANCY level of a pool."? In addition, why a single drive failure within a raidz pool would destroy the entire pool? Isn't raidz work like RAID5 and able to survive a single drive (zdev) failure? Thanks again


There are numerous posts here from people who have created a raziz pool and added a single drive to the pool. This does not expand capacity, it creates a strip between the raid pool and the new drive.
so you get something like (d1, d2, d3, d4, d5) / d6

So the first set (d1-d5) might be raidz but its stripped with d6. Since it is a stripe if the single drive dies the pool is gone.
ZFS doesnt allow you to remove disks from pools, so if you do this you need to destroy the pool and recreate to fix it.

What I mean by not mixing redundancy is two fold, dont stripe your pool with a single drive, and dont create a raidz(1/2) pool and then add mirrors to this. Try to create pools of vdev's with the same redundancy level.

I currently have a raidz2 pool and a raidz1 pool, many think raidz1 is "dead" so you might want to read up on this and determine what is suitable for needs.
 
Joined
Jul 13, 2013
Messages
286
For a small server (where disk costs aren't a big issue) I find mirrors much more flexible than RAIDZ. For a large server (where disk costs dominate), I can't afford 100% redundancy and have to use RAIDZ[123]. I'm going to ramble a bit about how I see the tradeoffs over the life of a server, in hopes that helps some people fill in their mental maps with more detail.

I find that cheap servers are limited by drive bay space (especially if you want hot-swap) plus motherboard controller provision (6 controllers on the two motherboards I've used for ZFS stuff). (One of my servers has a second controller card, so it can handle 14 drives.) (I haven't yet learned about backplane extenders and various ways of getting more drives off of advanced controllers, and should some day; but so far paying for 6 current-tech drives is pushing my limits).

Anyway, let me explain the flexibility using mirrors gives me, for a small household server (currently running 3 mirror vdevs plus a hot spare disk, 3.63T; this is the same server, but different disks, that I started with in 2006, and it's a Solaris not FreeNAS box).

When you plan your zpool, if you intend to keep it running a long time (as opposed to replacing it and copying the data over somewhere else), some advance planning helps a lot.

The basic limitation that constrains this is that you can NOT ever take a vdev out of a zpool. Once you've created a vdev in your zpool, you're stuck with that vdev forever.

You can add a new vdev at any time. However, once you've done so, you're stuck with it forever.

Mirrors are very flexible; you can attach an additional drive to a mirror at any time, and that new drive will get the data copied to it. When the copy completes, you then have a three-way mirror. You can also detach a drive from a mirror at any time. This reduces your redundancy, so you don't want to do it casually, but it's possible.

With a mirror vdev, you can increase its size without ever reducing the redundancy below the starting value. If you start out with a two-way mirror, here's how you increase it later:

  1. Physically install a new, larger, drive.
  2. Attach that drive to the mirror vdev, and wait for it to "resilver". You now have a three-way mirror.
  3. Detach one of the original, small drives from the vdev. You now have a two-way mirror again.
  4. Physically remove the detached original drive.
  5. Physically install a second new, larger, drive.
  6. Attach that drive to the mirror vdev, and wait for it to resilver. You now have a three-way mirror.
  7. Detach the one remaining original, small, drive. You now have a two-way mirror again. And, if the auto-expand property is on, it's bigger (the size of the new big drives).
  8. Physically remove the detached original drive.
Note that, to do this, you need to be able to add one additional drive to your server temporarily. So you can't do this if you have filled it 100%.

RAIDZ[123] vdevs will also auto-expand if you replace all their disks (one at a time, waiting for them to resilver) with larger disks. The difference, though, is that you can't temporarily increase the redundancy on a RAIDZ vdev. You can replace one disk with a bigger disk, wait for it to resilver, and then repeat -- but during that resilver (and a resilver can take a LONG time with 4TB disk drives!) the redundancy is down by one. And, of course, the load of the resilver is in addition to any other load, so the remaining disks are being worked harder than usual, and hence are more likely than usual to fail, just when the redundancy is reduced. So it's considerably more risky.

I've built one mirror-based zpool and one RAIDZ-based zpool, and I think each was the right choice for the budget and storage needs I build them for. I hope my rambling on at length has helped somebody understand the constraints and tradeoffs in these choices over the lifespan of their server! And that those of you who find it uninteresting have stopped reading before now!

(As an example of the ridiculous extremes mirrors can be pushed to, somebody at Sun once built a 48-way mirror on one of their X4500 "Thumper" fileserver boxes. Worked fine.)
 

david kennedy

Explorer
Joined
Dec 19, 2013
Messages
98
For what it is worth, I've got an X4500 in my basement.
It is impressive that such an old piece of equipment takes new 3tb drives without any issues.

Its nice as everything in it is 'clean' and contained to a single unit with all those disks, no cables, expanders or such.
Just put the disk into the caddy and slide it in.
 
Joined
Jul 13, 2013
Messages
286
For what it is worth, I've got an X4500 in my basement.
It is impressive that such an old piece of equipment takes new 3tb drives without any issues.

It was designed at Kealia, a startup aimed at building a large streaming video server (80,000 independent high-definition video streams). They also designed compute servers and that fileserver, since they didn't like any of what was on the market enough. Sun bought them in and took the servers into their product line, and even tried to make a go of the video server, though it never sold into the markets it was designed for (it was the "Sun Streaming Video Server" when Sun had it).

Ah, this looks like the press release on Sun's acquisition of Kealia.

(I worked in that part of Sun from 2005 to 2008.)
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
I am using a Dell MD1220, is a JBOD with 24 - 2.5" SAS/SATA drives. I am using the LSI 9207-8e for my SAS controller. The current plan (given I have some of the drives already) is to use the following drives on the JBOD

Play to run FreeNAS off a 8GB USB flash drive, the server has 32GB of memory with dual socket 4 core CPU.

1 - OCZ Deneva 2 R Series 400GB eMLC SATA SSD (for my Write cache)
1 - Crucial M500 480 GB MLC SATA SSD (for my read cache) - yet to purchase
17 - 15K SAS 146GB disk (these will be used for my servers load)
2 to 4 - HGST 7.2K 1TB SATA disk (these will be in a separate pool for tier 2-3, cold data) - yet to purchase

Plus I have like 6 - 3.5" SATA drive on the server head

I was going to RAIDz the 15K SAS perhaps with a hot spare (Not sure if FreeNAS does hot spare or not) and do something similar to the 7.2K drives also. Key here is having the ability to expand my pool capacity without blowing away the entire pool as I will lose data. Any feedback or though would be appreciated.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First, you want 2 disks for a ZIL. Second, OCZ is probably the worst possible brand you could buy for an item that has a high necessity of working without a problem.

Second, L2ARC is going to hurt performance unless you get MUCH more RAM. For 480GB, you are looking at 96GB of RAM minimum before you should use an L2ARC of that size. Of course, you will probably do what most people do, add it anyway, then are astonished as to why things are going horribly wrong. Your L2ARC should never be more than 5x your ARC size.. and you'd be lucky to even have a 25GB ARC on that system. There's other thumb rules for why you need 64GB of RAM minimum. I'm really not sure what you are trying to accomplish, but just throwing hardware at it because you can is a good way to lose data.
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
So for a system that has 32GB of memory, what is a good recommendation size for my L2ARC? Based on market mainstream and price, would a 128GB be a better choice? It is possible to install a 256GB SSD and partition out only 1/2 of its so there is room to grow if I decide to increase my system memory from 32 to 64 GB?
If you have a choice between Samsung 840 EVO, Crucial M500, or SanDisk Ultra Plus, which one would u get?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
A good size is 0GB. History has shown that using an L2ARC with less than 64GB of RAM works against you. See, the L2ARC uses ARC space for it's index. So if your ARC isn't big enough to handle it's new load(the L2ARC index) then things go slower. This is a common mistake, and it's why people that want very large servers like yours are better off doing consultation work with me or buying a TrueNAS from iXsystems where they will make sure you won't make mistakes like the one you are making. ;)

My guess is your motherboard maxes out at 32GB of RAM so you are looking at replacing your RAM, CPU, and motherboard to go past 32GB of RAM. Quite an expensive mistake if you bought that hardware *just* for FreeNAS.
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
The server I have is a bit older, but have ability to grow. The mother board max out at 128GB, thus upgrade is possible ;) I understand that it is always better to tap the ARC cache first but I can't imagine you can't have some level of benefit of L2ARC unless your physical memory is 64GB or above. It is possible to install a 256GB SSD and partition out only 1/2 of its so there is room to grow if I decide to increase my system memory from 32 to 64 GB?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You can try to carve out your partition, but that's not supported on FreeNAS. As for the RAM thing, this is precisely why the manual says "maximum motherboard RAM" before adding L2ARC. L2ARC is not a joke and it doesn't give guaranteed performance improvements. This isn't like Windows where the solution is to throw more/faster hardware at it. People make this mistake almost daily here, and it's laughable to wonder how much of the IT market these people are propping up with purchases that end up collecting dust on a shelf because they didn't do their homework.

Like I said.. 64GB of RAM should be your minimum before considering an L2ARC of any size. There's probably 100 users in the last year that have been examples of this. And you know, ever single one tries to argue with me, try to dismiss me as if I dont have a clue, etc. People make the same mistake with ZILs too. We've had people lose their pools over very lame mistakes I've tried to warn them of. But, I'm sure you'll do whatever you want anyway and a forum post is not a good avenue to go into extreme detail on this subject. So you can take my word for it, do a few months worth of homework, or just do it and hope it doesn't backfire.

Just keep in mind if you show up later with complaints we're going to publicly flog you, and probably not answer your question, if you start doing things you shouldn't do. Doing things you shouldn't do always cause other problems you probably aren't aware of, and next thing you know it's an avalanche of problems you were unprepared to understand or deal with.
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
I totally understand the risk , and this is why I am reaching out within the forum for guidance but the finally decision or risk is on me ;)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
And we're here to offer that guidance.

My suggestion is that for the cost of that 400GB OCZ SSD you could buy a huge amount of RAM, which will net you more performance in probably 90% of use cases.

You did mention putting "server workload" on here as well - are they VMs? What protocol are you going to use (NFS vs iSCSI) and are you aware of the significant performance loss when you start looking at sync writes?
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
The OCZ SSD was more less a freebie, so no harm there. The plan is to leverage the FreeNAS with 2 main Pool

1 - Pool with SSD and 15K SAS for VM (Infrastructure and some desktops) load
2 - Pool with 7.2K SATA as a mass file share volumes

I have not get to the protocol level yet, but from a network infrastructure presepective, I can go either LAG multiple GB, go with 10GB copper, or start with LAG GB and migrate to 10GB at a later time. As I have multiple GB NIC and 2 10GB ports on the FreeNAS already. What is the latest verdict with FreeNAS around NFS and iSCSI ?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
There's a lot of interdependent things in the architecture there, but I'll run down some quick highlights since I've got to head out for a bit:

- Don't LAGG iSCSI, use MPIO.
- You've already got 10GbE; might as well use it.
- VMs + Parity RAID = Nightmare. Use mirror vdevs for ESXi VMs if you want responsive servers.
- NFS or iSCSI are both fine; what will kill you is the sync writes (standard on NFS, should be manually set on iSCSI) unless you have an SLOG - put the SSD there instead of as L2ARC.
- More RAM. 32GB sounds like a lot until you start putting a bunch of VMs on it. If you can go to 64GB, do it; then you might consider L2ARC down the road.
 

dwchan69

Contributor
Joined
Nov 20, 2013
Messages
141
Few quick comments.
1. What you mean by VMs + Parity RAID?
2. Regarding to your comment "Use mirror vdevs for ESXi VMs if you want responsive servers.", you mean I should take my 16+ sas drive in 8 set of mirror vs a single RAIDz1?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Parity RAID would be any of the RAIDZ types, or in a conventional system, the RAID5/6 levels. They're a poor match for the random I/O that you get from a hypervisor.

Definitely configure it as 8 mirror vdevs for 16 drives total. Even if you weren't using it for VM storage, you shouldn't put a single RAIDZ vdev across that many drives. You've got small disks which means the risk of RAIDZ1's "death" doesn't apply to you as much, but it's still a poor design decision.
 
Status
Not open for further replies.
Top