Pool Expansion over SAS cable

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
Hi Guys,

We have a 15 drive raidz2 freenas sharing out to some video editors over SMB. Everything has been great but they are running out of storage space.

So I installed an expansion chassis with 20x 10TB drives via a 12G SAS cable. Wow that’s a big jump.

I’m thinking of expanding the pool across both chassis so that we don’t have to recreate the user permisions and folder structures all over again. What do you guys think?

Should I expand the current pool or create a new pool?

My setup is two super micro chassis. The head unit is 15 drives raid z2 with 128gigs of RAM. The expansion chassis is just a empty chassis with 20x 10TB HDDs and a SAS card.

I’ve connected the two chassis via a 12Gig mini SAS cable.

I’m thinking of making 2 groups of Raidz3 on the expansion chassis and then expanding into the current pool.

If something goes wrong is there a way to revert to the previous model, basically to remove the SAS chassis and put it back the way it was?

So I guess I have two questions. My first question is should I expand the current pool across the two chassis’ or should I make a new pool?

Then my second question is: if I decide to expand the current pool and let’s say the performance gets worse(which I doubt) but if it does, then is there a way to remove the 20x10TB drives from the hypothetical new pool? Or would I be stuck with my decision?

Thanks for any advice
 
Last edited:

blanchet

Guru
Joined
Apr 17, 2018
Messages
516
Key points
  • If you expand the pool, it is impossible to revert back without destroying the pool.
  • for optimal performance, it is recommended to have homogeneous vdev in the pool.
  • IOPS increases with the number of vdev in the pool
If I were you, I will
  • create a new pool (for example tank2) on the expansion chassis
  • use ZFS replication to replicate the datasets between the old and the new pool
  • export the old pool (it means disconnect it)
  • export the new pool (it means disconnect it)
  • import the new pool with the previous old pool name.
  • then operate from the new pool for few weeks/months
  • when you are out of space again, destroy the old pool to expand the new one
Advantages
  • It gives you the opportunity to change the vdev structure
  • If something goes wrong at the very beginning, you can still import the old pool
  • no need to change for user permissions: ZFS replication is an exact copy.
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
1/ Don't mix raidz2 and raidz2 vdevs in the same pool
2/ No you cant rool the pool back once done
3 Dont use raidz1 for large drives...

What @blanchet wrote is more or less what I was about to complete this post with
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
@blueether do you mean don’t mix raid z2 and raidz1? Also, why is raidz1 bad for large drives vs small drives? I need to be able to convince my boss if I am to change vdev raid structure.

@blanchet thanks so much for this.Do you know how I would go about using the ZFS replication tool to copy between pool? I’ve never done it.

So I think I can get 1.2GBps out of the transfer between the two chassis. Does that sound about right? It would take about 30 hours to transfer 130TB or is my math wrong?
 
Last edited:

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
@blueether do you mean don’t mix raid z2 and raidz1?
yes. don't mix vdev types in the same pool, including stripe/mirror/raidz1/2/3
also, don't use raidz1 with drives over 2-4TB unless you either:
A) do not care about the data or
B) really know what you are doing AND have a backup (you should have backup anyway)
for 15 drive vdev I would have recommended raidz3, since all your eggs are in the same basket.
6x raidz2/3 with 7 drives (42 total) [420 raw/270 usable @ 10TB raidz2]
5x raidz2/3 with 9 drives (45 total) [450 raw/330 usable @ 10TB raidz2]
4x raidz3 with 11 drives (44 total) [440 raw/309 usable @ 10TB raidz3]
3x raidz3 with 15 drives (45 total) [450 raw/316 usable @ 10TB raidz3]
some of these leave extra slots you can use for hot-spares and replacements.
you can start your new pool in the disk shelf with 1-3 groups, replicate your data over, and cannibalize the old array to add more vdevs, expanding the array by same size vdevs.
if you make a pool that spans multiple chassis, be aware that if something interrupts connectivity to the slave chassis, the pool will have major issues

you can replicate b/w pools on the same server with the GUI by using localhost as the target, or at the command-line. the GUI does not yet have resumable replications, so if there is any risk of being interrupted, you might want to use command-line+resumable. (if your server is online reliably that shouldn't be a concern).
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
We have a 15 drive raidz2 freenas sharing out to some video editors over SMB.
is this a single vdev? and what are the drive size of these original drives? this may effct the pool layout decisions a little
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
is this a single vdev? and what are the drive size of these original drives? this may effct the pool layout decisions a little
theit is one storage space currently named Tank. With one 15 drive HDD raidz2 10TB each. I guess it’s one vdev.

So the other chassis has 30 drives in it. So my boss wants to make 5vdevs (I think that’s what they are called) of raidz1 each.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
of raidz1 each.
I'm not sure what part of "Do not use raidz1" your boss isnt understanding but...do not use raidz1 for large drives. the statistics are that the entire array will die when you try to resilver any drive.
If you are not sure enough about what a vdev is, that tells me you are not knowledgeable enough to fully understand the risks and how/when you can use raidz1, and so really probably should not do so, particularly if you do not have a solid backup in place.
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
I expanded the pool to stripe across both chassis’ two separate vdevs each raid z2, 15 drives in each vdev within one pool.

In my math a SAS 12 connection gives 1500MBps throughput, is that about correct? EDIT: looks like SAS twelve is 4800 MBps because it’s multilane x4.

Which means that will be the maximum bandwidth of the chassis?

This NAS is connected via a 4x 10gigE LAGed together. So theoretically do 4000MBps.

So is my performance throttled by the SAS 12 cable? EDIT: no I don’t think so.


Oh and another thing, after I expanded the pool, it instantly finished. There was no redistribution or initialization or scrubbing or reslivering or anything. How does freenas expand the pool without data redistribution?

I appreciate all your guys feedback so far, it’s been so informative.
 
Last edited:

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
How does freenas expand the pool without data redistribution?
the drives are added to the pool and new data is striped across all vdevs. existing data doesn't change until written to, at which point copy-on-write applies and it is all rewritten like a new file across all vdevs
I assume you mean SAS3 (12gb/s). sas 12 isn't an existing designation.
wouldn’t allow me to mix raidz1 and raidz2.
ah, that's right, i forgot they implemented that in the GUI so...people don't do exactly what you tried to do, which we told you..not to do :/
20xHDD raidz2
not what I would recommend, but it's your data. do hope you have a backup though...
So is my performance throttled by the SAS 12 cable
unlikely. sas3 is generally overkill if you aren't using SSD's
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
Is there anyway to assign a hot spare to a specific vdev. I have two chassis striped together across a SAS link using two different vdev going to one pool call tank. I have one hot spare in the top chassis and I have one hot spare in the bottom chassis. I would like to assign them separately so that if a drive dies on the top vdev the top hot spare will take over.

And vice versa for the bottom chassis I want to assign a hot spare to the bottom vdev. This way the hot spares don’t stripe across two different chassis which I feel may limit performance a little bit.
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
the drives are added to the pool and new data is striped across all vdevs. existing data doesn't change until written to, at which point copy-on-write applies and it is all rewritten like a new file across all vdev


If I run a scrub after installing the new vdev will it moved data around into both vdevs?
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
If I run a scrub after installing the new vdev will it moved data around into both vdevs?
no, scrub doesn't write data unless it finds bad data. currently the only way I know of to balance data is to move it or change it, this may change in freenas12 with some of the features that are planned. if this is really important to you, move some big stuff off and then put it back. otherwise zfs will balance new data, you can just leave it alone to do it's job; data redundancy is maintained at the vdev level, so as long as you have all vdevs the data is available, it doesnt matter which vdev actually has it.
Is there anyway to assign a hot spare to a specific vdev
don't believe so. you'll need to do warm spares (manually replace with existing spare drive)
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
a couple of points:
20 wide vdev is not recomended, have a read of the ZFS primer https://www.ixsystems.com/community/resources/introduction-to-zfs.111/
I think that you/your client will feel that proformance might be poor with only 2 vdevs, the pool will only proform about the speed of two disks. if you had rearanged the pool layout to:
  • Pool (Name: Tank)
    • Chassis 1 (20 x 10TB)
      • 10 x 10TB raidz2
      • 10 x 10TB raidz2
    • Chassis 2 (30 x 10TB)
      • 10 x 10TB raidz2
      • 10 x 10TB raidz2
      • 10 x 10TB raidz2
You should have over 2x the throughput
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
20 wide vdev is not recomended
frankly, from what I have seen so far, this person may not be...skilled enough to be the solo provider of this service (freenas admin). they state themselves that they are not confident they can follow the recommendations that we might think are relatively simple. they may want to consider...finding someone who is.
they can continue as is, with the risks already described, but they should know that while following something outside the recommendations if they run into trouble *they* will be on the hook, and may have trouble getting help here after being told they probably shouldn't do it.
that said, if it does what they want, w/e, it's their data.
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
a couple of points:
20 wide vdev is not recomended, have a read of the ZFS primer https://www.ixsystems.com/community/resources/introduction-to-zfs.111/
I think that you/your client will feel that proformance might be poor with only 2 vdevs, the pool will only proform about the speed of two disks. if you had rearanged the pool layout to:
  • Pool (Name: Tank)
    • Chassis 1 (20 x 10TB)
      • 10 x 10TB raidz2
      • 10 x 10TB raidz2
    • Chassis 2 (30 x 10TB)
      • 10 x 10TB raidz2
      • 10 x 10TB raidz2
      • 10 x 10TB raidz2
You should have over 2x the throughput

When I read this article https://calomel.org/zfs_raid_speed_capacity.html it shows the speeds of 24 drive wide vdevs being quite strong, certainly not near the one drive theory represented in the link you posted. There is a ton of information out there, so I'm trying to shift through it all.

I was under the understanding that the more drives spinning the faster the performance, but after looking futher into it, it seems that might not be the case and while a 10 drive pool of Raidz2 seems to be the sweet spot I do not see the performance impact of a single disk on our pool.
 

TidalWave

Explorer
Joined
Mar 6, 2019
Messages
51
frankly, from what I have seen so far, this person may not be...skilled enough to be the solo provider of this service (freenas admin). they state themselves that they are not confident they can follow the recommendations that we might think are relatively simple. they may want to consider...finding someone who is.
they can continue as is, with the risks already described, but they should know that while following something outside the recommendations if they run into trouble *they* will be on the hook, and may have trouble getting help here after being told they probably shouldn't do it.
that said, if it does what they want, w/e, it's their data.

@artlessknave i do appreciate your advice, I'm sorry you feel that way.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
one of reasons really large raidz2 is not recommended is you now have 18 compounded points of failure, and if 3 drives fail at once in any vdev, which, because you have 20 drives in each vdev, you have an increased chance of occurring, you will loose the entire array. this is why I recommended at least raidz3 for 10-15 drives, and you are WAY over that. ultimately, it's your funera; since you touched it, if anything goes wrong you will be the scapegoat.
 
Top