RAIDZ expansion, it's happening ... someday!

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I read recently about alpha code, (or was that pre-alpha?). Anyway, there is an implementation of the RAID-Zx expansion code that does not yet allow newly written data to take advantage of the new stripe width. There are talks about how, if possible, existing data could be re-striped to new full width, (the mythical Block Pointer Re-Write, but in a different form?). This test code is not for production, and not even meant to be permanent, (any pool using it would be in-compatible with later releases).

So it is not dead, just taking time due to other things of higher priority. (Like the FP/SIMD save/restore for RAID-Zx calculations.)
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
I think re-striping old data with a send/receive is a perfectly fine thing to do, at the very least for the time being.

I'd like to ask a very basic question: That alpha code says that "more testing" is desired. How, concretely, could I get in on that? I'm happy to buy a HBA and a bunch of disks so I have a test bed. I can figure out building zfs from source on Linux with the alpha code integrated, though it'll take me a minute. What I don't know is where to go from there. Is there a specific list of things to test?

Basically: If I were to build a test bed, how can I best help to get this code from alpha to beta? And, if this forum doesn't have that answer, where would I find it?
 

djjaeger82

Dabbler
Joined
Sep 12, 2019
Messages
16
I'd also really like to see this feature implemented sooner rather than later. If there's anything we can do to help test and give feedback please reach out!
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
It looks like you'll actually want to build this one:https://github.com/ahrens/zfs/tree/raidz until it gets accepted into the Master branch.

ahrens is asking for help to do a few bits of the code for testing.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Oh that is the one I am trying to build, that raidz branch. I've just not been successful yet.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Well, I seem to have finished a compile (a few warnings, but exit status 0) using Ubuntu Server 18.04.4 LTS and just following the instructions directly including the in-tree build commands.

Now figuring out how to install it...

Sure enough, make install works

Then you need:
sudo ldconfig
and
sudo /sbin/modprobe zfs
to load the needed kernel modules properly and it's off to the races...
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Now it seems more of a problem that the master branch doesn't have the code in it and the branch that does (a081ad5009) can't be cloned... maybe my ignorance on how to do that exactly, but simple attempts like git clone -b a081ad5009 --single-branch https://github.com/ahrens/zfs.git fail stating that the branch can't be found despite this:

1585577103438.png
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, so this was it:
cd into the git cloned zfs directory,
git reset --hard a081ad5009
git checkout a081ad5009

Both of the commands mentioned an ambiguous name issue, but seems to have worked, I just expanded a pool from 4 to 5 disks.

EDIT: Or did I... the expand started to work with the countdown, but then claimed it only works for mirrors and stopped.
 
Last edited:

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
I didn't reset or use hex identifiers. I did checkout raidz, and have since rebased raidz against upstream zfs-0.8-release. So far so good, initial test seemed fine. It'll take me a while to understand the test script and prep to run it, and then we'll see whether it's successful.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
So I started again from scratch with:
Code:
sudo git clone -b raidz https://github.com/ahrens/zfs.git
sudo chown -R user:root zfs
cd zfs
git checkout raidz    (redundant as the clone did that already)
sh autogen.sh
./configure
make -s -j$(nproc)
sudo make install
sudo reboot
sudo ldconfig
sudo /sbin/modprobe zfs
sudo zpool create -fd -o ashift=9 storage raidz2 /dev/sdb /dev/sdc /dev/sdd /dev/sde
zpool status (shows the pool was created)
sudo zfs set mountpoint=/mnt/zfs1 storage   (makes sure we can use the pool at /mnt/zfs1)
sudo zfs mount -a   (just to make sure the mount worked)
cd /mnt/zfs1
sudo dd if=/dev/zero of=./zeros.bin bs=1024k count=6000 (put 6GB of data in the pool, should give the expansion some work to do for at least a minute)
sudo zpool attach -f storage raidz2-0 /dev/sdf (does not work, gives the message about can only work on top level devices and mirrors)
sudo zpool destroy storage
zpool status (shows the pool was destroyed)
sudo zpool create -fd -o ashift=9 storage raidz2 /dev/sdb /dev/sdc /dev/sdd /dev/sde
zpool status (shows the pool was created)
sudo zfs set mountpoint=/mnt/zfs1 storage   (makes sure we can use the pool at /mnt/zfs1)
sudo zfs mount -a   (just to make sure the mount worked)
cd /mnt/zfs1
sudo dd if=/dev/zero of=./zeros.bin bs=1024k count=6000 (put 6GB of data in the pool, should give the expansion some work to do for at least a minute)
sudo zpool attach -f storage raidz1-0 /dev/sdf (does not work, same problem)


Rebase is something new to me, so I'm not sure how to go about that.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Matt has time "later in the week" to take a look. I think I got as far as I will, until he has time to review. I gave him a PR against upstream 0.8. That's a LOT of changes, and while his test script succeeded, I want to make sure I didn't introduce an error during the rebase somewhere.

If you want to play around with it, my rebased raidz branch is (edit) utterly broken when I fsck'd it during push. Bad news, I clearly don't know how to properly handle git. Good news, I still have a working rebased raidz branch locally, and just as soon as I receive a git clue-by-four, I'll push that again and update this post.

After building, I did a "sudo make install", then "sudo ldconfig". A "sudo zfs/scripts/zfs.sh" will load the kernel modules, until next reboot.

"sudo zfs/scripts/raidz_expand_test.sh" runs the test script, which assumes that /dev/sdb is available to create a pool on. It then creates six files inside that pool which it uses to do the expand tests.

I'll just emphasize this for anyone else who wants to test this raidz expansion code, help with documentation, contribute to the effort in some way: All you need is ONE empty disk for testing. I used an old 120GB SSD that was sitting around feeling useless.

I'll probably get six 2TB disks and do full-fat testing on actual drives, eventually, just for some performance testing. And, that's not required to play with it. If you have any unused SAS/SATA device at all (USB is probably asking for trouble), you can jump in.

Note that the testing sretalla and I are doing is on Ubuntu Server 18.04 and Ubuntu Desktop 18.04. "Any supported Linux" should work though. The rebased code supports Linux kernels up to 5.4.

Rebase is a bit of a pain, and I am not sure I used the best / recommended way to go about it. I had to do some manual changes. You are probably better off using my already rebased version if you want to test, instead of duplicating the rebase effort. Once Matt comes back with either "this is okay" or "you need to make a couple changes", I'll feel better about it.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If you have any unused SAS/SATA device at all (USB is probably asking for trouble), you can jump in.
Or a virtual environment... I'm using virtual disks as I don't care about data integrity for this testing. This way I can create as many as needed (I'm just using 8GB virtual disks for now).
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
So, for those following along:

Matt’s raidz branch will build against kernel 5.0, so for example Ubuntu Desktop 18.04.03 - but not Desktop 18.04.04. Ubuntu Server 18.04.04 uses kernel 4.15, so that might be a safer build target.

Testing on a VM is totally a thing. There are some rough edges around make install and how to load kernel modules, please get in touch if you are keen to do some testing.

Matt is interested in bringing his code back in line with where ZFS master is now. Just that some of the data structures he uses have changed in the meantime, so, it’ll take his effort to adapt that. I can do basic carpentry on the code; I’m not going to rearchitect anything. Patience on that one.

The best bet to contribute then is to use the code as is on an older kernel, work on things that Matt asked for in his initial PR, and then move those efforts over once his code has been brought back into line with where ZFS master is now.

Those on this thread, what do you want to do to move this effort along? Is there something that seems like a thing you’d like to do or test?
 
Last edited:

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Note: I have a merge against master. It compiles, Matt's tests complete, zfs-tests suite breaks in the same places it did before the merge. It looks as if this was successful. PR is in; you can see this code at https://github.com/thorsteneb/zfs, it's the raidz branch you want to see. I recommend to always fork from Matt if you're testing raidz expansion; you could merge in my commits if they've not been accepted yet by the time you fork.
 

Keven

Contributor
Joined
Aug 10, 2016
Messages
114
hi guys, i just started to read about that new feature and notice it has been 3 years since it was announce, so are we months or years before official release in FreeNAS
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Years
 

zizzithefox

Dabbler
Joined
Dec 18, 2017
Messages
41
I'd also really like to see this feature implemented sooner rather than later. If there's anything we can do to help test and give feedback please reach out!

By the way, I thought I was a missing this feature in Freenas when I started using it back in 2012. QNAP had that possibility, UNRAID (which is not raid anyway) and traditional enterprise raid systems did too. It seemed a little odd not to be able to do that because it was so common. But at the time I was messing with 1-2TB drives (max 4TB). With QNAP it took like 9 hours to expand the array.

Now that the standard is 4TB drives with scaling prices to 8TB, and you can get 20TB monsters , I don't care anymore and I am actually scared of a feature like this just like I am not really feeling that comfortable going beyond 8TB drives. Maybe 12 TB with RAID-Z3 (Z2 still feasible with SAS 10^16 URE rate)? Just maybe. I would bet the time to complete a rebuild with just 4TB drives is starting to be insane?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I would bet the time to complete a rebuild with just 4TB drives is starting to be insane?
Do the reading, it doesn't work the way you think it does. Allan Jude has a good YouTube video from a year or two back where he explains in detail how all existing data in the pool is maintained with n redundancy/stripe while all new data written gets n+1.
 

zizzithefox

Dabbler
Joined
Dec 18, 2017
Messages
41
Do the reading, it doesn't work the way you think it does. Allan Jude has a good YouTube video from a year or two back where he explains in detail how all existing data in the pool is maintained with n redundancy/stripe while all new data written gets n+1.
I checked and It actually works exactly as I thought: by sequencially reflowing all the data (i.e. rewriting), because it's the obvious thing to do: ZFS is easier to do actually with respects to traditional raid 5 or 6.
The objection stands: this is so much worse than resilvering. I still would not do it with bigger drives.
 
Top