RAIDZ expansion, it's happening ... someday!

Ericloewe · Oct 27, 2017

Stux said:
Offline BPR ;)

It's easier, but the time spent offline is prohibitive to anybody with the cash to finance it.

Stux · Oct 27, 2017

Ericloewe said:
It's easier, but the time spent offline is prohibitive to anybody with the cash to finance it.

Yes. Same reason offline defrag doesn’t exist. Offline anything is anathema to the ZFS ideology, and rightfully so really

Stux · Oct 27, 2017

Ericloewe said:
Well, BPR that works in quasi-linear time is the holy grail.

The real problem with BPR is it touches all parts of the stack and implementing it in a maintainable way is hard.

Ericloewe · Oct 28, 2017

By the way: Links to the videos:

Day 1 (no shakycam here)

Day 2 (videos are in shakycam) - also includes slides for both days and maybe Day 1 videos at a later date.

Chris Moore · Oct 28, 2017

Ericloewe said:
I really don't understand it. The GUI throws warnings at people and they just ignore them. Where have they been desensitized to warning messages?

Windows is constantly throwing warnings that are meaningless and it has caused many people to entirely ignore the warning and just click through without reading.

danb35 · Oct 28, 2017

Chris Moore said:
just click through without reading.

You can't simply click through the warning in this case. To do what users keep doing, you have to go into the volume manager, try to add the disk to your pool, get the warning message, switch into manual setup, use a different set of steps to try again to add the disk to your pool, and then you succeed in hosing your pool. But nonetheless, it continues to happen. But none of the people who have done it have been willing to explain their thought processes, so that maybe something could be done in the GUI to reduce the incidence of such incidents.

Jailer · Oct 28, 2017

danb35 said:
maybe something could be done in the GUI to reduce the incidence of such incidents.

No matter what you do you still won't be able to make people actually read the manual which is where the problem lies.

Chris Moore · Oct 28, 2017

danb35 said:
nonetheless, it continues to happen. But none of the people who have done it have been willing to explain their thought processes, so that maybe something could be done in the GUI to reduce the incidence of such incidents.

Just remove the manual feature from the GUI. If they want to do a manual change, make them learn how to use the command line.
That will slow them down.

danb35 · Oct 28, 2017

Jailer said:
No matter what you do you still won't be able to make people actually read the manual which is where the problem lies.

Sure, if they'd only RTFM, they wouldn't have these problems. But when people keep thinking that the way to replace a disk goes through the Volume Manager, to the extent that they'll work to circumvent any warnings or seat belts in their way, maybe there's a problem with the GUI as well.

Chris Moore · Oct 28, 2017

danb35 said:
maybe there's a problem with the GUI as well.

It is difficult to find the way to replace a disk. It might be worth considering developing an additional page of the GUI that makes it more, 'point and click' simple for the people that are not going to read the manual because some people just have a philosophical problem with reading directions and they will never do it.
My dad was like that. I can't tell you the number of things he built wrong when I was a kid.

Arwen · Oct 28, 2017

Think about this method of RAID-Zx expansion, it's more or less one way. You can add a disk
to a RAID-Zx, but you can't remove the disk later using a similar method, (it's the way they
re-stripe the data). Once you have newly striped data on the expanded RAID-Zx, that vDev
now requires all the columns / disks to be present to maintain it's level of redundancy.

It's somewhat the same reason you can't remove a column / disk from a RAID-Zx without the
holy grail, I mean block pointer re-write. As long as one ZFS block spans the entire width of
the RAID-Zx, it needs to remain on that many disks to maintain redundancy.

So the best we can hope for, is full vDev removal to shrink a pool. Not the single disk method
that is coming up. And certainly not one with indirect pointers, (like that single disk method).

My current guess for practical pool shrink / full vDev removal would involve something like this;

ZFS pool must have more than 1 vDev, (duh).
The free storage for the vDev(s) to remain must be larger than the mount of data used in vDev to be removed.
Freeze all new writes to the vDev to be removed.
Any newly freed data on that vDev to be removed, is not made available.
Allow the SysAdmin to get a list of affected Datasets or Zvols, and potentially affected files in the dataset(s)
As time and I/O bandwidth permit, manually copy the affected data, (which forces it to be written to non-frozen vDevs)
When the entire frozen vDev is completely empty, allow removal.

Annoying, yet it would be 100% on-line and it will work.

Chris Moore · Oct 28, 2017

Arwen said:
Allow the SysAdmin to get a list of affected Datasets or Zvols, and potentially affected files in the dataset(s)

As time and I/O bandwidth permit, manually copy the affected data, (which forces it to be written to non-frozen vDevs)

I think the idea would be to automate this process. If it had to be done manually, forget it. I just tell management they need to buy a new server.

BigDave · Oct 28, 2017

Chris Moore said:
Just remove the manual feature from the GUI. If they want to do a manual change, make them learn how to use the command line.
That will slow them down.

To make the majority of users suffer the loss of a feature because some
lazy person chooses to take chances with an Enterprise OS without
paying attention is not gonna help those people at all.

If you make it idiot proof, someone will make a better idiot...

Chris Moore · Oct 28, 2017

BigDave said:
If you make it idiot proof, someone will make a better idiot...

Sadly, that is true.

Arwen · Oct 28, 2017

Chris Moore said:
I think the idea would be to automate this process. If it had to be done manually, forget it.
...

Practically, it can be semi-automated to some degree.

Simple example. Let's say you have a dataset full of training videos that need to be available 14 hours a day. They change at random intervals, (the SysAdmin has no control over when.) Each evening a cronjobs kicks off and copies some of the videos to an alternate name, then erases the original and renames the copy. The script keeps track of which files need to be copied. Over time, they are all copied and you get an E-Mail. You check current status and if all is good, you remove the vDev.

Natually this simplified method does not work for datasets that have snapshots, clones, and constant active data. Yet, even in some of those cases, active data is re-written. Like backups. If you have a 6 week retention, you just have to wait 6 weeks and the data will move it's self.

But, you are right that it's manual to select what method to free up the vDev to be removed. Then more manual effort to make the move happen.

Chris Moore said:
...
I just tell management they need to buy a new server.

That is probably the correct answer for most business cases.

Jailer · Oct 28, 2017

Chris Moore said:
It is difficult to find the way to replace a disk.

Maybe someone should submit a resource with the instructions and screenshots showing how to replace a drive. ~~Dumb~~ Break it right down to every step involved.

danb35 · Oct 28, 2017

Jailer said:
Maybe someone should submit a resource with the instructions and screenshots showing how to replace a drive. ~~Dumb~~ Break it right down to every step involved.

You mean like this one?

Jailer · Oct 28, 2017

Needs more pictures. Reeeeeally dumb it down.

Then again it would probably get read just as much as the manual does......

danb35 · Oct 28, 2017

Jailer said:
Needs more pictures

OK, how about this?

Jailer · Oct 28, 2017

Yup that's what I was talking about. But I'm sure none of it will help anyway since threads like this one will still continue to show up.

Important Announcement for the TrueNAS Community.

RAIDZ expansion, it's happening ... someday!

Server Wrangler

MVP

MVP

Server Wrangler

Hall of Famer

Hall of Famer

Not strong, but bad

Hall of Famer

Hall of Famer

Hall of Famer

MVP

Hall of Famer

FreeNAS Enthusiast

Hall of Famer

MVP

Not strong, but bad

Hall of Famer

Not strong, but bad

Hall of Famer

Not strong, but bad

Similar threads