RAIDZ expansion, it's happening ... someday!

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Offline BPR ;)
It's easier, but the time spent offline is prohibitive to anybody with the cash to finance it.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
It's easier, but the time spent offline is prohibitive to anybody with the cash to finance it.

Yes. Same reason offline defrag doesn’t exist. Offline anything is anathema to the ZFS ideology, and rightfully so really
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Well, BPR that works in quasi-linear time is the holy grail.

The real problem with BPR is it touches all parts of the stack and implementing it in a maintainable way is hard.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I really don't understand it. The GUI throws warnings at people and they just ignore them. Where have they been desensitized to warning messages?
Windows is constantly throwing warnings that are meaningless and it has caused many people to entirely ignore the warning and just click through without reading.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
just click through without reading.
You can't simply click through the warning in this case. To do what users keep doing, you have to go into the volume manager, try to add the disk to your pool, get the warning message, switch into manual setup, use a different set of steps to try again to add the disk to your pool, and then you succeed in hosing your pool. But nonetheless, it continues to happen. But none of the people who have done it have been willing to explain their thought processes, so that maybe something could be done in the GUI to reduce the incidence of such incidents.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
maybe something could be done in the GUI to reduce the incidence of such incidents.
No matter what you do you still won't be able to make people actually read the manual which is where the problem lies.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
nonetheless, it continues to happen. But none of the people who have done it have been willing to explain their thought processes, so that maybe something could be done in the GUI to reduce the incidence of such incidents.
Just remove the manual feature from the GUI. If they want to do a manual change, make them learn how to use the command line.
That will slow them down.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
No matter what you do you still won't be able to make people actually read the manual which is where the problem lies.

Sure, if they'd only RTFM, they wouldn't have these problems. But when people keep thinking that the way to replace a disk goes through the Volume Manager, to the extent that they'll work to circumvent any warnings or seat belts in their way, maybe there's a problem with the GUI as well.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
maybe there's a problem with the GUI as well.
It is difficult to find the way to replace a disk. It might be worth considering developing an additional page of the GUI that makes it more, 'point and click' simple for the people that are not going to read the manual because some people just have a philosophical problem with reading directions and they will never do it.
My dad was like that. I can't tell you the number of things he built wrong when I was a kid.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Think about this method of RAID-Zx expansion, it's more or less one way. You can add a disk
to a RAID-Zx, but you can't remove the disk later using a similar method, (it's the way they
re-stripe the data). Once you have newly striped data on the expanded RAID-Zx, that vDev
now requires all the columns / disks to be present to maintain it's level of redundancy.

It's somewhat the same reason you can't remove a column / disk from a RAID-Zx without the
holy grail, I mean block pointer re-write. As long as one ZFS block spans the entire width of
the RAID-Zx, it needs to remain on that many disks to maintain redundancy.

So the best we can hope for, is full vDev removal to shrink a pool. Not the single disk method
that is coming up. And certainly not one with indirect pointers, (like that single disk method).

My current guess for practical pool shrink / full vDev removal would involve something like this;
  1. ZFS pool must have more than 1 vDev, (duh).
  2. The free storage for the vDev(s) to remain must be larger than the mount of data used in vDev to be removed.
  3. Freeze all new writes to the vDev to be removed.
  4. Any newly freed data on that vDev to be removed, is not made available.
  5. Allow the SysAdmin to get a list of affected Datasets or Zvols, and potentially affected files in the dataset(s)
  6. As time and I/O bandwidth permit, manually copy the affected data, (which forces it to be written to non-frozen vDevs)
  7. When the entire frozen vDev is completely empty, allow removal.
Annoying, yet it would be 100% on-line and it will work.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
  • Allow the SysAdmin to get a list of affected Datasets or Zvols, and potentially affected files in the dataset(s)
  • As time and I/O bandwidth permit, manually copy the affected data, (which forces it to be written to non-frozen vDevs)
I think the idea would be to automate this process. If it had to be done manually, forget it. I just tell management they need to buy a new server.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Just remove the manual feature from the GUI. If they want to do a manual change, make them learn how to use the command line.
That will slow them down.
To make the majority of users suffer the loss of a feature because some
lazy person chooses to take chances with an Enterprise OS without
paying attention is not gonna help those people at all.

If you make it idiot proof, someone will make a better idiot...
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I think the idea would be to automate this process. If it had to be done manually, forget it.
...
Practically, it can be semi-automated to some degree.

Simple example. Let's say you have a dataset full of training videos that need to be available 14 hours a day. They change at random intervals, (the SysAdmin has no control over when.) Each evening a cronjobs kicks off and copies some of the videos to an alternate name, then erases the original and renames the copy. The script keeps track of which files need to be copied. Over time, they are all copied and you get an E-Mail. You check current status and if all is good, you remove the vDev.

Natually this simplified method does not work for datasets that have snapshots, clones, and constant active data. Yet, even in some of those cases, active data is re-written. Like backups. If you have a 6 week retention, you just have to wait 6 weeks and the data will move it's self.

But, you are right that it's manual to select what method to free up the vDev to be removed. Then more manual effort to make the move happen.
...
I just tell management they need to buy a new server.
That is probably the correct answer for most business cases.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
It is difficult to find the way to replace a disk.
Maybe someone should submit a resource with the instructions and screenshots showing how to replace a drive. Dumb Break it right down to every step involved.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Maybe someone should submit a resource with the instructions and screenshots showing how to replace a drive. Dumb Break it right down to every step involved.
You mean like this one?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Needs more pictures. Reeeeeally dumb it down.

Then again it would probably get read just as much as the manual does......
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Yup that's what I was talking about. But I'm sure none of it will help anyway since threads like this one will still continue to show up.
 
Top