Replaced a dead drive, FreeNAS "assumed" replacement needed to be a stripe

Status
Not open for further replies.

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
So I've done a bit more testing, and the warning is there in 9.2.1.3 as well. I don't know how many versions back it goes; I'm not going to test them all. You cannot click through it or say "OK, do it anyway"; the only way to end up in this configuration is by using manual mode.

@coloradogeek, not only did you (or whoever was working on your server) fail to RTFM about how to replace a disk (and it gives clear, click-by-click instructions), you also ignored a big red warning message and intentionally changed into the manual mode of the volume manager to allow you to mess up your pool this way. Yes, FreeNAS will allow you to shoot yourself in the foot if you try hard enough (and it's true that, in some cases, "hard enough" isn't very hard at all). But to say it "assumed" you wanted to add the disk as a stripe is simply incorrect--you told it this, explicitly, at least twice before it happened.
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
It makes no sense that replacing a drive would cause this.

I believe this is a zpool parse error. Can someone with the problem paste the output of the command "zpool status"
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I feel like if people keep incorrectly doing this type of thing then the FreeNAS GUI might want to add a warning/extra confirmation when attempting to add a single disk vdev to a pool. I don't see the harm in an extra check and it would have already saved 2 people from having to redo their pool this week alone which seems worth it.

Yeah, because responsibility can always be fixed in software, can't it? The problem isn't that he tried to add another disk. The problem is he wasn't even in the right *part* of the WebGUI. To use an analogy, he was going to control panels trying to find his word document and then was surprised when he went into admin tools and deleted a partition.

Well duh.. you weren't in the right place to begin with!

So no, probably wouldn't have saved him.

Edit: Oh gee.. read the read of the thread and it *didn't* save him. LOL.
 
Last edited:

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
There really are at least two unrelated issues going on here. The first is that (you're assuming--quite reasonably, but nonetheless assuming) the user went about replacing the failed disk in completely the wrong way. Maybe that's partially a UI issue(*); it's definitely at least mostly a PEBCAK/RTFM issue. I don't know if there's any way a FreeNAS bug could have caused this configuration--I don't think so, but I'm certainly not in a position to know it's impossible. I know you know a lot about FreeNAS and ZFS, but I doubt even you know the codebase intimately enough to say out of hand that it's impossible. Highly unlikely to be sure, but perhaps not impossible. Now, if @coloradogeek would clarify precisely what he did and did not do, that would help answer this question--but that hasn't happened yet.

* Just because the process is documented doesn't necessarily mean that the UI can't stand to be improved. There are a number of places in the FreeNAS GUI that appear to do the same thing, but in fact are slightly different, and could probably be improved. I don't really have an opinion as to whether this is one of them, but two users in less than a week trying to use the Volume Manager to replace a disk would tend to suggest that the GUI design could be improved in that regard.

The second issue is that the GUI allows users to irreversibly destroy their pools' redundancy. In a GUI that aims to pad the sharp edges of FreeBSD and ZFS, users should be warned obnoxiously before doing something like that--and they are, at least sometimes (I can't possibly have tested every possible configuration, but a warning is definitely implemented). I think the warning could be improved a bit (as I noted in the bug), but it's certainly there, and the big bold red warning text should at least get the user to look into what it means, if they don't already know. Now, presuming @coloradogeek did what you (and I) think he did, that warning didn't stop him--he went to the ZFS volume manager, selected his existing volume to extend, selected the disk, clicked Extend Volume, got the warning, ignored it, switched to manual mode, selected the volume to extend, selected his disk, and clicked the action button there. The warning should be there, and it should be robust (i.e., it should come up any time a user tries to add a single-disk vdev to a redundant zpool), even if it doesn't always keep the user from shooting himself in the foot.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
danb35,

I know because I've done teamviewers with more than a dozen people. They always do the same thing. The offline the disk and then try to replace the disk by adding a disk to the vdev. We've literally had dozens of people replace disks on all the major FreeNAS versions. So which story is more likely?

1. Some random guy had a random problem that hasn't hit anyone except him.
2. He made a mistake that has been documented and proven to have been done by several dozen other people.

So which is it?

I know where I stand.

And now you're trying to change the subject to argue the WebGUI. Yeah, we got the manual, it tells you how to do it. If you can't read the manual then no amount of perfection in the WebGUI is going to solve your ability to follow simple directions.

Yes, the GUI gives you complete power. People *have* *wanted* to add single disks before. And you can fix the problem.. add a second disk to the vdev and make it a mirror. So no, "irreversibly destroy their pool's redundancy" is VERY incorrect.

The bottom line.. you can't fix stupid. No amount of coding can ever fix stupid. If you can't handle some responsibility then don't argue with me that you want my software to be dumbed down to your level.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I agree it's much more likely that he simply did the wrong thing, and I think if you read my post more closely you'll see that I said as much. I just recognize that I'm not omniscient and thus can't say with 100% certainty that that's what he did.

Discussing possible changes to the WebGUI is not changing the subject; it's integral to this issue. First, the warning. It's a basic principle of UI design (any UI, really, and especially a GUI) to warn the user before doing something irreversible(*), and especially if that something is likely to be irreversibly bad. There is a warning. It looks like @William Grzybowski might suspect that it's not implemented as comprehensively as it could be, and I think the wording could be improved a little bit, but it's there. It certainly should be enough to put users on notice.

* Adding vdevs is irreversible, and there's no way in the GUI to add that mirror disk. I can't count the number of times you've told people here not to use the CLI to make changes to the pool, but even leaving that aside, the error is not reversible or repairable through the GUI.

Second, the UI for doing the disk replacement. We've seen two users on this forum, this week, who appear to have made exactly the same mistake in attempting to replace a disk. I remember others in the recent past, and you say you've worked personally with over a dozen more. Your obvious conclusion is, "people are idiots and won't RTFM," and there's no doubt a lot of truth to that. My conclusion is that if the error is that common, the developers should seriously consider whether the design could be improved. Bad design does not cease to become bad design just because it's well documented.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Discussing possible changes to the WebGUI is not changing the subject; it's integral to this issue. First, the warning. It's a basic principle of UI design (any UI, really, and especially a GUI) to warn the user before doing something irreversible(*), and especially if that something is likely to be irreversibly bad. There is a warning. It looks like @William Grzybowski might suspect that it's not implemented as comprehensively as it could be, and I think the wording could be improved a little bit, but it's there. It certainly should be enough to put users on notice.

The warning was in red text and made it pretty clear you were doing something that would or could remove redundancy. What more than RED text do you want?

William and I already discussed this thread in IRC and I think he understands the problem and that it's not a parsing error. He was thinking that you were telling the WebGUI to do something and it was doing something else. This clearly isn't the case or we'd have had tons of complaints about this problem. This is a very localized problem that a small subset of users do because they didn't want to RTFM and instead started doing things "hoping to get the right outcome", even go so far to ignore red text that, in my opinion, is very clear.

* Adding vdevs is irreversible, and there's no way in the GUI to add that mirror disk. I can't count the number of times you've told people here not to use the CLI to make changes to the pool, but even leaving that aside, the error is not reversible or repairable through the GUI.

Yep, and that will be fixed in 9.3 from what I understand.

Second, the UI for doing the disk replacement. We've seen two users on this forum, this week, who appear to have made exactly the same mistake in attempting to replace a disk. I remember others in the recent past, and you say you've worked personally with over a dozen more. Your obvious conclusion is, "people are idiots and won't RTFM," and there's no doubt a lot of truth to that. My conclusion is that if the error is that common, the developers should seriously consider whether the design could be improved. Bad design does not cease to become bad design just because it's well documented.

Yes, and warnings with red text are basically the best we can do. Anything else will overly hinder the ability for you to "do what you want". If you were around pre 9.x you have used the old "legacy" volume manager. It let you do anything you wanted. The "new" volume manager limits control to a major extent to ensure people don't do things that are often (keyword: often) stupid. The problem, they screwed over a major subset of power users that often want to do those "stupid" things. Now the legacy volume manager is back because there's no easy solution to the problem.

At some point you can't stop a user from doing things that are stupid and you can only put in some red text warnings and hope that they actually choose to obey them.

Regardless, some people get *really* stupid and go to the CLI for everything. You realize there's been talk by some people of completely banishing the ability to log in as a user or using terminal because "so many people have lost data because they used the CLI"? You have to draw the line somewhere, and anything except the absolutely most conservative choice will probably screw over far more people than it helps.

Sorry, but at this point I'm going to exit the conversation. I have better things to do than argue the merits of our WebGUI, which have been discussed internally and I tried to convey in this thread.

Good luck to all, RTFM and listen to red text and red warnings. They are there for a reason!

I've so far heard nothing but "our WebGUI needs work" yet the only recommendation that's been provided that is actually logical is "add a warning". Then the truth comes out.. the warning is there. When you can provide some actual constructive feedback on what in particular needs to be added then feel free to comment. But just arguing that "the WebGUI needs improvement" is unproductive.
 
Status
Not open for further replies.
Top