Workaround/Semi-Fix for Mountroot Issues with 9.3!

Status
Not open for further replies.

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Esteemed Community:

Many of you with bootable USB drives have been reporting that you are having difficulties getting FreeNAS 9.3 to boot, even on recommended hardware, and that the process dies at the infamous "mountroot" prompt. We have had this report now on at least three different recommended SuperMicro models (X10SLL, X10SLM, and X9DRD-7LN4F), and since I was personally affected as a friend of mine asked for a new FreeNAS build on bare metal (X10SLL), and I had the problem personally, we put in some effort to track it down. In the bug reports, I had called this a "fix", but, (after Jordan yelled at me for the hubris of calling it a "fix") we're going to downgrade to "workaround" until we get more clarity on what's going on, and the senseis at iXsystems can do more investigation.


But the good news is: many/most of you using recommended hardware getting the mountroot error on boot should now be able to boot, for the foreseeable future, using the following guide. Your hardware is fine; the problem has more to do with the peculiarities of the booting sequence, and the fact that you are using the USB bus to boot.

NOTE: Even if you're not using recommended/server grade hardware, there's a good chance this will work. This problem is one that will probably be ubiquitous across many motherboard models and chipsets. In any case, you can't hurt anything by trying this, if you're having a mountroot issue. Go ahead and try, and if it works, reply with your motherboard make and model so that others might search and find the discussion.

As is usual for me, I'll explain both what to do, and try to give some insight as to what the hell is going on while I am doing it. A disclaimer though: we are talking about parts of the process that I, personally, do not have a very good grip on, so I may say some horsecrap in my attempts to explain what's going on. Even so, the workaround process I outline below should solve your problem.

So here we go.

First of all, if I understand Cyberjock correctly, when you boot, this is a multi-staged process. At first, the BIOS recognizes your boot devices, and reads sectors specifically designated to get the initial stages of the operating system going. GRUB (which can then hand off to a variety of operating systems) then takes over, and at this point, your boot USB device is re-recognized in the context of a FreeBSD device (like /dev/ad0).

Here's the problem. Apparently, for whatever reason, there is a small bit of time/lag that occurs between when the GRUB takes over, and when your thumb drive is *actually* accessible in a /dev/da0 context. If, at the point that access is attempted, and it's not available, THAT is when you drop to mountroot, and then you're screwed.

So here is what you do:

Step 1: GRUB loads.
grubloading.jpg


Step 2: When you get to the GRUB screen (this is the screen that pauses for 5 seconds with the FreeNAS 9.3 option highlighted), IMMEDIATELY HIT THE ESCAPE KEY to stop GRUB.

grubloading2.jpg


Step 3: Having done this, hit "e" to edit. You will now see a list of what we call loader tunables that over-ride default behaviors in the loader, and you will be in a rudimentary text editor similar to "nano" (if you are familiar with it).

readytoedit.jpg


Step 4: You will need to add your OWN tunable here, manually. That tunable is as follows:

set kFreeBSD.kern.cam.boot_delay="50000"

This is shown below. Note: do not scroll down too far, you want this entry to be in the "Normal Bootup" entry in this file. Doesn't matter really where within that, but do not go so far that you are outside of that zone.

loader_set.jpg


This number 50000 is almost certainly much, much more than you actually need, but we think it would be impossible to need *more* than this, so let's just start with this number.

EDIT: We just had a user post later in this thread that said 50000 was NOT enough on his ASRock board and some (presumably cheapo) USB thumb drive, he had to go as high as 500000!!!!


Step 5: At this time, press F10 to BOOT. For this one boot (and this one boot only), your tunable will be active. When prompted, select "normal boot". Now, the GRUB will pause for 50000 milliseconds (i.e., 50 seconds), which is obviously ample (obscenely ample) time to let the USB bus do whatever it has to do.

final.jpg


And you should boot now!!!! woo hoo!


Step 6: NOW, once you're booted, the first thing you're going to want to do is to go into your FreeNAS GUI, and go to system->tunables and add a tunable, with category "loader". The name of the loader tunable is exactly as above minus the kFreeBSD, in other words, kern.cam.boot_delay, and the value for the loader tunable will be 50000. Hit Save/OK (whatever), and now this should be a persistent setting. (In fact, you should actually see it, if you were to reboot and start the GRUB process and go into the edit screen).

Play with the numbers (i.e., lower it by 10000 at a time). More than likely, 10000 will not be enough, but 20000 or 30000 will.

You're all set! At least until such time as the devs determine if this is actually the appropriate long-term fix, or the devs figure something else out.

Please, if this has fixed your problem, let us know right here. Thanks to Cyberjock for working jointly on this with me.
 
Last edited:

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Thank you! This saved me just now!
Awesome! Can you report the motherboard, and whether or not you decided to tweak the delay setting, and to what?
 

enemy85

Guru
Joined
Jun 10, 2011
Messages
757
Not yet on 9.3 but great job!
 

Uncle_Steve

Cadet
Joined
Nov 22, 2013
Messages
9
Awesome! Can you report the motherboard, and whether or not you decided to tweak the delay setting, and to what?
This was a Dell Poweredge T310 -- so a Dell proprietary motherboard. (I notice a theme here; all my systems are old. Let's say I'm frugal.)

I did tweak the delay settings, but I didn't experiment looking for a minimum. I just used 50000. I plan to do some tests, but not on Christmas Eve. ;-)
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
This was a Dell Poweredge T310 -- so a Dell proprietary motherboard. (I notice a theme here; all my systems are old. Let's say I'm frugal.)

I did tweak the delay settings, but I didn't experiment looking for a minimum. I just used 50000. I plan to do some tests, but not on Christmas Eve. ;-)
That's fine. We have a lot of users with that. I'm sure they'll appreciate your confirmation.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Thanks! The fix worked!
Excellent! Keep the reports coming in. Also, if you guys remember, leave the motherboard make and model for easy searching for those that come after you.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,970
Great posting and great job!

I hope this "workaround" makes it into the next bug fix upgrade until the developers deem this a real fix. Not sure why it wouldn't be a real fix to be honest, you're just taking into considerations possible slow hardware such as the USB Flash drive. I only use good fast USB flash drives but now I want to actually install a slow USB drive to see if it fails on my hardware.

This fix I'm positive will help a lot more than just folks with the "recommended" hardware list.
 

BoogaBooga

Cadet
Joined
Sep 7, 2013
Messages
8
I am not sure it's the USB drive that is the problem. From the sounds of it, certain USB controllers take longer to come up than others.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Indeed. It's the controller's relationship to the whole process, not the thumb drive itself, as far as I know.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Hahahahahahahahahahaha

FreeBSD has always needed a delay to allow the probing of (at first) SCSI and more recently a large variety of devices; we've been spanked before for having this value too-small in order to make the NAS boot quickly. Previously there's been some attempts to fix through SCSI_DELAY (in the kernel config file as option SCSI_DELAY) but boot_delay is a similar type of thing. See for example

https://bugs.freenas.org/issues/1364

In my opinion Jordan's being a bit rude if he scolded you over this. There's probably a BETTER fix and Jordan's certainly welcome to pursue it, but if the problem is one of device probing needing a longer time for device registration to occur, you've basically rediscovered FreeBSD's historical primary strategy for handling this: wait longer and pray it happens within the bigger window. I've not seen a case where 15000 is too small but then again most of our stuff is virtualized these days and on virtual machines we actually take the OS defaults.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Hahahahahahahahahahaha

FreeBSD has always needed a delay to allow the probing of (at first) SCSI and more recently a large variety of devices; we've been spanked before for having this value too-small in order to make the NAS boot quickly. Previously there's been some attempts to fix through SCSI_DELAY (in the kernel config file as option SCSI_DELAY) but boot_delay is a similar type of thing. See for example

https://bugs.freenas.org/issues/1364

In my opinion Jordan's being a bit rude if he scolded you over this. There's probably a BETTER fix and Jordan's certainly welcome to pursue it, but if the problem is one of device probing needing a longer time for device registration to occur, you've basically rediscovered FreeBSD's historical primary strategy for handling this: wait longer and pray it happens within the bigger window. I've not seen a case where 15000 is too small but then again most of our stuff is virtualized these days and on virtual machines we actually take the OS defaults.

the scsi_delay thing is what we were going to go after next, had we not had success with the boot_delay, for the record.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
It's a similar effect just happening at a different point.

So anyways, again, feel good about figuring it out.
 

Delltek

Cadet
Joined
Dec 30, 2014
Messages
1
Worked for Me.. Awesome Thank you much.
--------------------------------------------------------------
Mobo: GIGABYTE GA-A55M-DS2 FM1
 

Eli Anthony

Cadet
Joined
Dec 29, 2014
Messages
1
This fix did not work for me. But, I may have gotten around it.

My setup:
Mobo: Biostar A68N-5000
FreeNAS 9.3 installed onto Sandisk ultra fit USB 3.0 16gb flash drive

I got the "mountroot" issue with and without trying the fix mentioned above--with the flash drive plugged into a USB 3.0 port. I tried putting it into a USB 2.0 port, and I booted just fine.
 
J

jkh

Guest
The work-around is now in the current update (FreeNAS-9.3-STABLE-201412301712). It's still a work-around in the sense that all we're doing is waiting without knowing why, or what specifically we are waiting for.

What we don't know, and would still very much like to know for example, is what the USB device / controller returns when in the intermediate state that causes the mountroot> prompt to appear. Clearly, at some point in the mountroot() code, as we're descending through the filesystem and then device driver layers, the device with the root filesystem on it is returning *some* error code or otherwise failing to appear, and knowing the specifics would be at least the road to a real fix, since some devices may appear very quickly and others may wait even longer than the chosen delay interval. If there was some way to detect a device going "Wait, wait, I'm still getting dressed! Don't start the car!" then we could poll on such a device and not penalize the default boot path for all devices, which is what is now happening.

This is why just pulling a delay value out of our asses and waiting for that period of time unconditionally is a hack and not really a fix at all, as anyone with a background in CS could tell you. Worse, it just masks the problem, perhaps to have it appear another day in some even more subtle and hard to debug fashion, which is why I "scolded" Kyle for calling it a fix vs calling it what it was: A work-around, not at all a fix.

Even worse, now that we've added that delay, future generations will probably never touch it again because "it's what's needed to make things work", even long after the last device which actually needed that delay is long dead. This is how things become enshrined into lore, to the detriment of overall software quality. I'm not happy about us having to do this at all. :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I can understand your dissatisfaction with this. Now that we have some specific hardware that exhibits the behavior, maybe we should buy what DrKK had used and try repeating the process.

I agree that I don't like the answer. In my old career we called this 'tribal knowledge' and mentioning that we did something because of 'tribal knowledge' was one of the fastest ways to get fired. Either we know what causes "that one thing" or we need to do an investigation and find out. But if we aren't doing either we are screwed up.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Has anyone actually done a verbose boot on a platform that exhibits this issue and recorded it to see if anything obvious sticks out, or am I the only one who maintains a serial console infrastructure anymore... (grumble grumble)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Also conspicuously missing is a description of what USB thumb drives have been seen in relation to this issue.

I suppose since I've got some X10 gear on order and the resources to investigate something like this, I'm the natural victim to do additional research, not sure I'll have the time but I'll see what I can do. Sigh.
 
Status
Not open for further replies.
Top