Fails to boot - Can't import boot-pool

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
Edit:
https://jira.ixsystems.com/browse/NAS-108200

Solutions
add rootdelay=10 to the end of GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub.d/truenas.cfg and regenerate grub (via the update-grub command)

or

add ZFS_INITRD_POST_MODPROBE_SLEEP='10' or ZFS_AUTOIMPORT_TIMEOUT='10' to /etc/default/zfs and then regenerate the initrd with update-initrd -k all -u

They don't last beyond an upgrade though.
----

Ok this one confuses me... I installed SCALE on my home server so I could help with testing. It installed just fine, but when I went to boot up the installed system it crashes citing something about a SATA error or timeout but it goes by too quickly for me to see. It initializes the other devices connected to my HBA and then hangs 16 seconds into the boot process citing two SATA links are down. Pressing enter drops me to the initramfs shell.

I've been using the same HBA on Arch Linux for years as well as other distros, so it should have no issues. I did move things around to shoehorn my GTX 1070 back in there, thinking that maybe a cable came loose or something, I booted into Arch and everything works as expected.

HBA: Broadcom / LSI SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
Arch Kernel: 5.4.64-1-lts also 5.8.8.arch1-1
Motherboard: Asrock Rack X399D8A-2T
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
Are you using the latest firmware, P20.00.07?
 

groveraus

Cadet
Joined
Aug 23, 2020
Messages
5
I think there are some quirks that happen in the build that get rectified for the next build, as an example I was humming along nicely having installed over a month ago, applied a new nightly every couple of days and then the last one wouldn't respond after a reboot. I attached a screen and KB then rebooted again, it was having trouble with GRUB, UEFI and then some of the modules. Frustrated I waited a day or two, boot to the last known healthy from the list and then tried applying the latest Nightly from a new download, all works fine, humming away nicely again. Considering the stage of development the product is in I can live with that, slightly annoying but liveable. Obviously I'd be really concerned if this was the behaviour when in a production state.
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
Yeah things definitely get broken in each build.

Are you using the latest firmware, P20.00.07?

yep

[bran@server ~]$ sudo systool -a -v -c scsi_host | egrep "Class Device|model|version|proc_name|info|fwrev" Class Device = "host0" Class Device path = "/sys/devices/pci0000:00/0000:00:03.1/0000:0c:00.0/host0/scsi_host/host0" proc_name = "mpt2sas" version_bios = "07.23.01.00" version_fw = "20.00.07.00" version_mpi = "200.23" version_nvdata_default= "14010006h" version_nvdata_persistent= "14010006h" version_product = "LSISAS2116"
 
Last edited:

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
I think there are some quirks that happen in the build that get rectified for the next build, as an example I was humming along nicely having installed over a month ago, applied a new nightly every couple of days and then the last one wouldn't respond after a reboot. I attached a screen and KB then rebooted again, it was having trouble with GRUB, UEFI and then some of the modules. Frustrated I waited a day or two, boot to the last known healthy from the list and then tried applying the latest Nightly from a new download, all works fine, humming away nicely again. Considering the stage of development the product is in I can live with that, slightly annoying but liveable. Obviously I'd be really concerned if this was the behaviour when in a production state.

We agree with you... the NIGHTLY train is a different DEVELOPER experience from the USER release experience.
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
I recorded the boot messages just so I could see what it was saying and it looks like the actual issue is that it can't import the boot pool because it says it can't be found. My first install was on a flash drive and the second install on a SATA SSD.

Command: /sbin/zpool import -N 'boot-pool'
Message: cannot import 'boot-pool': no such pool available

Yet when I boot back into Arch and do a zpool import -a -d /dev/disk/by-id it imports it without issue, zpool import -N 'boot-pool' didn't work though

[bran@server ~]$ zpool status
pool: boot-pool
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: none requested
config:

NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
wwn-0x5002538d403f22e9-part3 ONLINE 0 0 0

errors: No known data errors
 

Aphid

Cadet
Joined
Oct 9, 2015
Messages
2
I am having the same issue. Running 20.10-ALPHA, installed onto a single SATA SSD. The boot-pool fails to mount.
 

Aphid

Cadet
Joined
Oct 9, 2015
Messages
2
Should be noted, that after this error I can type **zpool import -N 'boot-pool'** into the prompt and then **exit**, and TrueNAS starts up correctly. This is manually typing in the exact command that had just failed....
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
Interesting, I just tried the same and it continued booting as well. Thanks.

Edit: the underlying issue still persists though. When attempting an update it errors out at 95% with the following message
Error:[EFAULT] Command ['chroot', '/tmp/tmpnfxnrsp4', 'update-grub'] failed with exit code 1: /usr/sbin/grub-probe: error: failed to get canonical path of `/dev/disk/by-id/wwn-0x5002538d403f22e9-part3'.
 
Last edited:

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
So I did a fresh install with the latest ISO and same result. Just so I could see what it was hanging on more clearly, I disconnected my HBA so it wouldn't initialize my 16 drives and I could see the console more clearly...but to my surprise it booted up without issue! I reconnected the HBA and same issue as originally stated: can't load the boot pool.

After a few more boot comparisons with and without the HBA connected, and digging through dmesg, this is the only relevant info I can find relating to the hangs when the HBA is connected
Code:
[Wed Oct 28 18:48:23 2020] ata4: SATA link down (SStatus 0 SControl 300)
[Wed Oct 28 18:48:24 2020] ata5: SATA link down (SStatus 0 SControl 300)
[Wed Oct 28 18:48:24 2020] ata6: SATA link down (SStatus 0 SControl 300)


I have no such errors in Arch, so maybe it's something specific to Debian/TrueNAS kernel?

Edit: bug filed since I'm out of ideas and it's most likely a kernel config issue: https://jira.ixsystems.com/browse/NAS-108200
 

Attachments

  • arch-kernel-config.txt
    248.1 KB · Views: 324
  • truenas-kernel-config.txt
    228.2 KB · Views: 283
Last edited:

cig_in_mouth

Cadet
Joined
Dec 24, 2020
Messages
3
So I did a fresh install with the latest ISO and same result. Just so I could see what it was hanging on more clearly, I disconnected my HBA so it wouldn't initialize my 16 drives and I could see the console more clearly...but to my surprise it booted up without issue! I reconnected the HBA and same issue as originally stated: can't load the boot pool.

After a few more boot comparisons with and without the HBA connected, and digging through dmesg, this is the only relevant info I can find relating to the hangs when the HBA is connected
Code:
[Wed Oct 28 18:48:23 2020] ata4: SATA link down (SStatus 0 SControl 300)
[Wed Oct 28 18:48:24 2020] ata5: SATA link down (SStatus 0 SControl 300)
[Wed Oct 28 18:48:24 2020] ata6: SATA link down (SStatus 0 SControl 300)


I have no such errors in Arch, so maybe it's something specific to Debian/TrueNAS kernel?

Edit: bug filed since I'm out of ideas and it's most likely a kernel config issue: https://jira.ixsystems.com/browse/NAS-108200

Did you get any update same issue but different Hardware I have inbuilt intel SCU port. Tried all flavors of it i.e. nightly and other two but nothing happen installed successfully boot got initramfs
 

cig_in_mouth

Cadet
Joined
Dec 24, 2020
Messages
3
Did you get any update same issue but different Hardware I have inbuilt intel SCU port. Tried all flavors of it i.e. nightly and other two but nothing happen installed successfully boot got initramfs

One more thing If i import boot-pool then ctrl +d it boot continues but monitor said invalid format idk
 

beagle

Explorer
Joined
Jun 15, 2020
Messages
90
Same problem running 20.12-ALPHA on my Dell R420 with a H310 controller in HBA mode. But I'm booting of a HDD in a USB enclosure.

After importing the boot-pool and typing 'exit' the system continues to load.
 
Last edited:

ctsoccer13

Cadet
Joined
Feb 3, 2021
Messages
2
Hitting the same issue on my R720. I put the H710 into IT mode and I'm booting off of a SSD in the enclosure. Importing the boot-pool manually works and everything boots and works fine.

Strangely, I have a friend with a very similar setup to mine and he never hit the issue. The only difference between our setups is that I'm booting in UEFI mode while he's booting in BIOS.
 

shadofall

Contributor
Joined
Jun 2, 2020
Messages
100
That's interesting. I wonder if there is an issue with the h310 cards and debian. Adding my own little story for context. I have a h310 that was in my scale box. Its didn't drive my boot-pool so no issues there. But it was throwing a bus degraded error and running at 4x. (Is that a thing if it's not running on a dell mb? ) But beyond that working fine. I swapped it with the general non branded lsi card (same chipset) from my OMV box. Which cleared the 4x issue and runs at 8x. I booted my OMV box with the h310 to see if it had the 4x issue. It did. But also my primary data pool didn't import/mount. I had to do it manually. Rebooted a couple times same issue each time.
 

bay_wolf

Cadet
Joined
Feb 9, 2021
Messages
1
I tried reinstalling with bios mode and still have to manually import the pool at boot up.
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
After many months I've finally found the solution....and it was one I already knew because I also had to do it in Arch Linux months ago :confused:

The problem is that Linux boots up too quickly and doesn't give the HBA time to initialize, I'm not entirely sure how this affects the boot pool, since mine is connected directly to the motherboard and not the HBA. My theory is that Linux resets the SATA drives while attempting to initialize the HBA, but this also blocks Linux from accessing the boot-pool, so it fails and gives the above error. By the time we type the command to import the pool, the links have already reset, hence the reason we have no issues.

The fix is to delay booting by 10 seconds either via grub or rebuilding the initrd, but that doesn't survive upgrades :(

add rootdelay=10 to the end of GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub.d/truenas.cfg and regenerate grub (via the update-grub command)

or

adding ZFS_INITRD_POST_MODPROBE_SLEEP='10' or ZFS_AUTOIMPORT_TIMEOUT='10' to /etc/default/zfs and then regenerate the initrd with update-initrd -k all -u
 
Top