No longer able to boot: "you need to load the kernel first"

Joined
Mar 9, 2024
Messages
1
Hello! I'm not sure where to post my question as I am not running TrueNAS. I was "referred" here from this Youtube video discussing the benefits and awesomeness of zfs.

If I'm in the wrong place, or if this shouldn't even be posted, I sincerely apologize. I figured you all are experts to zfs, at least compared to the folks over on Ubuntu/Linux Mint forums.

I'm experiencing issues with my new Linux Mint 21.3 on a new desktop computer, which is based on Ubuntu 22. I'm not new to Linux; I've been using Linux for almost 20 years, using Ubuntu as my primary desktop for the majority of that time. However, I've always used ext3/4. When setting up Linux Mint, I saw the option to use zfs and did some research into it and found the benefits worth the uncertainty/unfamiliarity.

The OS ran fine for a few days. I attached a second hard drive to act as a mirror to the first using the instructions here. I was having screen related issues and was recommended to install a BIOS patch. After doing so, I was unable to boot:

Code:
error: file `/BOOT/ubuntu_udd128@/vmlinuz-6.5.0.21-generic' not found.
error: you need to load the kernel first.


After a few days of unsuccessful troubleshooting, I decided to [c]zfs send[/c] the snapshots and reinstall LM. Wanting to recreate the issue, I repatched my BIOS, but my install survived. So I applied my snapshots and continued to rebuild my desktop. A few days later, after a reboot, I encountered the same error.

During my troubleshooting in GRUB, it seems like GRUB is unable to load the EFI partition. I do have secure boot enabled and I've found that my motherboard (make and model below) makes it awfully difficult to disable it. During the first troubleshooting, I tried disabling SecureBoot to no avail.

I've tried running Boot Repair to no avail.

I've reverted bpool to the previous day's snapshot, to no avail.

I am fully prepared to reinstall again if needed. However, I'm convinced that the issue is with zfs. I've never had these boot issues before on ext4. When I drop to the GRUB command line, I got this message when trying to access the filesystem:

Code:
grub: Secure Boot forbids loading module from (hd3,gpt1)/grub/x86_64-efi/efs2.mod
...
(long list of files)


I am able to boot to a LiveUSB and mount both bpool and rpool without any issues. However, I did notice something odd. When inspecting the partition in Disks, partition 3 of /dev/nvme0n1 is showing as "Contents: unknown" whereas the mirrored drive /dev/nvme1n1 is showing as "Contents: Unknown (zfs_member 5000)" with the GUI properly labeling the second drive as "bpool".

When inspecting both partitions in parted, partition 3 of /dev/nvme0n1 is showing as File System "fat32" with an label of "EFI" whereas /dev/nvme1n1 is showing as File System "zfs" with a label of "bpool".

I'm convinced the issue here is either with how I set up mirroring or snapshots. After searching for this issue, it seems that zfs is stable enough to run a daily OS so I'm sure it's something I'm doing wrong. However, I am hesitant to reinstall LM a third time only to have this happen again.

I'm hoping those of you who are more familiar with zfs will be able to save me from a reinstall and having this happen again. Is there a way for me to rebuild bpool/boot partition without having to reinstall the entire OS?

My desktop specs:
  • Motherboard make and model: MSI PRO B760-VC WIFI
  • CPU make and model: Intel® Core™ i7-13700F
  • RAM quantity: 32 GB
  • Hard drives, quantity, model numbers, and RAID configuration, including boot drives:
    • Code:
      NAME        FSTYPE            LABEL                           MOUNTPOINT   SIZE MODELloop0       squashfs                                          /rofs        2.4G
      sda                                                                        1.8T WDC WD20EZBX-00AYRA0
      ├─sda1      linux_raid_member starbuck:viper                               1.8T
      └─sda9                                                                       8M
      sdb                                                                        3.6T WDC WD40EZAX-00C8UB0
      └─sdb1      ext4              BACKUP                                       3.6T
      sdc         iso9660           Linux Mint 21.3 Cinnamon 64-bit              7.2G STORE N GO
      ├─sdc1      iso9660           Linux Mint 21.3 Cinnamon 64-bit /cdrom       2.9G
      ├─sdc2      vfat                                                           4.9M
      └─sdc3      ext4              writable                        /var/log     4.4G
      nvme1n1                                                                  953.9G KXG50ZNV1T02 TOSHIBA
      ├─nvme1n1p1 vfat                                                           512M
      ├─nvme1n1p2 swap                                                             2G
      ├─nvme1n1p3 zfs_member        bpool                                          2G
      └─nvme1n1p4 zfs_member        rpool                                        927G
      nvme0n1                                                                  931.5G MSI M450 1TB
      ├─nvme0n1p1 vfat                                                           512M
      ├─nvme0n1p2 swap                                                             2G
      ├─nvme0n1p3                                                                  2G
      └─nvme0n1p4 zfs_member        rpool                                        927G
      
    • Code:
      [*]mint@mint:~$ zpool list
      NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
      bpool  1.88G   648M  1.24G        -         -     3%    33%  1.00x    ONLINE  /bpool
      rpool   920G  19.2G   901G        -         -     0%     2%  1.00x    ONLINE  /rpool
      
      mint@mint:~$ zpool status
        pool: bpool
       state: ONLINE
        scan: resilvered 15.7M in 00:00:00 with 0 errors on Fri Mar  8 02:21:23 2024
      config:
      
          NAME           STATE     READ WRITE CKSUM
          bpool          ONLINE       0     0     0
            mirror-0     ONLINE       0     0     0
              nvme0n1p3  ONLINE       0     0     0
              nvme1n1p3  ONLINE       0     0     0
      
      errors: No known data errors
      
        pool: rpool
       state: ONLINE
        scan: resilvered 10.5G in 00:00:13 with 0 errors on Sun Mar  3 00:59:56 2024
      config:
      
          NAME           STATE     READ WRITE CKSUM
          rpool          ONLINE       0     0     0
            mirror-0     ONLINE       0     0     0
              nvme0n1p4  ONLINE       0     0     0
              nvme1n1p4  ONLINE       0     0     0
      
      errors: No known data errors
      
      
    • Code:
      mint@mint:~$ zfs list
      NAME                                               USED  AVAIL     REFER  MOUNTPOINT
      bpool                                              648M  1.12G       96K  /bpool/boot
      bpool/BOOT                                         643M  1.12G       96K  none
      bpool/BOOT/ubuntu_uddl28                           643M  1.12G      139M  /bpool/boot
      rpool                                             19.2G   872G       96K  /rpool
      rpool/ROOT                                        14.9G   872G       96K  none
      rpool/ROOT/ubuntu_uddl28                          14.9G   872G     9.16G  /rpool
      rpool/ROOT/ubuntu_uddl28/srv                        96K   872G       96K  /rpool/srv
      rpool/ROOT/ubuntu_uddl28/usr                      1.09M   872G       96K  /rpool/usr
      rpool/ROOT/ubuntu_uddl28/usr/local                   1M   872G      504K  /rpool/usr/local
      rpool/ROOT/ubuntu_uddl28/var                      3.43G   872G       96K  /rpool/var
      rpool/ROOT/ubuntu_uddl28/var/games                  96K   872G       96K  /rpool/var/games
      rpool/ROOT/ubuntu_uddl28/var/lib                  3.40G   872G     3.25G  /rpool/var/lib
      rpool/ROOT/ubuntu_uddl28/var/lib/AccountsService   100K   872G      100K  /rpool/var/lib/AccountsService
      rpool/ROOT/ubuntu_uddl28/var/lib/NetworkManager    156K   872G      156K  /rpool/var/lib/NetworkManager
      rpool/ROOT/ubuntu_uddl28/var/lib/apt              96.1M   872G     96.1M  /rpool/var/lib/apt
      rpool/ROOT/ubuntu_uddl28/var/lib/dpkg             58.7M   872G     58.7M  /rpool/var/lib/dpkg
      rpool/ROOT/ubuntu_uddl28/var/log                  27.3M   872G     27.3M  /rpool/var/log
      rpool/ROOT/ubuntu_uddl28/var/mail                   96K   872G       96K  /rpool/var/mail
      rpool/ROOT/ubuntu_uddl28/var/snap                   96K   872G       96K  /rpool/var/snap
      rpool/ROOT/ubuntu_uddl28/var/spool                 132K   872G      132K  /rpool/var/spool
      rpool/ROOT/ubuntu_uddl28/var/www                    96K   872G       96K  /rpool/var/www
      rpool/USERDATA                                    4.27G   872G       96K  /rpool
      rpool/USERDATA/user_xxx111                        4.27G   872G     1.91G  /rpool/home/user
      rpool/USERDATA/root_tpy969                         292K   872G      196K  /rpool/root
      
      
  • Hard disk controllers: Not sure, chipset is Intel B760
  • Network cards: Realtek RTL8125B
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
TrueNAS works quite different from a standard Linux installation so I doubt you will find much help here, sorry. Also you posted in the TrueNaS CORE forum. TN CORE is based on FreeBSD, there is no Linux in that product. TN SCALE is based on Linux but the boot mechanism is quite locked down to the intended appliance use.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Forget that GRUB exists and just use ZFS Boot Menu. GRUB is nothing short of a dumpster fire.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Yes, it's really just a Linux distro packaged as a bootable EFI executable that mounts ZFS pools and loads the real kernel and environment from them. It sounds absurdly hacky, but works pretty well. Plus, no need to deal with GRUB, which is a huge plus.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
Why is the Linux boot process so convoluted in the first place? I mean initramfs? Really? Why? Support 2 or 3 filesystems, load the kernel from one, switch to protected mode, jump to kernel ...

Is that really due to the fact that there are too many file systems to support in a small boot loader binary? Well, ext4, xfs, zfs - everything else is irrelevant, isn't it?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I mean initramfs? Really? Why? Support 2 or 3 filesystems, load the kernel from one, switch to protected mode, jump to kernel ...
It's stupider than that. Whatever the bootloader, someone else is already copying the kernel and whatever bits into memory. That someone already implements whatever filesystem is in use - UEFI implements FAT32, GRUB implements whatever. So what in Earth is the initramfs doing?

But I have spent quite some time thinking about this, recently. Partly due to recent troubles with GRUB (that led to the desperation move of printing out instructions for manually booting a system at work from GRUB's terminal, because the tooling to generate GRUB configs is completely and hopelessly broken), partly from following along what's happening at Oxide (where they boot an Illumos kernel with the minimal set of modules directly from the EEPROM that would normally hold the system firmware - 32 MB and it's like half empty, despite them having the kernel, PCI, NVMe and ZFS, plus whatever other bits they need).

Obviously, a minimal kernel image needs to be loaded by someone into RAM. UEFI can easily do that. So the question becomes "what must the kernel carry along to successfully load the rest and boot the system?". For whatever reason, Linux's answers to this seem to have converged around "half the userland", and I won't pretend to understand why. From my point of view, it's as simple as:
  1. UEFI loads kernel image into RAM and jumps into it
  2. Kernel takes over the hardware as per ACPI
    1. It doesn't need to do anything non-storage at this point, other than reset the PCIe roots to whatever it needs
    2. Storage drivers obviously need to be part of this kernel image - AHCI, NVMe, SCSI, USB Mass Storage, eMMC , (LSI HBA drivers, ...) ...
  3. Kernel reads whatever to decide what to boot
    1. With ZFS, this is as simple - in most cases - as importing all pools minimally and checking datasets for whatever magic property we want to define. No static compilation of config files into the image or other hacks necessary.
    2. I don't care enough about other filesystems to spare them any thought in this process.
  4. Kernel mounts the root filesystem it found/chose/was told to use
  5. Kernel loads the userland and runs init
  6. Userland proceeds to do nasty, unspeakable userland things
Why would we want an initramfs in the middle of all this? The only good reason I can think of is if we're building something akin to ZBM, where we need a relatively fancy UI before we've loaded anything from disk.
 
Joined
Oct 22, 2019
Messages
3,641
I had always figured that the initrd/initramfs is a "kitchen sink" solution, so that one can boot with any PC, regardless of configuration or hardware? Unlike an embedded device, the "Wild West" of consumer PCs requires a vast bundle of drivers and files to "cover everyone out there", at the cost of efficiency.

I could be way off.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
To some extent, but whatever really needs to be present is in the kernel anyway, in Linux and most Unix variants. An initramfs only makes sense to me if you want a user land - but why would you want a userland during early boot? To manually recover from boot failures? That doesn't even make much sense to me, when's the last time that worked? Systems tend to fail to boot in recoverable ways because either the bootloader is misconfigured [probably because the whole process is so convoluted] or something happened after the root filesystem was mounted and the real userland was already available.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Oooh, no one mentioned that Linux likes to boot in 16 bit extreme legacy mode. Has to do with 32 bit REAL mode using segmented addressing... insane. And Linus wants it that way for compatibility!

I always thought that x86_64 should boot in to 64 bit mode. But what do I know... (maybe UEFI changed the behavior).

On the subject of InitRD for TrueNAS, it is likely that ZFS is not built into the kernel. So at a minimum, InitRD would have to include the ZFS support.

One thing I noticed about RHEL, is if you have a separate "/boot", it needs to be 1/2 a GigaByte! (Aka 500MegaBytes!) Down right silly. I remember the days of 50MByte "/boot" and that covered room for a dozen kernels. (Before I started with ZFS on my Linux home computers, I built in EXT234 into the kernel, thus never needed an InitRD.)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
that x86_64 should boot in to 64 bit mode. But what do I know... (maybe UEFI changed the behavior).
It does, UEFI runs in protected mode at least - in fact, other than some old Macs, all UEFI systems run in long mode/64-bit mode.
There is no reason to reset the CPU back to real mode, nor do I think that's viable from long mode. Of course, that doesn't mean Linux is being sane, but I'd be surprised if it didn't start in long mode.

In fact, there's a proposal from Intel to drop everything but long mode (protected mode 32-bit code would still run inside long mode, as it does now), which wouldn't require major changes to the boot process as far as an OS is concerned, though obviously the system firmware would be very different.
 

RetroG

Dabbler
Joined
Dec 2, 2023
Messages
16
since it was touched on.... why isn't an alternative to GRUB provided on TrueNAS Scale? it's really a heaping pile of $$$$ that gives no end of misery across the handful of servers I've dealt with. it should be noted these issues come and have gone but boy is it frustrating when an update brings you a new and exciting way for a boot to fail requiring manual intervention.

the misery I've had the joy of encountering (specifically on TrueNAS+GRUB, anyways..):

GRUB painfully slowly reading something (the header?) off each disk in a 60 drive JBOD (for about 5-10 minutes) before claiming it's out of RAM (on a server with 512GB of RAM...). where you need to turn off the JBOD while booting the server to GRUB.

General Protection faults when 4Kn disks were combined with a specific hba, meanwhile Win/FreeBSD never had issues with such a setup, neither does Linux once booted. requiring you have to hotswap disks while booting.

GRUB leaving the server on a black screen and resetting the machine if you let the auto countdown continue, but working fine if you manually hit enter, and also working fine if you let the countdown continue after chainloading via rEFInd.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Speculation: It's the default upstream, to the extent that there is an upstream. The devil you know and all that.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Speculation: It's the default upstream, to the extent that there is an upstream
...and it's not much of an extent. The upstream distro is Debian, but SCALE is heavily customized (Ubuntu does root-on-zfs, but Debian doesn't, for one significant example).
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
I remember when I had to migrate away from legacy Grub 0.x to the current pile of excrement. Yes, their are things useful in the new Grub. But most is a waste of my time.

My biggest complaint is that the new Grub wants to be everything to everyone, and do it poorly for everyone;
  1. I don't want Grub to scan my kernels and determine what and how to boot when making a new boot configuration
  2. For home computers running Linux, I don't need grub to scan the disks at boot time to find the bootable disks.
  3. I want zero graphics, zero sound, zero fluff
  4. The kernel boot time messages are mandatory for me. Any Unix SA wants them scrolling on the screen.
  5. Yes, odd boot devices & features may be desirable for some people. But, can we build a Grub without them? Or have a run time configuration option to disable them?
  6. That Grub shell thingy used on boot failure is at times next to useless.
In the end, I had to create custom, MANUAL configuration entries for my Grub boot entries. The "automatic", (which automatically makes the wrong choices 100% of the time, at least for me), is useless. So I don't use it. It is trivial to copy 6 lines and modify them for a new Grub entry, (like new kernel or InitRD). Yes, about once a year I make a typo and that Grub entry won't boot. But, hey, if I let Grub do the job, it makes 100% wrong entries, 100% of the time... Wow, I have mathmatically proven Grub is 200% bad!!!


Gee, I did not remember I had so much hatred stored up on Grub...
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I had to create custom, MANUAL configuration entries for my Grub boot entries.
The problem I ran into a few years ago, when I was trying to use Grub to boot CORE on my parents' MicroServer Gen8, was that the grub.cfg file is almost completely undocumented, the expectation apparently being that the only way you'd make any changes to it would be by way of an automated tool. If you never need to manually edit the file, what's the point of documenting it?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Well, I actually modify a file in "/etc/grub.d", then run;
grub-mkconfig -o /boot/grub/grub.cfg

I also remove the execute permissions on these 2 for any Grub update;
chmod a-x /etc/grub.d/20_linux_xen chmod a-x /etc/grub.d/30_os-prober
so that they don't attempt to create their own worthless entries, (which I don't need with my manual entries).

My manual entries are trivial;
/etc/grub.d/05_custom
Code:
#!/bin/sh
exec tail -n +3 $0
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
load_video
terminal_output console
####################
#
menuentry 'Desktop linux-6.1.31-gentoo.2b     20240228 ABE & Portage' {
     linux  /linux-6.1.31-gentoo.2 root=ZFS=rpool/root/20240228 rootfstype=zfs ro console=tty0
     initrd /linux-6.1.31-gentoo.2b.img
}
####################
#
menuentry 'Desktop linux-6.1.31-gentoo.2b     20240214 InitRD' {
     linux  /linux-6.1.31-gentoo.2 root=ZFS=rpool/root/20240214 rootfstype=zfs ro console=tty0
     initrd /linux-6.1.31-gentoo.2b.img
}
####################
#
menuentry 'Desktop linux-6.1.31-gentoo.2a     20240214 ABE & Portage' {
     linux  /linux-6.1.31-gentoo.2 root=ZFS=rpool/root/20240214 rootfstype=zfs ro console=tty0
     initrd /linux-6.1.31-gentoo.2a.img
}
...
 
Top