TrueNAS Scale - replacing boot pool drive, EFAULT on grub-install

TN22

Dabbler
Joined
Sep 12, 2022
Messages
13
I'm new to TrueNAS, but not Linux/Debian. I've been running a home-grown Deb 11 iSCSI NAS box for a while, in support of three Xen hypervisors. I need to expand my storage, and decided to look at TN.

I've built a custom lower-power box based on an Asrock J4105 motherboard with 16G of memory. It has two ZFS-mirrored 24GB Intel 313 SSDs for the boot pool (attached via the onboard SATA ports,) plus six 500GB Samsung 860 SSDs configured in a Z2 pool and two mirrored 1TB hard drives connected to an 8-port LSI controller with the 20.00.07 IT firmware.

Before I started using it for real, I wanted to put things through their paces to get a feel for how the processes worked. As a part of that, I pulled one of the boot volume SSDs (sdb) to simulate a failure. When I replaced it with a different, wiped, identical disk and kicked off the drive replace, it seemed to progress but resulted in an error message on the GUI -

Code:
[EFAULT] Command grub-install --target=i386-pc /dev/sdb failed (code 1): Installing for i386-pc platform. grub-install: error: failed to get canonical path of `/dev/replacing-0'.


Error: Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 176, in call_method
    result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1293, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1272, in nf
    return await func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1140, in nf
    res = await f(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/boot.py", line 234, in replace
    await self.middleware.call('boot.install_loader', dev)
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1344, in call
    return await self._call(
  File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1293, in _call
    return await methodobj(*prepared_call.args)
  File "/usr/lib/python3/dist-packages/middlewared/plugins/boot_/boot_loader_linux.py", line 16, in install_loader
    await run('grub-install', '--target=i386-pc', f'/dev/{dev}')
  File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 64, in run
    cp.check_returncode()
  File "/usr/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '('grub-install', '--target=i386-pc', '/dev/sdb')' returned non-zero exit status 1.



I checked the console with a zpool status, and it said that the pool was online, and rebuild had completed in 36 seconds. I rebooted the server an hour or so later just to check, and all seemed fine.

I then repeated the process with the other boot volume SSD (sda), and experienced the same error . This time around when I rebooted the server, it didn't find any bootable volumes.

I was able to bring the server back online by doing an "upgrade" via the USB media that I'd created (to the same version as installed,) which retained all of my configuration, and reinstalled grub.

Is this a known issue? If so, what's the "proper" way to recover from this issue?
 
Top