I'm new to TrueNAS, but not Linux/Debian. I've been running a home-grown Deb 11 iSCSI NAS box for a while, in support of three Xen hypervisors. I need to expand my storage, and decided to look at TN.
I've built a custom lower-power box based on an Asrock J4105 motherboard with 16G of memory. It has two ZFS-mirrored 24GB Intel 313 SSDs for the boot pool (attached via the onboard SATA ports,) plus six 500GB Samsung 860 SSDs configured in a Z2 pool and two mirrored 1TB hard drives connected to an 8-port LSI controller with the 20.00.07 IT firmware.
Before I started using it for real, I wanted to put things through their paces to get a feel for how the processes worked. As a part of that, I pulled one of the boot volume SSDs (sdb) to simulate a failure. When I replaced it with a different, wiped, identical disk and kicked off the drive replace, it seemed to progress but resulted in an error message on the GUI -
I checked the console with a zpool status, and it said that the pool was online, and rebuild had completed in 36 seconds. I rebooted the server an hour or so later just to check, and all seemed fine.
I then repeated the process with the other boot volume SSD (sda), and experienced the same error . This time around when I rebooted the server, it didn't find any bootable volumes.
I was able to bring the server back online by doing an "upgrade" via the USB media that I'd created (to the same version as installed,) which retained all of my configuration, and reinstalled grub.
Is this a known issue? If so, what's the "proper" way to recover from this issue?
I've built a custom lower-power box based on an Asrock J4105 motherboard with 16G of memory. It has two ZFS-mirrored 24GB Intel 313 SSDs for the boot pool (attached via the onboard SATA ports,) plus six 500GB Samsung 860 SSDs configured in a Z2 pool and two mirrored 1TB hard drives connected to an 8-port LSI controller with the 20.00.07 IT firmware.
Before I started using it for real, I wanted to put things through their paces to get a feel for how the processes worked. As a part of that, I pulled one of the boot volume SSDs (sdb) to simulate a failure. When I replaced it with a different, wiped, identical disk and kicked off the drive replace, it seemed to progress but resulted in an error message on the GUI -
Code:
[EFAULT] Command grub-install --target=i386-pc /dev/sdb failed (code 1): Installing for i386-pc platform. grub-install: error: failed to get canonical path of `/dev/replacing-0'. Error: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/middlewared/main.py", line 176, in call_method result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1293, in _call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1272, in nf return await func(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/schema.py", line 1140, in nf res = await f(*args, **kwargs) File "/usr/lib/python3/dist-packages/middlewared/plugins/boot.py", line 234, in replace await self.middleware.call('boot.install_loader', dev) File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1344, in call return await self._call( File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1293, in _call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/plugins/boot_/boot_loader_linux.py", line 16, in install_loader await run('grub-install', '--target=i386-pc', f'/dev/{dev}') File "/usr/lib/python3/dist-packages/middlewared/utils/__init__.py", line 64, in run cp.check_returncode() File "/usr/lib/python3.9/subprocess.py", line 460, in check_returncode raise CalledProcessError(self.returncode, self.args, self.stdout, subprocess.CalledProcessError: Command '('grub-install', '--target=i386-pc', '/dev/sdb')' returned non-zero exit status 1.
I checked the console with a zpool status, and it said that the pool was online, and rebuild had completed in 36 seconds. I rebooted the server an hour or so later just to check, and all seemed fine.
I then repeated the process with the other boot volume SSD (sda), and experienced the same error . This time around when I rebooted the server, it didn't find any bootable volumes.
I was able to bring the server back online by doing an "upgrade" via the USB media that I'd created (to the same version as installed,) which retained all of my configuration, and reinstalled grub.
Is this a known issue? If so, what's the "proper" way to recover from this issue?