Hi Mark,
For now, we'll need to dig into the command line a little bit. This is easiest done over SSH, which you can set up by going to
System Settings -> Services - click the pencil beside
SSH and then set up to allow
Log in as Admin with Password. Save the settings, return to the previous screen, and toggle the
Running status to
On. From there you can use any SSH compatible client (eg: PuTTY under Windows) to connect to TrueNAS as
admin/yourpasswordhere
I've simulated a failure on my side here to make it easier to follow along. I'm assuming that there's no hot-swap ability on your system for safety's sake.
Start with
sudo zpool status boot-pool
Code:
admin@scale01[~]$ sudo zpool status boot-pool
[sudo] password for admin:
pool: boot-pool
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using zpool online' or replace the device with
'zpool replace'.
scan: resilvered 6.87G in 00:02:24 with 0 errors on Tue Jul 4 15:29:54 2023
config:
NAME STATE READ WRITE CKSUM
boot-pool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
sda3 ONLINE 0 0 0
sde3 REMOVED 0 0 0
errors: No known data errors
In my example, the "healthy" drive is
sda3
and the other
sde3
is failed/removed. Look at the SMART output of your "healthy" drive to find the serial number (nb: "Serial" is case-sensitive in grep, try lowercase if you get no results)
sudo smartctl -a /dev/sda3 | grep Serial
Code:
admin@scale01[~]$ sudo smartctl -a /dev/sda3 | grep Serial
Serial number: 6000c295f0d47a00989c983cd95b1d25
Since your device is "still alive, but faulted" you can also look at the same value for your FAULTED device. Make a note of these numbers for later, when you're physically removing the failed device.
Now, let's logically remove the failed device from the boot pool with
sudo zpool detach boot-pool FAULTED_DEVICE_ID
Code:
admin@scale01[~]$ sudo zpool detach boot-pool sde3
admin@scale01[~]$ sudo zpool status -v boot-pool
pool: boot-pool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 6.87G in 00:02:24 with 0 errors on Tue Jul 4 15:29:54 2023
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
sda3 ONLINE 0 0 0
errors: No known data errors
From here, shut down TrueNAS from the UI, locate your boot devices, and remove the one with the serial number matching the FAULTED device.
Once that's done, you can install the new blank unit, boot up, and ATTACH from within the boot pool status page.