Hi all, I'm experiencing some issues booting my TrueNAS instance.
The problems first manifested yesterday evening, when I noticed that I was unable to access the TrueNAS dashboard via browser, or connect to the server via SSH. This was a little odd, as the containerised applications on the instance were still operational, and I had only just uploaded a file via SFTP to Plex's media dataset. On the assumption that something had gone wrong with the native applications, I went to reboot the instance. Rebooting through the server shell reported an error (which sadly I did not make note of), at which point the shell would continue to take typed input, but was no longer responsive to any commands. Perhaps foolishly, I restarted the instance at a hardware level.
Since this reboot, I have been unable to get the OS fully operational. When the system boots, it progresses through the boot sequence for ~16 seconds, culminating with "ZFS: Loaded module...", at which point no further progress is made. After a few attempts to resolve the issue, I left the machine running overnight, but it had made no further progress by the morning.
I'll recount some of the attempts I've made to further diagnose the problem, but I'll admit I'm something of a novice when it comes to debugging issues this early in the boot sequence, so forgive me if some of these attempts come across as naive.
I have two mirrored USB boot drives (yes, I am aware this is no longer a recommended configuration, but I have yet to have made the time to migrate away), and between them I have access to OS versions 22.12.2, 22.12.0, and 22.04.0, all of which exhibit the same behaviour. I have also removed each boot drive in turn, and tried them in alternate USB ports, with no observable change in behaviour.
My NAS current consists of a single data pool, spread across two virtual devices, each consisting of three physical drives in a RAIDZ1 configuration. I attempted to boot with a single physical drive disconnected, each in turn, to try to rule out some catastrophic drive failure having caused a problem, but again, this appeared to have no effect.
My current (hopeful) working theory is that I just so happened to attempt to access the dashboard while some maintenance activity - presumably a scrub - happened, which deprived resources from other services provided by the host, which I interrupted by my gung-ho attempt to restart the machine. The boot sequence is stalling as said maintenance task is attempting to complete while mounting the pool, and this will take more than the eight hours or so that the machine was left overnight. While there's no logs to indicate that this is the case, there does appear to be drive activity - unfortunately my case does not have an LED indicator for this, but there is certainly the sound of activity.
Barring that being the case, I'm at a lost for what to do next. I could attempt to reinstall the OS, but, given that I have tried to revert to earlier versions of the OS, and no user changes were made prior to the initial loss of access to the dashboard, I am pessimistic that it would make a difference.
Below, I've attached a screenshot of an example of the shell logs at the point where the boot sequence stalls. Naturally, there is more logged before this, but as this is occurring so early in the boot sequence I'm unclear how I would share these short of just recording a video.
The device specs are as follows:
CPU: Intel Pentium G4560
Motherboard: Supermicro X11SSL-F
Memory: 2x 8GB Crucial DDR4 Server Memory, PC4-17000 (2133); 2x Kingston Server Premier 8GB (1x 8GB) 2400MHz DDR4
Boot drives: 2x 16GB SanDisk Ultra Fit USB Flash Drive
Data drives: 3x WD Red 4TB in RAIDZ1 configuration; 3x WD Red 6TB in RAIDZ1 configuration
Any advice would be appreciated, thanks.
The problems first manifested yesterday evening, when I noticed that I was unable to access the TrueNAS dashboard via browser, or connect to the server via SSH. This was a little odd, as the containerised applications on the instance were still operational, and I had only just uploaded a file via SFTP to Plex's media dataset. On the assumption that something had gone wrong with the native applications, I went to reboot the instance. Rebooting through the server shell reported an error (which sadly I did not make note of), at which point the shell would continue to take typed input, but was no longer responsive to any commands. Perhaps foolishly, I restarted the instance at a hardware level.
Since this reboot, I have been unable to get the OS fully operational. When the system boots, it progresses through the boot sequence for ~16 seconds, culminating with "ZFS: Loaded module...", at which point no further progress is made. After a few attempts to resolve the issue, I left the machine running overnight, but it had made no further progress by the morning.
I'll recount some of the attempts I've made to further diagnose the problem, but I'll admit I'm something of a novice when it comes to debugging issues this early in the boot sequence, so forgive me if some of these attempts come across as naive.
I have two mirrored USB boot drives (yes, I am aware this is no longer a recommended configuration, but I have yet to have made the time to migrate away), and between them I have access to OS versions 22.12.2, 22.12.0, and 22.04.0, all of which exhibit the same behaviour. I have also removed each boot drive in turn, and tried them in alternate USB ports, with no observable change in behaviour.
My NAS current consists of a single data pool, spread across two virtual devices, each consisting of three physical drives in a RAIDZ1 configuration. I attempted to boot with a single physical drive disconnected, each in turn, to try to rule out some catastrophic drive failure having caused a problem, but again, this appeared to have no effect.
My current (hopeful) working theory is that I just so happened to attempt to access the dashboard while some maintenance activity - presumably a scrub - happened, which deprived resources from other services provided by the host, which I interrupted by my gung-ho attempt to restart the machine. The boot sequence is stalling as said maintenance task is attempting to complete while mounting the pool, and this will take more than the eight hours or so that the machine was left overnight. While there's no logs to indicate that this is the case, there does appear to be drive activity - unfortunately my case does not have an LED indicator for this, but there is certainly the sound of activity.
Barring that being the case, I'm at a lost for what to do next. I could attempt to reinstall the OS, but, given that I have tried to revert to earlier versions of the OS, and no user changes were made prior to the initial loss of access to the dashboard, I am pessimistic that it would make a difference.
Below, I've attached a screenshot of an example of the shell logs at the point where the boot sequence stalls. Naturally, there is more logged before this, but as this is occurring so early in the boot sequence I'm unclear how I would share these short of just recording a video.
The device specs are as follows:
CPU: Intel Pentium G4560
Motherboard: Supermicro X11SSL-F
Memory: 2x 8GB Crucial DDR4 Server Memory, PC4-17000 (2133); 2x Kingston Server Premier 8GB (1x 8GB) 2400MHz DDR4
Boot drives: 2x 16GB SanDisk Ultra Fit USB Flash Drive
Data drives: 3x WD Red 4TB in RAIDZ1 configuration; 3x WD Red 6TB in RAIDZ1 configuration
Any advice would be appreciated, thanks.