Slow boot after 12.0-U8 upgrade - ""Root mount waiting for: CAM""

Dave Grabowski

Dabbler
Joined
Aug 1, 2015
Messages
11
System: HP 8300 SFF desktop PC, I5-3470, 24GB RAM. Was running TrueNAS Core 12.0-U7, and this happened immediately during and after upgrade to 12.0-U8.

Boot volume is 16GB SATA SSD, connected to onboard SATA port (other three SATA ports are unused).

Pool is 3x4TB as a mirror with hot spare. Controller is LSI 9212-4i in IT mode.

Has been humming along for over three years at a remote location, with periodic updates since 11.2. System is a backup, so I usually push updates to it first.

System was taking a long time to come back after installing the update, so I remoted in via Intel AMT to see the desktop.

Screen is full and keeps blasting out more of

Code:
Root mount waiting for: CAM


And occasionally, messages like this

Code:
(aprobe0:ata2:0:1:0): INQUIRY. CDB: 12 00 00 00 24 00
(aprobe0:ata2:0:1:0): CAM status: Command timeout
(aprobe0:ata2:0:1:0): Error 5, Retries exhausted


But eventually, the system shut down and restarted (presumably, it was doing its update housekeeping?). Another long wait with more errors as above, and it came up and seems fine.

For kicks, I set the boot back to 12.0-U7 and restarted, and the problem still remains. Which makes me think that this is a hardware issue that either coincidentally manifested during the upgrade, or got triggered by the upgrade (some new driver poked the bear). Failing boot drive? (SMART stats look fine).

I'm back on 12.0-U8. Problem persists. Reboots take about ten minutes; in the past, they only took about one minute.

Ideas?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
CAM errors usually indicate a failing drive. In your case, that's at ATA port 2, which from your description above is your boot drive.

Since your system eventually boots, backup your configuration when it comes all the way up, and reinstall to a new boot drive. After the reinstallation, reimport your saved configuration, and you'll be back in business.
 

Dave Grabowski

Dabbler
Joined
Aug 1, 2015
Messages
11
CAM errors usually indicate a failing drive. In your case, that's at ATA port 2, which from your description above is your boot drive.

Since your system eventually boots, backup your configuration when it comes all the way up, and reinstall to a new boot drive. After the reinstallation, reimport your saved configuration, and you'll be back in business.
While I agree (I remember lots of CAM errors back in the days when I used USB for boot) that this points to the drive (or cable), I'm hesitant that's the answer. Here's why:

The problems only occur on boot. System has been running since a restart about 12 hours ago and is fine. I ran a scrub on the boot pool (which certainly exercises the media) and - no errors.

It's notable (but perhaps not at all significant) that one of the digits in the messages changes about halfway through the delay during boot - goes from "0" to "1". I'm not familiar enough with what's going on here to understand what it means.

Code:
(aprobe0:ata2:0:0:0): INQUIRY. CDB: 12 00 00 00 24 00
(aprobe0:ata2:0:0:0): CAM status: Command timeout
(aprobe0:ata2:0:0:0): Error 5, Retries exhausted
(aprobe0:ata2:0:1:0): INQUIRY. CDB: 12 00 00 00 24 00
(aprobe0:ata2:0:1:0): CAM status: Command timeout
(aprobe0:ata2:0:1:0): Retrying command, 0 more tries remain



At the next opportunity, I'm planning to virtualize this system (replace the boot drive with a drive running ESX, create a new VM running TrueNAS, and pass-through the controller for the pool to the new VM) - but this isn't scheduled for three months. It's possible that the system won't restart between now and then.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What could also be the case could be the SATA port on the motherboard or the SATA cable going south. Try moving your boot drive to another port, and swapping the SATA cable.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Going forward I would recommend to always do a reboot before the update. This applies to all servers, not only TrueNAS. It allows to be certain that an error condition was caused exclusively by the respective update.
 

Dave Grabowski

Dabbler
Joined
Aug 1, 2015
Messages
11
What could also be the case could be the SATA port on the motherboard or the SATA cable going south. Try moving your boot drive to another port, and swapping the SATA cable.
Next time I'm at that server's location, I'm replacing the boot drive. (See above - it's going to be virtualized under esx)

But going back to the fact that the problem (delay and errors) seems to only happen during boot... makes me think it's not the drive or cable or controller.
 

Dave Grabowski

Dabbler
Joined
Aug 1, 2015
Messages
11
Going forward I would recommend to always do a reboot before the update. This applies to all servers, not only TrueNAS. It allows to be certain that an error condition was caused exclusively by the respective update.
In the 25+ years that I've managed enterprise servers, I've never done that.

Snapshots, on the other hand... those definitely happen (everything else I have is virtualized, so this is trivial). And to an extent, that's sort of what happens with TrueNAS updates, since the older version is preserved.... although now it has me wondering if something in the upgraded config file has caused this..
 

Dave Grabowski

Dabbler
Joined
Aug 1, 2015
Messages
11
Thinking that the issue was hardware related on my remote system, I decided to upgrade my local system (similar configuration but larger pool and more memory)

SAME THING happens upon restart.

I think it's EXTREMELY unlikely that the exact same thing would happen on two different systems, even if they have similar hardware, if the problem is related to failing hardware.

Which brings me back to... this may be a bug. I'll create a ticket unless someone has any other ideas...

https://jira.ixsystems.com/browse/NAS-114679
 
Last edited:
Top