Truenas Scale Cobia - ZFS device fault for pool

boggie1688

Explorer
Joined
Jul 9, 2015
Messages
58
System:
AMD 3950x
Asus PRO WS X570 Ace
Kingston 128GB ECC DDR4 3200
Asrock M2 VGA Siliconmotion SM750
Intel x520-DA2
LSI 9286-8e IT Mode
TrueNAS-SCALE-23.10.0.1

I upgraded to Cobia this morning, and on the first boot I received several ZFS errors. One of my SSD pools suddenly showed a checksum error, but I cleared it with zpool clear.

Another pool, keeps receiving this error at every reboot. Here is an example:
ZFS has detected that a device was removed.

impact: Fault tolerance of the pool may be compromised.
eid: 16
class: statechange
state: UNAVAIL
host: truenas
time: 2023-11-13 12:48:35-0800
vpath: /dev/sdh2
vguid: 0x01F442C640867E6C
pool: Tank (0xDD2FC8057A5BB7E3)

On the storage dashboard, there are no errors.
storage dashboard.png


Interestingly, I don't have a /dev/sdh2. I have a /dev/sdh.

drive.png


The error also will seemingly happen to any drive(s) in the pool, and also always shows up as, /dev/sXX2; always ending in 2.
ZFS has detected that a device was removed.

impact: Fault tolerance of the pool may be compromised.
eid: 16
class: statechange
state: UNAVAIL
host: truenas
time: 2023-11-13 12:11:24-0800
vpath: /dev/sdc2
vguid: 0x01F442C640867E6C
pool: Tank (0xDD2FC8057A5BB7E3)
ZFS has detected that a device was removed.

impact: Fault tolerance of the pool may be compromised.
eid: 16
class: statechange
state: UNAVAIL
host: truenas
time: 2023-11-13 12:06:10-0800
vpath: /dev/sdb2
vguid: 0xB7632EEC617848FE
pool: Tank (0xDD2FC8057A5BB7E3)
ZFS has detected that a device was removed.

impact: Fault tolerance of the pool may be compromised.
eid: 7
class: statechange
state: UNAVAIL
host: truenas
time: 2023-11-13 11:50:31-0800
vpath: /dev/sdj2
vguid: 0x01F442C640867E6C
pool: Tank (0xDD2FC8057A5BB7E3)

This never happened on bluefin, and immediately happened after the upgrade. I get these errors emailed to me, on every reboot. It might be for one drive, or two drives. There is never an error on the dashboard.

I've also disconnected the pool, reimported, and the same thing happens on reboot.
 

boggie1688

Explorer
Joined
Jul 9, 2015
Messages
58
One additional thing.

Rolling back to the previous boot on Truenas Scale Bluefin, clears all the issues, but I lose my apps. When I return to the Cobia boot image, the errors return immediately.

Finally, the random checksum errors I am receiving on my SSD pool seems to only happen with Cobia as well. It also will happen with any of the four drives. However, it doesn't happen on every reboot. When I clear the error, it might take several reboot before it pops up again.
 
Top