Unhealthy zpool prevents any use of pool for VM's

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
I have a TrueNas box running TN Core 13 for a pool for my VM's.

My pool keeps going unhealthy and prevents writing to the datastore by my hosts.
I checked the status of my pool and it shows that one of the drives has some errors.

I've run a zpool scrub and the duration increased from 10 hours to indefinitely.
I reinstalled TrueNAS knowing that the drives were clean and healthy (100% life SAS SSD's) and I'm getting the same error - unhealthy pool.

This is the result of my zpool status:

zpool status -v
pool: SPOOL
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
config:

NAME STATE READ WRITE CKSUM
SPOOL ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/00fe96f2-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0
gptid/0100b343-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/0a4fcc14-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0
gptid/0a51f9f8-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
gptid/1e3d06e4-6049-11ed-a845-9c7bef2e9111 ONLINE 3 13.7K 0
gptid/1e3fb48d-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
gptid/2a9e5fe5-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0
gptid/2aa155e1-6049-11ed-a845-9c7bef2e9111 ONLINE 0 0 0

errors: No known data errors
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I reinstalled TrueNAS knowing that the drives were clean and healthy
No, they are not. The first disk in mirror-2 experienced 13.700 write errors.
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
I am aware of this. I was curious as to whether anyone knows what I can do other than a scrub.
I ran a scrub for a few hours yesterday and the speed was crawling at a few MB/sec. And this is with SAS3 SSD drives.

Thanks
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
The error found was for a different drive yesterday. The dual HBA's I have are both brand new and do not think it's the HBA.
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I am aware of this. I was curious as to whether anyone knows what I can do other than a scrub.
Replace the faulty drive.
The error found was for a different drive yesterday. The dual HBA's I have are both brand new and do not think it's the HBA.
If you can for certain rule out drives and HBA, check cabling and/or backplane.
 

Love4Storage

Dabbler
Joined
Nov 6, 2020
Messages
35
The drives just finished a long SMART test. Nothing wrong with them.
I replaced the HBA, drives seem to drop.

- Cabling --> Decent quality Intel cables, never had problems with them before. Will replace and test
- Backplane --> Need to check this, never had problems with them before

Alternatively, could one of the following be the problem:

1. The power supply
2. The power cables that attach to the backplane
3. The chassis (or server) that TrueNAS is installed on
4. TrueNAS

Thanks!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
My pool keeps going unhealthy and prevents writing to the datastore by my hosts.
Unless your pool is going into an offline state, a degraded pool should still be able to export resources (ZVOL/NFS) for consumption.

Can you elaborate on the details of the server and HBA(s) in play? Details on the SSDs may also be relevant as there may be a firmware update required.
 
Top