Lost 3 Disks / replaced now all disks show degraded or faulted

urbansystems

Dabbler
Joined
Oct 22, 2019
Messages
10
Hi,

I somehow lost 3 disks at once. Yikes. Nothing on this storage is mission critical. In raidz3. I assume that the faulted disks caused some data to be lost. Now most disks are showing up degraded or faulted and showing checksum errors. Is my only option to pave and create a new pool? That isn't the end of the world, but would like confirmation.

root@bs01:~ # zpool status
pool: backup
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 1012K in 05:52:27 with 67 errors on Tue Jan 9 13:40:21 2024
config:

NAME STATE READ WRITE CKSUM
backup DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
gptid/af8898d2-094e-11ee-ab21-b4969106efe8 FAULTED 1.62K 0 0 too many errors
gptid/af6ef04d-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/afa8dc0c-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/afbf14fd-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/afa2ee5b-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/af988e3b-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/01313c18-aa50-11ee-9cfd-b4969106efe8 DEGRADED 10 0 258 too many errors
gptid/afa26c1e-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/af907cc0-094e-11ee-ab21-b4969106efe8 FAULTED 149 0 126 too many errors
gptid/af8a99ca-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/af6bb0ea-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/51490ca5-aa50-11ee-9cfd-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/7b899614-ab1e-11ee-9cfd-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/af7a7fe2-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/afa31173-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/afce6da7-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b091b88e-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b0b860da-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b091dd92-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b09e4d68-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b0b4e5b0-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b10588c8-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors
gptid/b1026b40-094e-11ee-ab21-b4969106efe8 DEGRADED 0 0 258 too many errors

errors: 7 data errors, use '-v' for a list

pool: freenas-boot
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:01:34 with 0 errors on Thu Jan 4 03:46:34 2024
config:
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 00:01:34 with 0 errors on Thu Jan 4 03:46:34 2024
config:
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
You have one faulted disk with some read errors and the rest with Checksum errors... probably some kind of controller error or bad reaction to the faulted disk.

Try powering off, removing the bad drive (and replacing it if you can), then power back on and see if it improves.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
First, hardware list please.

Certain disk models are less usable with ZFS. Some LSI type HBAs need good cooling. Perhaps your server has grown dust bunnies large enough to block air flow & cooling.

Further, (if I am reading the count right), you have 23 disks in a RAID-Zx stripe. In general, 10 to 12 disks is considered the reasonable maximum. This can cause delays in response time, especially if certain disk models are used. Worse, re-silver times are going to be impacted when you do need to replace disks.
 

urbansystems

Dabbler
Joined
Oct 22, 2019
Messages
10
First, hardware list please.

Certain disk models are less usable with ZFS. Some LSI type HBAs need good cooling. Perhaps your server has grown dust bunnies large enough to block air flow & cooling.

Further, (if I am reading the count right), you have 23 disks in a RAID-Zx stripe. In general, 10 to 12 disks is considered the reasonable maximum. This can cause delays in response time, especially if certain disk models are used. Worse, re-silver times are going to be impacted when you do need to replace disks.
What raid is recommended with this many disks?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
What raid is recommended with this many disks?
ZFS does not work that way. You can have mixed RAID-Zx vDevs, (virtual devices), and use something like this;

11 disks in a RAID-Z2
12 disks in a RAID-Z2

You loose 1 disk more to parity compared to a 23 disk RAID-Z3, but, the pool will be more responsive. Data will be striped across the 2 RAID-Z2 vDevs. Plus, on disk failure, only the affected RAID-Z2 vDev will need to be read to re-create the failed disk. Right now, it is likely for the 23 disk RAID-Z3 it will have to read 19 data disks and 1 parity to re-create a failed disk.

Note that you can have as many vDevs in a ZFS pool as you want. Either to increase storage or improve performance. Or both.
 
Top