Rebuild failure with strange response

mauzilla

Dabbler
Joined
Jul 7, 2022
Messages
17
I have a 6 x 4TB disk RAID RAIDz2

A couple of weeks ago one drive failed, took it out and sent to supplier, only to have another drive start giving unrecoverable errors a couple of days later. We replaced the original drive and started an import which did a scrub and completed 100%, however, upon looking at the logs I get:

Pool RAIDZ2-32TB-VMBACKUPS state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
The following devices are not healthy:
  • Disk ATA WDC WD40EFRX-68N WD-WCC7K6NVUZTJ is DEGRADED
  • Disk ATA WDC WD40EFRX-68N WD-WCC7K5NZTY39 is DEGRADED
  • Disk ATA WDC WD40EFRX-68N WD-WCC7K5FR9KY4 is DEGRADED
  • Disk ATA WDC WD40EFRX-68N WD-WCC7K0EHKP5E is DEGRADED
  • Disk ATA HGST HUS726T4TAL V1GTLZ6H is DEGRADED


I am making an assumption that the rebuild failed, and likely due to the 2nd drive giving unrecovable errors, but what I dont understand why all of the other drives are shown as degraded. I ran a smart test on the 1st one and it came back without any faults, so not sure what to do?

It does appear that I had some data loss as there are a large number of data missing, fortunately this is a backup of a backup so a lesson learnt (just not sure what lesson yet :smile: )

What is my next steps? Does this simply mean the entire pool is degraded beyond repair? I again cannot imagine all 6 drives fail (and no, it's a IT mode HBA so not a raid controller)

2023-03-16 18:01:26 (Africa/Johannesburg)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
What is my next steps?
Restore from a backup once you are confident of having identified and corrected the issue.

Does this simply mean the entire pool is degraded beyond repair? I again cannot imagine all 6 drives fail (and no, it's a IT mode HBA so not a raid controller)
The issue is likely on the HBA, so please provide the hardware model (as well as the bios version of it, especially if it's flashed).

Make sure to run long smart tests to all your drives as standard troubleshooting procedure.
 
Top