Zpool degraded after drive replacement

Status
Not open for further replies.
Joined
Oct 28, 2012
Messages
3
I had a drive report SMART errors and replaced it. After resilvering my array still shows as degraded. I have raidz+2 config consisting of 6 drives. After the re-silvering completed I rebooted and it started the process again. I think I might have another bad drive. Is it safe to replace after this resilvering process completes?

Here is my zpool status.

http://pastebin.com/CVJ9Ntqb

Here are my drive SMARTCTL reports. /dev/ada4 seems hosed up but that drive isn't in the Zpool.

http://pastebin.com/AzQ7bpqE

How should I proceed to ensure I don't lose my data?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
You appear to have a lot of problems on that pool. What's your hardware, what version of FreeNAS are you running, and how are your drives connected?
I note that most of your drives are running too hot, especially ada0, ada6, and ada7. Temps shouldn't exceed 40 deg C. I also don't see that any SMART tests have ever been run on the first few drives, or that any long SMART tests have been run on any of the drives. Your short test schedule of every 12 hours is rather more aggressive than necessary. I don't know if any of these are related to the problems you're having with your pool, but none of them are helping.
 
Joined
Oct 28, 2012
Messages
3
Thanks I'm looking into the heating issues. I have a SilverStone DS380 NAS case. I don't know why the drives are so hot inside of it. I didn't notice this when I built it over a year go.

But my question is recoving the data? Do I need to delete all the corrupt files?

edited for the right case model
 
Last edited:
Joined
Oct 28, 2012
Messages
3
Sorry forgot my model and info.

FreeNAS 9.3
Asrock Avoton c2550d4i
16gb udimm
5x seagate 3tb
1x WD RD 3td
2x seagate 2td (about to be pulled)
Silverston DS380 case

I just ordered some noctua fans to replace the stock ones. I don't I've ever had heating issues and this build is over a year old.
 
Last edited:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Since all the checksum errors are top-level, and not drive-level, I would guess you have a defective CPU, mobo, or RAM. Your checksums do not match the data, at all. Therefore, it can't apply parity to repair because it doesn't know what 'fixed' looks like.

Run a memtest immediately.

As to recovery, in the common scenario, no, those files are gone (or at least chunks of them). In a better case scenario, you might be able to turn off checksums and ZFS might be willing to give you the glitchy-data anyway. If that doesn't work, manual ($$$) recovery might be able to recover the glitchy data, however whether that is even worth the effort depends on how damaged the data actually is.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
On top of the overheating issues, unfortunately you have several ST3000DM001, which are known for high failure rates.
 

jrsneto

Cadet
Joined
Dec 25, 2013
Messages
2
On top of the overheating issues, unfortunately you have several ST3000DM001, which are known for high failure rates.
Im my freenas server I have 6 ST4000DM001 on a pool. I can keep them just a feel degrees down from 40 on a nanoxia tower case. Are they known for high failures rates too?

Enviado de meu LG-H815 usando Tapatalk
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
You can see the difference if you do a web search for "ST3000DM001 failure", then "ST4000DM001 failure". You'll see similar results if you replace "failure" with "problems". Even searching for ST3000DM001 in these forums will be informative.
 
Status
Not open for further replies.
Top