Zpool degraded after drive replacement

thisisnotdave · Feb 10, 2016

I had a drive report SMART errors and replaced it. After resilvering my array still shows as degraded. I have raidz+2 config consisting of 6 drives. After the re-silvering completed I rebooted and it started the process again. I think I might have another bad drive. Is it safe to replace after this resilvering process completes?

Here is my zpool status.

http://pastebin.com/CVJ9Ntqb

Here are my drive SMARTCTL reports. /dev/ada4 seems hosed up but that drive isn't in the Zpool.

http://pastebin.com/AzQ7bpqE

How should I proceed to ensure I don't lose my data?

danb35 · Feb 10, 2016

You appear to have a lot of problems on that pool. What's your hardware, what version of FreeNAS are you running, and how are your drives connected?
I note that most of your drives are running too hot, especially ada0, ada6, and ada7. Temps shouldn't exceed 40 deg C. I also don't see that any SMART tests have ever been run on the first few drives, or that any long SMART tests have been run on any of the drives. Your short test schedule of every 12 hours is rather more aggressive than necessary. I don't know if any of these are related to the problems you're having with your pool, but none of them are helping.

thisisnotdave · Feb 10, 2016

Thanks I'm looking into the heating issues. I have a SilverStone DS380 NAS case. I don't know why the drives are so hot inside of it. I didn't notice this when I built it over a year go.

But my question is recoving the data? Do I need to delete all the corrupt files?

edited for the right case model

thisisnotdave · Feb 10, 2016

Sorry forgot my model and info.

FreeNAS 9.3
Asrock Avoton c2550d4i
16gb udimm
5x seagate 3tb
1x WD RD 3td
2x seagate 2td (about to be pulled)
Silverston DS380 case

I just ordered some noctua fans to replace the stock ones. I don't I've ever had heating issues and this build is over a year old.

rs225 · Feb 10, 2016

Since all the checksum errors are top-level, and not drive-level, I would guess you have a defective CPU, mobo, or RAM. Your checksums do not match the data, at all. Therefore, it can't apply parity to repair because it doesn't know what 'fixed' looks like.

Run a memtest immediately.

As to recovery, in the common scenario, no, those files are gone (or at least chunks of them). In a better case scenario, you might be able to turn off checksums and ZFS might be willing to give you the glitchy-data anyway. If that doesn't work, manual ($$$) recovery might be able to recover the glitchy data, however whether that is even worth the effort depends on how damaged the data actually is.

Robert Trevellyan · Feb 10, 2016

On top of the overheating issues, unfortunately you have several ST3000DM001, which are known for high failure rates.

jrsneto · Feb 12, 2016

Robert Trevellyan said:
On top of the overheating issues, unfortunately you have several ST3000DM001, which are known for high failure rates.

Im my freenas server I have 6 ST4000DM001 on a pool. I can keep them just a feel degrees down from 40 on a nanoxia tower case. Are they known for high failures rates too?

Enviado de meu LG-H815 usando Tapatalk

Bidule0hm · Feb 12, 2016

jrsneto said:
Are they known for high failures rates too?

No ;)

Robert Trevellyan · Feb 12, 2016

You can see the difference if you do a web search for "ST3000DM001 failure", then "ST4000DM001 failure". You'll see similar results if you replace "failure" with "problems". Even searching for ST3000DM001 in these forums will be informative.

Mirfster · Feb 12, 2016

So it looks like this board supports both ECC and Non-ECC Ram... Are you using ECC or Non-ECC Ram?

Important Announcement for the TrueNAS Community.

Zpool degraded after drive replacement

thisisnotdave

Cadet

danb35

Hall of Famer

thisisnotdave

Cadet

thisisnotdave

Cadet

rs225

Guru

Robert Trevellyan

Pony Wrangler

jrsneto

Cadet

Bidule0hm

Server Electronics Sorcerer

Robert Trevellyan

Pony Wrangler

Mirfster

Doesn't know what he's talking about

Similar threads