HDD Degraded Status, how to replace?

Status
Not open for further replies.

ThePayaner

Cadet
Joined
Apr 21, 2016
Messages
2
Hello, I have a pool of 2 HDD in ZFS RAID 1 (Mirrored), I'm confused on the following :
1. Which HDD is really damaged?
2. Or maybe both are damaged?
3. Which one should I replace first?

Under 'Volume status' I see ada1p2 Status: Degraded (ADA1) and there's no Offline button, does this mean, it's offline already?

upload_2016-4-22_10-3-20.png


Then in the Alert System I see that ada0 it's Offline, so is this the same disk?

upload_2016-4-22_10-27-23.png


[root@freenas ~]# zpool status Storage
pool: Storage
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P (Nothing found here on a Degraded HDD Status)
scan: scrub repaired 0 in 4h9m with 0 errors on Thu Apr 21 04:09:38 2016
config:

NAME STATE READ WRITE CKSUM
Storage DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
gptid/97385fc7-7a56-11e4-88f8-f04da2f92210 ONLINE 0 0 0
gptid/97dbc60b-7a56-11e4-88f8-f04da2f92210 DEGRADED 0 0 1.29M too many errors

errors: No known data errors


I plan to add a HDD to replace the Degraded one without unplugging the current 2 in the mirror and the hit replace on the Degraded one, but I dont know if I need to do some CLI to really turn it offline, Is that correct?

I haven't been able to find documentation specifically about the HDD Statuses, only applied to the volume itself.

I greatly appreciate your time and support.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Then in the Alert System I see that ada0 it's Offline, so is this the same disk?
That isn't saying that ada0 is offline; it's saying that ada0 has offline sectors. Completely different things. ada0 has SMART errors, and that many bad sectors are cause to replace it immediately. The problem is that that isn't the drive causing problems for your pool; that's ada1. Please post the output of 'smartctl -x /dev/ada1' in code tags.
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
Just to make sure you know where to find the code tags button :)
 

Attachments

  • Code_Tags.png
    Code_Tags.png
    7.8 KB · Views: 274

ThePayaner

Cadet
Joined
Apr 21, 2016
Messages
2
Sure, here's the output (I added images because text doesn't arrange the same here:
Thanks
upload_2016-4-22_11-55-28.png

upload_2016-4-22_11-55-35.png

upload_2016-4-22_11-55-49.png

upload_2016-4-22_11-55-55.png
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
Don't worry about the formatting, it looks good enough in code tags.

I see that it failed its long test at 8299 hours. It currently has 9006 hours. Everything else looks pretty good, so I would suggest checking all of your cables. I also recommend scheduling SMART tests, they help alert you to problems like this sooner. Make sure you have your Email section filled out.

Also, please post the full specs of your FreeNAS box.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Sure, here's the output (I added images because text doesn't arrange the same here:
It really isn't that hard. SSH in using a decent client (puTTY is pretty much the standard on Windows, or I've heard Bitvise recommended recently as well). Add code tags and paste text. But be that as it may...
  • The drive's too hot. I doubt that's directly causing your problems, but it isn't a good thing for longevity. Drive temps should not exceed 40 C; this one has hit 45 C this power cycle, and 49 C in its lifetime.
  • You don't have scheduled SMART tests set up on the drive. It's only seen three in its lifetime, only one of those was a long test (about a month ago, assuming you run the system 24x7), and it failed that. The web GUI should have been squawking at you about that, I'd think.
  • The drive is also reporting bad sectors, which is no doubt why it failed the SMART test.
Right now, ada1 is faulted, so it's offline already. I'd recommend you replace it first (like immediately). Once the replacement finishes resilvering, you probably want to replace ada0 as well. Get those temps down, and make sure you have regular SMART tests scheduled (short test every 1-5 days, long test every 1-4 weeks).
 
Status
Not open for further replies.
Top