Drive unavailable, pool degraded

Status
Not open for further replies.

qwertymodo

Contributor
Joined
Apr 7, 2014
Messages
144
After having my motherboard fail on me, I just got the RMA board back and put it all together, only to receive a bunch of unrecoverable CRC errors on one drive. I had tested all of the drives on another mobo while waiting for the RMA and all tested as working, and I didn't see any of these CRC errors, so upon further research, I found reports that it could be due to a bad cable, which I then replaced and the CRC errors went away. However, now when I try to mount my pool, it comes up as degraded, and that drive is shown as "unavailable". The drive shows up properly as a /dev/adaX device, but it almost looks like it got reassigned a new gptid or something. So my thought at this point is to attempt to wipe the drive and then re-add it and re-silver the pool, but I can't figure out which drive it's referring to (yes, I know which drive had the CRC errors and which one I replaced the cables on, but I'm not about to make the assumption that it's the same drive and risk trashing my redundancy).

Here's the output of zpool status:

[root@freenas] ~# zpool status
pool: vdev0
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 1017G in 5h57m with 0 errors on Sun Feb 15 23:58:31 2015
config:

NAME STATE READ WRITE CKSUM
vdev0 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/622056f7-f020-11e3-8131-d05099192e39.eli ONLINE 0 0 0
10515161252 UNAVAIL 0 0 0 was /dev/gptid/a342c0a9-b57f-11e4-be96-d05099192e39.eli
gptid/632723e6-f020-11e3-8131-d05099192e39.eli ONLINE 0 0 0
gptid/638d914a-f020-11e3-8131-d05099192e39.eli ONLINE 0 0 0
gptid/63f6fd98-f020-11e3-8131-d05099192e39.eli ONLINE 0 0 0
gptid/645dccff-f020-11e3-8131-d05099192e39.eli ONLINE 0 0 0

errors: No known data errors

I've also attached the output of smartctl -a for all drives, which doesn't seem to indicate any obvious failures. So the question now is, how do I figure out which /dev/adaX device corresponds to the gptid that is unavailable? Once I know the /dev/adaX name, from there I can get the serial number out of the list disks dialog, and confirm it from there (actually, even that is redundant, once I know which /dev/adaX device it is, then I know which one to wipe in the web gui).

Obviously, if this doesn't work and more errors show up then I'll replace the drive, but at this point, my gut still says the drive is fine and something weird just happened on the software side as a result of the bad cable/CRC errors.
 

Attachments

  • smart.txt
    41.3 KB · Views: 241

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The safer method would be to map the IDs of the others drives, the one who doesn't match is the one you're looking for :)

Look at the useful scripts link in my sig if you don't want to do this by hand ;)
 

qwertymodo

Contributor
Joined
Apr 7, 2014
Messages
144
Ok, I think that got me the information I needed. Wish me luck and hold my Mountain Dew, I'm going in!
 

qwertymodo

Contributor
Joined
Apr 7, 2014
Messages
144
Well, looks like I was totally overthinking this whole thing. Didn't need to wipe anything, or even manually identify it. I just went to volume status, clicked on the unavailable drive and hit replace, and the lone drive was the only option. Clicked ok and we're on to resilvering. Yaaay.
 
Status
Not open for further replies.
Top