Replacing Degraded Disk - Issues

Status
Not open for further replies.

Peter Friedrich

Dabbler
Joined
Mar 11, 2016
Messages
13
Guys,

i had a red alert, ada4 failing. I shut down my system, ordered a new 500GB hdd (same size as the old one), attached it, restarted and started resilvering onto ada7.
Resilvering was completed with (i believe 66) errors. I was not able to detach ada4 after silvering, the option is just not there. Trying to detach in CLI:

[root@pienas01] ~# zpool detach PIENAS01VOL01 gptid/e561534e-f8ef-11e5-a0f1-0015176aa817
cannot detach gptid/e561534e-f8ef-11e5-a0f1-0015176aa817: no valid replicas
[root@pienas01] ~#

Scrubbing the volume (PIENAS01VOL01) did not help. So i deleted all affected files after stopping the collectd service( they were all rrd files in 'PIENAS01VOL01/.system/rrd-0dc2ca1e7fa9464d8c4d7c4fd81f6855:/localhost/'), and scrubbed the disk again and had 0 errors, but i still cannot detach the failing drive ada4 (gptid/e561534e-f8ef-11e5-a0f1-0015176aa817).

Any suggestions what i can do to get ada7 to replace ada4 so i can disconnect ada4?

Current status after another resilvering:

[root@pienas01] ~# zpool status
pool: PIENAS01VOL01
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 153G in 4h44m with 88 errors on Tue Jun 28 10:49:54 2016
config:

NAME STATE READ WRITE CKSUM

PIENAS01VOL01 DEGRADED 6 0 522
gptid/2ea2dc21-f219-11e5-9632-000129a2e2d2 ONLINE 0 0 0
gptid/2f7be694-f219-11e5-9632-000129a2e2d2 ONLINE 0 0 0
gptid/3073e5b5-f219-11e5-9632-000129a2e2d2 ONLINE 0 0 0
gptid/dd369694-1854-11e6-b479-0015176aa817 ONLINE 0 0 0
gptid/fd5fe6d1-f7ab-11e5-94de-0015176aa817 ONLINE 0 0 0
replacing-5 DEGRADED 6 0 1.02K
gptid/e561534e-f8ef-11e5-a0f1-0015176aa817 DEGRADED 6 0 1.02K too many errors
16811129386278411136 UNAVAIL 0 0 0 was /dev/gptid/52ab2899-3ba5-11e6-84d5-0015176aa817
13171746776257900660 OFFLINE 0 0 352 was /dev/gptid/6baee858-3bbe-11e6-be55-0015176aa817
2645496781613445355 OFFLINE 0 0 352 was /dev/ada7
gptid/13e4f014-3cb2-11e6-b2e3-0015176aa817 ONLINE 0 0 352

errors: 66 data errors, use '-v' for a list


pool: freenas-boot

state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM

freenas-boot ONLINE 0 0 0
gptid/31388efd-3ba1-11e6-9562-0015176aa817 ONLINE 0 0 0

errors: No known data errors

[root@pienas01] ~#
 

Peter Friedrich

Dabbler
Joined
Mar 11, 2016
Messages
13
I just upgraded to the newest version (FreeNAS-9.10-STABLE-201606270534 (dd17351)) and rebooted. Resilver started automatically:

[root@pienas01] ~# zpool status
pool: PIENAS01VOL01
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Tue Jun 28 17:13:14 2016
160G scanned out of 2.46T at 424M/s, 1h34m to go
18.9G resilvered, 6.37% done
config:

NAME STATE READ WRITE CKSUM
PIENAS01VOL01 DEGRADED 0 0 0
ada0p2 ONLINE 0 0 0
ada3p2 ONLINE 0 0 0
ada6p2 ONLINE 0 0 0
ada2p2 ONLINE 0 0 0
ada5p2 ONLINE 0 0 0
replacing-5 DEGRADED 0 0 0
ada4p2 ONLINE 0 0 0
16811129386278411136 UNAVAIL 0 0 0 was /dev/gptid/52ab2899-3ba5-11e6-84d5-0015176aa817
13171746776257900660 OFFLINE 0 0 0 was /dev/gptid/6baee858-3bbe-11e6-be55-0015176aa817
2645496781613445355 OFFLINE 0 0 0 was /dev/ada7
ada7p2 ONLINE 0 0 0 (resilvering)

errors: 66 data errors, use '-v' for a list

pool: freenas-boot
state: ONLINE
scan: none requested
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
ada1p2 ONLINE 0 0 0

errors: No known data errors
[root@pienas01] ~#


I am sure though, that resilver will finish with errors, and ada4p2 will not be automatically detached and replaced by ada7p2. I will know in a few hours, but what top do if i get the same result again? How can i successfully resilver that disk and get the broken one detached? of course, booting with the broken one disconnected will not be able to load the pool, so that does not work neither (i gave it a try.. though maybe i am lucky.. lol)

Any help, suggestions or advise really appreciated ;)
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
i still cannot detach the failing drive
Are you absolutely 100% sure you're attempting to replace the correct drive. In other words, did you figure out the serial number of the failing drive?
Any help, suggestions or advise really appreciated
Well ... you should consider rebuilding your pool with some redundancy.
 
Status
Not open for further replies.
Top