Stuck in replacing, need help

Status
Not open for further replies.

doron

Cadet
Joined
Apr 27, 2012
Messages
3
Hi folks,
We have a FreeNAS 8.0.3 server stuck in a funny state that we seem unable to sort out.
It's a raidz1, 5x 2TB HDDs.
The short version of the story is that one disk (ada3) failed. We pulled it out and replaced it with another 2TB drive, and issued replace from the GUI.
It started resilvering, then died for "too many errors", stayed "degraded" and "replacing". We have tried to scrub the pool. Scrub failed like 3 times, then on the fourth time it went on for 16 hours. It completed resilvering with no errors(!). Now, the state is:
Code:
[SSH@NewNAS] /> zpool status nas
  pool: nas
 state: DEGRADED
 scrub: none requested
config:

        NAME                        STATE     READ WRITE CKSUM
        nas                         DEGRADED     0     0     0
          raidz1                    DEGRADED     0     0     0
            ada0p2                  ONLINE       0     0     0
            ada1p2                  ONLINE       0     0     0
            ada2p2                  ONLINE       0     0     0
            replacing               DEGRADED     0     0     0
              11620499617465712447  UNAVAIL      0     0     0  was /dev/ada3p2/old
              ada3p2                ONLINE       0     0     0
            ada5p2                  ONLINE       0     0     0

errors: No known data errors
[SSH@NewNAS] />


Note the state of ada3p2 above. Nothing that we do (e.g. clear) takes the pool out of DEGRADED into ONLINE.

What to do now? How to get rid of the "replacing" and get our pool back to ONLINE?

Thanks!
 

doron

Cadet
Joined
Apr 27, 2012
Messages
3
Thanks for the quick response!
In fact, we have RTFM, however the docs seem to indicate that "replacing" will be cleared once resilvering is done, and then I need to "detach". I'm kinda wary to do a detach when it is still in this "replacing" state. Is the detach safe? When you say you've seen it to not work - what has actually happened in those cases?
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
What do you get when you do "zpool status -v"?

If it doesn't say it's still resilvering or scrubbing, then detaching /dev/ada3p2/old is what you need to do next.

In the cases where it hasn't worked, people freaked out and started issuing random commands they didn't understand and they were never able to remove the OLD device and the pool stayed in the DEGRADED state. They eventually gave up and recreated their pool or got pissed and switched to some other OS.

I'm about to head off to bed, so I hope you're able to get it going ok!
 

doron

Cadet
Joined
Apr 27, 2012
Messages
3
I'm about to head off to bed, so I hope you're able to get it going ok!

Hope you slept well... we sure are going to. Detach did the trick (a "doh" moment), and the pool is now online. We will still try to figure out whether any file has in fact been corrupted (multiple cases of i/o errors and checksum errors during the silvering hint that there might still be some damage lurking somewhere) but it looks way better now. (I was reluctant to "just do it" cuz the ZFS manual I was looking at stated that when you are in "replacing", after a few(...) "zpool status" it should finish replacing and be back online. I suppose that manual was a bit off.

Thanks for the quick and effective help!!
 
Status
Not open for further replies.
Top