Disc died after update, coincidence or not?

Toydoll

Dabbler
Joined
Sep 17, 2015
Messages
33
Hi

I just did an update from Freenas 11.2-U2 to U8 and after the reboot I was met with a failed Hard drive. A failed HD is of course nothing odd and I realise that booting a device is more taxing than normal on the drives so I guess it's not that strange that they die after an update.

But...

That particular drive died without any warning (no odd sounds or smart warnings etc.) and it's only a little more than one year old, so it's way too early to give in now. On top of that my experience with updating freenas is... not always a streamlined process.

So while this feels like a very dumb question I guess it never hurts to ask; Is there any chance that the actual update did some wonky stuff with my machine which makes it believe my disk is failed?
Is there anything I can do to try and resurrect the drive, maybe a command that forces freenas to do a thorough search and repair or whatever?

Thanks, and have a nice day.


Edit:
To clarify, I run raiz2 so there is no (big) risk of losing data yet. The drives in that vdev are all WD red 6tb.
 
Last edited:

Toydoll

Dabbler
Joined
Sep 17, 2015
Messages
33
So... 28 minutes later...

I opened up the UI again to see which rive was broken so I could try and see if something was physically wrong (loose cable etc) and now everything works fine.

Not sure what to think of that.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Is there any chance that the actual update did some wonky stuff with my machine which makes it believe my disk is failed?

Basically no.

You've experienced a transient error, but without a bunch of rummaging you may never know if it was hardware or software. It may have been nothing more exciting than the drive being a touch slow to respond on the reboot. If you want to rummage, take a look through /var/log/messages before you reboot next, and see if that drive appeared in the bootup disk list. You may find CAM errors or others referring to it, search for it by the adaXX or daXX or whatever number in the log.

Personally I'd power it down and reseat all the SATA and power cables after taking that look!
 

Toydoll

Dabbler
Joined
Sep 17, 2015
Messages
33
I've already rebooted it, mainly to see if it still worked, which it did. I did also look at all the (physical) connections, they seemed to all be as they should be.

But it's in the process of scrubbing right now and I was sent a mail stating that "One or more devices has experienced an unrecoverable error".

Code:
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub in progress since Sat Apr  4 06:00:02 2020
    24.1T scanned at 902M/s, 22.9T issued at 856M/s, 26.1T total
    8K repaired, 87.74% done, 0 days 01:05:19 to go
config:
    NAME                                            STATE     READ WRITE CKSUM
    head                                            ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/64cc2c30-5d2f-11e9-9b78-ac1f6b9743f0  ONLINE       0     0     0
        gptid/68574735-5d2f-11e9-9b78-ac1f6b9743f0  ONLINE       0     0     0
        gptid/6d135cf5-5d2f-11e9-9b78-ac1f6b9743f0  ONLINE       0     0     0
        gptid/71bb875f-5d2f-11e9-9b78-ac1f6b9743f0  ONLINE       0     0     0
        gptid/76845582-5d2f-11e9-9b78-ac1f6b9743f0  ONLINE       0     0     0
        gptid/7b3efb96-5d2f-11e9-9b78-ac1f6b9743f0  ONLINE       0     0     0
      raidz2-1                                      ONLINE       0     0     0
        gptid/3ed9b1a9-5d74-11e9-86e5-ac1f6b9743f0  ONLINE       0     0     0
        gptid/425b0efc-5d74-11e9-86e5-ac1f6b9743f0  ONLINE       0     0     0
        gptid/4748f2c2-5d74-11e9-86e5-ac1f6b9743f0  ONLINE       0     0     0
        gptid/4afce277-5d74-11e9-86e5-ac1f6b9743f0  ONLINE       0     0     0
        gptid/4f50b128-5d74-11e9-86e5-ac1f6b9743f0  ONLINE       0     0     0
        gptid/53199a62-5d74-11e9-86e5-ac1f6b9743f0  ONLINE       0     0     2


I see that there are a bunch of forum threads about exactly this so I'll dive into it a bit later today. But in my inexperienced eyes it does look like one disc is acting up.

What I don't understand though is that is says it has exeperienced an "unrecoverable error" but a zpool status says "8K repaired". Does that mean that it was repaired or nor? Or just partly repaired? Or doesn't it work like that at all?
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
How did it look after the scrub completed?

The "Unrecoverable error" is referring to a problem at one HDD where 8K could not be read.
The pool's data was repaired, that 8K was reconstituted from a redundant copy/checksum block elsewhere.
The HDD which experienced the error may or may not be working correctly now. Usually FreeNAS will email you to say how broken a disk is, and will show it in the Alerts in the UI. It may show up as an r/w/cksum issue in zpool status, or you may have to go poking at it from the command line with smartctl -a /dev/ada1 or whichever device it is.

By the way, when I said "reseat all the power and sata connections" I meant pull them out and push them back in, not just eyeball them. Environmental degradation can cause cable+socket issues, which mechanically removing+reseating can often fix.
 
Top