Zpool Degraded - Please Help A Newcomer

molko · May 4, 2014

Hello,

We had a power cut yesterday, I've a feeling it's taken one of my hard drives down - which is a bit unfortunate. I have 4 x 3TB in a raidz1-0 configuration.

When initially start my machine up, I can take a 'zpool status' and everything looks good.

Code:

[root@freenas] ~# zpool status
  pool: laurelstore
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 3.58M in 0h0m with 0 errors on Mon May  5 06:33:08 2014
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        laurelstore                                    ONLINE      0    0    0
          raidz1-0                                      ONLINE      0    0    0
            gptid/99cd04be-6ba5-11e3-a258-10604b92b178  ONLINE      0    0    0
            gptid/9aadfe71-6ba5-11e3-a258-10604b92b178  ONLINE      0    0    5
            gptid/9b8c3302-6ba5-11e3-a258-10604b92b178  ONLINE      0    0    0
            gptid/9c688c02-6ba5-11e3-a258-10604b92b178  ONLINE      0    0    0
        logs
          gptid/9cc5a6e1-6ba5-11e3-a258-10604b92b178    ONLINE      0    0    0
 
errors: No known data errors

and then after 3-4 mintutes I get a device failed

Code:

[root@freenas] ~# zpool status
  pool: laurelstore
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 3.58M in 0h0m with 0 errors on Mon May  5 06:33:08 2014
config:

        NAME                                            STATE     READ WRITE CKSUM
        laurelstore                                     DEGRADED     0     0     0
          raidz1-0                                      DEGRADED     0     0     0
            gptid/99cd04be-6ba5-11e3-a258-10604b92b178  ONLINE       0     0     0
            16291702972151024056                        REMOVED      0     0     0  was /dev/gptid/9aadfe71-6ba5-11e3-a258-10604b92b178
            gptid/9b8c3302-6ba5-11e3-a258-10604b92b178  ONLINE       0     0     0
            gptid/9c688c02-6ba5-11e3-a258-10604b92b178  ONLINE       0     0     0
        logs
          gptid/9cc5a6e1-6ba5-11e3-a258-10604b92b178    ONLINE       0     0     0

errors: No known data errors

I'd like to initially try and use the drive again, i'm not totally convinced it's failed TBH - I have only had all the drives around 4months and I would have expected greater reliability than that.

I have tried the zpool replace and get no joy

Code:

[root@freenas] ~# zpool online laurelstore 16291702972151024056                                  
warning: device '16291702972151024056' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present
 
[root@freenas] ~# zpool replace laurelstore 16291702972151024056 9aadfe71-6ba5-11e3-a258-10604b92b178
cannot open '9aadfe71-6ba5-11e3-a258-10604b92b178': no such GEOM provider
must be a full path or shorthand device name

Am I doing something wrong ? Or is it likely that the disk actually is screwed ?

Really appreciate any sort of help.

Thanks

molko · May 4, 2014

When i try to perform the same action through the GUI I can't replace it with anything, please see attached screenshot.

Thanks

warri · May 4, 2014

This definitely looks like a failing device, can you post the output of smartctl -q noserial -a /dev/ada1?
I wouldn't take my chances with an unreliable drive, especially not in a RaidZ1 pool.

If it's only 4 month old, chances are that you can RMA it if a SMART test fails. Btw it is not uncommon for drives to die within the first hours of operation, that's why many people do "burn-ins" and stress tests before trusting a new hdd.

9C1 Newbee · May 5, 2014

I'd go to the store as soon as possible and grab another 3Tb drive to replace the failed one. I would RMA the failed one and keep it as a spare.

I have 2 spares myself.

molko · May 5, 2014

Hi

Really appreciate the response guys.....thanks

Currently I have powered down the server - There is too much content on there and I dont want to risk any further HDD failures

I have ordered 2 new WD Red, hopefully I can pick them up tomorrow. As suggested I'll RMA the failed HDD, i've just checked my receipts and it is actually 6months old !

Thanks again

m

Important Announcement for the TrueNAS Community.

Zpool Degraded - Please Help A Newcomer

molko

Dabbler

molko

Dabbler

Attachments

warri

Guru

9C1 Newbee

Patron

molko

Dabbler

Similar threads