Possible two disk failure on RAIDZ1? Next steps?

tomf84 · Apr 4, 2013

Hi

Trying to work out if I am just the victim of bad luck here - and actions I should take to dig myself back out.

I have a RAIDZ1 pool of 4x 1TB SATA drives. A few days ago, I got some READ errors on ada1 (6659063e-3dd8-11e1-b2b4-6cf049d3b305), so I was just getting ready to offline it ready to power down the box to replace the disk.

Powered it down this morning, booted it up this evening (untouched) to find "zfs status" now giving me this:

pool: tank0
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scan: scrub canceled on Thu Apr 4 20:15:05 2013
config:

NAME STATE READ WRITE CKSUM
tank0 DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/65e94683-3dd8-11e1-b2b4-6cf049d3b305 ONLINE 0 0 0
gptid/6659063e-3dd8-11e1-b2b4-6cf049d3b305 ONLINE 0 0 0
gptid/66cac68e-3dd8-11e1-b2b4-6cf049d3b305 ONLINE 0 0 0
1765056975075706550 UNAVAIL 0 0 0 was /dev/gptid/6759ec40-3dd8-11e1-b2b4-6cf049d3b305

(I started a scrub then cancelled it straight away.)

Before all of this, glabel status showed the following. Now the line with ada3 is missing.

gptid/65e94683-3dd8-11e1-b2b4-6cf049d3b305 N/A ada0p2
gptid/6659063e-3dd8-11e1-b2b4-6cf049d3b305 N/A ada1p2
gptid/66cac68e-3dd8-11e1-b2b4-6cf049d3b305 N/A ada2p2
gptid/6759ec40-3dd8-11e1-b2b4-6cf049d3b305 N/A ada3p2

ada3 appears in /dev but ada3p2 does not - unlike all of the others.

[root@bunker] ~# ls /dev/ada*
/dev/ada0 /dev/ada0p1 /dev/ada0p2 /dev/ada1 /dev/ada1p1 /dev/ada1p2 /dev/ada2 /dev/ada2p1 /dev/ada2p2 /dev/ada3

I can run smartctl -a against ada3, and to my dismay I see errors:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 067 067 011 Pre-fail Always - 10680
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 533
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10061
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 23230
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 322
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 025 025 000 Pre-fail Always - 75
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 083 037 000 Old_age Always - 17 (Min/Max 10/17)
194 Temperature_Celsius 0x0022 074 036 000 Old_age Always - 26 (4 227 26 10 0)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 55577
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 5
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 1
201 Soft_Read_Error_Rate 0x000a 253 253 000 Old_age Always - 0

A couple of days ago this read as follows (for the same drive):

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0007 069 069 011 Pre-fail Always - 9950
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 532
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 100 100 051 Pre-fail Always - 0
8 Seek_Time_Performance 0x0025 100 100 015 Pre-fail Offline - 10061
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 23229
10 Spin_Retry_Count 0x0033 100 100 051 Pre-fail Always - 0
11 Calibration_Retry_Count 0x0012 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 321
13 Read_Soft_Error_Rate 0x000e 100 100 000 Old_age Always - 0
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 000 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 069 037 000 Old_age Always - 31 (Min/Max 30/43)
194 Temperature_Celsius 0x0022 067 036 000 Old_age Always - 33 (4 227 43 30 0)
195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 238100568
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x000a 100 100 000 Old_age Always - 0
201 Soft_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0

Have I just encountered the worst enemy of RAIDZ1 - failure of two disks?

The whole pool is backed up to an external 3TB drive (about 2.5TB of data exists), and the backup was verified on the external drive with a zfs scrub. So all is not absolutely lost if my pool is toast.

However I'd really like to recover this pool - partly as a learning exercise, partly because I can't be THAT unlucky and I'm not seeing something obvious. Right?

Just really don't know what my next move should be.

Cheers....

cyberjock · Apr 4, 2013

Can you post the output of gpart list, camcontrol devlist, and zpool status?

Put them in CODE tags so they are easier to read.

You do appear to be the "lucky" person that sees RAIDZ1 fail you. Really, it shouldn't be much of a surprise to see multiple disks fail in quick succession. If you buy them all new they are all the same batch most likely. Then the see almost identical wear and tear over time, so why wouldn't multiple drives fail at nearly the same time?

The Google white paper on hard drive lifespans found that if 1 disk fails in a machine the chances of a second disk failing go up by 48%!

Now you know why I never recommend RAIDZ1 and mirrors. If you have 1 disk fail that second disk could be coming along alot faster than you think. Not to mention that most people don't hover over their servers every day, so if I disk starts getting flaky how long will it take you to notice? Hopefully less time than it takes to see a second or third fail. :)

Also, I always recommend that people that don't do backups(some of my friends) always keep a spare hard drive on the shelf so when they have a disk fail they have a replacement they can drop in immediately.

titan_rw · Apr 4, 2013

Out of 8 drives, I had to rma 3 of them at one point. 3 at the same time, not one at a time.

So yes, multiple drive failures are not uncommon.

paleoN · Apr 4, 2013

tomf84 said:
Just really don't know what my next move should be.

Safest thing would be to make a block copy of the troublesome disks to new replacement disks first.

Some long SMART tests would be useful for these drives with the full output in [code][/code] tags.

cyberjock said:
Also, I always recommend that people that don't do backups(some of my friends) always keep a spare hard drive on the shelf so when they have a disk fail they have a replacement they can drop in immediately.

I'd argue this applies to everyone regardless.

cyberjock · Apr 4, 2013

paleoN said:
I'd argue this applies to everyone regardless.

I totally agree but too many people aren't building their system properly(in my definition of properly) because of cost and then upset when they lose everything. So while I can understand people cutting corners(I'm a disabled vet myself so I'm not exactly filthy rich) you have to balance the cost of your hardware versus the value of your data. :)

tomf84 · Apr 5, 2013

Just booted it all up again. ada3 appears to have made a reappearance - but boot was very slow with the log talking about device timeouts.

Code:


[root@bunker] ~# zpool status
  pool: tank0
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub canceled on Thu Apr  4 20:15:05 2013
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank0                                           ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/65e94683-3dd8-11e1-b2b4-6cf049d3b305  ONLINE       0     0     0
            gptid/6659063e-3dd8-11e1-b2b4-6cf049d3b305  ONLINE       0     0     0
            gptid/66cac68e-3dd8-11e1-b2b4-6cf049d3b305  ONLINE       0     0     0
            gptid/6759ec40-3dd8-11e1-b2b4-6cf049d3b305  ONLINE       4 3.09K     4

errors: No known data errors

Here's the relevant output from gpart list

Code:

Geom name: ada0
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 1953525134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada0p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e0
   rawuuid: 65d206bd-3dd8-11e1-b2b4-6cf049d3b305
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada0p2
   Mediasize: 998057319936 (929G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 65e94683-3dd8-11e1-b2b4-6cf049d3b305
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 998057319936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 1953525134
   start: 4194432
Consumers:
1. Name: ada0
   Mediasize: 1000204886016 (931G)
   Sectorsize: 512
   Mode: r2w2e4

Geom name: ada1
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 1953525134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada1p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e0
   rawuuid: 66431ec9-3dd8-11e1-b2b4-6cf049d3b305
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada1p2
   Mediasize: 998057319936 (929G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 6659063e-3dd8-11e1-b2b4-6cf049d3b305
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 998057319936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 1953525134
   start: 4194432
Consumers:
1. Name: ada1
   Mediasize: 1000204886016 (931G)
   Sectorsize: 512
   Mode: r2w2e4

Geom name: ada2
modified: false
state: OK
fwheads: 16
fwsectors: 63
last: 1953525134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: ada2p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e0
   rawuuid: 66b4d5c7-3dd8-11e1-b2b4-6cf049d3b305
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: ada2p2
   Mediasize: 998057319936 (929G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 66cac68e-3dd8-11e1-b2b4-6cf049d3b305
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 998057319936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 1953525134
   start: 4194432
Consumers:
1. Name: ada2
   Mediasize: 1000204886016 (931G)
   Sectorsize: 512
   Mode: r2w2e4

Geom name: ada3
Providers:
1. Name: ada3p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e0
2. Name: ada3p2
   Mediasize: 998057319936 (929G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
Consumers:
1. Name: ada3
   Mediasize: 1000204886016 (931G)
   Sectorsize: 512
   Mode: r2w2e4

Here's the output from camcontrol devlist

Code:

<Hitachi HDS721010CLA332 JP4OA3EA>  at scbus0 target 0 lun 0 (pass0,ada0)
<Hitachi HDS721010CLA332 JP4OA3MA>  at scbus1 target 0 lun 0 (pass1,ada1)
<Hitachi HDS721010CLA332 JP4OA3MA>  at scbus3 target 0 lun 0 (pass2,ada2)
<SAMSUNG HD103UJ 1AA01113>         at scbus4 target 0 lun 0 (pass3,ada3)
<SanDisk Cruzer 7.01>              at scbus7 target 0 lun 0 (pass4,da0)

Current thinking

- offline ada3
- power down
- replace that Samsung drive with the spare I have on the shelf
- power up
- zpool replace, wait for resilver
- scrub?

then
- power down
- wait for newly purchased drive to arrive (Tuesday) - the Samsung is out of warranty
- power up
- offline ada1
- power down
- replace the Deskstar (HZ241ENU) with the newly purchased drive
- power up
- zpool replace, wait for resilver
- scrub

Longer term

- Upgrade remaining drives to 2TB drives (the one I bought is a 2TB)
- Back up pool to two external 3TB drives
- Destroy pool
- Create new pool 4x 2TB RAIDZ2
- Restore backup

This seem reasonable?

Cheers and have a great weekend.

paleoN · Apr 5, 2013

tomf84 said:
Current thinking

- offline ada3
- power down
- replace that Samsung drive with the spare I have on the shelf

No. Wait until Tuesday and you have the replacement disk unless you are hoping to restore from backup. I would make a block copy of ada1 using ddrescue to the spare. Then attempt a block copy of ada3, which still has serious issues, using ddrescue to the new disk. I'm suggesting this way in case the copy from ada1 isn't completely clean and you are able to pull off enough from ada3 to still provide redundancy for any missing bits.

paleoN said:
Furthermore, for any shutdown/replace I actually wouldn't offline the old device first. Removing it and having it be UNAVAIL simplifies things if you need to put the original back in for any reason. Now if I was hot-swapping, I would always offline the drive first.

tomf84 said:
Longer term

Looks good to me.

tomf84 · Apr 6, 2013

Now ddrescue looks like a very cool tool, and one I wasn't familiar with at all, so thanks for that. Going to kick off the block copy of ada1 to the spare shortly.

tomf84 · Apr 6, 2013

Little update. Got a copy of ada1 using ddrescue on to the spare disk. No errors reported.

Now just got to wait for the disk to arrive, attempt the ddrescue copy on that.

Guessing here, and won't attempt to boot back into FreeNAS until I get confirmation, but swapping out the physical drives should be transparent to ZFS right? I.e. when I get the second bad drive ddrescue'd to the second new disk, these disks just take the place of the two bad ones from the array? Therefore just replacing the physical medium with good hardware, containing as much as the original data as possible, and letting ZFS deal with verifying the correctness of the data via a scrub, is good enough? (no resilvering?)

Cheers
Tom

paleoN · Apr 6, 2013

tomf84 said:
Guessing here, and won't attempt to boot back into FreeNAS until I get confirmation, but swapping out the physical drives should be transparent to ZFS right? I.e. when I get the second bad drive ddrescue'd to the second new disk, these disks just take the place of the two bad ones from the array? Therefore just replacing the physical medium with good hardware, containing as much as the original data as possible, and letting ZFS deal with verifying the correctness of the data via a scrub, is good enough? (no resilvering?)

Exactly correct. Normally one would just replace the drive & resilver, but your situation is a bit different. Two drives having issues and you only have a single-parity array. I'd wait until Tuesday before booting anything up, ada3 in particular. Once the copy is done a scrub will verify all the data and repairing, if necessary, what it can.

Z300M · Apr 16, 2013

Really, it shouldn't be much of a surprise to see multiple disks fail in quick succession. If you buy them all new they are all the same batch most likely. Then the see almost identical wear and tear over time, so why wouldn't multiple drives fail at nearly the same time?

The Google white paper on hard drive lifespans found that if 1 disk fails in a machine the chances of a second disk failing go up by 48%!

I bought four 2TB Seagate drives at the same time 18 months or so ago and had to RMA one of them about 9 months ago, but there has been no sign of a problem with the other three or with the identical replacement I bought when I discovered that the replacement Seagate would be sending was a different (later) model with a far less satisfactory user rating on NewEgg.

I set this NAS machine up with four 2TB drives in RAIDZ1 (plus a hot spare that I added later) and now wish that I had sprung for the extra drive in the first place and used RAIDZ2. If I am not mistaken, the only way to accomplish this now would be to buy another set of drives either as permanent storage or as temporary storage from which I can copy back to a recreated RAIDZ2 version of the present pool. If I had a separate backup machine, things would be much simpler, but this NAS box is my backup machine for several other computers running a couple of different operating systems.

tomf84 · May 3, 2013

Hi

Just wanted to come back and thank everyone, specifically paleoN, for the help and advice.

The ddrescue of both drives followed by scrub worked perfectly and everything is back to normal.

Cheers
Tom

paleoN · May 3, 2013

tomf84 said:
The ddrescue of both drives followed by scrub worked perfectly and everything is back to normal.

Excellent and thanks for reporting the outcome.

tomf84 · Jul 11, 2013

As a follow-up, I now suspect that either one or both of these failed drives were actually completely fine.

This is on the basis of the drive dropout problem I describe in this thread.

I have now migrated to Ubuntu Server with ZFS-on-Linux, and am suffering zero issues after several weeks of use.

Thanks for talking me through this issue.

Important Announcement for the TrueNAS Community.

Possible two disk failure on RAIDZ1? Next steps?

tomf84

Dabbler

cyberjock

Inactive Account

titan_rw

Guru

paleoN

Wizard

cyberjock

Inactive Account

tomf84

Dabbler

paleoN

Wizard

tomf84

Dabbler

tomf84

Dabbler

paleoN

Wizard

Z300M

Guru

tomf84

Dabbler

paleoN

Wizard

tomf84

Dabbler

Similar threads