Drive Failure - Not rebuilding?

Status
Not open for further replies.

TurboNewb

Cadet
Joined
Jan 20, 2015
Messages
8
FreeNAS 9.2.1.8

Had a failure of da9 in a pool. Used the GUI to replace disk with a newly inserted drive that was indicated to be da12. The system said please wait, but the box actually rebooted.

When the system came back up the order of the da# of the disk appears to have changed. Based on the serial number, the new drive which was da12 is now da4 and the supposed bad drive is now da9. The new drive doesn't appear to be in the pool based on the spool status -v below., nor does the array look to be reslivering.

There does appear to be data appear to still be there in the shares, but performance is beyond abysmal (1-2k transfer rates, 30min to copy 60Mb).

Want to reach out for what to do for the next step so I don't send this all crashing down as we do want to have at least a 90% recovery rate on this data. A few corrupt files on a re-build is fine as they are just photos.

Code:
[root@freenas ~]# zpool status
pool: CamsSegate
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: none requested
config:

NAME STATE READ WRITE CKSUM
CamsSegate ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/16ef361b-9a82-11e4-8ace-003048356c44 ONLINE 0 0 0
gptid/1755f106-9a82-11e4-8ace-003048356c44 ONLINE 0 0 0
gptid/17b96800-9a82-11e4-8ace-003048356c44 ONLINE 0 0 0
gptid/181e89ec-9a82-11e4-8ace-003048356c44 ONLINE 0 0 0
gptid/188337b1-9a82-11e4-8ace-003048356c44 ONLINE 0 0 0
gptid/18e78791-9a82-11e4-8ace-003048356c44 ONLINE 0 0 4

errors: No known data errors]



System:
16Bay Super Micro dual 54XX Xeon
32Gb ECC
2X LSI cards in IT mode firmware V20
12 3Gb SATA HDD


Thanks!!
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
2X LSI cards in IT mode firmware V20

First of all, downgrade to P16. We've seen very nasty results with P20, despite them supposedly being identical in IT mode.

Next, give us the SMART output for the drive with the four errors. Pastebin, please. Note that I'm assuming all those drives are in the same vdev (all with the same indent) - if they're not, you have a big problem.

We'll see what to do next at that point.
 

TurboNewb

Cadet
Joined
Jan 20, 2015
Messages
8
Thank you Ericloewe!

I flashed to the P16 firmware and the alert status changed back to green. All drives in volume status of zeros across the board on read, write, and checksum with an online status. I went in and ran a SMART short self test on every drive in the system (smartctl -t short /dev/x) and viewed the results (smartctl -a /dev/xx) alll tests came back completed without error and values well above the thresholds!

Did a quick copy off a CIFS share and performance was back to normal.

At this point I think we are good unless you can think of any other tests I should run.

Thanks for the help!

I case anyone needs it the P16 firmware is available here as it is not available under the normal section of LSI's site. http://www.lsi.com/downloads/Public...P16_IR_IT_Firmware_BIOS_for_MSDOS_Windows.zip

Thanks to diehard for that link.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Green status is always the case after a reboot (for ZFS errors at least), but the lack of problems in SMART is encouraging, as is the performance improvement.

@cyberjock - assuming this isn't a hidden gremlin waiting to strike again, what do you make of this? Are the LSI guys living in an alternate fantasy world? P20 seems crazy incompatible with the FreeBSD driver, despite the supposed lack of changes.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
@cyberjock - assuming this isn't a hidden gremlin waiting to strike again, what do you make of this? Are the LSI guys living in an alternate fantasy world? P20 seems crazy incompatible with the FreeBSD driver, despite the supposed lack of changes.

Not sure. The lead FreeNAS dev is investigating. He says he asked the LSI guys about this and they are swearing they've heard nothing about problems, but you and I both have seen it plenty of times to know it's not a random coincidence. *something* is not right with the P20 firmware and our driver. Now if there is another part that is required (such as some broken firmware in certain disks) I don't know.

@TurboNewb
Can you post the output of #camcontrol devlist please?
 

TurboNewb

Cadet
Joined
Jan 20, 2015
Messages
8
Can you post the output of #camcontrol devlist please?

Sure thing. By the way cyberjock, thank you for all of your help around here and the guide, it has been a great resource!


It was one of the Segate 3Tb drives that had been identified as failed.

Output:
Code:
<ATA HGST HDN724030AL A5E0>  at scbus2 target 0 lun 0 (da0,pass0)
<ATA WDC WD30EFRX-68E 0A82>  at scbus2 target 1 lun 0 (da1,pass1)
<ATA WDC WD30EFRX-68E 0A82>  at scbus2 target 2 lun 0 (da2,pass2)
<ATA WDC WD30EFRX-68E 0A82>  at scbus2 target 3 lun 0 (da3,pass3)
<ATA WDC WD30EFRX-68E 0A82>  at scbus2 target 4 lun 0 (da4,pass4)
<ATA ST3000VN000-1HJ1 SC60>  at scbus3 target 0 lun 0 (da5,pass5)
<ATA ST3000VN000-1HJ1 SC60>  at scbus3 target 1 lun 0 (da6,pass6)
<ATA WDC WD30EFRX-68E 0A82>  at scbus3 target 2 lun 0 (da7,pass7)
<ATA WDC WD30EFRX-68E 0A82>  at scbus3 target 3 lun 0 (da8,pass8)
<ATA ST3000VN000-1HJ1 SC60>  at scbus3 target 4 lun 0 (da9,pass9)
<ATA ST3000VN000-1HJ1 SC60>  at scbus3 target 5 lun 0 (da10,pass10)
<ATA ST3000VN000-1HJ1 SC60>  at scbus3 target 6 lun 0 (da11,pass11)
<ATA ST3000VN000-1HJ1 SC60>  at scbus3 target 7 lun 0 (da12,pass12)
<MATSHITA DVD-ROM SR-8178 PZ16>  at scbus4 target 1 lun 0 (cd0,pass13)
<Lexar USB Flash Drive 1100>  at scbus6 target 0 lun 0 (da13,pass14) 
 
Status
Not open for further replies.
Top