How can i tell which disk is failing

Status
Not open for further replies.

paulhuynh81

Dabbler
Joined
Jul 1, 2014
Messages
13
[root@IX-NAS01] ~# zpool status -v
pool: VOLUME01
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: scrub repaired 0 in 18h55m with 29 errors on Sun Jan 31 18:55:07 2016
config:

NAME STATE READ WRITE CKSUM
VOLUME01 DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/24a118b5-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/2537113f-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/25bad772-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/2657ab57-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/26dfa415-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/2779340c-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/5719d818-bf8c-11e5-ab4a-0015172fbae8 ONLINE 0 0 0
gptid/2897c2c0-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/29192059-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/299afe01-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/2a20b17a-360a-11e5-9eb4-0015172fbae8 ONLINE 0 0 0
gptid/2aa527f1-360a-11e5-9eb4-0015172fbae8 FAULTED 0 27 0 too many errors
gptid/016a257d-b643-11e5-9435-0015172fbae8 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

VOLUME01:<0x27d4>

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Fri Jan 15 03:45:46 2016
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/f6c7e84d-833c-11e5-84dd-0015172fbae8 ONLINE 0 0 0

errors: No known data errors
[root@IX-NAS01] ~#
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You can use my script to identify drives; look at the link Useful Scripts in my signature if you want it ;)

Edit: wow, how a RAID-Z2 can corrupt data if there's only one drive failing? ping @jgreco && @cyberjock
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Wondering if the system lacks ECC Ram? Also, it looks like this happened to the OP just about a month ago as well?

"Please help which disk is bad"

@joeschmuck mentioned that he noticed SSHD (Hybrid) drive in the SMART results... I have not looked too closely are the SMART results, but wonder if there are more than one SSHD and if that has any bearing as well?

OP, can you please provide your FreeNas system specs?
 

Stellardata

Cadet
Joined
Nov 8, 2015
Messages
2
Yep, just pull the failed disk out and insert a new one.

Heck you could try removing the failed disk and reinserting (the same failed disk) and seeing if it's detected and if it starts rebuilding.
 

paulhuynh81

Dabbler
Joined
Jul 1, 2014
Messages
13
Wondering if the system lacks ECC Ram? Also, it looks like this happened to the OP just about a month ago as well?

"Please help which disk is bad"

@joeschmuck mentioned that he noticed SSHD (Hybrid) drive in the SMART results... I have not looked too closely are the SMART results, but wonder if there are more than one SSHD and if that has any bearing as well?

OP, can you please provide your FreeNas system specs?
The system is dual xeon / 16 gb of ecc ram on supermicro mother board ... i rebuild the data domain box from emc and make it a free nas

As for disk all of my disk are
Model Family: Seagate Desktop SSHD
Device Model: ST2000DX001-1CM164
 

paulhuynh81

Dabbler
Joined
Jul 1, 2014
Messages
13
all the disk seem fine with smart check...
how can i tell it try to scan and repaired the zpool
 

paulhuynh81

Dabbler
Joined
Jul 1, 2014
Messages
13
also how can i tell what file is bad

errors: Permanent errors have been detected in the following files:

VOLUME01:<0x27d4>

pool: freenas-boot
state: ONLINE
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Sounds like a metadata error, not a good thing for the pool life... If you don't have backups you should backup now just in case ;)

If I'm right there's only two solutions to clear the error: make another pool, copy the data to it, destroy the first pool; or copy the data elsewhere, clear the error, copy the data back to the pool.
 
Status
Not open for further replies.
Top