Device FAULTED in scrub but OK in SMARTCTL report

Status
Not open for further replies.

IanWorthington

Contributor
Joined
Sep 13, 2013
Messages
144
I'm astonished though to see so many CKSUM errors without even one reallocated or pending sector error. Does that make any sense?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm astonished though to see so many CKSUM errors without even one reallocated or pending sector error. Does that make any sense?
Sure, if it was an interface issue. Stuff that got past the CRC and ended up written.
 

IanWorthington

Contributor
Joined
Sep 13, 2013
Messages
144
Sure, if it was an interface issue. Stuff that got past the CRC and ended up written.

Can that explain how the error count has jumped from

Code:
gptid/4d50f8ba-da2a-11e3-90c3-002590878c66  FAULTED     70   187     0  too many errors


on the scrub last week to:

Code:
gptid/4d50f8ba-da2a-11e3-90c3-002590878c66  DEGRADED     0     0 12.0K  too many errors


this week?

Would you agree with ethereal that I should pull the disk from the array, run badblocks, reinsert and resilver? Can I rely on the SMART results that the disk itself it NOT actually failing?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Can I rely on the SMART results that the disk itself it NOT actually failing?
You can't prove a negative. SMART attributes and SMART tests can only report that they didn't find a problem, not that there is no problem present.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You can't prove a negative.
There's several you can. /pedantry

SMART attributes and SMART tests can only report that they didn't find a problem, not that there is no problem present.
Right, they're not infallible, but they're generally informative. If not the attributes, then the test results themselves.

Would you agree with ethereal that I should pull the disk from the array, run badblocks, reinsert and resilver? Can I rely on the SMART results that the disk itself it NOT actually failing?
Certainly worth a try before labeling the disk as dead. Burn it in again and see how things go.
 

IanWorthington

Contributor
Joined
Sep 13, 2013
Messages
144
Hmmm...

The GUI (9.3) wouldn't allow me to OFFLINE the disk, had to do a:

Code:
sudo zpool offline VOLUME1 /dev/gptid/4d50f8ba-da2a-11e3-90c3-002590878c66


But when I come to do a
Code:
badblocks -wsv /dev/da6


I get:

Code:
badblocks: Operation not permitted while trying to open /dev/da6


I can find this mentioned elsewhere on the forum but can't find the solution.

Any suggestions please?

UPDATE: Ignore this. I needed to do a

sudo sysctl kern.geom.debugflags=0x10

first.
 
Last edited:

IanWorthington

Contributor
Joined
Sep 13, 2013
Messages
144
Initial indications are that the HDD is ok. I'm going to allow badblocks to complete its four passes to be sure though.

I feel I need to at least try to explain those 12.0K CKSUM errors though. If SMARTCTL is not showing any increase in the CRC error count since I replaced the cable I don't see where they came from.

Code:
 %  sudo badblocks -wsv /dev/da6
Password:
Checking for bad blocks in read-write mode
From block 0 to 3907018583
Testing with pattern 0xaa: set_o_direct: Inappropriate ioctl for device
done
Reading and comparing: done
Testing with pattern 0x55:  18.70% done, 17:10:32 elapsed. (0/0/0 errors)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I feel I need to at least try to explain those 12.0K CKSUM errors though. If SMARTCTL is not showing any increase in the CRC error count since I replaced the cable I don't see where they came from.
They would have been present before the cable was replaced, and then the scrub detected them.
 

IanWorthington

Contributor
Joined
Sep 13, 2013
Messages
144
They would have been present before the cable was replaced, and then the scrub detected them.

They weren't detected by the scrub a week previously though:

Code:
  scan: scrub repaired 0 in 15h22m with 0 errors on Mon May  9 17:22:16 2016
 
        gptid/4d50f8ba-da2a-11e3-90c3-002590878c66  FAULTED     70   187     0  too many errors
 

IanWorthington

Contributor
Joined
Sep 13, 2013
Messages
144
Badblocks found nothing bad, resilvered ok with 0 errors.

Or do I need to scrub to be sure of 0 errors?

Code:
%  sudo badblocks -wsv /dev/da6
Password:
Checking for bad blocks in read-write mode
From block 0 to 3907018583
Testing with pattern 0xaa: set_o_direct: Inappropriate ioctl for device
done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found. (0/0/0 errors)


Code:
 % zpool status VOLUME1
  pool: VOLUME1
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 3.41T in 33h44m with 0 errors on Wed May 18 23:53:14 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        VOLUME1                                         ONLINE       0     0     0
          raidz3-0                                      ONLINE       0     0     0
            gptid/444302b9-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/44acaf47-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/458b61fe-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/45f04d30-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/46dd2963-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/47cdf0aa-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/48565317-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/48cd4928-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/4a58c8ac-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/4c508137-da2a-11e3-90c3-002590878c66  ONLINE       0     0     0
            gptid/cf48b9d2-1c62-11e6-be40-002590878c66  ONLINE       0     0     0

errors: No known data errors
 

ethereal

Guru
Joined
Sep 10, 2012
Messages
762
you've run badblocks and a resilver - personally that's enough for me.

but because the problem first presented during/after a scrub - you may want to run one.
in the end it's up to what you think is best.
 
Status
Not open for further replies.
Top