Possible failure in near future?

Status
Not open for further replies.

Don1919

Cadet
Joined
Apr 4, 2014
Messages
4
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 5.64M in 0h0m with 0 errors on Fri Apr 4 21:10:21 2014
config:

NAME STATE READ WRITE CKSUM
RaidZ1 ONLINE 0 0
0
raidz1-0 ONLINE 0 0
0
gptid/299dce8e-02e3-11e2-9dbf-50e549deb030 ONLINE 0 0
0 block size: 512B configured, 4096B native
gptid/2a43be61-02e3-11e2-9dbf-50e549deb030 ONLINE 0 0
0 block size: 512B configured, 4096B native
gptid/2ae81dc9-02e3-11e2-9dbf-50e549deb030 ONLINE 0 0
39 block size: 512B configured, 4096B native

errors: No known data errors


(All 3 drives are seagate 2TB drives, model #ST2000DL003)


As of last night i actually had a degrade error, and this drive which is now online but has 39 cksum errors wasn't even running or being detected.

I recently (today 4/4/14) ran sealtools on the drive that stopped showing up in freenas, upon booting it beeped 3 or 4 times, turned off, then rebooted and ran fine for the 8hours it took to run both general scans (short and long) both of which came back with no errors what so ever.

Upon placing back into my nas i get the following error code which i first posted. I've already purchased a new drive but now that it seems to be up and running with no issues I'm just wondering if its worth replacing?

Anyone able to shed some light on this for me? first time ever having a failed drive in my nas.

Edit: Started to perform a scrub, looks like as if its upto 377 chksum errors now. Assuming its time to replace. However scrub says 6hours remaining atm.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Replace your SATA cable first. And I'm not saying it is a good drive, just this is the first thing to do. If you could post the output of 'smartctl -a /dev/adax' (for that drive) that could help.
 

Don1919

Cadet
Joined
Apr 4, 2014
Messages
4
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 68.9M in 4h50m with 0 errors on Sat Apr 5 02:35:42 2014

config:

NAME STATE READ WRITE CKSUM
RaidZ1 ONLINE 0 0
0
raidz1-0 ONLINE 0 0
0
gptid/299dce8e-02e3-11e2-9dbf-50e549deb030 ONLINE 0 0
0 block size: 512B configured, 4096B native
gptid/2a43be61-02e3-11e2-9dbf-50e549deb030 ONLINE 0 0
0 block size: 512B configured, 4096B native
gptid/2ae81dc9-02e3-11e2-9dbf-50e549deb030 ONLINE 0 0
18.4K block size: 512B configured, 4096B native


This is after the scrub, 18.4k now.




Heres from the command you listed.


Edit: had to upload a photo, output looked fine when pasted in but was completely messed up after posting
 

Attachments

  • output.jpg
    output.jpg
    240.4 KB · Views: 134

Don1919

Cadet
Joined
Apr 4, 2014
Messages
4

joelmusicman

Patron
Joined
Feb 20, 2014
Messages
249
I'll keep digging but i cannot say if it is or isn't, but since it was just standard desktop memory i'd assume it wasn't ECC.

That's usually a pretty safe assumption.

You may be too late since you just did a scrub, but you should definitely think about backups for your important media if you haven't already.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
ECC RAM has nothing to do with this type of problem otherwise I would have mentioned it, although having ECC RAM is a very good thing when dealing with ZFS. Lack of ECC RAM could result in data corruption during a scrub, not drives going offline or having chksum errors. Also, running a scrub does no harm providing your RAM is fine but that is the risk you take not using ECC RAM, again, it's not a RAM issue for this thread.

Don't let all those high values in that smart report concern you too much, they can be that high under normal conditions for many drives. If you have values in ID's 5, 196, or 197 then you are looking at a hard drive failure. I would replace the SATA cable first (I prefer locking cables). If you have another open SATA port you could plug it in there as well in case it's your MB connector.
 

joelmusicman

Patron
Joined
Feb 20, 2014
Messages
249
Swap the cable and then do short and long SMART tests before deciding that the HDD is bad...
 

Don1919

Cadet
Joined
Apr 4, 2014
Messages
4
Cable has been swapped, will run another short and long test to ensure. However when i did run them before it was with another cable which it came back with zero errors.

Outside of this, any other suggestions? or pretty much a wait game to see if it actually fails?

Edit: after reboot with new cable checksum error says 0, normal?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Just wait to see what happens. The problem you had is indicative of a SATA cable connection issue.
 
Status
Not open for further replies.
Top