Worrying looking errors!!!!

Status
Not open for further replies.

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
I've just spotted this in my logs:

Sep 6 20:33:32 freenas smartd[1275]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Sep 6 20:33:33 freenas smartd[1275]: Device: /dev/ada2, 1 Currently unreadable (pending) sectors
Sep 6 20:33:33 freenas smartd[1275]: Device: /dev/ada2, 1 Offline uncorrectable sectors
Sep 6 20:33:33 freenas smartd[1275]: Device: /dev/ada3, 1 Currently unreadable (pending) sectors
Sep 6 20:37:00 freenas root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/ada2 offset=907635373568 size=512
Sep 6 20:37:00 freenas root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/ada2 offset=907128027136 size=512
Sep 6 20:37:00 freenas root: ZFS: checksum mismatch, zpool=storage path=/dev/gpt/ada2 offset=907128020992 size=512

This is all running off FreeNAS 8.0 release.

I've been seeing the smartd stuff throwing out errors for a while, but I just assumed that it was complaining because SMART hasn't been implemented. The checksum mismatch looks VERY worrying though. Is the drive on the way out?
 

pallfreeman

Dabbler
Joined
Sep 1, 2011
Messages
38
Depends if it's an old drive or a new one. If it's old, it's probably dying. If it's new, it's more likely still burning in.
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
How old is old? I guess this one is about 12 months or so. I guess I should swap it out and resilver. It seems to be constantly throwing up errors about ZFS checksums.

Any idea what the smartd errors are?

b0redom
 

satman80

Cadet
Joined
Aug 1, 2011
Messages
8
Burning in????? I haven't posted much or ever but really??? just sounds to me like a bad drive bro, even though I do not care for windoze but for simple, just for kicks, re-format it in windows, do some scans. there are also some other cd's you could burn to check the disk aka tools.. before turning it into trash i would do a little more testing on it.. New/old/mid way, it does not matter, when a drive goes it goes. OR its software not liking the drive it has been used in before, or something weird.

Respectfully,
Satman
 

pallfreeman

Dabbler
Joined
Sep 1, 2011
Messages
38
Yeah, burning in, but really.

One of the differences between so-called "Enterprise Grade" drives and those that we mere mortals get to use is that the expensive ones are subjected to much more testing, and for longer, than the cheap ones. It costs the manufacturer to do this, which is one reason why the drives cost more.

As you say, the thing to do is run scans on it. Forget Windoze, just dd random data onto the drive and read it back a few times. Or use something like PowerMax, which has a Burn In option to repeatedly write 0s. Random data verified on readback is better.

A slightly less intrusive way of doing this is to fill your zpool and scrub it several times, but that might not test every single block on the drive.
 

pallfreeman

Dabbler
Joined
Sep 1, 2011
Messages
38
.
Any idea what the smartd errors are?

They're telling you that you have disk errors. :)

I'd be a little scared that it was giving me errors on other drives too.
 

satman80

Cadet
Joined
Aug 1, 2011
Messages
8
ahhh ic what you ment now, I have to do exactly what your saying today as i am getting GPT Pri errors.. sigh....
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
Thanks for the feedback. I've had the SMART errors ever since I installed FreeNAS (iirc), so I think they may be red herrings, I've just dropped on to the box itself and done:

[root@freenas /var/log]# zpool status
pool: storage
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
storage ONLINE 0 0 0
raidz1 ONLINE 0 0 0
gpt/ada0 ONLINE 0 0 0
gpt/ada1 ONLINE 0 0 0
gpt/ada2 ONLINE 0 0 83.4K
gpt/ada3 ONLINE 0 0 0
gpt/ada4 ONLINE 0 0 0

83000 errors!!!! Time to replace the disk methinks!

Thanks again.....

b0redom
 

pallfreeman

Dabbler
Joined
Sep 1, 2011
Messages
38
83000 errors!!!! Time to replace the disk methinks!

Sure, but hang on to the old one. You might find it's perfectly OK after you've re-written it several times, to give the drive's firmware more chance to find the marginal areas on the disk and remap them.
 
B

Bohs Hansen

Guest
Not relevant for your NAS, but before you throw out the disk.

I remember back in my DOS days I saved most of a drive by marking a sector range as "do not read". My bios at the time had that function, but I'm sure that must be doable today as well with some software tool.
With a checktool i found the section that was damaged, or in my case it was a damaged read-head, and marked that. In the end I still had a working 600mb hdd (used to be 850mb). Of course new HDDs cost a lot back then compared to today, but would be a shame to throw it away if it could be utilized in another place.
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
What's worrying is those other Current_Pending_Sector errors. I'll see if they resurface once I've taken FreeNAS down to replace the failing disk. Fortunately I have a bunch of spares.

Cheers....

b0redom
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
Well, what a PITA!

New disk installed, I dropped to the shell and did:

zpool replace <knackered disk> <new disk>

and am now at:

[root@freenas /var/log]# zpool status
pool: storage
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: resilver in progress for 3h47m, 32.13% done, 8h1m to go
config:

NAME STATE READ WRITE CKSUM
storage DEGRADED 0 0 306K
raidz1 DEGRADED 0 0 678K
gpt/ada0 ONLINE 0 0 0 6K resilvered
gpt/ada1 ONLINE 0 0 0 22K resilvered
gpt/ada2 ONLINE 0 0 40.5K 46.7M resilvered
replacing DEGRADED 0 0 4
6378489141270903367 UNAVAIL 0 0 0 was /dev/gpt/ada3
ada3 ONLINE 0 0 0 275G resilvered
gpt/ada4 ONLINE 0 0 0 30.5K resilvered

errors: 312908 data errors, use '-v' for a list

WAAAAAAAHHHH!
 

pallfreeman

Dabbler
Joined
Sep 1, 2011
Messages
38
Ouch.

Is there anything in dmesg relating to I/O errors?

And did you scrub the pool before replacing the drive? I don't think anyone ever told me to do that, it's something I learned from similar painful experiences. ZFS is great at dealing with half-broken drives, but you need to give it a chance.

Ah, serendipity. I've been writing a blurb over the last few days about just this sort of problem. :)
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Was that zpool status and errors: 312908 data errors AFTER it had finished resilvering?

Was /dev/gpt/ada3 the device you replaced?

If so did you do "zpool detach tank1 /dev/gpt/ada3" (replace tank with your pool), and then do another zpool status -v ?

You might be pleasantly surprised....
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
The dmesg filled up /var which wasn't a great start!

I didn't bother to scrub the zpool before replacing the drive.
I just:

Powered down
Swapped out the drive
Booted up
Did a zpool replace 6378489141270903367 ada3
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
It's still resilvering - zpool status tells me that it should be done in about 6.5 hours, at which point I guess I'll need to replace ada2 as well!
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
Sweary sweary rude words!

Since I cleared down /var from 105% full I've not had a single new error. Still 3h56m to do. PLEASE someone tell me that Freenas/ZFS doesn't log AND write out fix/resilvering stuff to /var
 

b0redom

Dabbler
Joined
Sep 2, 2011
Messages
40
Oh my god! I seem to have lost a whole subdirectory structure of about 2TB of ripped TV shows. Sure I can rerip them, but really FFS!

I am rescrubbing the pool now, but what is the correct way of replacing a drive in FreeNAS? I'm getting to the point where I'm considering just binning the lot and starting again.

Looks like a lot of my films may have gone the way of the dodo too :(
 

pallfreeman

Dabbler
Joined
Sep 1, 2011
Messages
38
Um, I'm getting a little confused here, but it looks to me as if you replaced ada3. But ada2 was the drive with the errors in your original post...
 
Status
Not open for further replies.
Top