A very noob question about checksum errors

Status
Not open for further replies.

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11
I apologize and have tried to read every thread appropriate but still head scratching.

I have a month old Freenas 9.3 running and slowly figuring things out. Not able to update PLEX without uninstalling and re-installing but that's another story.

My problem is that 1 of my volumes (2T + 2T + 3T (gave up 1 T np) is now bringing up checksum errors.

Scrub

Status: Completed

Errors: 0 Repaired: 396K Date: Sun Aug 16 02:26:22 2015

raidz1-0
ada4p2 0 0 4 ONLINE
ada3p2 0 0 4 ONLINE
ada5p2 0 0 8 ONLINE

My other volume of 3 @ 4T are error free.

This is a simple home setup for videos + photos etc. so I'm not losing any sleep but would like fix the problem if possible.

Can I simple shut down FreeNas and check the drives out separately on another machine or is it odd all 3 drives are showing checksums meaning mb problems or cables?

Thank you for any help and keep in mind that I will probably just toast the drives and replace instead of tearing my hair out...
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
There could be a lot of reasons why this is happening.

First off, please post your system specs and volume configuration so we know what we're dealing with.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Indeed. The fact that all drives in the vdev are checksumming makes me also want to see the detailed specs. Mobo, RAM, CPU, etc.
 

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11
Thank you very much for the replies.

Build FreeNAS-9.3-STABLE-201506292332
Platform AMD A6-6400K APU with Radeon(tm) HD Graphics
Memory 15530MB
System Time Mon Aug 17 20:29:09 EDT 2015
Uptime 8:29PM up 4 days, 5:29, 0 users
Load Average 0.20, 0.10, 0.02
 

Attachments

  • NasScreenShot1.jpg
    NasScreenShot1.jpg
    110.3 KB · Views: 171
  • NasScreenShot2.jpg
    NasScreenShot2.jpg
    112.7 KB · Views: 172

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Thank you very much for the replies.

Build FreeNAS-9.3-STABLE-201506292332
Platform AMD A6-6400K APU with Radeon(tm) HD Graphics
Memory 15530MB
System Time Mon Aug 17 20:29:09 EDT 2015
Uptime 8:29PM up 4 days, 5:29, 0 users
Load Average 0.20, 0.10, 0.02
OK, not recommended hardware, but OK, could be a lot worse.

Could you show us, in "code" brackets (if you dont know what that is, use pastebin.com please), the results of the following:

smartctl -x -qnoserial /dev/ada3
smartctl -x -qnoserial /dev/ada4
smartctl -x -qnoserial /dev/ada5
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
also while you're at it, looks like you have some interesting set of devices hooked into the USB bus. can we see a "camcontrol devlist"?
 

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11
root@freenas ~]# camcontrol devlist
<WDC WD40EFRX-68WT0N0 82.00A82> at scbus0 target 0 lun 0 (pass0,ada0)
<WDC WD40EFRX-68WT0N0 82.00A82> at scbus1 target 0 lun 0 (pass1,ada1)
<WDC WD40EFRX-68WT0N0 82.00A82> at scbus2 target 0 lun 0 (pass2,ada2)
<WDC WD2001FASS-00U0B0 01.00101> at scbus3 target 0 lun 0 (pass3,ada3)
<WDC WD20EFRX-68EUZN0 80.00A80> at scbus4 target 0 lun 0 (pass4,ada4)
<ST3000DM001-9YN166 CC9C> at scbus5 target 0 lun 0 (pass5,ada5)
<ST500LX003-1AC15G SM12> at scbus7 target 0 lun 0 (pass6,ada6)
<ADATA USB Flash Drive 1100> at scbus9 target 0 lun 0 (pass7,da0)
<Generic STORAGE DEVICE-A 9727> at scbus10 target 0 lun 0 (pass8,da1)
<Generic STORAGE DEVICE-A 9727> at scbus10 target 0 lun 1 (pass9,da2)
<Generic STORAGE DEVICE-A 9727> at scbus10 target 0 lun 2 (pass10,da3)
<Generic STORAGE DEVICE-A 9727> at scbus10 target 0 lun 3 (pass11,da4)
[root@freenas ~]#
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
OK well all of those pastebins are screwed up. I can't see the full smartctl output. :) OK, we're going to have to figure something out. Do you know how to start up the SSH service, and sign in with a client like bitvise or putty? If not, then worst comes to worst, maybe one of us can teamviewer session you, and show you how it's done. You seem earnest enough. If you *DO* know how to do that yourself, you will find it is much easier to cut and paste from those terminals.


also, your camcontrol has 4 mystery devices there at the end...what's the deal,, you have some kind of a card-reader or something?
 

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11
Yes I have the motherboard attached to a 5 1/4" bay card reader, didn't think it would make a difference either way.

I am not ignoring your reply I'm just muddling thru the SSH stuff, thought I should know it anyways.
Can't you just sent the console output to a file like dos? eg. smartctl -x -qnoserial /dev/ada3 >ada3info.txt

Anyways I'll be back soon.

BTW, thanks again, really enjoying FreeNas. (Almost don't miss my WHSv1)...
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yes, you can do that ;)
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Yes I have the motherboard attached to a 5 1/4" bay card reader, didn't think it would make a difference either way.
It doesn't make a difference. I was academically curious what it was.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
OK, I'll take a crack at this.

Good news up front: None of these drives appears to be self-destructing as we speak.

Bad news: There's nothing good about this pool. For example:
  • You have mixed three drives of completely different types into one vdev. While not "illegal" per se, it's certainly a shibboleth. :)
  • One of them is a Seagate Barracuda. :)
  • Like a patient who only brushes their teeth the week before they go to the dentist, some/all of these drives have only recently had a short S.M.A.R.T. test regimen put on them---looks like they spent most of their life completely untested and unmaintained. Also, I don't see any evidence of proper long S.M.A.R.T. tests.
  • There are unusual "Load Cycle" counts on these drives. One has 4000, one has 24000, and one (God help him) has 205000. All of these numbers are wrong for properly maintained NAS drives. I'm not sure, actually, how one gets a WD Black up to 205000---usually load cycles that high require taking a green drive and misusing it in an external enclosure. So it looks like you repurposed a hodge-podge of drives you had laying around that had previous lives as something else for your NAS.
  • Some of the drives have lifetime high temps that are too high. Respectly, 45C, 54C, and 66C. The 45C is borderline. The other two numbers are not borderline at all. That means that two of the three drives have led part of their existences at a temperature that if my drives every achieved it, I would throw them in the garbage can.
  • The WD Black has over 40000 hours on it. That is way past its useful, reliable, life, and this drive needs a nursing home by any standard. If it were a person, it'd be 81 years old.
So, my diagnosis is that this is a hodge-podge pool of drives that have not been generally well-cared for, one of which has already lived longer than it should have. (The WD Red looks to be the best cared for of the three, but it still has had a much, much worse time in life than any drive I have ever owned). Thus, I would say, you are experiencing checksum errors because (one or more of) your drives have generally seen much better days, and one of them is elderly.

The correct procedure is to get a proper vdev of new, well-cared for, NAS drives, and to take these guys out of service. But I assume you don't want to do that. So:

What I would do if I were you, is I would manually start a LONG smart test on each drive:

smartctl -t long /dev/adaN where N is the drive number, and these take several hours to complete. Then I would rerun the smartctl -x commands, to see if anything new has been revealed by the long test.

If not, then I would maintain the pool/drives as best as I could, and I would consider my pool to be a ticking timebomb, and have backups ready at all times.
 

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11
DrKK and friends, your helpful response to my original question overwhelms me. Can't thank you enough for the time taken for the professional responses.

You are 100% right about the 3 drives as they are probably from my original WHS v1 system from years past.

I will run a long smart test on the 3 drives while I search for money under the couch cushions...

After reading your excellent response I'm think of adding 2 more RED WD 4T to other vol (currently 3 using raidz1) and switching to raidz2.

BTW, I had to research "shibboleth".
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
Just so you know, you can't live convert your existing 3 drive RAIDZ1 to a 5 drive RAIDZ2. You'll have to move your data off the drives, and recreate the vdev.

Though it's not required, it's recommended that you follow get (N+P) disks, where N=2,4,6 and P=parity (1,2,3). In that case, 6 drives (4+2) is ideal for RAIDZ2. Though I know that formula would tell you that 5 disks is good for RAIDZ1, please don't do that. Since RAIDZ1 is basically dead on such large disks (http://www.zdnet.com/article/why-raid-5-stops-working-in-2009/), a 5 disk RAIDZ2 is much, much better than a 5 disk RAIDZ1, but a 6 disk RAIDZ2 is better still.

In the future, periodic short and long SMART tests are a must. Cyberjock has a great post about scheduling these: https://forums.freenas.org/index.php?threads/scrub-and-smart-testing-schedules.20108/

One thing to keep in mind with SMART is a "failed test" will tell you if a drive has failed, not if it's failing. You need to actually read the SMART data to understand if it's failing. Here's a quick-n-dirty SMART primer of some things you want to look for that might indicate a problem: http://hetmanrecovery.com/recovery_news/smart-parameters-and-early-signs-of-a-failing-hard-disk.htm

For NAS drives, you also want to look at:
  • 193 Load_Cycle_Count - This tells you how many load cycles the drive has. In normal NAS operation, this should be very low. The drives in my NAS, which I've had for close to two years, show counts of less than 50.
  • Current Temperature and Lifetime Min/Max Temperature - Hard drives are sensitive to temperature. Excessive high temperature means excessive premature wear on the drive. Peak temperature matters less than sustained temperature, but you do not want to get too hot in any case. Drives are also sensitive to too cold of temperature as well. The ideal range for most hard drives is between 30-40C. Sustained temperature in excess of 45C is very bad. Peak temperature should never exceed 60C, but, honestly, if you ever reach 45C in your NAS, that's a huge red flag. If you want to learn more, this landmark Google Study is a great read with lots of good charts: https://static.googleusercontent.com/media/research.google.com/en//archive/disk_failures.pdf
Let us know if you need anything else!
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I agree with all you said except that the power of two rule is useless for performance if you use compression (but still valid for overhead) and you likely use it so don't bother too much with that. RAID-Z2 will be far far better than RAID-Z1 even with 5 drives ;)
 
Last edited:

Raycaster

Dabbler
Joined
Aug 6, 2015
Messages
11
I knew about needing to start from scratch with the RAIDZ2 conversion. (changed mind a few times initially with hard drives)

You answered my next 2 topics right on schedual, smart results/maintenance + Raidz2 details. thx.

Unfortunately I jumped on the hodgepodge bandwagon to put a Freenas system together that many sites encourage.
Looking back I would have definitely gone the extra mile and setup with suggested hardware.

The one thing I never mentioned is that whenever I venture down to visit the freenas machine in the basement I notice the screen is usually filled with errors and I have to hit return to bring up the menu. After I upgrade the drives I'll come back and bug if the same situation happens.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
The one thing I never mentioned is that whenever I venture down to visit the freenas machine in the basement I notice the screen is usually filled with errors and I have to hit return to bring up the menu.
Mine is the same. Instead of venturing into the basement, just enable "Show console messages in the footer" and click on the text box. The content is the same as what gets sent to the console. You should find that it's mostly informational messages that can be safely ignored. Anything important should show up in your inbox if you have notification emails working properly - which is essential to proper care and feeding of a FreeNAS box.
 
Status
Not open for further replies.
Top