Email when RAID array degrades?

Status
Not open for further replies.

Lancsrick

Dabbler
Joined
Sep 2, 2012
Messages
11
I've done a bit of searching on this and found the threads on regular status emails, but that's not quite what I'm after. I also have a question on support for hardware RAID.

Having recently had an issue with my RAID1 array going degraded (one drive developed SMART errors), I was a bit worried since I only found out about it by chance. Is there a way to make sure FreeNAS (9.1) emails me as soon as any issue with RAID performance occurs?

My intention is to use a JMB363 controller on an Atom motherboard, as I understand FreeNAS supports that hardware controller to maintain a RAID array. If not, then I'll use the ZFS software mirroring.

Any help on either question would be greatly appreciated!

Cheers.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
If not, then I'll use the ZFS software mirroring.
ZFS mirror is the better choice. Gmirror would superior as well.

Having recently had an issue with my RAID1 array going degraded (one drive developed SMART errors), I was a bit worried since I only found out about it by chance.
Don't use the RAID controller and let FreeNAS access the drives directly. Then simply configure smartd + email and you will be emailed about any SMART errors any of the drives develop.
 

Lancsrick

Dabbler
Joined
Sep 2, 2012
Messages
11
Great, thanks paeloN.

Reading up on configuring smartd, it seems like it's only looking at temperatures? (Services>SMART). I've no doubt I'm missing something here, but if you could point me in the right direction I'd be really grateful.

Cheers!
 

Lancsrick

Dabbler
Joined
Sep 2, 2012
Messages
11
Ouch, software ZFS mirror has dropped my transfer speeds from 65MB/s to around 25MB/s. Atom D510 showing its limitations! Hopefully reading won't be as slow.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Reading up on configuring smartd, it seems like it's only looking at temperatures? (Services>SMART). I've no doubt I'm missing something here, but if you could point me in the right direction I'd be really grateful.
To make sure that smartd warns you also of other problems with the disk, add -a as S.M.A.R.T. extra options on the View disks screen.
http://smartmontools.sourceforge.net/man/smartd.conf.5.html:
-a
Equivalent to turning on all of the following Directives: '-H' to check the SMART health status, '-f' to report failures of Usage (rather than Prefail) Attributes, '-t' to track changes in both Prefailure and Usage Attributes, '-l error' to report increases in the number of ATA errors, '-l selftest' to report increases in the number of Self-Test Log errors, '-l selfteststs' to report changes of Self-Test execution status, '-C 197' to report nonzero values of the current pending sector count, and '-U 198' to report nonzero values of the offline pending sector count.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I haven't needed to add -a to get non-zero current pending sector count or non-zero values of offline pending sector count. How did you come to this conclusion?
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
By reading smard source code. Documentation states "Note that -a is the default for ATA devices. If none of these other Directives is given, then -a is assumed.". However, the source code shows that enabling any temperature monitoring will also prevent the default -a. See: http://sourceforge.net/apps/trac/smartmontools/browser/trunk/smartmontools/smartd.cpp#L4178. The if applies the default -a only if all the cfg options are zero, including tempdiff, tempinfo and tempcrit.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ah, you also submitted https://bugs.freenas.org/issues/2537

I'm very very confused on this whole thing because I do have temperature monitoring enabled and I have gotten high temp alarms as well as current pending sector and offline uncorrectable in the last 10 days from 9.1(and then 9.1.1). I know because my server texts me and before I did my resilvering last week every time I rebooted the server I'd immediately get an email because both current pending sector and offline uncorrectable were non-zero. Also, my server, during the hottest part of the day, sometimes gives me a text because it hits the 37C setting I've assigned to it.

I did just reply to that bug report too so you can see my info.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Interesting. When setting up FreeNAS I checked the source code and discovered what I mentioned above. I did not have a failing disk to test it, so to be safe I added -a to all my disks. I definitely can't hurt.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
Ah, so that's you on bugs.freenas :). I'll continue the discussion there.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes. The tricked out name is because iXsystems decided to let everyone see the full names instead of usernames and I think it is BS to do that without fair warning. And they haven't decided to fix it despite the issue being on their log for 2 weeks as critical. So "Bite Off" is me.
 

Lancsrick

Dabbler
Joined
Sep 2, 2012
Messages
11
Thanks for that. So...

1) Add "-a" to the View Disks
2) enable smartd to email from the SERVICES > SMART menu
3) Do I need to put in a scheduled check too?

Thanks for all the help guys!
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
If someone wants to follow the technical details continue here: https://bugs.freenas.org/issues/2537
I think I now understand why cyberjock got some of the warning emails, but I still believe that without -a smartd is not checking everything it is capable to check (that is, if you enable any of the temperature checks).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
1. The "-a" isn't necessary per my own testing. But there's some confusion as to why mine works. It shouldn't based on the information available, but it does work and I verified that -a is not part of my smartd.conf file.

2. Yes.

3. You can. If you do make sure that the long test is never ever performed at the same time as a scrub or resilver. It's like "crossing the streams" on Ghostbusters. Bad stuff has happened to people in the past and the reason why is not understood. I do scrubs on 1st and 15th and long tests on 7th and 21st. If you choose to do scrubs on a particular day of the week you MUST follow the same method for long tests. Short tests can be done anytime. But keep in mind that only 20 or so test results will be stored on the disks, so if you do a short test every hour you won't have much of a history. I don't do short tests, but I wouldn't recommend you do them any more frequently than daily.
 

Lancsrick

Dabbler
Joined
Sep 2, 2012
Messages
11
Cheers. Presumably if I don't do (3) then it will never actually check via smartd to know if there's a problem?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You will still have the regular smart monitoring, but you won't have the tests. In all honesty, there's alot of mixed views, opinions, and facts on how useful the tests are. Me personally, of all of the disks I've had fail in the last 5 years I always first identified a failing disk by smart monitoring and not by any test result. It really is a personal choice. And if you choose not to run any smart tests at all and just use smart monitoring you'd probably be safe.

Far more important things to do "right" is to go with ECC RAM, use hardware that allows smart monitoring, and using a mirror, RAIDZ2 or RAIDZ3 vdev.
 

Lancsrick

Dabbler
Joined
Sep 2, 2012
Messages
11
Thanks, I've done points 1 & 2 then. I don't want all the fancy diagnostics, it's just that my last RAID1 array (hardware) went down to one disk and I only found out by chance! Made me realise that redundancy without notification is a bit like a boat without a sail!
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Ouch, software ZFS mirror has dropped my transfer speeds from 65MB/s to around 25MB/s. Atom D510 showing its limitations!
ZFS mirror doesn't have overhead like that. You either need to do some CIFS? tuning or there is some other underlying issue.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What I would give for a crash course in git stuff... Being that I have roughly 24 hours a day of complete boredom it would be awesome if I could actually merge stuff back into FreeNAS. I tried to figure it out myself, but after about 10 hours I'm left with a cubic butt-ton of questions, acronyms I'm not familiar with, and not much understanding.
 
Status
Not open for further replies.
Top