Multiple drives all failing

Status
Not open for further replies.

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
Recently I've been transferring about 1.5TB of data to my NAS which is about 40% full. It has 8x 2TB Segate drives that are all 2.5 years old. (just outside of their 2 year warrantee) I have not yet had an issue with the drives and earlier this week I had a smart email sent by my NAS that said smartd daemon was unable to open ada5, s/n 5YD7XXX. I happen to have a spare 2TB WD red drive sitting around, so I went through the re-slivering process with no issues in just a few hours. Everything seemed fine and I resumed my transfer. I woke up to find two more "failures" first , ada1, "device not capable of smart check" and then "ada1, unable to open device" followed by "ada4 unable to open device"

So now my RAIDz2 is in serious danger of losing data and I shut everything down. I popped the first drive that went bad into my mac and had no trouble formatting and writing data to it... All 3 drives that are reporting issues are from the 5YD7xxx series of serial numbers. All drives were bought at the same time from amazon. I am also unable to run any smart commands to the drives as they typically become offline or unavailable when connected to the RAID.

Not really sure what I should do next here as another drive could easily fail. As far as I can guess, either something else is going wrong with the system to make it think the drives are failing or 3 drives just decided to die in the last 48 hours. They have been running 24/7 for 2.5 years connected to a UPS. No strange noises or clicking etc. Temps are barley warm.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't know what you should do either. Running RAID with ZFS is just asking the storage gods to smite you.

There is a possibility you are incredibly unlucky with disks, or your configuration is somehow responsible. To be blunt, the fact that you admit to using RAID makes me wonder what other "thumbrules" you ignored when setting up your server. There's a lot of things that you can do that will have dire consequences later. Using RAID with ZFS is one of those 'initially works great until it ultimately snacks on your pool'.

Can you post your hardware specs and FreeNAS version since you left that stuff out?
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
I think he may just be saying RAID as in short for the RAIDZ2 he was running rather than meaning that he is running it through a hardware raid controller.

Are you running this RAIDZ2 as a software RAID and letting ZFS have direct access to all of your disks, or are you passing them through a hardware RAID Controller of some sort?

If you're doing the latter it is a highly unrecommended configuration, and should be avoided at all costs. If you are running your zpool through a hardware RAID controller than I would highly advise you start getting that data off of there as soon as you can unless you have a suitable backup and then recreating the zpool properly using software RAID.

If you are already using software RAID I'd just advise to keep going with the resilvering process, although if you have a suitable backup that you can restore from (hint: you should) it may be easier for you to recreate the RAID with the new disks and just restore the data to a fresh zpool.
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
I am running a standard software raid RAIDz2. A single vdev with 8x 2 tb disks. Nothing out of the ordinary here. It's been working find for nearly 3 years. The only other thing I did recently was upgrade my zpool. Which was at v28, to the latest version (no version number shown).
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Is it possible that as a result of all that disk activity, you're having a thermal problem? i.e., your thermal/ventilation condition was dodgy to begin with, but since you weren't doing much, it wasn't faulting. But now, with all the transfers, it heats up, and you start having problems? Whenever we have had people dropping disks and pools all at once under load, it's often been a thermal problem.
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
Heat could be an issue. I will be removing the server from the closet it is in for any further testing, I'll bring my IR thermometer along with me. I'm wondering if leaving my server off over night will force it to recheck the smart status and possibly change its mind. As soon as the drives go offline I have no way to see what errors they are actually experiencing (that I know of). So short of turning on the NAS and throwing in brand new drives, I have no idea what the best plan of action would be.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Heat could be an issue. I will be removing the server from the closet it is in for any further testing,

/facepalm
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
/facepalm
The NAS is in a closet with a few switches and other network gear in the finished basement of an air-conditioned house. It has never had any temp issues, in-fact everything is cool or cold to the touch when running. It was also built to be cool and quiet.
nas1.jpg
nas2.jpg


Currently I have removed the server from the closet, powered down the drives, and I am running memtest. Following that, I plan on booting to ubuntu and seeing if I can manually read the smart data from the drives outside of the freenas OS. From there we will see what can be done.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You can't get the SMART data from FreeNAS itself?
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
You can't get the SMART data from FreeNAS itself?
Negative. if I ask for the smart data using smartctl –a /dev/ada1 for example it will ask me to specify type with -d, so I add -d ata, which it responds that it can't find the device. It did this for all 3 drives now.

The emails I got for the 2nd and 3rd drives that failed were, in order,

ada1 not capable of SMART self-check" followed shortly by
"ada1 failed to read SMART attribute data" (Same as first drive that failed ada5) And then
"ada1 Read SMART self-test Log failed", and finally
"ada1 unable to open device"

A few hours later I got an immediate "ada4 unable to open device"
 

wlee

Cadet
Joined
Aug 11, 2014
Messages
6
Did you run sea tools on failed seagate drive? What's your drive model? Desktop hdd ain't designed to run 24/7.

Sent using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ug. Then you have bigger problems because not being able to monitor SMART from FreeNAS is basically playing russian roulette. SMART is your primary indicator of a failing disk with FreeNAS. Without it your first indicator is going to be a degraded pool- and that's already a nasty place to be.
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
The drives are Seagate Barracuda Green drives ST2000DL003. I did not run any tools on the first drive that went. I did reformat it and I was able to load 1.5TB onto it with no issues. When I built the NAS 2.5 years ago HDDs were still pretty pricey. If I have to replace them all gradually with WD red drives then thats what I have to do I guess, but the first drive that I pulled appears to be fine. Can/should I still run the Seatools on the drive or any of the other drives? What would I be looking for?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You also still haven't posted all of your hardware details like I asked above.
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
The NAS is off currently. But here are the specs I have from last week before the drives started failing.
NASspecs.jpg
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's still not what I need. Motherboard model, what controller your hard drives are connected to, model of hard drives, RAM models, that stuff. I can see your PSU model from the picture and you've already provided your hard drive info.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
That's still not what I need. Motherboard model, what controller your hard drives are connected to, model of hard drives, RAM models, that stuff. I can see your PSU model from the picture and you've already provided your hard drive info.
Lol, sounds like someone didn't follow hardware suggestions and resisting your request for specs!
From the pic above, mobo looks like some Asus board.
 

pcmofo

Explorer
Joined
Mar 2, 2012
Messages
98
Sorry about that. All my parts came from newegg.

CORSAIR XMS3 8GB (2 x 4GB) 240-Pin DDR3 SDRAM DDR3 1333 (PC3 10600) Desktop Memory Model CMX8GX3M2A1333C9
http://www.newegg.com/Product/Product.aspx?Item=N82E16820145315

Intel Core i3-2100 Sandy Bridge Dual-Core 3.1GHz LGA 1155 65W Desktop Processor Intel HD Graphics 2000 BX80623I32100
http://www.newegg.com/Product/Product.aspx?Item=N82E16819115078

ASUS P8Z68-V PRO/GEN3 LGA 1155 Intel Z68 HDMI SATA 6Gb/s USB 3.0 ATX Intel Motherboard with UEFI BIOS
http://www.newegg.com/Product/Product.aspx?Item=N82E16813131790
 

wlee

Cadet
Joined
Aug 11, 2014
Messages
6
SeaTools let you check smart and run various tests. I don't think tests are destructive except advanced one which will prompt you.

BTW is basement humid? If yes not good for drives.

Sent using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ah. Desktop components. Yeah, we don't use or recommend desktop components because they often don't work quite as well as you'd like. Not surprisingly, you are having unexplainable problems.
 
Status
Not open for further replies.
Top