Replacing a failed disk. Dead drive not showing anywhere

Status
Not open for further replies.

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
One of the hard drives in my array (4x 3TB, which makes a 9TB zfs volume) has failed. It clicks when I turn it on, which sounds like an attempt to start the disks, but fail, and therefore do not get detected in BIOS.

I've tried to follow the Replacing a Failed Drive tutorial on the FreeNAS site, but I don't think it's updated for 9.2.1.5, or the GUI acts differently when one of the disks is offline. It says to set the drive status to offline, or if not available, click the "Replace" button. I don't see anywhere where it lists the fourth drive, so I can't replace it or anything.

Here is a screenshot of what it looks like in the "View Volumes" section of the GUI. I can't do anything other than detach volume "Stuff". If I click "View Disks" it will only show the three disks that are connected as per screenshot. I already have the drive to replace it with, but I don't want to mess something up, so I want to ask you guys first.

Do you guys know what to do replace a failed hard drive? Can I get my volume back? I'm just not sure where to start. I've searched the forums but I can't seem to find someone with the same issue, or I don't know exactly what to search. This is just a personal file server, therefore it's not *too* much of an issue if I lose the whole volume, but obviously I would be upset if I couldn't get it back up and running.

When I run zpool import, Volume Stuff will only list the online drives and not the offline one.

Code:
[root@Zarzob ~]# zpool import                                                   
   pool: Stuff                                                                  
     id: 14668934408475674742                                                   
  state: UNAVAIL                                                                
 status: One or more devices are missing from the system.                       
 action: The pool cannot be imported. Attach the missing                        
        devices and try again.                                                  
   see: http://illumos.org/msg/ZFS-8000-6X                                      
 config:                                                                        
                                                                                
        Stuff                                           UNAVAIL  missing device 
          raidz1-0                                      ONLINE                  
            gptid/61c974cd-d646-11e1-8764-000129ff7024  ONLINE                  
            gptid/6245f1a0-d646-11e1-8764-000129ff7024  ONLINE                  
            gptid/62bba6ae-d646-11e1-8764-000129ff7024  ONLINE                  
                                                                                
        Additional devices are known to be part of this pool, though their      
        exact configuration cannot be determined.


As a followup question, once I get the array up and running again, is there a way to increase the life of my disks? I'm running 4x Seagate Barracuda 3TBs (ST3000DM001), so not using the WD Greens or anything, but this disk only lasted about 1.5 years, so I'm curious if there's anything I can do to not make it die so quickly..
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, you have bigger problems. It's UNAVAIL because your pool has incomplete metadata.

Where did this pool come from? Was it created in FreeNAS?

What's your server's hardware specs?

Don't try to do ANYTHING as you might lose your data if you start willy-nilly experimenting.
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
Welp. That doesn't sound like good news. I'm running an AMD FX990 sabertooth, AMD FX 4100 cpu with 4gb ram( recently upgraded to 16).

I have a feeling it is because I've upgraded freenas a couple of times since creating the volume? I originally made the volume about 1.5 years ago, on which I believe was 8.x, then I've upgraded a couple of times, for when I found that you could add the plex plugin. After upgrading I used the auto import volume button. Not sure if this adds anything to the troubleshooting.

EDIT: To clarify, I didn't "upgrade" it as such. Every time I upgrade, I wrote the new version of FreeNAS version to a different USB drive, so it was a whole new operating system. I have since formatted and used the USB drive that I originally created the volume in.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
What exactly was your pool configuration? Was it a four-disk RAIDZ1? Or was it a three-disk RAIDZ1, to which you later added a disk? And if the latter, was the disk you added the one that failed?
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
To be honest I can't remember if I started on the three-disk array or four. I know I bought the fourth disk at a different time, but I can't remember if I had set up the volume before adding the fourth disk. However if I did start on a three-disk array, then the fourth disk would have been added about a week after the volume was created, and has been running since then for about 1.5 years now.

Come to think of it, judging by your question and the zpool import log, it does look like it was started from a three-disk array, to which I later added a fourth. Does this make the problem worse?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If you started with a three-disk array and then added the fourth disk later, to the best of my knowledge, your pool is unrecoverably lost. I wouldn't do anything drastic (like reformatting all your drives) until someone else confirms this, but I believe it to be the case. Here's why:

When you added the fourth disk, your striped that single disk with your existing array--there is no way in ZFS to add a single disk to an existing array and have redundancy for that new disk. Your pool configuration was one three-disk RAIDZ1 array (what I'll call disks 1, 2, and 3) striped with a single disk (disk 4). This left you with something akin to a RAID 0. If disk 1, 2, or 3 failed, your RAIDZ array would have provided redundancy. However, there was no redundancy for disk 4. If that disk failed (which it sounds like it did), and it had no redundancy (which it didn't), your pool is destroyed. The rest of the disks aren't physically damaged and can be reused, but there's no way to recover your data.

cyberjock's signature has a link to his guide, which explains how this process works in more detail.
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
Oh wow, that sucks. So does that mean I can't recover anything from the disks at all? Here I was thinking I had redundancy for all my drives :( I guess it's not paid software, and I didn't fully research it, so I can't really complain. Your explanation was very detailed and easy to understand though.

Would data recovery help at all, or do I need the full drive worth of data to get anything out of it (ie if anything is corrupt it would make it useless anyway)? I suppose that's quite expensive for non-commercial purposes.

Hopefully somebody can give me some hope, does anybody know of any options I have, if any?

If I do have to reformat all the drives and start over, what are the things I should do to set up warnings for when these types of things happen? Also, 1.5 years seems like a really short amount of time for a non "green" drive to fail (Seagate Barracuda ST3000DM001). Is there anything I can do to prolong the life of the replacement drive, and the other drives in the array?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
danb35 has very nicely explained fundamentals. If it is going to make you feel better, consider that most people who buy 2-disk NAS appliances have them in RAID 0 configuration.

Let me add, that if you are thinking about paying for the recovery, there is a good chance that the surface of your drive is intact or mostly intact and the failed drive can be reassembled to a working condition with new components****. If you are to google it, the key phrase is "Hard Drive Recovery" and not Data Recovery. It is still an expensive route, however recovery companies often give an estimate after receiving the failed drive and then you can decide whether to proceed.

You are playing with chances, but the fact that the drive is not seen by the BIOS seems to indicate a failure of electronics, and not a mechanical failure.

**** In almost all cases, only to have all the data be copied immediately (well..., it takes many hours to transfer 3TB of data...) to a new drive.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I believe cyberjock has had some success in data recovery from failed pools. Your problem had nothing to do with its being free software--if you'd set up the same configuration with iXsystems' paid software, you'd have experienced the same result. I'll give a bit more background so you can avoid making the same mistake again.

Zpools are referred to by FreeNAS as "volumes". A zpool consists of one or more vdevs, and each vdev consists of one or more block devices (usually disks, but they can be files as well). FreeNAS, by design, mostly hides the vdev layer, which is not always a good thing.

A vdev can contain one or more drives. If it contains a single drive, you can add another drive as a mirror. If it contains two or more mirrored disks, you can add additional mirrored disks--for example, you can convert a 2-way mirror into a 3-way mirror (three disks, each with a complete copy of its data). However, you can't add or remove disks from a RAIDZ vdev. The only way to increase the capacity of a RAIDZ vdev is to replace all the disks with larger ones, one at a time. You also can't change the RAIDZ level--for example, change a RAIDZ1 to a RAIDZ2.

A zpool consists of one or more vdevs. If it consists of more than one, data is striped across the vdevs. Once a vdev has been added to a zpool, it can never be removed, and if any vdev fails, the entire zpool fails. As a result, there's simply no safe way to add a single disk to an existing pool. You could add that disk to a new pool, thus giving you two separate storage pools, but that really doesn't seem to be the ZFS way of doing things.

As to the drive failure, the factor that you have the most control over is heat--if the disks are getting too hot, their lifespan will be reduced. You'll know if they're getting too hot by making sure the SMART service is enabled, making sure it can email you with alerts, and setting a sane maximum temperature (40 deg C is frequently recommended here). If the drives are running hot, do something about it--increase airflow, move them apart, put the server in a cooler environment, something. This is the only thing I can think of that will actually prolong your disks' lives. Run SMART diagnostic tests regularly--a short test daily and long test weekly is a common recommendation. Schedule ZFS scrubs regularly--every few weeks, maybe once a month. The SMART tests and scrubs won't prolong the life of your disks, but they may alert you to impending failure.
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
Oh yes, I understand that it could have been the same with other software. I just I didn't get any warning to mention that this drive would not give any redundancy when setting it up, and meant that other software (whether free or paid) may have given me a warning. Sorry, I didn't mean to put FreeNAS down in any way, it was just my ignorance.

Thanks for the explanation. It looks like when I get an array running again I have to do be more careful when setting it up, and set up the warning systems.

As for hard drive recovery, should I try cyberjock, or will any local recovery place essentially do the same thing? I live in New Zealand, so I would have to factor in postage time/cost when dealing with this.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I just thought of something--your Barracuda is a 7200 rpm disk, right? Those tend to use more power and run hotter than slower-spinning disks, which may have contributed to the disk's demise. I can't advise on the recovery--I've never used those services.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
@Zarzob, I think the postage might be a small fraction of the cost... And they do have express rates too.

Any advanced local recovery place might be able to replace the electronics board at 150-200% of the new drive cost and you may luck out.

Recovery from the remaining 3 drives, if at all doable, will give you only those files that did not change after the 4th drive was added 1.5 year ago. That is due to RAID 0 and ZFS design.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, so here's my guess based on what I read:

You added a drive to the pool after creation. You did exactly what I tell people not to do in my guide.. It's slide 28 and 29. You've been the unlucky person this week to make that mistake and may lose all of your data for it. If the disk has broken electronics but is still mechanically sound there's a chance of data recovery. You need to call around to various companies(google for hard drive recovery) and see about recovering a disk's physical media. If they can do it they should just mail you back a working drive with the partition table and ZFS data intact. You put that in your server and the pool will come right online. Expect to pay $1000 to $3000 for this service. And be VERY cautious with who you go with. Do your homework. Many companies will take your money and rip you off. So don't necessarily go with the cheapest because its the cheapest.

There's some companies that sell just the electronic boards. But be warned that many models and makes link the physical formatting on the platters with the firmware on the disk's controller. In those cases you can NOT replace the controller. If you want to go this route there's the risk of breaking the drive or potentially erasing data on the disk that you are trying to recover. So proceed at your own risk or reward.

There are NO ZFS recovery tools. I said this on slide 30 of my presentation. So if you cannot get your broken disk to work there is basically 0% chance of seeing your data again. If you want to contact a data recovery service for ZFS recovery expect to pay something between $20000 to whatever they want to quote. ZFS is an enterprise-class file system/volume manager and they know that businesses will pay(and for some they might do it). You aren't going to talk them down and you probably aren't going to get much data back since approximately 1/4 of your data is on the broken disk.

I do offer data recovery services, but they won't do you a bit of good in this case. You're pool is UNAVAIL so anything I'd do would either take 100s of hours for fragments of data. That's not my gig and I don't take recovery jobs that I don't think have a reasonable chance of success. Sorry but you don't fall into the category where I'd be willing to take on your pool's problems.

Good luck.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh yes, I understand that it could have been the same with other software. I just I didn't get any warning to mention that this drive would not give any redundancy when setting it up, and meant that other software (whether free or paid) may have given me a warning. Sorry, I didn't mean to put FreeNAS down in any way, it was just my ignorance.

The expectation is that the administrator will fully understand what he/she is doing and all of the potential consequences. You can't protect people from everything and there's plenty of cases where you might WANT a RAID0.
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
I'm not sure that I'd be willing to pay $1000 to $3000 for recovery of mostly personal storage (thankfully this is not in a commercial environment), considering the drive is worth one tenth of that, but I'll ask around just in case. It is quite clear that I did not do my homework when I started using FreeNAS. I know a fair bit about IT but this is one field that I had no idea, as I had previously never set up any type of RAID array. I'll be sure to read your slide for when I set up the array next time, or get my current array up and running - obviously I need to revise how it is set up.

Thanks for the detailed response.

The expectation is that the administrator will fully understand what he/she is doing and all of the potential consequences. You can't protect people from everything and there's plenty of cases where you might WANT a RAID0.

Yes I understand that file servers, particularly of this size are generally intended for commercial purposes. I didn't put too much thought into it because it's just for personal storage. On the bright side, next time I'll have a lot more knowledge! :)
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
@Zarzob, file storage servers of your size are now a commonplace in households of some rich countries. Discounting downloaders, people place their digital media in on-line libraries (200 Blu-rays needs around 6TB) or they record shows from satellite or cable (and keep because they can be), etc.

The above is replaceable. On the other hand, over a weekend outing a family can easily generate one thousand megapixel pictures and record in HD video tens of gigabytes. Most of that would never be watched. Very few have time to trim down and prepare the best of. Everything is kept. And there are irreplaceable priceless gems there. Pictures of those who passed away, first baby steps, graduation memories, you name it! 30k tag makes people cry, and suddenly 1-3k is small in comparison..., just for personal storage...
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
You have a very good point there, I was just thinking the same thing. I am running low on money right now, and to be honest, if I can't even remember what most of the irreplaceable data on the drive is, then it's likely that I won't ever feel lost without it. I have been advised that it is between $450 and $750 +GST (NZD) to recover because it's non-commercial (if they see zfs configuration will they change their mind?) , and while I think that is pretty cheap, if it leans towards the more expensive option I'll be reluctant to go for it. I think I'm leaning towards just letting it all go and starting over. After all, if I don't pay for data recovery, then I may be able to use that money to buy parts for a more robust server.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
It is your data. We cannot estimate its value.

I am imagining that a recovery company might always look at your drive contents - insert any valid conspiracy theory here. However, for the purpose of the recovery the copy is made sector to sector and is filesystem agnostic (at least some white papers I have read claimed so).

Why would they look? They just might want to be up to date with the current usage patterns, tools being used to manage hard drives, filesystems, filesystem versions, etc. Consequently, if they see ZFS, ext4 or btrfs signatures, they might consider investing in developing recovery tools for those filesystems, etc.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
They could easily see your partition table(and it will hint that it's ZFS). But they wouldn't be able to gleam much from your drive without serious forensic analysis. If you had compression enabled that's a hindrance, since it was it's own vdev, that's a hindrance, the different block sizes also are a hindrance. If you have files in plain-text and compression wasn't used they may be able to read it. But when it comes to getting large quantities of data it'll be very limited in what they can actually get off the drive.
 

Zarzob

Dabbler
Joined
Jun 16, 2014
Messages
13
Going to be reading through your guide over the next few days to make better decisions for the future. Thanks for taking the time to create such a detailed guide! And of course thanks for all the help in this thread.
 
Status
Not open for further replies.
Top