Please help! Started up FreeNAS, suddenly voluime storage (ZFS) status unknown?!

Status
Not open for further replies.

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
And I redid it once more to make sure, but same thing:
Let's see if this gives us anything different:
Code:
time zpool import -fFX storage
A -T includes a -X. Depending on the logic in -X we might see a different result. Also, I'll address some of your other questions a bit later.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Let's see if this gives us anything different:
Code:
time zpool import -fFX storage
A -T includes a -X. Depending on the logic in -X we might see a different result. Also, I'll address some of your other questions a bit later.

Running now. Guess this will indeed take a few hours.
No problem, thanks...
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
With the -T command we don't have to zero out the uberblocks. We can just tell it up to which txg to use. This achieves the same end.

Your labels are consistent. Or they are consistent now at least.

Another guess; wouldn't booting up with an Oracle Solaris 11.1 Live Media for x86 USB/CD and run commands do anything?
Probably not, but couldn't hurt to ask..
Possibly. I was going to suggest it if these other commands fail. The zdb -e -bcsvL in particular I was thinking of, but I'd try importing first. There's one more -T import I'd like you to try first if the current import fails though.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Hhawk,

After looking at some additional info that PaleoN shared with me, I think the method I was hoping to use is clearly more complicated than I planned for. I still think it can be done, but it would involve more time than I have and I'm in a lot of pain at the moment and it's making things really unpleasant for me.

I was hoping to generate a list with all the file names as well as a list of the pointers to all the parts of each file, and ultimately a way to go to each of those pointers and assemble the pieces of your files. I feel bad because I hate to see you lose all of your pictures and files. I'll see how things go and I can probably still get a list of your files if I can have a little more time.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
No worries ProtoSD.

Well finally the command was done. It's taking longer and longer every time (or it feels like it at least):

Code:
root@mfsbsd:/root # time zpool import -fFX storage
cannot import 'storage': one or more devices is currently unavailable
0.000u 2882.773s 7:37:16.48 10.5%       111+2652k 835+0io 0pf+0w
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Well I will be off for a few hours, I will run the same command once more, since I am not here anyways.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Well I am back last command is still running though.

However I needed to turn of the laptop, so I cannot see the outcome (probably again the same as above). How do I check the result / outcome?

It is still busy though, I can see it by doing:

Code:
 gstat -I1s -f 'a?da[0-9]+$'


And what should I do after this step? And what commands?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you SSH'd, I don't think you can. I always do important commands locally.
 

purduephotog

Explorer
Joined
Jan 14, 2013
Messages
73
I've been following this thread with both fear and trepidation. I've lost raids before, deployed them, and always been able to recover from pretty much every failure.

I've never seen anything like this and I was just about to deploy my own, using nearly the same config (although I have been playing around with drives, etc, a since I can). Today I ordered backblaze and will run it n parallel with this solution . I will hold off deploying this at work for the labs until I can understand what went wrong... which might be a bit of too little too late.

Good luck and thank you for keeping this saga going. Recovery is critical... and seeing how it is done is just as important.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Seems I am never lucky. I downloaded the stupid live media cd for Oracle Solaris 11 and burned it. But every single time it reboots after the message "probing devices".
Bleh I am really getting sick and tired of this crap. Been spending a whole week on it already and nothing works. The only thing I did manage so far is decreasing the life expectancy of my harddrives. Which obviously does not matter, because after this experience I am never, ever, ever going to use crapzfs again.

I might as well had used RAID 0 with 6 disks, if that crashes I would have exactly the same result as now; nothing, nada...
I really do not understand that ZFS can be considered "safe". If I cannot even restore anything.

Since I am still nowhere and my wife is already asking me questions daily, I will tell her tomorrow that our wedding pictures are gone (and the rest as well). Probably I will be in a world of trouble, but this is also going nowhere.
With almost every other file system I ever have worked with, there was always some part restorable.

And now, since nobody responded anymore, I wanted to give Oracle Solaris a go, but apparently that doesn't work either. Nice...
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Sorry for my previous post, but I didn't mean any offence, especially to all the people who provided help so far. Currently running a few older commands.

Maybe I will run:

Code:
zpool import -R /mnt -FX storage


Not much to lose now anyways... Right?
 

purduephotog

Explorer
Joined
Jan 14, 2013
Messages
73
I've had some bad luck with Solaris 11 and hardware restrictions. In theory, shouldn't you be able to move the card and drives to any computer?
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Seems I am never lucky. I downloaded the stupid live media cd for Oracle Solaris 11 and burned it. But every single time it reboots after the message "probing devices".
Some other illumos/Solaris base distros to check out, The illumos Family. I'd lean toward Nexenta and SmartOS without knowing anything about them though.

Not much to lose now anyways... Right?
It's likely possible to make things worse, but essentially you're right. What OS are you running this under? Make sure to let it finish before trying anything else.

This may be a stupid suggestion and if I understood more about ZFS I might not make it, but I'd try it if it was my pool. If you read the first post in this thread, [thread=11825]Second drive failed while replacing another[/thread]. He was also seeing I/O error on import. Keep in mind he literally had I/O errors which you don't seem to. Perhaps there is a bug in zpool import and if it hits an error in that spot it doesn't retry on a redundant disk. My suggestion:
  • Shutdown & disconnect da0.
  • Boot up & run:
    Code:
    zpool import
    
    zpool import storage
  • Shutdown, connect da0 and disconnect da1.
  • Boot up & run:
    Code:
    zpool import
    
    zpool import storage
  • Shutdown, connect da1 and disconnect da2.
  • Boot up & run:
    Code:
    zpool import
    
    zpool import storage
  • Etc... One disk at a time for each disk.
You want to look for zpool import saying anything different, besides the rotating missing disk, and any other error messages on actual import. I would then repeat the procedure with all the permutations with 2 disks disconnected.

The other -T import I want you to try is:
Code:
zpool import -T 735242 storage

time zdb -e -bcsvL storage
Both of those you can run with vfs.zfs.recover being 0, the default.

Since I am still nowhere and my wife is already asking me questions daily, I will tell her tomorrow that our wedding pictures are gone (and the rest as well). Probably I will be in a world of trouble, but this is also going nowhere.
As far as I'm concerned you could have told your wife the truth as you see it last week. In fact you probably should tell her now. Optionally, you may want to tell her that both ProtoSD and I both honestly believe at least some of your data is still accessible. However, it's quite complex and will take some time to get something together.



I will hold off deploying this at work for the labs until I can understand what went wrong...
Feel free to let the rest of us know. I sure as hell don't.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
It still bugs the crap out of me that it thinks there's a device missing. If that could be identified it might add some light to the situation.

Like I said last night, I think it's still possible to scrape stuff off, but I can barely spend 5 minutes before the pain is so bad I want to puke. So I don't want to make any promises I can't keep. It depends on how quickly you want to give up and reuse your disks. You've been a real sport about being patient AND listening to our advice, including trying a new controller. If we didn't think you were serious, I don't think either of us would have spent the time we have trying to help if you hadn't shown the cooperation you did. All of us feel bad about your loss. I'm not giving up on the script, I know there will be another chance to use it, but I just can't guess how long I need to have something ready to test.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm with paleoN that you should try doing an import with each disk missing. That's what I was trying to recommend back at the end of the troubleshooting steps I provided but other people had a few ideas. Since you have a RAIDZ2 I'm curious what would happen if you removed 2 disks and tried to do a regular import command. If something is very very wrong it will tell you that you have insufficient replicas. It's very bizarre that the imports seem to think you are missing a disk(or does it think you are missing so many disk that redundancy can't even recover?). As I've said before, the most common causes for data loss based on forum posts are:

1. No UPS and a loss of power(or write cache enabled on a hardware RAID controller with no BBU and a loss of power)
2. RAM went bad
3. Improper shutdown/freezing/kernel panics(kinda similar to #1)
4. User error(for instance, pulling the wrong hard drive and losing all redundancy plus 1 more disk or adding a single disk to a RAIDZ2 and having the single disk fail).

You don't appear to be #2-4, and I don't have a solid answer to provide for #1. Unfortunately there is no evidence that the server was up, except that your router logs say it was down(how accurate those were.. I don't know). I'm not sure what exactly your logs said about it being up or down, or how you came to that determination, but I do know that the DHCP addresses are only renewed every so often(depends on your config). Typical lease time is 24 hours, so if it was up the night before the lease wouldn't have been near expiring and your router wouldn't consider it "offline" and the DHCP lease would still be valid.

Not to discredit your investigation, but I still have a gut feeling that it was actually up during your loss of power. I find it far more likely that you lost power and that is the cause for the problems than a random hardware failure that we can't seem to find or a bug that nobody has found. Just the fact that you lost power on the same day you started problems is, in my opinion, a little more than coincidence.

So here's what I would do HHawk:

#1- Whatever server you build next, be it FreeNAS, Windows, Linux, whatever.. BUY AN UPS! Every server OS manual always highly recommends an UPS. This goes back to before I was born and just makes sense. Any contract you do with any company to provide a server will always include a UPS. There's a reason for that.
#2- protoSD has some stuff going on, so I wouldn't be logging into the forums every morning hoping he posted his tool to help you.
#3- Start doing backups of whatever data is ultra-important to you. At this point I have to point out that if I told you that for $1500 I could guarantee you that your data would be completely recovered you'd probably start saving your money right now. Unfortunately, for less than $1500 you could have probably built a whole second FreeNAS server and have a good solid backup. This issue would have been as simple as a restore-from-backup scenario.
#4- Talk to your wife. Be honest with her. Tell her all the stuff we've gone through and tell her there is still a chance to get your pictures and such back. The two of you should make a choice at this point:
-Set the 6 drives you have aside and wait for further guidance from protosd's potential recovery tool. Keep in mind that whatever storage space you buy, you definitely won't need as much since you have very little "history" in the computing world. (This is what I would do. Obviously this means buying new drives though.)
-Setup your server again how you want it regardless of what OS and file system you choose and reuse the 6 disks.

I'm thinking(and hoping) your wife will probably opt for setting the drives aside. I think that's a smart choice and completely reasonable. Think of it like this; if you just make a new zpool you will kick yourself for the rest of your life when you realize how much stuff you lost. It's far from fun and something you'll never feel you recovered from. I lost everything I had when I was 20 years old. I had only 1 copy and I was trying to make a backup to 3 CDs. When my friend took the drive home to make my backup...he dropped it. It fell off the kitchen counter to the floor and the biznatch never worked again. The drive had the click of death. Everything was gone. :( At that time data recovery was a 5-figure cost and I didn't have that kind of money.

DO BACKUPS! Your data is only worth as much as you are willing to pay to ensure a good backup. Even a very low powered Atom system with ZFS snapshot/replication can be a life saver. If I needed to build a backup server I'd get one of those Atoms that supports 8GB of RAM and do ZFS snapshots every week. They're relatively cheap to build, very low powered, and they can be shoved in the corner(or under a bed) until they are no longer useful.

When it comes to a single fundamental truth about IT admins...

There are 2 kinds of people in this world:
- Those that do backups.
- Those that haven't ever lost irreplaceable data.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
I am back. I tried a few other distro's, no luck at all. Same problems or they didn't work at all.

Now I am running / trying the commands without 1 disk (at a time).

The first thing I did notice just now, which I didn't had before, was the following.

Please mind the below part is the same or similar:

Code:
root@mfsbsd:/root # zpool import
   pool: storage
     id: 17472259698871586545
  state: DEGRADED
 status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://illumos.org/msg/ZFS-8000-2Q
 config:

        storage                                         DEGRADED
          raidz2-0                                      DEGRADED
            gptid/19177fb9-25fa-11e2-9ab0-00151736994a  ONLINE
            gptid/19b5ec3a-25fa-11e2-9ab0-00151736994a  ONLINE
            gptid/3dc2f956-3de6-11e2-8af1-00151736994a  ONLINE
            gptid/1aefa3e9-25fa-11e2-9ab0-00151736994a  ONLINE
            5393521929904432319                         UNAVAIL  cannot open
            gptid/1c2d6a74-25fa-11e2-9ab0-00151736994a  ONLINE
root@mfsbsd:/root # zpool import storage
cannot import 'storage': I/O error
        Destroy and re-create the pool from
        a backup source.


However what is different is this:

Code:
root@mfsbsd:/root # zpool import -T 735242 storage
Pool storage returned to its state as of Fri Mar 15 02:03:31 2013.
root@mfsbsd:/root # 


I never received this message before: Pool storage returned to its state as of Fri Mar 15 02:03:31 2013.

And if I doo zpool status I get this now:

Code:
root@mfsbsd:/root # zpool status
  pool: storage
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 3h50m with 0 errors on Sun Feb 24 03:13:57 2013
config:

        NAME                                            STATE     READ WRITE CKSUM
        storage                                         DEGRADED     0     0     2
          raidz2-0                                      DEGRADED     0     0     4
            gptid/19177fb9-25fa-11e2-9ab0-00151736994a  ONLINE       0     0     0
            gptid/19b5ec3a-25fa-11e2-9ab0-00151736994a  ONLINE       0     0     0
            gptid/3dc2f956-3de6-11e2-8af1-00151736994a  ONLINE       0     0     0
            gptid/1aefa3e9-25fa-11e2-9ab0-00151736994a  ONLINE       0     0     0
            5393521929904432319                         UNAVAIL      0     0     0  was /dev/gptid/1b8f2b64-25fa-11e2-9ab0-00151736994a
            gptid/1c2d6a74-25fa-11e2-9ab0-00151736994a  ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list


Using -v results in this:

Code:
errors: Permanent errors have been detected in the following files:

        /rw/storage/Jail/plugins/var/log/messages


Well since it's something different than normal, I am getting a little hope again now...

I will wait on what to do next?
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
WTF!!

I can see my storage again!!!

PLEASE ADVICE WHAT NOW?!

(sorry about the caps, I am a little excited)


//update

I am now browsing the files through SSH Secure File Transfer tool
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Apparently most / everything is now still intact...!

Great... I am currently very happy, however before I do something (stupid), what should I do now...? Can the ZFS be repaired and get it back working again...? So I don't even have to lose my FreeNAS settings etc...?

Of course it would be best to back it up now, however I do not have the spare space or harddisks to transfer everything...? However since I can access it now, it means it can also be fixed right?

Please provide me with some solutions or what I should do next... Thanks...
 

purduephotog

Explorer
Joined
Jan 14, 2013
Messages
73
WTF!!

I can see my storage again!!!

PLEASE ADVICE WHAT NOW?!

(sorry about the caps, I am a little excited)


//update

I am now browsing the files through SSH Secure File Transfer tool

I'd say back them up to a USB drive, personally :smile:I missed your post about not having any additional drives.

Personally, I'd find the first electronics store that is open and buy the biggest drives I could afford and start copying like crazy. Barring that, I'd call any/all of m friends and bum whatever external USB drives they had available. Id explain it is a matter of martial harmony...
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Yeah well, it's sunday here and nothing is open. And all of my friends are no computer "geeks" or anything... So that's kinda hard.

And I am now scared to have a power outage. I will back up the most important stuff now first though, however since everything is accessable, I am still wondering what needs to be done to get it working properly again as before with all the current settings.
 
Status
Not open for further replies.
Top