Please help! Started up FreeNAS, suddenly voluime storage (ZFS) status unknown?!

Status
Not open for further replies.

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
For me, leaving a drive out of the pool is definitely not an option.
The more drives the merrier.

I'm no expert but wouldn't it be easier to rip apart your new water cooled desktop than tell your wife you lost the wedding pictures?
Damn +1. Inconvenient or not just eliminating the controller is worth it.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Hhawk,

I'm playing around with a *possible* method for retrieving files. It's complicated and I'm not sure if it is practical or reliable.

Do you have any idea how full your pool was?
Did you have any datasets?
Did you have any snapshots?

I don't want to give you any false hope, but IF this was possible, it would mean you'd need enough space to copy the recovered files to, and it could take a VERY long time (probably a week, possibly more). I'm almost afraid to even mention this because I don't completely understand it, and it will probably take me as long to figure it out as it will to copy stuff off if it's even possible.

I might be able to post a script here before I sleep that can make a list of files from your pool, but its VERY VERY slow. It would help to know how your disk was structured (datasets, how many top level directories, approximate size all your data, lots of big files?, primary types of files). Its been awhile since I've done any coding from scratch, so that may be another problem.

Like I said, I'm not sure I can do it. I think its possible with enough time and if your pool isn't corrupt. There are a LOT of IFs, but if a new controller doesn't help, this might be one more thing to try.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
I'm no expert but wouldn't it be easier to rip apart your new water cooled desktop than tell your wife you lost the wedding pictures? Most new higher end motherboards have 6 SATA ports, so why not yank out the hard drives from your desktop and stick in the FreeNAS drives along with the USB stick?

No it would not be easier, because not only would it mean that I would have to take out, the radiator, fans, reservoir, tubing, etc... But it would also mean, I would have to run the desktop PC without CPU cooling, because I had to remove the watercooling. And I don't have a (air) CPU cooler. Intel 3930k processors come without a default CPU cooler.

My best bet would be to get some kind of cheap SATA controller. Like 2x Vantec UGT-ST310R (with Silicon Image Sil3114 SATA controller chip) which has 4 SATA internal ports. However this could several days. Another solution would be borrowing one from my work, however I only think we have PCI-X controller cards here, which is incompatible with my system (only 2x PCI slots and the rest is PCIe slots).

And I still doubt it's the onboard SATA controller. The motherboard is a Gigabyte GA-880GA-UD3H (rev 3.1) with AMD SB850 Southbridge. The SB also handles the SATA connections. And in the past few years (16+) I never seen a broken onboard SATA controller (and no, I am not saying it's impossible, just stating I never saw it happen, only DOA). Also if it had problems and since the chipset is located in the SB controller, I would have noticed problems earlier since the SB handles more than only the SATA controller.

Inconvenient or not just eliminating the controller is worth it.

Anyways, I was thinking; you guys mentioned I need to test another controller, but what about I switch to RAID mode from the BIOS? It would be using the RAID controller chip instead onboard, right? And not the regular SATA controller.
I don't know if this would work. But it would use a different connection towards the harddisks.

What do you think?




Hhawk,

I'm playing around with a *possible* method for retrieving files. It's complicated and I'm not sure if it is practical or reliable.

Do you have any idea how full your pool was?
Did you have any datasets?
Did you have any snapshots?

I don't want to give you any false hope, but IF this was possible, it would mean you'd need enough space to copy the recovered files to, and it could take a VERY long time (probably a week, possibly more). I'm almost afraid to even mention this because I don't completely understand it, and it will probably take me as long to figure it out as it will to copy stuff off if it's even possible.

I might be able to post a script here before I sleep that can make a list of files from your pool, but its VERY VERY slow. It would help to know how your disk was structured (datasets, how many top level directories, approximate size all your data, lots of big files?, primary types of files). Its been awhile since I've done any coding from scratch, so that may be another problem.

Like I said, I'm not sure I can do it. I think its possible with enough time and if your pool isn't corrupt. There are a LOT of IFs, but if a new controller doesn't help, this might be one more thing to try.

Okay so I am gonna jog my memory.

I was running raidz2 with 6 disks. I don't remember the total space available, but it would have been around 7 TB, right?
I think I used around 2.5 TB space with a maximum of 3 TB (probably less, but for sure not over 3 TB).

I don't think I have any datasets or snapshots at all. Because apparently it wasn't important as mentioned here?

So in regards to your questions, I think it would be safe to say I have enough space to copy the recovered files to, right? If the above calculations were right. I would have at least 4 TB of free space left.

Top level directories; I had the basic Freenas and plugin /jail directories, along with 4 or 5 other directories. One of them was called _Downloads where all movies, series and regular Sabnzbd downloads are located. Other top lever directories were ISO's, Multimedia and maybe 2 or 3 others. If I recall everything correctly.

Approximate size all your data; as mentioned above around 2.5 TB with a maximum of 3 TB not more (100% sure on this).

Lots of big files; I think the most files would have been big. My guess would be 60% over 1.2 GB. 20% over 4 GB and the rest (20% or less) would be smaller (as in MP3's, photo's, documents).

Primary types of files; .mkv, .iso, .mp3, .jpg (most used). Others would be a bunch of .txt, .doc, .rar (r00, etc) and maybe even some .exe files.

I don't know how much data you were expecting, but probably more than 2.5 TB / 3 TB, right? So maybe we are only speaking of several days?

I hope this answers your questions. If you have any other questions, let me know.

In the meantime I am going to try to find a controller for testing, however that could also take some time.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
If I cannot find a suitable SATA controller card at my work, wouldn't this be an option:

2x SATA 150 RAID PCI card, 4-port

Specifications:
Code:
4 x internal Serial ATA connectors, 2 x external eSATA connectors (selectable)
Chipset: Sil 3114
Compliant with SATA, revision 1.0
Stand-alone PCI to Serial ATA host controller
Serial ATA generation 1, data transfer rate up to 150MB/sec
Supports the following Raid levels: 0 = Disk Striping, 1 = Disk Mirroring, 5= Parity Raid, 1+0 = Combination of disk striping and disk mirroring
Four independent Serial ATA channels, each of them supporting one connection of a Serial ATA device at the same time
Supports hard drives larger than 137GB
Compliant with PCI IDE Controller specifications, revision 2.3, 32-bit, 33/66Mhz


I did some searching and it appears that the chipset (Sil 3114) is supported in FreeNAS 7 and 8, but I do not know if 2 of these cards would work in my PC?
In that case I have a cheap solution to test if the problem is the onboard controller or not. And not only cheap, I can have it within 1 working day as well...

//update

Well I searched a bit more, but apparently only 1 card is being detected if 2 similar cards are installed?
I found the information here.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
My best bet would be to get some kind of cheap SATA controller. Like 2x Vantec UGT-ST310R (with Silicon Image Sil3114 SATA controller chip) which has 4 SATA internal ports.

I think using anything that doesn't include all 6 disks is a bad idea.

And I still doubt it's the onboard SATA controller.

You're probably right, but unless you are ready to give up on your data it's better to be certain. It could be some bug with the firmware, or something that just doesn't happen unless its under load. It seems like the import goes on for some time and then you get some I/O error. Maybe another controller will just handle things just differently enough to make it work. I don't know how it works with the computer stores where you live, and I don't think any of us really like doing this, but buying a controller to test this theory and then returning it before the end of the return period is a possible option.

Anyways, I was thinking; you guys mentioned I need to test another controller, but what about I switch to RAID mode from the BIOS? It would be using the RAID controller chip instead onboard, right? And not the regular SATA controller.
I don't know if this would work. But it would use a different connection towards the harddisks.

What do you think?

No, this won't allow ZFS to access the disks the proper way.





So in regards to your questions, I think it would be safe to say I have enough space to copy the recovered files to, right? If the above calculations were right. I would have at least 4 TB of free space left.

You won't be able to copy any data to the disks that we are reading from, and like I said, I think its a bad idea to try to recover anything without all of the disks belonging to the pool connected at the same time.

The reason I asked about the file types/sizes, was so I could get an idea of the type of files to test the method I'm trying to use. The bigger files may not be so easy if there are multiple links to follow to gather all of the pieces. I really don't know how long it will take, but just parsing the directory and files is slow. To really do things efficiently would probably mean looking at the zdb source code and turning it into a completely different program. Being that you're almost at the point where there are no other options, I figure you won't mind trying it, but I'm not a professional developer and it could take me a lot longer than you want to wait and there's no guarantee I can do it.

I have an appointment too damn early this morning to have my heater repaired and I'm running out of time to post the script to only list the directories/files. As you probably noticed, the output from zdb can be pretty long and tricky to parse when I'm not sure myself what I need to collect ;)

Having lost stuff in the past, I realize having a list of the stuff can be helpful even though it can make the pain of knowing what you lost worse....
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
I think using anything that doesn't include all 6 disks is a bad idea.

That's why I said 2x (two times) that card. But I really do not know if both cards will be recognised from within FreeNAS / FreeBSD...

Anyways, what about this controller: IBM Express ServeRAID M1015 SAS/SATA Controller
It has enough ports and is affordable?


You're probably right, but unless you are ready to give up on your data it's better to be certain. It could be some bug with the firmware, or something that just doesn't happen unless its under load. It seems like the import goes on for some time and then you get some I/O error. Maybe another controller will just handle things just differently enough to make it work. I don't know how it works with the computer stores where you live, and I don't think any of us really like doing this, but buying a controller to test this theory and then returning it before the end of the return period is a possible option.

Well you convinced me, I am going to the IBM Express ServeRAID M1015 SAS/SATA Controller.
...I can always send it afterwards.


No, this won't allow ZFS to access the disks the proper way.

Okay.


You won't be able to copy any data to the disks that we are reading from, and like I said, I think its a bad idea to try to recover anything without all of the disks belonging to the pool connected at the same time.

The reason I asked about the file types/sizes, was so I could get an idea of the type of files to test the method I'm trying to use. The bigger files may not be so easy if there are multiple links to follow to gather all of the pieces. I really don't know how long it will take, but just parsing the directory and files is slow. To really do things efficiently would probably mean looking at the zdb source code and turning it into a completely different program. Being that you're almost at the point where there are no other options, I figure you won't mind trying it, but I'm not a professional developer and it could take me a lot longer than you want to wait and there's no guarantee I can do it.

I have an appointment too damn early this morning to have my heater repaired and I'm running out of time to post the script to only list the directories/files. As you probably noticed, the output from zdb can be pretty long and tricky to parse when I'm not sure myself what I need to collect ;)

Having lost stuff in the past, I realize having a list of the stuff can be helpful even though it can make the pain of knowing what you lost worse....

So in general I will "always" need a an extra controller (like the IBM Express ServeRAID M1015 SAS/SATA Controller) right?
Otherwise how will I be able to "recover" all files to another harddisk. I guess I will need two (2) harddisks of 2 TB at least, right?

No worries. I will order the controller (IBM Express ServeRAID M1015 SAS/SATA Controller) first. And test it as soon as it arrives (in 2 or 3 working days).
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Well I just placed an order for the following:

1x IBM Express ServeRAID M1015 SAS/SATA Controller
2x CBL-SFF8087OCF-10M (SFF-8087) Serial ATA breakout cable, forward

Should arrive within a few days (hopefully within 2 working days).


//sidenote

When I receive the SAS/SATA controller card. Which commands do I run then?
Also I have a question; does it matter to which SATA connector I connect the harddisk to? Or doesn't this matter at all?

I compiled a list of commands I might want / have to run as soon as I have the new controller card.

Code:
zpool import



Code:
zpool import storage



Code:
mount -uw /
followed by:
zpool import storage



Code:
zpool import -nfF storage



Code:
zpool import -fF storage



Code:
zpool import -X -fF storage



Code:
zpool import -f -R /mnt -o rdonly=on storage



All commands will be run from mfsBSD as recommended.


And if that doesn't do anything or give back any positive results I will do the following:

Reboot mfsBSD
Hit <2>
And enter the following:
Code:
set vfs.zfs.recover=1
set vfs.zfs.debug=1
set aok=1
boot -s


And than run the following command:
Code:
zdb -e -bcsvL storage


If that doesn't do anything I will try (from the same prompt) the following:

Code:
zpool import -R /mnt storage


If that fails, I will try:

Code:
zpool import -FR /mnt storage


And if that fails also, I will try:

Code:
zpool import -XFR /mnt storage


The (last) above information was found on Freenas forums here.


And if that fails also, I always could try the following with a Solaris 11 live CD:

Code:
1. Boot to Solaris 11 live CD. 
2. Change to the root user. 
3. Change the name of the live image to match the hostname and domain of the failed server environment. 
4. Set zfs:zfs_recover=1
5. Set aok=1
6. Added the same configuration to the /etc/system file. 
7. Verified the pool was online by typing zpool import {Pool was online}
8. Just for grins, try various combinations of zpool import {fFXR} /mnt but all kernel panic'ed. 
9. Ran the following command as specified by this website: http://sigtar.com/2009/10/19/opensol...-kernel-panic/

 -----> zdb -e -bcsvL <poolname>

NOTE: This error-ed out with an out of memory error after some time had expired. Initially 15 minutes. But I ran this same command at least 3-4 times more and each time, the out of memory error occurred later. The last run was around an hour before experiencing the error. 

10. Retried the import command again {again for grins}. The command used: zpool import -rfFXR /mnt <poolname>. After a few minutes, it mounted. There are a few virtual environments which appear to have corrupted data but compared to what I retrieved, it paled in comparison. 
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I compiled a list of commands I might want / have to run as soon as I have the new controller card.
We are waiting until the controller card shows up, yes? One of my thoughts was a surge might have damaged the motherboard's during the recent power outage.

In fact you might want to consider buying 6 more disks and doing a block copy of the current disks to them. Creating a safe working copy.

First, flash the controller to IT mode.

Also I have a question; does it matter to which SATA connector I connect the harddisk to? Or doesn't this matter at all?
No it doesn't matter.

All commands will be run from mfsBSD as recommended.
Yes, from mfsBSD. I suggest this exact sequence:
  1. Boot mfsBSD
  2. Hit <2>And enter the following:
    Code:
    set vfs.zfs.recover=1
    set vfs.zfs.debug=1
    boot
    
  3. Add some swap first.
    Code:
    swapctl -a /dev/ada0p1 /dev/ada1p1 /dev/ada2p1
  4. All of the uberblock information this time using this modified command:
    Code:
    zdb -luuu /dev/adaXp2
    Replacing the X for each disk, save and copy off [post=54662]as before.[/post]
  5. Code:
    zpool import
    If anything looks different then stop and report back.
  6. Code:
    zpool import -o rdonly=on storage
    
    zpool import storage
  7. Code:
    zpool import -nfF storage
  8. Code:
    time zpool import -fF storage
  9. Code:
    time zpool import -fFX storage
  10. Yes, twice if the above fails. The time of the runs can be used as a very poor indicator to see if it errors out in a different place.
    Code:
    time zpool import -fFX storage
I would suggest reporting back at that point regardless. Be sure to make careful note of any and all error messages. As in write them down or take pictures or something.

If all that fails a few:
Code:
time zdb -e -bcsvL storage
Would be likely. Record the exact error messages.

Comments on the procedure/ordering are welcome. I don't see much difference with normal zpool import vs one with vfs.zfs.recover=1 which would be the next step anyway, yes?
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Yes I am currently awaiting the controller, but it seems it's a slow company, but I am hoping I will receive it before the weekend.
...and yes, I already got the necessary files for flashing it to IT mode. ;)
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Small update; I expect the controller to arrive today and will install it later this evening or tomorrow. So hopefully *crosses fingers* it will fix this mess. However I still doubt it's the onboard controller, but I will be very happy if I am wrong about this.

There is also some bad news; the wife was asking my server was down the past few days, she needed some pictures... I had to lie about it and told her some excuse about I needed to upgrade it before I could put it back online. Didn't want to be the bringer of bad news just yet. :(
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Your patience and willingness to let us help and not freak out and do stuff without checking will make your chances of recovery a lot better.

PaleoN is pretty sure he can make it work, and I'm still working on my script also. It's proving to be a little challenging because my programming skills are rusty and I'm trying some stuff I haven't done before. I've also got some personal stuff I'm having trouble with that isn't helping. If PaleoN's ideas don't work, I still think my method has a good chance, its just going a little slowly. :(
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Your patience and willingness to let us help and not freak out and do stuff without checking will make your chances of recovery a lot better.

PaleoN is pretty sure he can make it work, and I'm still working on my script also. It's proving to be a little challenging because my programming skills are rusty and I'm trying some stuff I haven't done before. I've also got some personal stuff I'm having trouble with that isn't helping. If PaleoN's ideas don't work, I still think my method has a good chance, its just going a little slowly. :(

Well so far I am not complaining about the amount of help! You guys already earned my gratitude so far.
Normally I do things more quickly, HOWEVER that is only with things I have knowledge about and FreeBSD / FreeNAS / ZFS are not such things. And since it's a fair amount of (important) data, it's better to be safe, rather than sorry...

Well at least you can program. The only thing I can program is my alarm on clock. Haha.
Man sucks when you are having trouble with personal stuff. Hopefully it will get resolved soon for you...

//sidenote;

The card has arrived, I am gonna let it warm up a bit (cold outside, so package and contents are not on room temperature) and than I will put it in my PC at work and flash the card.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Installed it... And at first everything seems the same... Same errors (I/O error. Destroy and re-create...blablabla).

I did encounter one problem:

Code:
root@mfsbsd:/root # swapctl -a /dev/ada0p1 /dev/ada1p1 /dev/ada2p1
swapctl: /dev/ada0p1: No such file or directory
swapctl: /dev/ada1p1: No such file or directory
swapctl: /dev/ada2p1: No such file or directory


Because of this I skipped the following:
Code:
zdb -luuu /dev/adaXp2


Than I did the following:
Code:
root@mfsbsd:/root # zpool import
   pool: storage
     id: 17472259698871586545
  state: ONLINE
 action: The pool can be imported using its name or numeric identifier.
 config:

        storage                                         ONLINE
          raidz2-0                                      ONLINE
            gptid/19177fb9-25fa-11e2-9ab0-00151736994a  ONLINE
            gptid/19b5ec3a-25fa-11e2-9ab0-00151736994a  ONLINE
            gptid/3dc2f956-3de6-11e2-8af1-00151736994a  ONLINE
            gptid/1aefa3e9-25fa-11e2-9ab0-00151736994a  ONLINE
            gptid/1b8f2b64-25fa-11e2-9ab0-00151736994a  ONLINE
            gptid/1c2d6a74-25fa-11e2-9ab0-00151736994a  ONLINE
root@mfsbsd:/root # zpool import 17472259698871586545
cannot import 'storage': I/O error
        Destroy and re-create the pool from
        a backup source.
root@mfsbsd:/root # zpool import -o rdonly=on storage
cannot import 'storage': I/O error
        Destroy and re-create the pool from
        a backup source.
root@mfsbsd:/root # zpool import storage
cannot import 'storage': I/O error
        Destroy and re-create the pool from
        a backup source.
root@mfsbsd:/root # zpool import -nfF storage
(this resulted in nothing, also happened before)


Than with the following command:

Code:
root@mfsbsd:/root # time zpool import -fF storage


Something was different in contrary to before!
Because this is what I got back:

Code:
cannot import 'storage': I/O error
        Destroy and re-create the pool from
        a backup source.
0.000u 0.103s 0:01.44 6.9%      114+2524k 321+0io 0pf+0w


The top is similar, HOWEVER the bottom part (in bold here) is something new?!

I re-ran that command and it was different:

Code:
root@mfsbsd:/root # time zpool import -fF storage
cannot import 'storage': I/O error
        Destroy and re-create the pool from
        a backup source.
0.000u 0.102s 0:01.42 7.0%      79+1882k 321+0io 0pf+0w


The bottom part changed again.

So since this is different, I will wait. Hopefully it will mean something positive...? *crossing fingers*

//update

Bah... I feel stupid, I now noticed there was the command "time" in front of it, which I hadn't used before. So false alarm. Bah... :(

Anyways now running:

Code:
time zdb -e -bcsvL storage


So this will take a while...
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
So since this is different, I will wait. Hopefully it will mean something positive...? *crossing fingers*
No. That's the output from the time command.

I did encounter one problem:
The disks are likely daX now. Run an updated:
Code:
camcontrol devlist

glabel status
Adjust the adaX for daX for each disk.

Also, run these before trying the -X import:
Code:
zdb -e -uuuv storage

zdb -e -h storage
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Code:
root@mfsbsd:/root # swapctl -a /dev/ada0p1 /dev/ada1p1 /dev/ada2p1
swapctl: /dev/ada0p1: No such file or directory
swapctl: /dev/ada1p1: No such file or directory
swapctl: /dev/ada2p1: No such file or directory


Because of this I skipped the following:
Code:
zdb -luuu /dev/adaXp2

You can still do the above command. If you look closely, it has "p2" and not "p1" like the command above it.

I'm just waking up, I'll wait for PaleoN because he has some other things he wants to try.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Anyways now running:

Code:
time zdb -e -bcsvL storage


So this will take a while...
Not in the proper order, but just let it run now.
 
Status
Not open for further replies.
Top