Please help, broken RAIDZ1 pool after MB failure

Status
Not open for further replies.

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
Please help, I've made a right mess of my raidz1 pool. Im sure its my nub stature that broke it by running premature commands without fully understanding what im doing but if you could help me recover my data that would be great. I hate asking and dont want to waste anybodies time but ive spent hours reading through info and I dont want to brade it any more than I already have. have. I did spend time reading through what i thought was relevant info before I did anything but Im just making it worse and worse.


The story.
Power supply blew, It took my motherboard with it and I suspect 1 HDD as well.
I installed all HDD into another machine. OS is on usb so I put that to the new machine also.
All came online ok but raid was degraded as 1 HDD was unavailable.
I couldnt work out which drive failed so I sequentially pulled sata cables to try and identify the faulty drive. This made the whole raid unavailable obviously. Research I was looking at at the time told me to offline/detach the pool from GUI than import to get the pool back online. its been broken ever since.
I thought the raid info was stored on the drives so I didnt think unplugging extra drives would do permanent damage.
I cant online the drives cause I detached the pool, I cant import the pool cause the pools unavailable.......
Heres some info from commads ive run, hope they are helpfull.


System Information
Hostname freenas.local Edit
Build FreeNAS-9.2.1.7-RELEASE-x64 (fdbe9a0)
Platform Intel(R) Core(TM) i7 CPU 920 @ 2.67GHz
Memory 8168MB
5x4TB HDD in raidZ1
Code:
[root@freenas] ~# zpool import
   pool: Storage
     id: 7343851656798266284
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://illumos.org/msg/ZFS-8000-3C
config:

        Storage                                         UNAVAIL  insufficient replicas
          raidz1-0                                      UNAVAIL  insufficient replicas
            13782428409787983682                        UNAVAIL  cannot open
            gptid/36753082-72ba-11e6-b25b-00241dc22a4a  ONLINE
            3287537329476448460                         UNAVAIL  cannot open
            6661899145104330531                         UNAVAIL  cannot open
            2968293278834859767                         UNAVAIL  cannot open

camcontrol devlist
<HGST HDN724040ALE640 0953>        at scbus0 target 0 lun 0 (ada0,pass0)
<HGST HDN724040ALE640 MJAOA5E0>    at scbus0 target 1 lun 0 (ada1,pass1)
<Port Multiplier 14580322 000e>    at scbus0 target 15 lun 0 (pass2,pmp0)
<HGST HDN724040ALE640 0953>        at scbus1 target 0 lun 0 (ada2,pass3)
<Port Multiplier 14580322 000e>    at scbus1 target 15 lun 0 (pass4,pmp1)
<HGST HDN724040ALE640 MJAOA5E0>    at scbus7 target 0 lun 0 (ada3,pass5)
<Verbatim STORE N GO PMAP>         at scbus9 target 0 lun 0 (pass6,da0)

[root@freenas] ~# glabel status
                                      Name  Status  Components
gptid/36753082-72ba-11e6-b25b-00241dc22a4a     N/A  ada1p2
                             ufs/FreeNASs3     N/A  da0s3
                             ufs/FreeNASs4     N/A  da0s4
                            ufs/FreeNASs1a     N/A  da0s1a


[root@freenas] ~# gpart show
=>        34  7814034988  ada1  GPT  (3.7T)
          34          94        - free -  (47k)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  7809840584     2  freebsd-zfs  (3.7T)
  7814035016           6        - free -  (3.0k)

=>      63  15133185  da0  MBR  (7.2G)
        63   1930257    1  freebsd  [active]  (942M)
   1930320        63       - free -  (31k)
   1930383   1930257    2  freebsd  (942M)
   3860640      3024    3  freebsd  (1.5M)
   3863664     41328    4  freebsd  (20M)
   3904992  11228256       - free -  (5.4G)

=>      0  1930257  da0s1  BSD  (942M)
        0       16         - free -  (8.0k)
       16  1930241      1  !0  (942M)


[root@freenas] ~# zpool import -D
no pools available to import



Thanks for your time.
 
Last edited by a moderator:

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
J

jkh

Guest
Storage UNAVAIL insufficient replicas
raidz1-0 UNAVAIL insufficient replicas
13782428409787983682 UNAVAIL cannot open
gptid/36753082-72ba-11e6-b25b-00241dc22a4a ONLINE
3287537329476448460 UNAVAIL cannot open
6661899145104330531 UNAVAIL cannot open
2968293278834859767 UNAVAIL cannot open
OK, that's pretty bad... Nonetheless, your plight has at least attracted the attention of various onlookers who show up for plane crashes, oil refinery fires, and various natural disasters over a certain scale, so let's see what, if anything, we can do here.

<Port Multiplier 14580322 000e> at scbus0 target 15 lun 0 (pass2,pmp0)
OK, what's that? When you ripped the drives out of the previous system, what kind of system did you plug them back into? Same type of controller? Does the new motherboard also support GPT devices or is it some terribly old thing you found in a dumpster, crying for its lost bairns, and maybe it's BIOS-all-the-way?

It is weird that the labels are all so horked, which is why I ask. From this picture, only one 4Tb drive is showing up, along with what appears to be your USB stick. You had 4x4Tb drives to start with? What did you replace the failed drive with?
 

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
Originally the drives were plugged into a LSI9211-IT raid card on a GIGABYTE GA-MA785G-UD3H motherboard


The replacement machine is a Gigabyte GA-EX58-EXTREME motherboard and I connected the drives directly to the motherboards sata ports. When I first connected the drives 4 out of the 5 drives came online ok and I had access to all my data, so i suspect the motherboard is all ok with the drives.
My problems then started when I unplugged other drives trying to establish which one was faulty.

Originally I had 5 x 4TB drives in a raidZ, I havnt installed any extra drives to replace the faulty one yet,. (I never made it that far.....lol)
So currently connected to this new setup is only the 5x4tb HDD and my USB with the freenas OS, no other devices are connected.


Thanks for taking the time to try and help jkh.
Cheers.
matt
 
Joined
Apr 9, 2015
Messages
1,258
As far as having "sequentially pulled sata cables" did you do this while it was powered on or did you power down and pull one then boot back up to see if it was the correct one. If you did this while everything was powered up please power everything down and go stand in the corner for three hours and think about what you have done. Even with AHCI hardware it's probably best to power down and back up if the system is not designed for hot swap as you can bump another cable and cause issues.

When your timeout is over power back up after everything is connected. See if things come back up to what you had before you started pulling things, if so go into "storage" and click "view disks". You can then see the serial number of the offending drive which should be printed on the label of the drive but before removing it make sure to mark it as offline -- http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive

If you can't get things to come back up you may want to boot a previous environment and see if that lets everything show up. Just plug a monitor and keyboard into the freenas and watch the screens you will see one where you can select an earlier version in case something in the current one is corrupted. You could also try a fresh install on another USB drive and see what comes up.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Originally the drives were plugged into a LSI9211-IT raid card on a GIGABYTE GA-MA785G-UD3H motherboard


The replacement machine is a Gigabyte GA-EX58-EXTREME motherboard and I connected the drives directly to the motherboards sata ports.
Can you not simply use that LSI 9211 in the new system?
 

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
when I went into "Storage" then "view disks" it showed me the serial numbers of all the drives but didnt indicate which one was faulty.

I did try a raid card in the new system but for some reason it hangs on boot after configuring the controller.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
I did try a raid card in the new system but for some reason it hangs on boot after configuring the controller.
Hold on a second here, tell us exactly what you mean by this? When you "configured" this Raid Card, did you have it do anything with the disks, like "Initialize" them?
 

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
No i did nothing to configure the disks. raid card is jbod only. I meant the raid card configure screen comes up on boot, thinks for a while then hangs on a screen saying something like. MPT boot rom successfully installed. So i just removed raid card and connected the drives straight to the motherboards sata.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Try to flash the card again, this time removing the boot ROM, which is known to cause trouble.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Try to flash the card again, this time removing the boot ROM, which is known to cause trouble.

Alternatively, sone BIOSes allow you to skip/disable "option roms"
 

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
As far as having "sequentially pulled sata cables" did you do this while it was powered on or did you power down and pull one then boot back up to see if it was the correct one. If you did this while everything was powered up please power everything down and go stand in the corner for three hours and think about what you have done. Even with AHCI hardware it's probably best to power down and back up if the system is not designed for hot swap as you can bump another cable and cause issues.

When your timeout is over power back up after everything is connected. See if things come back up to what you had before you started pulling things, if so go into "storage" and click "view disks". You can then see the serial number of the offending drive which should be printed on the label of the drive but before removing it make sure to mark it as offline -- http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive

If you can't get things to come back up you may want to boot a previous environment and see if that lets everything show up. Just plug a monitor and keyboard into the freenas and watch the screens you will see one where you can select an earlier version in case something in the current one is corrupted. You could also try a fresh install on another USB drive and see what comes up.


Ok ill give a fresh install on another usb, i couldnt see anything on boot about loading to another earlier environment?
I did find the info in the gui about the serial numbers before i started pulling things but it didnt indicate which one was faulty. It just showed them all. I even took a picture of it at the time...
Ill go and stand back in my corner now. Lol.

Cheers
 

Attachments

  • 20160904_160952.jpg
    20160904_160952.jpg
    347.2 KB · Views: 184

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
So Ive done a fresh install and it appears to be the same. I definately had my pool working in a degraded state (down 1 drive) when I first booted the disks on the new hardware. So im confident I dont have any bios issues and that it should work without the original raid card. My big mistake was pulling the drives while the system was online... I get it, im an ID10T, I genuinely thought that after reconnecting and rebooting it would just read the raid info from each drive and start working again. Hard lesson.
So am I screwed?
IS there no way to force the drives back online,? nothings been written to them so the info should all be there.

Code:
[root@freenas] ~# zpool import
   pool: Storage
     id: 7343851656798266284
  state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-EY
config:

        Storage                                         UNAVAIL  insufficient replicas
          raidz1-0                                      UNAVAIL  insufficient replicas
            13782428409787983682                        UNAVAIL  cannot open
            gptid/36753082-72ba-11e6-b25b-00241dc22a4a  ONLINE
            3287537329476448460                         UNAVAIL  cannot open
            6661899145104330531                         UNAVAIL  cannot open
            2968293278834859767                         UNAVAIL  cannot open

[root@freenas] ~# camcontrol devlist
<HGST HDN724040ALE640 0953>        at scbus0 target 0 lun 0 (ada0,pass0)
<HGST HDN724040ALE640 MJAOA5E0>    at scbus0 target 1 lun 0 (ada1,pass1)
<Port Multiplier 14580322 000e>    at scbus0 target 15 lun 0 (pass2,pmp0)
<HGST HDN724040ALE640 0953>        at scbus1 target 0 lun 0 (ada2,pass3)
<Port Multiplier 14580322 000e>    at scbus1 target 15 lun 0 (pass4,pmp1)
<TSSTcorp CDDVDW SH-S223B SB01>    at scbus4 target 0 lun 0 (cd0,pass5)
<HGST HDN724040ALE640 MJAOA5E0>    at scbus6 target 0 lun 0 (ada3,pass6)
<Verbatim STORE N GO 5.00>         at scbus8 target 0 lun 0 (pass7,da0)



[root@freenas] ~# glabel status
                                      Name  Status  Components
gptid/36590af5-72ba-11e6-b25b-00241dc22a4a     N/A  ada1p1
gptid/36753082-72ba-11e6-b25b-00241dc22a4a     N/A  ada1p2
                             ufs/FreeNASs3     N/A  da0s3
                             ufs/FreeNASs4     N/A  da0s4
                            ufs/FreeNASs1a     N/A  da0s1a
 
Last edited by a moderator:

rs225

Guru
Joined
Jun 28, 2014
Messages
878
What is the theory for this problem? Did he pull drives on a degraded pool, and then the labels got out of sync? On some systems this doesn't happen because ZFS immediately detects it and freezes the pool, but maybe in this case it did?

If so, then the only workaround would be TXG rollback(which probably doesn't work with mismatched labels), or manual label editing to achieve a rollback.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
You could always try zpool import -F -n Storage. The -F flag would normally attempt to force the import, including rolling back recent TXGs to the point where the pool could be mounted. The -n flag instructs it to simulate doing the -F and report if it will work, but doesn't actually do anything to the pool data.

Edit: If the output says it will work, you can run it again without the -n flag to import the pool. Once it finishes, do zpool export Storage, and import the pool through the GUI.
 
Last edited:

matt502san

Cadet
Joined
Sep 10, 2016
Messages
7
You could always try zpool import -F -n Storage. The -F flag would normally attempt to force the import, including rolling back recent TXGs to the point where the pool could be mounted. The -n flag instructs it to simulate doing the -F and report if it will work, but doesn't actually do anything to the pool data.

Edit: If the output says it will work, you can run it again without the -n flag to import the pool. Once it finishes, do zpool export Storage, and import the pool through the GUI.

I tried the zpool import -F -n Storage but it gave no output.
So i tried it without the -n
Code:
[root@freenas] ~# zpool import -F -n Storage
[root@freenas] ~# zpool import -F -n Storage
[root@freenas] ~# zpool import -F Storage
cannot import 'Storage': no such pool or dataset
        Destroy and re-create the pool from
        a backup source.
[root@freenas] ~#


I found this comment in some other info I was reading through on a similar issue, Is there any reason this would work for the import/export process where freenas doesn't?

"Download an OmniOS LiveCD, boot it into a shell and do an import/export there. It will likely fix this. "

I also have a record of the gptid numbers for the 4 drives that were working before I broke them, I dont understand enough about them but could this help?

Thanks
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Do you have a backup?
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
I would try 'zpool import -f -F -n Storage'

I've always seen it with both the -f (force import) and the -F (permit rewind). I don't think it will work though.

No harm in trying the OmniOS live CD. If it works, it would be because there is an automated method to detect and dodge whatever your problem is. I also doubt it will work though, since I am only aware of the -F, -m, -T options being added to zpool import, and they are all in FreeNAS.

Anything more complicated is generally beyond automating.
 
Status
Not open for further replies.
Top