GEOM Errors on zpool, can it be recovered?

jlacroix

Cadet
Joined
Feb 22, 2019
Messages
2
Hello everyone,

My main storage volume (ZFS) is unable to be mounted, and I'm hoping it can be recovered. I do have backups, but before I wipe the disks and restore (which will take days) I am trying one last time to see if this situation can be recovered from.

My FreeNAS server was installed in a Poweredge T30 with four SATA disks attached directly to the motherboard, without RAID. I let ZFS handle disk management, so RAID was disabled. However, this server had fewer resources and was choking, so I decided to replace it. Replacing it ended up being cheaper than upgrading the existing.

I purchased a server from UNIX Surplus, after telling them that it was intended to run FreeNAS. After I purchased the server and it arrived, I put my existing disks from the T30 into the new server and FreeNAS wouldn't detect the drives at all. The RAID card on the new server was an Adaptec card. The support person walked me through the process of creating a JBOD, promising me that the process of doing so does not wipe data off the disks but just simply allows the disks to be detected by the BIOS so I can bypass hardware RAID. FreeNAS still wouldn't see the drives.

Now, they replaced the card with an LSI card, and now FreeNAS does see the drives, but still won't activate the pool. At boot-up I see errors.

Code:
root@storage:~ # dmesg | grep -iE "ata|ahci|geom"
ahci0: <Intel Cougar Point AHCI SATA controller> port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 0xdfb02000-0xdfb027ff irq 19 at device 31.2 on pci0
ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
ahcich2: <AHCI channel> at channel 2 on ahci0
ahcich3: <AHCI channel> at channel 3 on ahci0
ahcich4: <AHCI channel> at channel 4 on ahci0
ahcich5: <AHCI channel> at channel 5 on ahci0
mps0: SAS Address for SATA device = 3f553c1d7a856ba5
mps0: SAS Address for SATA device = 3f5337157a886581
mps0: SAS Address for SATA device = 3e494f1490876b83
mps0: SAS Address for SATA device = 3e494f049ca07883
mps0: SAS Address from SATA device = 3f553c1d7a856ba5
mps0: SAS Address from SATA device = 3f5337157a886581
mps0: SAS Address from SATA device = 3e494f1490876b83
mps0: SAS Address from SATA device = 3e494f049ca07883
cd0 at ahcich5 bus 0 scbus6 target 0 lun 0
cd0: 150.000MB/s transfers (SATA 1.x, UDMA5, ATAPI 12bytes, PIO 8192bytes)
da2: <ATA TOSHIBA DT01ACA2 ABB0> Fixed Direct Access SPC-4 SCSI device
da0: <ATA TOSHIBA DT01ACA2 ABB0> Fixed Direct Access SPC-4 SCSI device
da3: <ATA TOSHIBA DT01ACA2 ABB0> Fixed Direct Access SPC-4 SCSI device
da1: <ATA TOSHIBA DT01ACA2 ABB0> Fixed Direct Access SPC-4 SCSI device
GEOM: da0: corrupt or invalid GPT detected.
GEOM: da0: GPT rejected -- may not be recoverable.
GEOM: da1: corrupt or invalid GPT detected.
GEOM: da1: GPT rejected -- may not be recoverable.
GEOM: da2: corrupt or invalid GPT detected.
GEOM: da2: GPT rejected -- may not be recoverable.
GEOM: da3: corrupt or invalid GPT detected.
GEOM: da3: GPT rejected -- may not be recoverable.


I've been Googling for a few hours, found the following command and ran it:

Code:
root@storage:~ # gpart recover da0
gpart: arg0 'da0': Invalid argument


The RAID card I have now is this one:

root@storage:~ # sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2004(B2)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------

0 SAS2004(B2) 20.00.07.00 14.01.00.06 07.39.02.00 00:01:00:00

Finished Processing Commands Successfully.
Exiting SAS2Flash.

Code:
root@storage:~ # sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

    Adapter Selected is a LSI SAS: SAS2004(B2)   

Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------

0  SAS2004(B2)     20.00.07.00    14.01.00.06    07.39.02.00     00:01:00:00

    Finished Processing Commands Successfully.
    Exiting SAS2Flash.


So basically, I'm thinking the Adaptec card overwrote something important at the beginning of the drives when it saved metadata. I'm wondering if there's anything I can do in order to recover before I give up and wipe them and start over from a backup.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Wow, sorry for the trouble with the new system. When you have any questions, always ask here first because lots of people in the computer industry don't know what ZFS is or how to deal with it.
So basically, I'm thinking the Adaptec card overwrote something important at the beginning of the drives when it saved metadata.
You are correct in thinking that the Adaptec card has overwritten important data. I don't know that there is no way to recover the pool but I think it is unlikely.
Hopefully the other community members will have some time to offer advice.
 
Last edited:

jlacroix

Cadet
Joined
Feb 22, 2019
Messages
2
I took another look at the server from the Web GUI, it shows the volume as "locked." I don't recall ever encrypting it. I know for sure that it's never asked me for a passphrase. I tried to unlock it via the GUI, and I gave it the key that I found on my FreeNAS installation at /data/geli/<key>.key but that didn't seem to work.

So I'm not sure if:

1.) FreeNAS thinks that the volume is locked, because it can't access the disks
2.) The volume is actually encrypted

I tried to run:

Code:
geli attach -k /data/geli/<key>.key /dev/da0


But that just gave me:

Code:
geli: Cannot read metadata from /dev/da1: Invalid argument.


I looked for recovery information in /var/backups but that directory is empty. If the volume was truly encrypted, wouldn't it have saved backups of the disk structure there?

For fun, I tried booting the server with an Ubuntu live USB just to see what Linux shows (since that's what I'm more familiar with) and fdisk info for one of the drives shows:

Code:
Disk /dev/sda: 1.8 TiB, 2000398934016 bytes, 3907029168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device     Boot Start        End    Sectors  Size Id Type
/dev/sda1           1 3907029167 3907029167  1.8T ee GPT


So it still has a partition showing up on the disk, but I'm not sure why it says GPT.

I tried to assemble the zpool via Ubuntu's tools but it can't detect RAID and just shows that there is no zpool's at all. FreeNAS itself doesn't see any pools either, though the Web GUI does show the pool (but it shows up as locked).

I'll keep trying a bit longer before giving up...
 
Top