SOLVED GPT table is corrupt or invalid error on bootup

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yesterday morning I was playing around with testing the AES-NI benchmark thread and I panic'd the machine during a scrub(automated every 1st and 15th at 0330. The issue may have occurred before now, but I only noticed it rebooting after the panic.

On bootup(and dmesg output) has:
Code:
GEOM: da7: the primary GPT table is corrupt or invalid.
GEOM: da7: using the secondary instead -- recovery strongly advised.


I've spent the last 8 hours or so reading what others have done. I read every thread and nobody seems to have a solution they said worked aside from zeroing out the disk(or at least the gpt tables) and re-adding it to the array or using alternate utilities such as Parted Magic's 'gpt fdisk' command.

In the spirit of fixing this issue(and learning a little something) without using some other boot CD or wiping the drive, how do I fix this?

System specs:

FreeNAS-8.3.1-RELEASE-x64 (r13452)
E5606 with 20GB of RAM
ZFS v28 running 18x2TB on RAIDZ3

I've never had any problems with any of my disks, and reviewing the SMART data for the drive shows nothing to indicate anything is going wrong.

Here's some outputs that were commonly asked for by other people with the same issue...

Code:
# gpart show da7

=>        34  3907029101  da7  GPT  (1.8T) [CORRUPT]
          34          94       - free -  (47k)
         128     4194304    1  freebsd-swap  (2.0G)
     4194432  3902834703    2  freebsd-zfs  (1.8T)

# gpart list da7

Geom name: da7
modified: false
state: CORRUPT
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5

# zpool status
  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 17h23m with 0 errors on Mon Apr  1 21:23:24 2013
config:

	NAME                                            STATE     READ WRITE CKSUM
	tank                                            ONLINE       0     0     0
	  raidz3-0                                      ONLINE       0     0     0
	    gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/70448fd2-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/713de0d5-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/71e3eea1-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/728458d2-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/7326aebc-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/73c64f27-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/7468c69a-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/75045f96-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/75a0096a-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/763790a1-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/76d701fa-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/77759c5c-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/78190bd3-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/78bb9173-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/795a7052-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0
	    gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae  ONLINE       0     0     0

errors: No known data errors


It was mentioned by several people to use gpart recover /dev/da7 and gpart recovery but everyone that tried that said it didn't work. Some places even mention using a sysctl kern.geom.debugflags=0x10 before running the other commands. In their defense though, they seemed to have other issues that may have prevented the command from fixing everything anyway. It seems this issue was more widespread with FreeNAS .7 and on USB sticks.

So before I try either of those commands I'm curious as to if those are still the recommended ways to repair the issue or if I have something misunderstood. The threads I found were sometimes quite old(2011 or older). I'm just thinking someone should validate the correct command to execute for this error.

Some places even say this is an issue with FreeBSD and ZFS and should be ignored. But considering one disk has this issue and the rest don't I'm thinking this is something that should be fixed.

Any input from the FreeBSD wizards?
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Just curious, the AES testing we did, are our machines at risk now?

Hope you find a fix for yourself as well.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
So before I try either of those commands I'm curious as to if those are still the recommended ways to repair the issue or if I have something misunderstood. The threads I found were sometimes quite old(2011 or older).
I'd double check the kern.geom.debugflags setting as that value has changed with kernel versions. I don't recall what the correct current value is. I would try and fix it without it first anyway. Try a plain gpart recover /dev/da7 first. Then offline da7 and repeat. Then disable swap on da7, don't forget it's encrypted now, and repeat.

Some places even say this is an issue with FreeBSD and ZFS and should be ignored. But considering one disk has this issue and the rest don't I'm thinking this is something that should be fixed.
They just did their setup wrong.

Just curious, the AES testing we did, are our machines at risk now?
Not unless you had a panic during testing.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just thought I'd clarify, I'm not using encryption and I plan to roll back to 8.3.0 since hard drive serial numbers don't appear in the GUI. That's a pretty big deal in my opinion because it helps validate which serial number are on which hard drives if you have one you know are failiing. We've had plenty of people pull the wrong "da3" and now there is no way to verify that easily if the drive keeps offlining itself(since the GUI doesn't seem to update serials the instant a drive goes offline).

No, your machines aren't at risk. I tried to run the commands I have at http://forums.freenas.org/showthread.php?12157-Encryption-performance-benchmarks and that's what crashed my system. I don't think all of geli is installed, probably because of the limited space on the USB stick. Running those commands on FreeNAS will panic the system. However, running them from a FreeBSD liveCD works fine. I do have a warning on that thread that those commands will panic FreeNAS.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I agree with you that the serial numbers in the GUI are important but in the ticket I submitted reporting the problem, the answer was they doubt they will produce a new -p1 version just because of that problem. I honestly don't see why that couldn't easily be built and issued.

Thanks for the reply, my pool is running fine but I had to ask.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I agree with you that the serial numbers in the GUI are important but in the ticket I submitted reporting the problem, the answer was they doubt they will produce a new -p1 version just because of that problem. I honestly don't see why that couldn't easily be built and issued.
Actually, they are doing a -p1 with the serial number fix, and several other fixes as well.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Actually, they are doing a -p1 with the serial number fix, and several other fixes as well.

That's good to hear!

I've been helping a friend with his pfsense box all day, so I haven't tested the fix yet. Hopefully tonight!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Actually, they are doing a -p1 with the serial number fix, and several other fixes as well.
That is great news as I haven't heard they were actually going to do the -p1 for sure. Any idea of an expected date? (CyberJock, sorry to take over your thread with that question)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So I was planning to pull all the drives except the one that has the bad GPT partition just to prevent any errors(I'm very conservative about messing with partition tables). I still unplug all the hard drives except the one I plan to install an OS on when doing OS installations.

I SSHed into my box and I did:
Code:
[root@zuul] ~# gpart list da7
Geom name: da7
modified: false
state: CORRUPT
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5

[root@zuul] ~# gpart recover /dev/da7
da7 recovered
[root@zuul] ~# gpart list da7
Geom name: da7
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da7p2
   Mediasize: 1998251367936 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251367936
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029134
   start: 4194432
Consumers:
1. Name: da7
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5


I did a reboot and all is well. So I guess the correct command (at least for my situation) was gpart recover /dev/da7 as root. Hopefully someone will find this useful in the future. I didn't have to do anything except run this command. No sysctl parameters or anything.

Now that I've seen this issue I'm inclined to see about this issue being included in my modified nightly emails so I can make sure I don't have this issue again. It would have been a PITA if I had been forced to recreate the partition and resilver the drive.

Edit: I'm wondering if there would be any downside from doing a "gpart recover" as part of a cronjob. Maybe I'm just too OCD about this since it was potentially a one time oopsie. I just have to wonder if the issue was actually from my panic or of it had been there for a while.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I did a reboot and all is well. So I guess the correct command (at least for my situation) was gpart recover /dev/da7 as root.
This was with the pool imported and /dev/da7 online or with only /dev/da7 in the system?

Edit: I'm wondering if there would be any downside from doing a "gpart recover" as part of a cronjob.
Given that it touches the partition info I would always prefer manual intervention myself.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
This was with the pool imported and /dev/da7 online or with only /dev/da7 in the system?

I ran that command with the server up and running, all drives installed, zpool mounted and shared, serving files to my HTPC. After the table was fixed I figured I'd reboot just to make sure something wasn't going to fubar it again on bootup. No error message and all is still well.

It's a little scary to me that the partition tables aren't locked. I thought that FreeNAS had locked them unless you did some kind of trickery to make sure you weren't playing with things you shouldn't. I guess I shouldn't be "too" surprised. After all, they aren't protected at all in Windows or Linux.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
It's a little scary to me that the partition tables aren't locked. I thought that FreeNAS had locked them unless you did some kind of trickery to make sure you weren't playing with things you shouldn't.
It does for some simple cases at least. Try stomping on the partition table via dd. I assume gpart recover is blessed in that regard.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
It does for some simple cases at least. Try stomping on the partition table via dd. I assume gpart recover is blessed in that regard.

Probably not, think about how the disks get wiped from the GUI.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Probably not, think about how the disks get wiped from the GUI.
Actually, it calls __gpt_unlabeldisk which does a swapoff, geli detach and gpart destroy -F. Followed by a gpart create -s gpt and another destroy. Then it runs the dd command.
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Actually, it calls __gpt_unlabeldisk which does a swapoff, geli detach and gpart destroy -F. Followed by a gpart create -s gpt and another destroy. Then it runs the dd command.

I guess they've made some changes, it used to use dd for wiping the gpt stuff.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Hmm... is it possible to "backup" the partition table? Maybe I could back them up to a thumbdrive in case the table gets corrupted.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I guess they've made some changes, it used to use dd for wiping the gpt stuff.
It still does or maybe it's more for the ZFS metadata? The disk_wipe function calls the above first. Which means the dd commands run after the partition table has been destroyed by gpart.

Hmm... is it possible to "backup" the partition table? Maybe I could back them up to a thumbdrive in case the table gets corrupted.
One way, man gpart, but GPT already includes a backup partition table on-disk. Which is how you recovered. In your case if you have three disks with both GPT headers corrupt or the first sectors corrupt you likely have a much more significant problem anyway.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Actually, I'd need 4 to be in bad shape since I'm a RAIDZ3. Although with 3 disks bad I'd be sweating a little. :P
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403

heek002

Cadet
Joined
Oct 2, 2012
Messages
1
Yesterday morning I was playing around with testing the AES-NI benchmark thread and I panic'd the machine during a scrub(automated every 1st and 15th at 0330. The issue may have occurred before now, but I only noticed it rebooting after the panic.

On bootup(and dmesg output) has:
Code:
GEOM: da7: the primary GPT table is corrupt or invalid.
GEOM: da7: using the secondary instead -- recovery strongly advised.


I've spent the last 8 hours or so reading what others have done. I read every thread and nobody seems to have a solution they said worked aside from zeroing out the disk(or at least the gpt tables) and re-adding it to the array or using alternate utilities such as Parted Magic's 'gpt fdisk' command.

In the spirit of fixing this issue(and learning a little something) without using some other boot CD or wiping the drive, how do I fix this?

System specs:

FreeNAS-8.3.1-RELEASE-x64 (r13452)
E5606 with 20GB of RAM
ZFS v28 running 18x2TB on RAIDZ3

I've never had any problems with any of my disks, and reviewing the SMART data for the drive shows nothing to indicate anything is going wrong.

Here's some outputs that were commonly asked for by other people with the same issue...

Code:
# gpart show da7
 
=>        34  3907029101  da7  GPT  (1.8T) [CORRUPT]
          34          94      - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  3902834703    2  freebsd-zfs  (1.8T)
 
# gpart list da7
 
Geom name: da7
modified: false
state: CORRUPT
fwheads: 255
fwsectors: 63
last: 3907029134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da7p1
  Mediasize: 2147483648 (2.0G)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 65536
  Mode: r1w1e1
  rawuuid: 762670b2-4a95-11e2-bca4-0015171496ae
  rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
  label: (null)
  length: 2147483648
  offset: 65536
  type: freebsd-swap
  index: 1
  end: 4194431
  start: 128
2. Name: da7p2
  Mediasize: 1998251367936 (1.8T)
  Sectorsize: 512
  Stripesize: 0
  Stripeoffset: 2147549184
  Mode: r1w1e2
  rawuuid: 763790a1-4a95-11e2-bca4-0015171496ae
  rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
  label: (null)
  length: 1998251367936
  offset: 2147549184
  type: freebsd-zfs
  index: 2
  end: 3907029134
  start: 4194432
Consumers:
1. Name: da7
  Mediasize: 2000398934016 (1.8T)
  Sectorsize: 512
  Mode: r2w2e5
 
# zpool status
  pool: tank
state: ONLINE
  scan: scrub repaired 0 in 17h23m with 0 errors on Mon Apr  1 21:23:24 2013
config:
 
NAME                                            STATE    READ WRITE CKSUM
tank                                            ONLINE      0    0    0
  raidz3-0                                      ONLINE      0    0    0
    gptid/6fbb91d5-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/70448fd2-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/70c0c7b3-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/713de0d5-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/71e3eea1-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/728458d2-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/7326aebc-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/73c64f27-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/7468c69a-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/75045f96-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/75a0096a-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/763790a1-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/76d701fa-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/77759c5c-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/78190bd3-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/78bb9173-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/795a7052-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
    gptid/79fbc7b0-4a95-11e2-bca4-0015171496ae  ONLINE      0    0    0
 
errors: No known data errors


It was mentioned by several people to use gpart recover /dev/da7 and gpart recovery but everyone that tried that said it didn't work. Some places even mention using a sysctl kern.geom.debugflags=0x10 before running the other commands. In their defense though, they seemed to have other issues that may have prevented the command from fixing everything anyway. It seems this issue was more widespread with FreeNAS .7 and on USB sticks.

So before I try either of those commands I'm curious as to if those are still the recommended ways to repair the issue or if I have something misunderstood. The threads I found were sometimes quite old(2011 or older). I'm just thinking someone should validate the correct command to execute for this error.

Some places even say this is an issue with FreeBSD and ZFS and should be ignored. But considering one disk has this issue and the rest don't I'm thinking this is something that should be fixed.

Any input from the FreeBSD wizards?

Thanks the gpart recover worked for me!. Happy..
 
Status
Not open for further replies.
Top