SOLVED GPT Corrupt or invalid

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
Hi All,

There's going to be a lot of content here, but I appreciate anyone who can provide some assistance. At this point in time, i've been down for about 2 months. I've been running off my backup, which has been fine but I'm ready to get back to using my primary hardware.

Here is my Previous Post where I had a plethora of errors. I think I've now narrowed it down to a bad IOM and/or bad QSFP cable/connector on my array. I also made a Reddit Post and got some ideas for troubleshooting. At this point I'm still stuck on these GPT corruption issues, and I've seen some people have these with disk failures, and they dd the disk to another disk, and seems like that works things out. What i'd like to know is if I can simply repair the GPT's and get the pool back online?

System Specs:
Dell R720xd
2x E5-2670
384gb RAM
LSI SAS-9207-8e
NetApp DS4246 (2x)
FreeNAS 11.1-U7 AND 11.2-U7

At this moment I'm running in a VM, but the same errors are exhibited when running bare metal. The pool was built and has always been run on bare metal. I was going to build my new system as a VM and it was when I had FreeNAS loaded up, that I saw all vdevs were back online (I'd been struggling with only 2 of them showing up, replacing a cable seemed to to resolve this)

The errors I'm getting during boot are things along these lines:


GEOM: da35: the secondary GPT table is corrupt or invalid. GEOM: da35: using the primary only -- recovery suggested.

GEOM: multipath/disk1: corrupt or invalid GPT detected. GEOM: multipath/disk1: GPT rejected -- may not be recoverable.
root@freenas[/var/log]# gpart recover /dev/da25 gpart: arg0 'da25': Invalid argument

Using gpart list/status/show, none of the disks in vdev2(raidz3-1) show up.
the disks do show up when doing camcontrol devlist all (or sysctl kern.disks
but their GPT both primary and secondary seem to be fully gone.

Results of zpool import:

Code:
root@freenas[/var/log]# zpool import
   pool: tank
     id: 907423887126384953
  state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
   see: http://illumos.org/msg/ZFS-8000-EY
config:

        tank                                            UNAVAIL  insufficient replicas
          raidz3-0                                      DEGRADED
            gptid/e6f85169-5136-11e8-b376-bc305bf48148  ONLINE
            731087118901044337                          UNAVAIL  cannot open
            gptid/181f4f1a-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/24da255f-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/325a8e49-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/4a4df2c1-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/6a52ee8d-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/7bc6a747-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/8d306d6b-5137-11e8-b376-bc305bf48148  ONLINE
            gptid/a82f0ce2-5137-11e8-b376-bc305bf48148  ONLINE
            1014468244676781008                         UNAVAIL  cannot open
            gptid/db7bb774-5137-11e8-b376-bc305bf48148  ONLINE
          raidz3-1                                      UNAVAIL  insufficient replicas
            8358001936465582577                         UNAVAIL  cannot open
            13168953014485996713                        UNAVAIL  cannot open
            16674343073354040627                        UNAVAIL  cannot open
            5916167911648451258                         UNAVAIL  cannot open
            10228142690658451320                        UNAVAIL  cannot open
            17309881899809254171                        UNAVAIL  cannot open
            6521923195479449726                         UNAVAIL  cannot open
            16254298423903963076                        UNAVAIL  cannot open
            12519560841692417252                        UNAVAIL  cannot open
            16658654953139949791                        UNAVAIL  cannot open
            11783566184643368972                        UNAVAIL  cannot open
            17155630495564972158                        UNAVAIL  cannot open
          raidz3-2                                      ONLINE
            gptid/bc7a19f9-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/bd626022-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/bef543fe-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c0e17dc6-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c27fc9e2-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c365fdbb-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c46b5d52-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c60d3684-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c7a6bc30-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/c9416b43-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/ca343c33-c4cb-11e8-b538-bc305bf48148  ONLINE
            gptid/cb3f2f2c-c4cb-11e8-b538-bc305bf48148  ONLINE


I've seen several ideas.
1) dd the disk to a new disk. (https://forums.freebsd.org/threads/zfs-geom-mfid1-corrupt-or-invalid-gpt-detected.2139/)
2) wipe the first and last part of the disk - dd if=/dev/zero of=/dev/label/spare1 bs=512 count=1
3) give up and start fresh and restore from backup.

I tried option 2, and did not seem to make any difference. I feel like there's got to be a way to rebuild the GPT without having to dd the entire disk? We're talking 12 2TB Disks. I have space, I could DD each disk to a file, and then wipe the disk, and dd back - would that work?

Is this something that is recoverable or should I just let it go and move on?
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
This has the feeling of trying to build a jetliner at 30,000 feet.

Since you have a backup and this isn't a critical recovery -- I'd stop, go back, and validate your hardware. ZFS (and FreeBSD) cannot run on marginal hardware.

Typically you want to be able to see error-free operation for several weeks worth of burn-in. Without that, you're likely to keep fighting symptoms.

Two maybe-obvious questions:

Is your HBA flashed to the correct IT firmware (20.00.07.00)?

Is the multipath stuff (always a little twitchy) actually showing up correctly? There are some discussions of this elsewhere on the forums. Things get crazy quickly when this hasn't been set up correctly.

These would be likely places where apparent or actual disk corruption could come into play.
 

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
This has the feeling of trying to build a jetliner at 30,000 feet.

Since you have a backup and this isn't a critical recovery -- I'd stop, go back, and validate your hardware. ZFS (and FreeBSD) cannot run on marginal hardware.

Typically you want to be able to see error-free operation for several weeks worth of burn-in. Without that, you're likely to keep fighting symptoms.

Two maybe-obvious questions:

Is your HBA flashed to the correct IT firmware (20.00.07.00)?

Is the multipath stuff (always a little twitchy) actually showing up correctly? There are some discussions of this elsewhere on the forums. Things get crazy quickly when this hasn't been set up correctly.

These would be likely places where apparent or actual disk corruption could come into play.

This hardware has run Freenas for several years now. weekly scrubs never showed any cause for concern.

HBA is flashed to the correct firmware. In addition I've used a few other HBA's (HP H220 and a NetApp X2065A QSFP HBA, but some people reported issues using it so I removed it from the equation (it ran for over 2.5 years for me without issue up until this most recent corruption.

The multipath stuff, I honestly don't know much about it. But, the multipath section of the GUI shows as degraded on all the disks in that vdev. Here is the status:

Code:
root@freenas[/var/log]# gmultipath list
Geom name: disk6
Type: AUTOMATIC
Mode: Active/Passive
UUID: 6e0f7705-2c17-11ea-b674-000c29550ac4
State: OPTIMAL
Providers:
1. Name: multipath/disk6
   Mediasize: 3000592981504 (2.7T)
   Sectorsize: 512
   Mode: r0w0e0
   State: OPTIMAL
Consumers:
1. Name: da37
   Mediasize: 3000592982016 (2.7T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE
2. Name: da1
   Mediasize: 3000592982016 (2.7T)
   Sectorsize: 512
   Mode: r1w1e1
   State: PASSIVE

Geom name: disk12
Type: AUTOMATIC
Mode: Active/Passive
UUID: dc793e2c-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk12
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da36
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk11
Type: AUTOMATIC
Mode: Active/Passive
UUID: dc646199-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk11
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da35
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk10
Type: AUTOMATIC
Mode: Active/Passive
UUID: dc56ccaf-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk10
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da34
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk9
Type: AUTOMATIC
Mode: Active/Passive
UUID: dc463d7b-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk9
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da33
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk8
Type: AUTOMATIC
Mode: Active/Passive
UUID: dc353a4c-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk8
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da32
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk7
Type: AUTOMATIC
Mode: Active/Passive
UUID: dc250fd7-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk7
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da31
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk5
Type: AUTOMATIC
Mode: Active/Passive
UUID: dbe921de-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk5
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da29
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk4
Type: AUTOMATIC
Mode: Active/Passive
UUID: dbd7e553-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk4
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da28
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk3
Type: AUTOMATIC
Mode: Active/Passive
UUID: dbc689db-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk3
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da27
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk2
Type: AUTOMATIC
Mode: Active/Passive
UUID: dbb4fcfb-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk2
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da26
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Mode: r1w1e1
   State: ACTIVE

Geom name: disk1
Type: AUTOMATIC
Mode: Active/Passive
UUID: db8a26e7-edea-11e9-a7be-bc305bf48148
State: DEGRADED
Providers:
1. Name: multipath/disk1
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r0w0e0
   State: DEGRADED
Consumers:
1. Name: da25
   Mediasize: 2000398934016 (1.8T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   State: ACTIVE


How do I fix multipath errors?
 

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
So this was fixed. I don't know how someone else would come across the same situation I had but...

Somehow FreeNAS decided to start multipathing my disks. It was adding multiple disks to a single "disk" multipath though. So for example, da21 and da33 would both be assigned a multipath to 'disk1'. This is why things weren't showing up under zpool import (but disks were there in sysctl/camcontrol/GUI).

After reading more on Multipathing, and deciding that is NOT something I'd setup previously when I built the pool - I decided these must've somehow been created in error.

I ran gmultipath destroy <disk#> on all the multipath disks, and then did a zpool import -f and was able to reimport my pool. two failed disks on 2 separate vdevs, so no big deal. only thing I can't get back to is my jails - but they weren't critical anyways (and therefore weren't backed up)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well, I'm happy to hear that this turned out to be explained by what I'm guessing was broken multipathing. It had me basically stumped as to what to suggest next. :smile:
 

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
Well, I'm happy to hear that this turned out to be explained by what I'm guessing was broken multipathing. It had me basically stumped as to what to suggest next. :)

Yea I am just confused as to why multipathing was introduced to begin with, i'm going to be doing additional testing on the hardware to ensure i've identified the bad piece of gear.
 

Rand

Guru
Joined
Dec 30, 2013
Messages
906
Thank you.
This helped me fix a pool (no backup but not critical data ;)) which I moved from an -A type to an dual expander type backplane.
It worked fine initially but after a couple of boots the multipath disk got corrupted for whatever reason, but destroying the mp disks worked just fine.
 

MrTVirus

Dabbler
Joined
Dec 18, 2018
Messages
10
Sorry to necro an old thread, but this saved my behind and 300Tb of data. Went and moved across the country and had everything packed really nice, went to fire it all back up and was met with the same error as OP and quickly lost my cool. Furiously googling for a solution left me here and after removing all the multipathing stuff and doing a force import I was back in business and the data is scrubbing.
 
Top