gptid lost?

Dice · Jul 24, 2019

Hello!

Im a little bit out of the grooves when it comes to FreeNAS, as I resurrected the box from more or less a year of hibernation, only to be greeted by a degraded volume.

I do remember that a drive acted up several times (particularly when moving or cleaning around the chassis). Presumable a bad SATA connection as smart would rack up errors in "199 UDMA_CRC_ERROR_COUNT". I've not worried, as the degraded volume would normally be fixed by wiggling the sata connector into place and rebooting, running a 0 fix safety scrub.

Code:

199 UDMA_CRC_Error_Count    0x0032   200   001   000    Old_age   Always       -       1452

This time however, more system confusion than normal have ensued. Here is sort of a time line representation:
Shut down the system, wiggled the sata cable, rebooted.
- GUI showed 6/7 drives present. Pool as degraded.
- CLI show 6/7 drives present.

Shut down the system, REPLACED the sata cable, rebooted.

- GUI shows 7/7 drives present. Pool as "healthy" in the Storage->volumes menu. The traffic light shows pool as degraded IIRC.
- CLI shows 7 drives, yet one deviating gptid.

Rebooting:
- GUI shows 7/7 drives present. Pool as "healthy" in the Storage->volumes menu. The traffic light menu: "The volume wd60efrx state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."
- CLI :

Code:

 zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:16 with 0 errors on Fri Mar  8 03:45:17 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0

errors: No known data errors

  pool: wd60efrx
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 4.36G in 0 days 00:01:10 with 0 errors on Wed Jul 24 14:08:01 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        wd60efrx                                        ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/18479def-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            da6p2                                       ONLINE       0     0     1
        logs
          gptid/ae50a861-e463-11e8-8177-000c298edbf1    ONLINE       0     0     0

errors: No known data errors

Reflections:
At this point I recognize there is some sort of resilvering that has begun (did it finish?)
The expected data change from last nights resurrection work could very well correspond to the 4.36GB added, as I did not take notice of the FreeNAS situation immediately.

I'm slightly confused by the system stating "healthy", finding all drives, yet have lost a gptid on a drive in the pool. hm?

Question:
What is the best course of action to fix the broken gptid, and normalize the system?

When searching the forums for similar problems yet seemingly more severe, I find threads dating 2015, saying the drive should be offlined, removed from pool, wiped, then resilvered. Is that necessary for my problem too?

Specs:
FreeNAS 11.1-U6
X11-SSL i3-6100
48GB
9201-16i (20.04.00)
7x WD60EFRX Raidz2
SLOG 120GB SM863

Cheers!

Chris Moore · Jul 24, 2019

Dice said:
Hello!

Long time. You gotta stop in more often.

Chris Moore · Jul 24, 2019

Dice said:
I do remember that a drive acted up several times (particularly when moving or cleaning around the chassis). Presumable a bad SATA connection as smart would rack up errors in "199 UDMA_CRC_ERROR_COUNT". I've not worried, as the degraded volume would normally be fixed by wiggling the sata connector into place and rebooting, running a 0 fix safety scrub.

Am I correct that your data is still accessible?

Dice said:
At this point I recognize there is some sort of resilvering that has begun (did it finish?)

Yes, the system completed an automatic resilver. That is the kind of thing that it does when a drive is taken out of a hot-swap bay and put back in before any significant changes happen to the pool.

Dice said:
I'm slightly confused by the system stating "healthy", finding all drives, yet have lost a gptid on a drive in the pool. hm?

Question:
What is the best course of action to fix the broken gptid, and normalize the system?

I don't see how that happened. It is a bit unusual. I would take the drive out of the FreeNAS and delete all the partitions using another computer then put it back in the FreeNAS and initiate a regular resilver through the GUI. That will correct the gptid situation and ensure that the data is checked for accuracy.

Dice said:
When searching the forums for similar problems yet seemingly more severe, I find threads dating 2015, saying the drive should be offlined, removed from pool, wiped, then resilvered. Is that necessary for my problem too?

It is a good safety precaution. I don't think it needs to be offlined first. I never do that. I have hot-swap bays on my systems and I just pull the drives out. If you don't have hot-swap bays, you can shut the system down, take the drive out and bring it back up. FreeNAS will complain that you have a missing drive, but you have RAIDz2, so you still have redundancy.
I might even go to the extra step of putting the suspect drive in a Windows system and giving it a full format to check it for bad sectors.

Do you have your FreeNAS set to run SMART tests periodically?

Dice said:
FreeNAS 11.1-U6

You could upgrade to U7 because there are some good fixes there. I am not a fan of the new GUI in the 11.2 series of updates.

Dice · Jul 25, 2019

Chris Moore said:
Long time. You gotta stop in more often.

Heartwarming. Thank you!

Chris Moore said:
Am I correct that your data is still accessible?

Correct.

Chris Moore said:
Do you have your FreeNAS set to run SMART tests periodically?

Yes, I follow the advice put forward by cyberjock ages ago.

Chris Moore said:
You could upgrade to U7 because there are some good fixes there.

Good to know.
At this point I am hesitant to update before I have sorted out the gptid quirk. Intuitively, it is a liabilty I rather not have.

Chris Moore said:
I would take the drive out of the FreeNAS and delete all the partitions using another computer then put it back in the FreeNAS and initiate a regular resilver through the GUI. That will correct the gptid situation and ensure that the data is checked for accuracy.

I've pursued a middle ground, that is to first run a scrub to secure integrity to the best of the pools ability before messing around further. Turns out this repaired quite a few more hickups than the resilvering process indicated!

Code:

zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:18 with 0 errors on Thu Jul 25 03:45:19 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0

errors: No known data errors

  pool: wd60efrx
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 848K in 0 days 11:32:10 with 0 errors on Thu Jul 25 05:02:36 2
019
config:

        NAME                                            STATE     READ WRITE CKSUM
        wd60efrx                                        ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/14ef1fa6-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/15c495ba-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/16990bee-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/1769399b-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/18479def-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            gptid/1911207e-e0a4-11e5-b134-0cc47ab3208c  ONLINE       0     0     0
            da6p2                                       ONLINE       0     0    33
        logs
          gptid/ae50a861-e463-11e8-8177-000c298edbf1    ONLINE       0     0     0

errors: No known data errors

At this point I am content with precautions, prior to attempting the remove-re:add the drive again to fix the gptid.

Out of curiousity I checked that the uiid is still ...there, under the hood, somewhere:

Code:

:~# gpart list

Geom name: da6
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 11721045134
first: 34
entries: 128
scheme: GPT
Providers:
1. Name: da6p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   rawuuid: 19de182c-e0a4-11e5-b134-0cc47ab3208c
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: da6p2
   Mediasize: 5999027556352 (5.5T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r1w1e1
   rawuuid: 19ed623f-e0a4-11e5-b134-0cc47ab3208c
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 5999027556352
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 11721045127
   start: 4194432
Consumers:
1. Name: da6
   Mediasize: 6001175126016 (5.5T)
   Sectorsize: 512
   Stripesize: 4096
   Stripeoffset: 0
   Mode: r2w2e4

The rawuuid: 19ed623f-e0a4-11e5-b134-0cc47ab3208c still is the same (matches an old reference screenshot)

I guess the 'forum way' to solve this discrepancy between 'under the hood' and the GUI, to ...avoid fiddling too much under the hood, and do the 'offline-remove-wipe-re:add' method via the gui.
However, my interest is really poked if there are ways to solve this "beautifully" rather than the grunt-gui-fix...

According to the manual, the option 'quick wipe' will erase the partition information on the disk. Sufficient to make the drive appear as unknown to FreeNAS if I am correct.

I'll report back whenever I get around to execute the move.

Dice · Jul 28, 2019

Reporting back as planned.

Confirmed procedure to fix the problem:
The drive was be offlined, removed from pool, wiped, da6 "replaced" with the wiped drive, resilvering started.

Code:

zpool status

confirms a new gptid has been assigned.

Code:

scan: resilvered 4.04T in 0 days 16:27:10 with 0 errors

Everything looks normal again.

cheers!

Important Announcement for the TrueNAS Community.

gptid lost?

Dice

Wizard

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

Dice

Wizard

Dice

Wizard

Similar threads

Important Announcement for the TrueNAS Community.

gptid lost?

Dice

Wizard

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

Dice

Wizard

Dice

Wizard

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "gptid lost?"

Similar threads