Panic When Resilvering after Replacing Disk

bollar · Dec 25, 2018

I sm in the process of replacing the drives on one vdev of a 4 X raidz-2 pool. The first two drives went fine. On the third drive, very roughly 20 minutes into the resilver process, the system panics and reboots and attempts to resilver from the beginning. I have tried changing the destination drive to a fresh HD as well as removing the original disk. I have not been able to get the resilver to complete and the current state is degraded.

I've never seen a panic on FreeBSD before and I don't know where to start to diagnose the problem. Any advice?

System:

FreeNAS 11.2-RELEASE
Platform: Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz, 16 Cores @ 2.2GHz x 2
Supermicro X9DRD
LSI SAS2308 PCI-Express Fusion-MPT SAS-2
Chelsio S320e Dual-Port 10GBe
CPU VM Support: Full
Memory: 64GB
Chassis: Chambro 40700 4U-48 bay / SAS-2 Backplane
HDD: 6TB x 16, 4TB x 5, 3TB x 3, 2TB x 10, 1.5TB x 12
SSD: 120GB x 2 (mirrored boot), 18.64GB x 1 (SLOG

bollar · Dec 26, 2018

With the original drive to be replaced removed and no target inserted, a resilver completed. ~~Decided to put one of the target drives back in and we'll see what happens this time~~. That reintroduced the panic.

Code:

pool: tank


state: DEGRADED


status: One or more devices could not be opened.  Sufficient replicas exist for


    the pool to continue functioning in a degraded state.


action: Attach the missing device and online it using 'zpool online'.


   see: http://illumos.org/msg/ZFS-8000-2Q


  scan: resilvered 0 in 0 days 13:01:44 with 0 errors on Wed Dec 26 05:52:37 2018


config:





    NAME                                            STATE     READ WRITE CKSUM


    bollar                                          DEGRADED     0     0     0


      raidz2-0                                      ONLINE       0     0     0


        gptid/ec7bb72e-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/eba940e2-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/ed468c77-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/4053ba3b-aed7-11e7-a9dd-0025909434fc  ONLINE       0     0     0


        gptid/976c6aaf-b2cc-11e7-b3a1-0025909434fc  ONLINE       0     0     0


        gptid/2f754939-b05c-11e7-b3a1-0025909434fc  ONLINE       0     0     0


        gptid/298ebf65-341e-11e8-a3f9-00074307578b  ONLINE       0     0     0


        gptid/f27c190c-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


      raidz2-1                                      ONLINE       0     0     0


        gptid/f42213d8-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/f57c6efa-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/f66364a1-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/f71b89a9-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/f849bcf2-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/f9a941fd-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/faf2b1e5-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


        gptid/fc168dba-0aba-11e7-8136-0025909434fc  ONLINE       0     0     0


      raidz2-3                                      ONLINE       0     0     0


        gptid/643db24f-15b8-11e8-98af-00074307578b  ONLINE       0     0     0


        gptid/6593ff26-15b8-11e8-98af-00074307578b  ONLINE       0     0     0


        gptid/6747ea3f-15b8-11e8-98af-00074307578b  ONLINE       0     0     0


        gptid/68f766f3-15b8-11e8-98af-00074307578b  ONLINE       0     0     0


        gptid/6b4e6f1e-15b8-11e8-98af-00074307578b  ONLINE       0     0     0


        gptid/f3547a81-1996-11e8-86d7-00074307578b  ONLINE       0     0     0


        gptid/26c0673d-6c0d-11e8-b265-00074307578b  ONLINE       0     0     0


        gptid/cc712a62-6c5c-11e8-b265-00074307578b  ONLINE       0     0     0


      raidz2-4                                      DEGRADED     0     0     0


        2437984594064647901                         UNAVAIL      0     0     0  was /dev/gptid/eaadcad5-0882-11e9-942e-00074307578b


        gptid/b4c921fe-0737-11e9-b757-00074307578b  ONLINE       0     0     0


        gptid/8a111399-3393-11e8-a3f9-00074307578b  ONLINE       0     0     0


        gptid/bc85928e-6e8c-11e8-b265-00074307578b  ONLINE       0     0     0


        gptid/8d98777d-3393-11e8-a3f9-00074307578b  ONLINE       0     0     0


        gptid/a112faa5-d146-11e8-800b-00074307578b  ONLINE       0     0     0


        gptid/b6878f33-06e7-11e9-b757-00074307578b  ONLINE       0     0     0


        gptid/930e7916-3393-11e8-a3f9-00074307578b  ONLINE       0     0     0


    logs


      gptid/fc62586e-0aba-11e7-8136-0025909434fc    ONLINE       0     0     0





errors: No known data errors

bollar · Dec 27, 2018

Still sitting with a clean resilver, but degraded array and considering my options. A restore would take a long time (weeks?), so that's a last resort. I wonder if there's something else I should look at (like memory allocation or tunables) or try a live CD of a different OS with ZFS support, like OmniOS.

Or just leave the array "as-is" until another two drives fail in that VDEV and restore at that point.

cobrakiller58 · Dec 27, 2018

Probably want to post a bug report about this so the devs can take a look at the problem.

bollar · Dec 27, 2018

cobrakiller58 said:
Probably want to post a bug report about this so the devs can take a look at the problem.

Yeah. I'd like to be able to give them pertinent info/logs, but I'm not sure what those would be.

cobrakiller58 · Dec 27, 2018

https://www.ixsystems.com/documentation/freenas/11.2/system.html?highlight=debug#support

If you attach the debug along with the information you've provided here they will be able to begin investigating and ask you if they need any addition information.

Apollo · Dec 27, 2018

The panic isn't a good thing.
In order to resolve your Degraded state, you need to remove the "Unavailable" drive from the pool. Until the drive remains unavailable, you will never be able to get rid of the Degraded state.

Once remove, you can run a scrub if you like.

bollar · Dec 27, 2018

Apollo said:
The panic isn't a good thing.
In order to resolve your Degraded state, you need to remove the "Unavailable" drive from the pool. Until the drive remains unavailable, you will never be able to get rid of the Degraded state.

Once remove, you can run a scrub if you like.

Thanks -- apparently not an option here. the device name has changed since my previous post, but this is the unavailable drive from above.

Code:

root@storage:~ # zpool detach tank 13182715825251778798


cannot detach 13182715825251778798: only applicable to mirror and replacing vdevs


root@storage:~ #

bollar · Dec 27, 2018

cobrakiller58 said:
https://www.ixsystems.com/documentation/freenas/11.2/system.html?highlight=debug#support

If you attach the debug along with the information you've provided here they will be able to begin investigating and ask you if they need any addition information.

Yes. I think it's insufficient info, but you're right and I have submitted a bug. https://redmine.ixsystems.com/issues/66234

Apollo · Dec 27, 2018

I think you have to use the name with the gptid such as this one:

/dev/gptid/eaadcad5-0882-11e9-942e-00074307578b

bollar · Dec 28, 2018

Through the bug report, I was advised that another drive in the vdev was timing our and that caused the panic. I took it offline and was able to complete the resilver. Now in the process of replacing this drive.

Important Announcement for the TrueNAS Community.

Panic When Resilvering after Replacing Disk

bollar

Patron

bollar

Patron

bollar

Patron

cobrakiller58

Guru

bollar

Patron

cobrakiller58

Guru

Apollo

Wizard

bollar

Patron

bollar

Patron

Apollo

Wizard

bollar

Patron

Similar threads