Hotspare not available after failed drive replaced and zPool resilvered

ondro727

Cadet
Joined
Oct 3, 2021
Messages
2
I am running TrueNAS Core on VM (based on Supermicro AMD based server with passthrough SAS HBAs) connected to WD Data102 JBOD. There are 2 zPools. A drive (HDD disk78p2) failed on one of them recently, and hotspare (disk98p2) kicked in and was used as a temporary drive. I then replaced the failed drive disk78p2 physically (hotplug) and initiated the replacement operaton via GUI - I am pretty sure I marked the failed drive to be replaced, not the spare (but I wouldn't bet my life on it). The resilver was triggered and finished without errors. However, after the resilver was finished, the hotspare disk98p2 didn't get "available", but is still reporting "in use" in the vDev (and unavailable as a hot spare for this zPool). What is worse/strange is that the new replaced drive disk78p2 is also reported as "online", but under "spares" section...

Original zPool:
4x RAIDZ6 of 8x HDD
2x cache SSD
1x log MIRROR of 2x SSD
1x spare HDD

Current status (see the attached screen):
3x RAIDZ6 of 8x HDD
1x RAIDZ6 of 9x HDD (2 of HDDs are in "spare" section - disk98p2 and disk78p2)
2x cache SSD
1x log MIRROR of 2x SSD
1x spare HDD (disk98p2 unavailable)

I tried to follow the replacement procedure: remove the failed drive, insert new drive, hit "replace" on failed drive in GUI and point it to use the new drive, wait for resilver to finish. The outcome is quite different than expected. The zPool is reported as "ONLINE" (which is fine), but the underlying structure is incorrect. Any idea what happened? And more important - any idea how to resolve this situation, so I can get my hotspare disk98p2 back without degrading the zPool? Any help would be appreciated. [Note: I have all 102 drive slots filled in WD Data102, so there is no option of adding even one more drive...]
 

Attachments

  • TrueNAS_Data102.JPG
    TrueNAS_Data102.JPG
    63.5 KB · Views: 182

ondro727

Cadet
Joined
Oct 3, 2021
Messages
2
Okay, resolved the issue finally. It's my fault I got fooled by the fact that detaching the spare from the vDev after resilver was not possible via GUI (resulted in error), since it was logical and even stated in several posts (and Oracle docs). Finally I tried the same via shell (zpool detach zPool2 <diskID>) and all is back to normal. As for two disk reported under "spares" section of mentioned vDev, I guess the idea is that both "spare" and the "replaced" disks are in this section, so one can see "which spare is for which replaced disk". Not very staightforward in my opinion, but hey, new day (night actually), new experience. Reporting this in case anyone else gets to such situation.
 
Top