Core 13.0-U4 Disk Replacement Error | Member Disk Not Selectable

Joined
Apr 2, 2023
Messages
8
Hello all!

A bit of background: I'm currently running TrueNAS Core v13.0-U4 on a Lenovo Thinkserver TS440. Xeon E3-1225-v3, 32GB memory, 10GBe Intel NIC. Right now I run a Z2 pool using 4 x 2TB HDDs, but I'm in the process of upgrading those to 4 x 12TB HDDs. The new drives are Seagate Ironwolf, which are CMR, so there shouldn't be any issues there.

I started the upgrade by following the documentation here: https://www.truenas.com/docs/core/coretutorials/storage/disks/diskreplace/

My plan was to replace the old 2TB disks one at a time, and let the pool auto-expand after they were all replaced. The first disk replacement worked flawlessly, and I have the first new 12TB drive already resilvered and happily chugging along.

When I start to follow the same process with the second disk, something weird happens. I can offline the disk, but then once I swap hardware, no member disks are available to select in the GUI drop down menu. Two little lines appear where ADA0 was listed before (and I expect to see ADA1 this time). I followed the exact same steps as before, but now get a different result.

I've tried trying swapping other disks instead, restarting the middleware, rebooting the system, hard refreshing the GUI in case it was a caching issue, but nothing helps.

Does anyone have any ideas? I am stumped.
 

Attachments

  • No_member_disk.png
    No_member_disk.png
    32.3 KB · Views: 90

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
First, check dmesg to make sure the new disks are being detected.
Second, the UI may not make a drive appear as an eligible replacement, if it already contains existing partitions, as a safety mechanism. Run gpart show ada1 to see if there are existing partitions. If this is the case, try wiping the new ada1 in Storage->Disks.
 
Joined
Apr 2, 2023
Messages
8
Thanks for the reply!

Here is the relevant section I get from dmesg

ada0 at ahcich0 bus 0 scbus0 target 0 lun 0 ada0: <ST12000VN0008-2YS101 SC60> ACS-4 ATA SATA 3.x device ada0: Serial Number ZR70R8HA ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 11444224MB (23437770752 512 byte sectors) ses0: pass0,ada0 in 'Slot 00', SATA Slot: scbus0 target 0 ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 ada1: <WDC WD2003FYYS-18W0B0 01.01D02> ATA8-ACS SATA 2.x device ada1: Serial Number WD-WMAY04744310 ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 1907729MB (3907029168 512 byte sectors) ses0: pass1,ada1 in 'Slot 01', SATA Slot: scbus1 target 0 ada2 at ahcich2 bus 0 scbus2 target 0 lun 0 ada2: <WDC WD2003FYYS-18W0B0 01.01D02> ATA8-ACS SATA 2.x device ada2: Serial Number WD-WMAY04715400 ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 1907729MB (3907029168 512 byte sectors) ses0: pass2,ada2 in 'Slot 02', SATA Slot: scbus2 target 0 ada3 at ahcich3 bus 0 scbus3 target 0 lun 0 ada3: <WDC WD2003FYYS-18W0B0 01.01D02> ATA8-ACS SATA 2.x device ada3: Serial Number WD-WMAY04714924 ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 1907729MB (3907029168 512 byte sectors) ses0: pass3,ada3 in 'Slot 03', SATA Slot: scbus3 target 0 ada4 at ahcich4 bus 0 scbus4 target 0 lun 0 ada4: <KingSpec KSD-SA25.7-016MJ 1.094.33> ACS-2 ATA SATA ada4: Serial Number 985083001018 ada4: 600.000MB/s transfers (SATA 3.x, UDMA5, PIO 512bytes) ada4: Command Queueing enabled ada4: 15272MB (31277232 512 byte sectors) ses0: pass4,ada4 in 'Slot 04', SATA Slot: scbus4 target 0

----------
zpool status -v gives me the following:
root@FreeNAS:~ # zpool status -v pool: FreeNAS state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 28.2M in 00:00:07 with 0 errors on Sun Apr 2 14:04:36 2023 config: NAME STATE READ WRITE CKSUM FreeNAS DEGRADED 0 0 0 raidz2-0 DEGRADED 0 0 0 gptid/0c339b45-d0af-11ed-ba31-6c0b840a737d ONLINE 0 0 0 gptid/b34876ec-58b2-11e8-a5fd-6c0b840a737d OFFLINE 0 0 0 gptid/b3fe6108-58b2-11e8-a5fd-6c0b840a737d ONLINE 0 0 0 gptid/b4ac6825-58b2-11e8-a5fd-6c0b840a737d ONLINE 0 0 0 errors: No known data errors pool: freenas-boot state: ONLINE status: Some supported and requested features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: scrub repaired 0B in 00:01:00 with 0 errors on Fri Mar 31 03:46:00 2023

----------
gpart show ada1 doesn't find anything at ada1

The new disk appears to be listed by gptid in the GUI under Pool Status, but it doesn't show up in the GUI under disks.

I also tried glabel status both before and after replacing the old disk, results are attached.
 

Attachments

  • Disks.png
    Disks.png
    53.2 KB · Views: 72
  • Glabel_status_after.png
    Glabel_status_after.png
    44.6 KB · Views: 72
  • Glabel_Status_Before.png
    Glabel_Status_Before.png
    59.5 KB · Views: 70
  • Gpart_Show_ADA1.png
    Gpart_Show_ADA1.png
    12 KB · Views: 72
  • Pool_Status.png
    Pool_Status.png
    68.3 KB · Views: 82

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
You've got a physical layer problem, possibly a broken ada1 port on your motherboard.
 

Matt_G

Explorer
Joined
Jan 24, 2016
Messages
65
Something that I am noticing:
In your first post/picture, it says you are trying to replace a disk with a gptid beginning with b4ac6825.
But that disk is ada3 and is online.
The offline disk (ada1) has a gptid beginning with b34876ec.
What the heck is up with that?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The new disk appears to be listed by gptid in the GUI under Pool Status
That could only be the case if it had a GPT partition table, which it shouldn't. But the dmesg output shows one 12TB disk on ada0 (which would presumably be the one 12TB disk you've already resilvered into the pool), ada1-3 being 2TB WD disks, and ada4 presumably being your boot device.

What's the output of camcontrol devlist? And is the offline disk still connected?
 
Joined
Apr 2, 2023
Messages
8
That could only be the case if it had a GPT partition table, which it shouldn't. But the dmesg output shows one 12TB disk on ada0 (which would presumably be the one 12TB disk you've already resilvered into the pool), ada1-3 being 2TB WD disks, and ada4 presumably being your boot device.

What's the output of camcontrol devlist? And is the offline disk still connected?
Thanks for taking a look, I appreciate it.

You're correct, the 12TB disk is the new Ironwolf that I got up and running first try, the other 2TB disks are the legacy drives I'm replacing, and the remaining drive is the boot drive.

The offline disk is no longer connected, I swapped it out for a second new 12TB Ironwolf.

Here is what camcontrol devlist returns in my current configuration with that new 12TB Ironwolf swapped in for ada1. Let me know if you want to see it with the legacy drive installed:

root@FreeNAS:~ # camcontrol devlist <ST12000VN0008-2YS101 SC60> at scbus0 target 0 lun 0 (ada0,pass0) <WDC WD2003FYYS-18W0B0 01.01D02> at scbus2 target 0 lun 0 (ada2,pass2) <WDC WD2003FYYS-18W0B0 01.01D02> at scbus3 target 0 lun 0 (pass3,ada3) <KingSpec KSD-SA25.7-016MJ 1.094.33> at scbus4 target 0 lun 0 (ada4,pass4) <AHCI SGPIO Enclosure 2.00 0001> at scbus5 target 0 lun 0 (ses0,pass5)
 
Joined
Apr 2, 2023
Messages
8
Something that I am noticing:
In your first post/picture, it says you are trying to replace a disk with a gptid beginning with b4ac6825.
But that disk is ada3 and is online.
The offline disk (ada1) has a gptid beginning with b34876ec.
What the heck is up with that?

I've tried the offline/replace procedure on several different disks, all with the same result (no member disk option to choose in the GUI). That first screenshot was from an attempt with a different disk. Sorry for the confusion.
 
Joined
Apr 2, 2023
Messages
8
You've got a physical layer problem, possibly a broken ada1 port on your motherboard.
That could be, though if that's the case I don't think it's just ada1 that is broken since I'm getting the same results after attempting to swap ada1, ada2, or ada3 (no member disk option available in the GUI). I'm running the factory Lenovo backplane and hot swap caddies if that helps.
 
Joined
Apr 2, 2023
Messages
8
I went for a long bike ride and tried to think about possible solutions.

The only thing that can identify that changed between the 1st (successful) swap and subsequent (unsuccessful) attempts was the resilver. Is it possible something went wrong with reslivering?

Maybe I have something misconfigured with the factory Lenovo backplane somehow?

That's all I got :confused:
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Here is what camcontrol devlist returns in my current configuration with that new 12TB Ironwolf swapped in for ada1. Let me know if you want to see it with the legacy drive installed:
So the new disk isn't being seen at all. DOA disk, bad SATA cable, bad SATA port, bad power cable to the disk? Could be any of these, but some of them are easy to test (swap cables, use a different SATA port, etc.).
 
Joined
Apr 2, 2023
Messages
8
So the new disk isn't being seen at all. DOA disk, bad SATA cable, bad SATA port, bad power cable to the disk? Could be any of these, but some of them are easy to test (swap cables, use a different SATA port, etc.).
I think you're right, and Samuel Tai too. Seems like a hardware issue.

I'll try experimenting with swapping things around next and report back later with the results.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
I'd say it's likely you may have one or more of these issues:
  • Failed backplane (this actually happened to me)
  • Power supply isn't supplying enough juice to the newer disks
  • Do you have an HBA or a RAID controller feeding the backplane? A RAID controller isn't recommended.
 
Joined
Apr 2, 2023
Messages
8
I tried testing several components today, and the verdict is...

All 3 of the remaining new 12TB HDDs are DOA. I got "lucky" and pulled the only working one out of the box first, which is why the swap worked on the first try but not for any others. A scenario that I thought was so unlikely I didn't even consider it. Thank you all for pointing me in the right direction.

Someone at UPS must have really kicked the *#*$ out of that box on the way to me. I'll RMA the drives and have new ones soon, and then I should be back in business.
 
Top