SOLVED Resilvering a (seemingly) healthy pool

sfanla

Dabbler
Joined
Jul 25, 2023
Messages
10
Hello all!

I have a 3x 8 wide VDEV RAIDZ2 pool that I've been upgrading to larger drives by doing replace one drive at a time. The first 12 drives went without a problem. However, since drive 13, TrueNAS tries to resilver and seems to get exponentially slower as it progresses. Whereas my replacements take roughly 5.5h to resilver (about 3h per use TB per drive), this resilver (on I don't know what since all drives show as healthy with no errors) has not yet completed after some 12 hours running, and is still at 86% (it was at 84% about 3 hours ago).

I'll do my best to include all pertinent information about my system below:
I am running with ECC ram, proxmox on bare metal, Truenas Scale as a VM, and doing a PCIe pass-through on the LSI 3008 SAS card (a 16i that I flashed with the suggested custom firmware from the Truenas help documentation some months ago).

OS Version:TrueNAS-SCALE-22.12.3.3
Product:Standard PC (Q35 + ICH9, 2009)
Model:QEMU Virtual CPU version 2.5+
Memory:61 GiB

The drives initially in the system were micron 2TB SSD (5100). The good replacements were 14 SATA 8TB CMR drives (2 Seagate, 12 WD RED "plus" that pre-date the plus name). The new drive I put in when this all started is a SAS HGST OF23678 8TB drive. I did try replacing the SAS drive with a second one from my 15. It resilvers the SAS drive normally for the replacement, and then starts this strange resilvering forever again.


edit: adding a screenshot of my storage dashboard. The 2 warnings are due to unequal drive sizes since I'm in the process of upgrading

1699471567720.png


Thanks!
 

sfanla

Dabbler
Joined
Jul 25, 2023
Messages
10
Update: it seems to have magically jumped back to normal speeds (or close to) and did 10% over the last hour. So strange.

*edit* but then it id 0.3% over the following 20 minutes which is again crazy slow. I have no clue what's causing this.
 
Last edited:

sfanla

Dabbler
Joined
Jul 25, 2023
Messages
10
Alright, as soon as it finished, it restarted. Well that helps, now I can look at all the endless loop threads!
 
Last edited:

sfanla

Dabbler
Joined
Jul 25, 2023
Messages
10
Conclusion for those who may stumble upon this someday. zpool status -v saved my butt, I didn't know about that command. It showed me the list of pending resilvers, it turns out I had just moved around disks too much and it had 3 more drives to run through, and resilvers are sequential so... that explained the resilver loop. As far as the speed problem though on the SAS drives, that remains a mystery.
 
Top