Problem expanding pool after drive upgrade

joeschmuck · Jan 16, 2024

I don't honestly know which versions of ZFS are on CORE and SCALE, I could look tomorrow. If they are the same version, Moving to CORE just to make the VDEV work might be worth it, then go back to SCALE. And SCALE did get a minor update today, 23.10.1.1. The changelog didn't mention anything about ZFS changes.

tvsjr · Jan 16, 2024

Unfortunately, versions went away in favor of feature flags... so far, short of turning up a current Core (which I might do in a bit) and creating a sample pool, I haven't found a good reference on what's supported in Core vs Scale. I'm a little hesitant to try jumping to Core - something is obviously a touch wonky somewhere, and I don't want to upset the balance and wind up with the vdev, and subsequently the pool, going tango uniform on me.

I pulled the zpool history and found that the creation date for this pool in its current form was Christmas Eve 2015:

Code:

2015-12-24.04:24:06 zpool create -o cachefile=/data/zfs/zpool.cache -o failmode=continue -o autoexpand=on -O compression=lz4 -O aclmode=passthrough -O aclinherit=passthrough -f -m /Tier3 -o altroot=/mnt
Tier3 raidz2 /dev/gptid/26d2563c-a9f6-11e5-b3e9-002590869c3c /dev/gptid/277b82e6-a9f6-11e5-b3e9-002590869c3c /dev/gptid/28279512-a9f6-11e5-b3e9-002590869c3c /dev/gptid/28d8c104-a9f6-11e5-b3e9-002590869c3
c /dev/gptid/298897cc-a9f6-11e5-b3e9-002590869c3c /dev/gptid/2a3c8e32-a9f6-11e5-b3e9-002590869c3c spare /dev/gptid/2af81b3c-a9f6-11e5-b3e9-002590869c3c

Unless the iX gurus come up with something, I think I'm just going to say that more than 8 years is a pretty damn good life for a pool and go ahead and rebuild with the new drives. It'll let me clean out some cruft at the same time.

joeschmuck · Jan 17, 2024

tvsjr said:
Unless the iX gurus come up with something, I think I'm just going to say that more than 8 years is a pretty damn good life for a pool and go ahead and rebuild with the new drives. It'll let me clean out some cruft at the same time.

I can understand that. It would be nice to know if there is a solution to your issue however you have lived with it long enough, time to get on with life.

tvsjr · Jan 17, 2024

joeschmuck said:
I can understand that. It would be nice to know if there is a solution to your issue however you have lived with it long enough, time to get on with life.

Yep. I haven't written it off quite yet. The new drives supposedly arrive end of day tomorrow. Then it will be 9-10 days to run badblocks and long smart. The bug report is filled with details and a debug and was very quickly assigned, where others seem to languish, so maybe the iX gurus will have an answer. If nothing else, maybe it will prevent someone else from having the same issue I am.

tvsjr · Jan 20, 2024

Because I'm an impatient fsck, I did a fast badblocks run... no errors, so I pressed on. Built a new pool of the 18TBs, replicated the data, exported and imported to rename, confirmed everything working, then built a second vdev with the 20TB drives. Dealt with a little Active Directory issue (unrelated), but now all is well. And I should have enough free space for at least a year or two:

joeschmuck · Jan 20, 2024

tvsjr said:
And I should have enough free space for at least a year or two:

LOL, You think ! ~138TB Usable Capacity, that is a lot. I'm happy with ~8TB of storage. Of course my use case is different that yours.

Glad you have everything up and running again.

danb35 · Jan 22, 2024

I think there's a bug in SCALE where the partition created on the replacement disk is the same size as on the old disk--I'm encountering the same issue. Ticket here:

Log in with Atlassian account

Log in to Jira, Confluence, and all other Atlassian Cloud products here. Not an Atlassian user? Sign up for free.

ixsystems.atlassian.net

joeschmuck · Jan 22, 2024

danb35 said:
I think there's a bug in SCALE where the partition created on the replacement disk is the same size as on the old disk--I'm encountering the same issue. Ticket here:

Log in with Atlassian account

Log in to Jira, Confluence, and all other Atlassian Cloud products here. Not an Atlassian user? Sign up for free.

ixsystems.atlassian.net

That is good to know about, sucks to have the bug in the first place.

danb35 · Jan 22, 2024

joeschmuck said:
sucks to have the bug in the first place.

Agreed. There was a bug in 23.10.0 that was supposed to be fixed in 23.10.1, but apparently it wasn't.

tvsjr · Jan 22, 2024

danb35 said:
Agreed. There was a bug in 23.10.0 that was supposed to be fixed in 23.10.1, but apparently it wasn't.

Wonder if I can forward my second Newegg bill to iX for reimbursement?

joeschmuck · Jan 23, 2024

tvsjr said:
Wonder if I can forward my second Newegg bill to iX for reimbursement?

You can, and they would enjoy it for bringing in some humor on a dark gloomy day.

tvsjr · Jan 24, 2024

joeschmuck said:
You can, and they would enjoy it for bringing in some humor on a dark gloomy day.

Lol I figured as much. I can't complain... TN (and FN) have served me very well for well over a decade, after I realized that Synology wasn't really all that.

The bug has been acknowledged and assigned highest priority for a fix. Looks like it's slated for 24.04/24.10. Alexander noticed something I didn't... after the resize, the partition start gets changes from sector 4096 to sector 2048. No wonder the drive goes missing. I guess it's exceedingly lucky that the middleware apparently does the resize serially and not in parallel - if it had done that to all 6 drives, it would have been a really bad day, especially on a pool with more than one vdev.

Ericloewe · Feb 8, 2024

Annoying bug this one. It's one of those that takes forever to convince you that it is a real bug.
Not TrueNAS, but a similar vibe: the other day I spent an afternoon trying to migrate a few SATA boot SSDs to NVMe, but the partitions were always getting messed up, with warnings that they'd exceeded the available L As, even though the new disks were larger. Turns out, a while before this, I'd formatted the new SSDs to present 4k blocks instead of 512-byte blocks, since they'd be using ashift=12 anyway. By the time I got around to using the disks, I'd forgotten all about it.

Important Announcement for the TrueNAS Community.

Problem expanding pool after drive upgrade

joeschmuck

Old Man

tvsjr

Guru

joeschmuck

Old Man

tvsjr

Guru

tvsjr

Guru

joeschmuck

Old Man

danb35

Hall of Famer

Log in with Atlassian account

joeschmuck

Old Man

Log in with Atlassian account

danb35

Hall of Famer

tvsjr

Guru

joeschmuck

Old Man

tvsjr

Guru

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

Problem expanding pool after drive upgrade

Old Man

Guru

Old Man

Guru

Guru

Old Man

Hall of Famer

Old Man

Hall of Famer

Guru

Old Man

Guru

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Problem expanding pool after drive upgrade"

Similar threads