ZPool rolled back to first snapshot after power outage

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
So after a power outage one of my zpools seems to have rolled back to its earliest snapshot. I can't find any reason for this in the logs. All other ZPools are fine and show the current file state.

Any recommendations what I could try to fix this? I do have backups, but I want to understand what happened here.

I'm running TrueNAS Scale Angelfish, the ZPool in question is a RAIDZ2 with 5 SATA SSDs.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So after a power outage one of my zpools seems to have rolled back to its earliest snapshot.
That doesn't sound right, especially because pools don't have snapshots, datasets do. Can you provide more specifics of your datasets, used space, etc.?
 

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
Sorry for confusing the nomenclature.
All of the datasets in this specific ZPool seem to have rolled back to 25th September 2022 when I've set up this particular TrueNAS Scale server.

TrueNAS Scale runs on bare metal. I have one NVMe SSD where virtual machines live, five SATA SSDs in RAIDZ2 that is used for SMB shares (which seems to have get rollbacked or something), and a single M.2 SATA SSD for the OS.

The NVMe datasets (virtual machines) get periodically replicated to the SATA SSD ZPool. In this case I can see a discrepancy of 176 GiB used space on the NVMe and 145 GiB used space on the SATA SSD ZPool.

One thing that is weird: I have a dataset on the SATA SSD ZPool showing about 4 GiB used space, but when I navigate there via shell I can't see any files there.

There are snapshots for the root dataset but if I try to create a clone it fails with `[EFAULT] Failed to clone snapshot: failed to create mountpoint: Operation not permitted` - no snapshots exist for the child datasets, even though there should be. New snapshots from the periodic snapshot tasks already got generated.

Scrub is currently running and at 50% now. No errors found yet.

EDIT: One of the datasets is even showing 1.55 TiB usage but when I navigate there via shell it also doesn't show any files.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I have a dataset on the SATA SSD ZPool showing about 4 GiB used space, but when I navigate there via shell I can't see any files there.
Sounds to me like a dataset/directory clash.

Destroy the empty dataset and you'll probably see your data re-appear. (if there are children, you'll need to rename it instead... handle the conflicting directory by copying over the content and then delete the directory and rename the dataset back)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
EDIT: One of the datasets is even showing 1.55 TiB usage but when I navigate there via shell it also doesn't show any files.
The chief suspect in situations like that is a directory with the same name as a dataset.

Fake edit: ninja'd, but be careful:
Destroy the empty dataset and you'll probably see your data re-appear.
Don't destroy any datasets that have some used space (more than a handful of kB), as that's not recoverable.
 

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
Sounds to me like a dataset/directory clash.

Destroy the empty dataset and you'll probably see your data re-appear.

The chief suspect in situations like that is a directory with the same name as a dataset.

Fake edit: ninja'd, but be careful:

Don't destroy any datasets that have some used space (more than a handful of kB), as that's not recoverable.

Does that also explain why I do have datasets with files in there that were last modified on 25th September 2022?
What should I do about those then? What if some new data was written there already?

And when I destroy the dataset, could I break anything else by doing so? After destroying it, do I need to recreate it and all of the share and data protection settings?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, the safest option is to rename the datasets, e.g. add "-dset" to the end of the name to eliminate the collisions. You'll have everything mounted then, if it goes well and can sort through everything yourself.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
What should I do about those then? What if some new data was written there already?

And when I destroy the dataset, could I break anything else by doing so? After destroying it, do I need to recreate it and all of the share and data protection settings?
I would only destroy completely empty datasets.

You're going to have to work through it to figure out exactly what mess you have there.
 

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
Wait a second, after you guys said something about dataset/directory mismatch I tried `zfs mount -a` - it spit out something about read-only for the ix-application folders but ran through. Then I checked `zfs mount` and saw everything mounted (should have tried this before to see if it wasn't).
Then I checked the folder again and now all the files are there.

What happened?
 

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
Scrub ran through, nothing repaired, no errors found.

I'd really like to understand what deleted all of the child dataset snapshots and what made the directories look at something that looked like the earliest snapshot.

If anyone has any idea if and how I could find something in the logs that would be greatly appreciated.

EDIT: Interesting, I deleted all snapshots now to see if anything would act up. And now I have 56 snapshots from ix-applications left, that can't be deleted because they have "dependent clones", even though I'm trying to delete all snapshots - is that normal?
 

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
@Ericloewe Now when I restart Freenas the ZFS datasets are not mounted to their respective paths on boot. This either happened due to the power outage or due to an update to 22.02.4

zfs get mounted clearly shows that some of the datasets are not mounted; I need to do this manually now after each restart for some reason. But only for most, not all datasets.

Do you have a tip for me how to fix that?
 
Last edited:

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
I would only destroy completely empty datasets.

You're going to have to work through it to figure out exactly what mess you have there.
I think I have it the other way round here.

The datasets have the CURRENT data, while for some reason unmounted folders in the root dataset exist with the same name now. (While containing very old data for some reason...)

I need to move these folders that are not mounted, but I have no permission to actually write in the root dataset...

Huh, I am not allowed to create new datasets. It said failed to create mountpoint: Operation not permitted

Looks like I had the same issue as here: https://www.truenas.com/community/threads/dataset-permissions.99794/
 
Last edited:

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
Ok so I have two remaining problems:

1) The immutability flag is still being set on the /mnt/satassd "root" dataset mountpoint folder while the other mountpoint folder for the other "root" dataset doesn't get it. Is this an issue?

2) The problematic datasets still don't mount on boot. I already renamed the "old" folders that weren't actually used for mounting and created new directories with the dataset names but mounting on reboot doesn't work. I have to call `zfs mount -a` manually, then it works. Where can I find a list of the zfs mountpoints that should be mounted on boot so I can fix this?
 

Chiaki

Explorer
Joined
Apr 4, 2016
Messages
51
So I'm pretty sure this is some issue with permissions, because the datasets in question got specific permission settings that all the other datasets, which are mounted properly on boot, do not have - they're "vanilla".

Any idea what kind of permission setting could stop TrueNAS Scale from mounting the datasets on boot properly?
 
Top