How to prevent unneeded snapshots?

toolforger · Jul 2, 2018

Hi all,

I have a NAS box used by multiple users for archiving their data.
Each user gets their own volume.

I want an as hands-free operation as possible.
Ideally, it would be enough to just upload the files to an SMB share or rsync them, and the box should create a snapshot once the upload is done.

I don't know if this ideal is attainable at all (I doubt it, neither SMB nor rsync have a straightforward way to signal the end of the upload to the target system, it's just a series of file updates).
I considered using a really short snapshot frequency, but that gives a ton of snapshots of which most don't contain any value. Also, if you do a snapshot every five minutes or so, you'll likely have a snapshot that contains just half of an upload; individual files will be consistent sure enough (at least that's my understanding of how ZFS works), but if the snapshot happens while a set of files is uploaded, some files may be included in that particular snapshot and those not yet written will not.
Next I thought about triggering snapshots manually through the web console, but that's too many clicks. Worse, users will perceive this as an environment where they could damage things, which will make them feel insecure, and that's never a good thing.
Currently I'm considering shell scripting. Set up a cron job, let the script call ZFS tools to detect whether a volume has changed (does such a tool exist?), create a snapshot if the conditions are right (here I know what to do), and tell Freenas that a new snapshot exists (what command would that be?). I'm not sure whether this is a good idea though, so any feedback about this is appreciated.
Are there better/simpler approaches?

Regards,
Jo

garm · Jul 3, 2018

When it comes to snapshots, all that matters is their age. If you have a change made and then take three periodic snapshots, all of them will contain the change. Remove or damage a file and you can go into the latest snapshot and retriev it. If you delete any two of the three snapshots you can still retrieve the damaged file from the one remaining. Having two many snapshots can be an issue, I tend to deploy a schedule similar to Apple time machine; I take a snapshot every 15 minutes that is removed after 24 hours and one at every hour that is kept for 30 days.

When it comes to backups I don’t rely on periodic snapshots, those are mainly for IBM errors or if some software craps it’s pants (word, I’m looking at you). For backup I have a script run by cron that simplified; takes a snapshots, clones it and mount it to a jail. The jail then compress, encrypt and send the data off to cold storage in peace and quiet (cold storage is not ZFS) as the production system carries on. The snapshot is then deleted when the clone is destroyed.

Don’t in any scenario let users in to the web UI, it’s run by root and ya, they can do real damage there..

You can of course run a script that detects changes to a dataset, but that will trigger on the first new block so it won’t “solve” anything.

One solution is to take frequent short lived snapshots with infrequent long lived and prune out the snapshots that has zero changed blocks. Finding a specific change gets easier, but you still won’t have a guaranteed complete transfer.

The only real way of ensuring that each snapshot contains a complete transfer is to pull the archive from the users machine. Then you can easily script the snapshot to be taken when the pull is complete. This limits you to running rsync or something equivalent, but that might not be a bad thing. Users should not have write access to backups due to ransomware any way. Depending on what the content is, I would also recommend you take a look at a version control system like git. It only really works on text based files, but there are things like git-annex to work around that.

Now to other issue; You have a volume set up per user?! That seams an awfully inefficient use of hard drives. Why on earth do you not have one pool for all storage and individual datasets for users? That way all the storage benefits all the users and there is no duplication of redundancy.

kdragon75 · Jul 3, 2018

They never mentioned backups (though that's a logical assumption). @toolforger what are you trying to do? I sounds like you need an rsync wrapper that will use the FreeNAS API to trigger snapshots. I don't know how its implemented and would likely be an extreme security risk.

I work in an environment with 10s of thousands of users all over the state. People accidently delete files/move them to black holes all the time. We take 15 min snapshots for something like 6 hr then hourly for 48 hr daily for 14 days etc... Many of these files are shared across large teams too. the upshot is that 98% of the files are word & excel and they them open as temp copies with locks placed on the original. After hundreds of restores I have never had an issue with partial files. Generally, if you damage a file mid upload, you deserve to lose the file.;)

toolforger · Jul 3, 2018

garm said:
Don’t in any scenario let users in to the web UI, it’s run by root and ya, they can do real damage there..

Good to get that confirmed.

garm said:
One solution is to take frequent short lived snapshots with infrequent long lived and prune out the snapshots that has zero changed blocks.

Okay, that covers at least one scenario: User periodically shoves a bunch of files to the SMB share, manually. The user will then see batches of snapshots, and can be told that all but the last snapshot in a batch will contain partial uploads.
The partial upload snapshots are still noise and it would still be nice to get rid of them, but at least the zero-change snapshots are gone.

garm said:
Finding a specific change gets easier, but you still won’t have a guaranteed complete transfer.

No, but at least the users can identify the finished transfer.

Users that do a lot of small incremental changes will be less happy, but you can't have everything.
Plus it remains to be seen whether they will actually use Freenas in that fashion.

garm said:
The only real way of ensuring that each snapshot contains a complete transfer is to pull the archive from the users machine.

Some of the machines run Windows, where exclusive file locks are somewhat commonplace. Which is as well because that file is likely to be half-changed anyway.
With Linux machines, I'll get half-changed files without questions asked, which isn't much better.

So what I'd like is users explicitly trigger an upload. Windows users will have to be educated to close all applications that hold exclusive locks (they will be told it's a backup, though it's really more a Time Machine-like operation and the real backup eventually goes to cold storage).

garm said:
Then you can easily script the snapshot to be taken when the pull is complete. This limits you to running rsync or something equivalent, but that might not be a bad thing.

Pretty much my thinking, yes.

garm said:
Users should not have write access to backups due to ransomware any way.

100% on spot I think.
Freenas will get read access to the client machines, and client machines will get read access to Freenas to access those historic snapshots.
This leaves me to password-sniffing attacks, but at least ransomware won't be able to wipe historic snapshots willy-nilly.

garm said:
Depending on what the content is, I would also recommend you take a look at a version control system like git. It only really works on text based files, but there are things like git-annex to work around that.

Well, yeah, git would work, too, but the tooling isn't exactly friendly for non-techies (which some users are).

garm said:
Now to other issue; You have a volume set up per user?! That seams an awfully inefficient use of hard drives. Why on earth do you not have one pool for all storage and individual datasets for users? That way all the storage benefits all the users and there is no duplication of redundancy.

"View volumes" gives me this structure:

Code:

data
  data
	jails
	jails_2 (probably an effect of Freenas importing the same disks twice)
	user_1
	user_2
	...

I've been thinking that jails, jails2, user_# are the volumes, but maybe the top-level "data" item is the volume and what I've been identifying are already the data sets. (Freenas unfortunately isn't telling which is which.)

anmnz · Jul 4, 2018

toolforger said:
"View volumes" gives me this structure:

Code:
data data jails jails_2 (probably an effect of Freenas importing the same disks twice) user_1 user_2 ...

I've been thinking that jails, jails2, user_# are the volumes, but maybe the top-level "data" item is the volume and what I've been identifying are already the data sets.

The first "data" is the ZFS pool, which the FreeNAS UI unfortunately calls a "volume". (IIRC @Ericloewe has recently said this is being changed.)

The second "data" is the pool's top-level dataset. Usually a pool has a top-level dataset with the same name as the pool, in which all its other datasets are nested

The other entries are datasets nested inside the "data" dataset.

Ericloewe · Jul 4, 2018

anmnz said:
Usually a pool has a top-level dataset with the same name as the pool

Not usually, always. The top-level dataset is named after the pool. Or vice-versa, really, you can't have one without the other.

Important Announcement for the TrueNAS Community.

How to prevent unneeded snapshots?

toolforger

Explorer

garm

Wizard

kdragon75

Wizard

toolforger

Explorer

anmnz

Patron

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

How to prevent unneeded snapshots?

toolforger

Explorer

garm

Wizard

kdragon75

Wizard

toolforger

Explorer

anmnz

Patron

Ericloewe

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "How to prevent unneeded snapshots?"

Similar threads