Periodic snapshot backups to file

AaronLS · Sep 1, 2013

I know I'll probably have to use cron job to do this, but basically I want to send incremental snapshots to files, instead of to another zfs filesystem. This way I can push the snapshot files over some agnostic protocol to any remote system, without the requirement that the backup location be a zfs filesystem. Will allow me alot of flexibility in how backups are stored.

To clarify by "to file", you can pipe a send to gzip for example:
zfs send tank/fs@snap1 | gzip > /mnt/backupvolume/pool/backup_full_snap1.gz

zfs send -i snap1 tank/fs@snap2 | gzip > /mnt/backupvolume/pool/backup_incremental_snap2.gz

After the initial backup, I can create additional incremental backups with -i. These *.gz files would accumulate in a separate local volume dedicated as a staging area for backups, from which another job or an external machine accessing via share can grab the files and push them to remote system or something like Amazon glacier. Alternatively the destination could be a mounted external drive, perhaps one of a pair that I rotate out and always have one offsite.

For example, lets say previously snap2 was the most recent snapshot that was backed up:
zfs send tank/fs@snap1 | gzip > /mnt/backupvolume/pool/backup_full_snap1.gz
zfs send -i snap1 tank/fs@snap2 | gzip > /mnt/backupvolume/pool/backup_incremental_snap2.gz
Since that time, 3 more snapshots have been generated by a perodic snapshot task.
snap3, snap4, snap5.
The next time my cron job runs, it can look at the backup files and see that *_*_snap2.gz was the last snapshot backed up.
Now the challenging part(at least for me), identifying via script what the most recent snapshot is. I need to programmatically determine that snap5 is the most recent snapshot for that pool so that my script can build the incremental command:
zfs send -i snap2 tank/fs@snap5 | gzip > /mnt/backupvolume/pool/backup_incremental_snap5.gz
The well structured filename, or better yet accompanying metadata file(txt/xml/json) can be used to determine which snapshot was the last backed up(snap2). Challenge is identifying name of most recent snapshot(snap5) that will be the incremental target.
Identifying the name of the most recent snapshot for a given filesystem was a bit of a challenge:
zfs list -H -t snapshot -o name -S creation -d1 TestOne | head -1
I'll have to learn some scripting, but I think I'm on my way.
So if you had two external HDs, always keeping one offsite and rotating, drive A might have increments up to snap5 and you disconnect it, take it offsite, and bring in drive B the next day. Drive B only has backups up to snap2. So the cron will run and see Drive B's most recent backup is snap2, and that the filesystem's most recent snapshot is snap5( or maybe snap7 if more snaps have occurred in the day), and generate an increment file between those two snapshots.
I know you can put a ZFS filesystem right on the drive, but using files provides some flexibility. As I mentioned I could point the destination of the script to a staging location and some other process can pick the files up. I can connect my external hardrive to other non-zfs/non-unix systems and copy the backup files elsewhere. Files are more intuitive to work with, and minimizes the risk that one might do something catastrophic during a restore that blows away the ZFS filesystem during a restore leaving you without a backup or having to choose a less ideal backup. I'm not even sure if you stream snapshots to a zfs filesystem on a usb hard drive, and then connected that drive to another system, what it would look like if it could connect at all. If are in a catastrohpic situation where a restore from backup is needed, you probably want the comfort of first thing making a copy of your restore media before actually doing the restoration.
Questions:
1) Does ZFS include in checksums and/or redundancy in the send stream? I.e. later when restoring a stream:
1.a) Can it verify it's integrity (check if data not matching checksum)?
1.b) Can it repair corruption ? If not then I need to generate par2 files along with my backups to guard against small amounts of corruption.
2) Once I get this working, how do I ensure my cron setup is included with the Settings->Save Config? I assume the cron job will be restored from config but the script file will not? Or is there specific folder that I can drop my script and have Save Config include it. I imagine I need to manually restore my script file in the event of a system restore, but would be nice if that could be automated.
3) In FreeBSD, if a script or process needs to write data, like logs, or settings, etc. where is the appropriate directory for this? I'm thinking in terms of systems where the location where the processes are executed is not writable by most users, and there is some other designated location for them to write data.
Notes:
-A metadata file in the stating/destination of the *.gz files will be more reliable than embedding metadata in the filename, but more complicated to script(at least for me).
-Will also need original creation data, or items ordered in someway to ensure restoration is in same order, regardless of names.
--Test if ZFS will error on incorrect ordering of incremental snapshots receiving.
-Metadata file would allow the *.gz to be deleted after staged, such as if being uploaded to glacier. The metadata file would remain and cron job could still see what last backup was.
--Beware file read/write contentions with processes on staging system.

Dusan · Sep 1, 2013

1: As far as I know, but I didn't test it, ZFS is able to detect corruption in the stream -- the ZFS receive will fail (but will not correct the data). However, to be 100% sure, you can test it yourself. Save an uncompressed stream, modify a random byte and try to ZFS receive it.
2: As far is I know, you can not include a script in the config DB. I keep my scripts in the ZFS pool. However, as I spin down the disks and I do not want cron to wake them up, I have cp -r /mnt/tank/scripts /tmp set as a post init command. I then point cron to execute the scripts from /tmp/scripts
3: you can use /tmp (or /var/tmp) for temporary data, configs are usually in /etc, logs are in /var/log (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dirstructure.html). However, FreeNAS clears (/tmp, /var/log) or recreates (/etc) those directories on startup. Also, /etc and /var/log are usually only writable by root/wheel. If you need logging from a script, you can use the logger command. Running logger "Your message" will log the message via syslogd (in FreeNAS, it will appear in /var/log/messages) (http://www.freebsd.org/cgi/man.cgi?query=logger)

Dusan · Sep 1, 2013

To keep track of which snapshots did you already process, which one is the latest, ... you can store the data directly in the snapshots as ZFS properties. ZFS allows you to (in addition to the "system" properties such as compress, quota, ...) create your own properties for datasets & snapshots. You can use zfs set & zfs get to set/read the data. You can also use zfs snapshot -o to set a property when creating a snapshot (http://www.freebsd.org/cgi/man.cgi?query=zfs).
You can take a look at /usr/local/www/freenasUI/tools/autosnap.py and /usr/local/www/freenasUI/tools/autorepl.py to see how the FreeNAS automatic snapshots/replications do it. The scripts create/use the freenas:state property. This comment is taken from autorepl.py:

Code:

# DESIGN NOTES
#
# A snapshot transists its state in its lifetime this way:
#  NEW:                        A newly created snapshot by autosnap
#  LATEST:                    A snapshot marked to be the latest one
#  -:                          The replication system no longer cares this.
#

Dusan · Sep 1, 2013

Another bit of information you may find useful.
FreeNAS is using /data to store information that should persist reboots. The config DB (freenas-v1.db), the reporting graphs history (rrd_dir.tar.bz2), the disk encryption keys (geli/*.key), ... are all stored there. However, be careful if you decide to use the directory. The partition that's mounted as /data can only hold 19MB of data. Do not run out of free space there or bad things will happen.

vsespb · Sep 6, 2013

With that tool https://github.com/vsespb/mt-aws-glacier you can send snapshots directly to Amazon Glacier (i.e. it uploads from STDIN, without intermediate files, with multithreading/multipart functionality available), it will also manage metadata for you (Glacier metadata and local metadata cache - tab delimited file). That tool is tested under *BSD.

rstark · Jun 10, 2016

AaronLS and Dusan, thank you both! I've been looking to do something very similar as far as creating automatic snaps and replicating off-site. Scripting an intelligent method of detecting the most recent snap was really throwing me, but I think you've both given me some info to work from.

Thanks!!! :D

Important Announcement for the TrueNAS Community.

Periodic snapshot backups to file

AaronLS

Dabbler

Dusan

Guru

Dusan

Guru

Dusan

Guru

vsespb

Cadet

rstark

Cadet

Similar threads