AaronLS
Dabbler
- Joined
- Jul 9, 2013
- Messages
- 18
I know I'll probably have to use cron job to do this, but basically I want to send incremental snapshots to files, instead of to another zfs filesystem. This way I can push the snapshot files over some agnostic protocol to any remote system, without the requirement that the backup location be a zfs filesystem. Will allow me alot of flexibility in how backups are stored.
To clarify by "to file", you can pipe a send to gzip for example:
zfs send tank/fs@snap1 | gzip > /mnt/backupvolume/pool/backup_full_snap1.gz
zfs send -i snap1 tank/fs@snap2 | gzip > /mnt/backupvolume/pool/backup_incremental_snap2.gz
After the initial backup, I can create additional incremental backups with -i. These *.gz files would accumulate in a separate local volume dedicated as a staging area for backups, from which another job or an external machine accessing via share can grab the files and push them to remote system or something like Amazon glacier. Alternatively the destination could be a mounted external drive, perhaps one of a pair that I rotate out and always have one offsite.
For example, lets say previously snap2 was the most recent snapshot that was backed up:
zfs send tank/fs@snap1 | gzip > /mnt/backupvolume/pool/backup_full_snap1.gz
zfs send -i snap1 tank/fs@snap2 | gzip > /mnt/backupvolume/pool/backup_incremental_snap2.gz
Since that time, 3 more snapshots have been generated by a perodic snapshot task.
snap3, snap4, snap5.
The next time my cron job runs, it can look at the backup files and see that *_*_snap2.gz was the last snapshot backed up.
Now the challenging part(at least for me), identifying via script what the most recent snapshot is. I need to programmatically determine that snap5 is the most recent snapshot for that pool so that my script can build the incremental command:
zfs send -i snap2 tank/fs@snap5 | gzip > /mnt/backupvolume/pool/backup_incremental_snap5.gz
The well structured filename, or better yet accompanying metadata file(txt/xml/json) can be used to determine which snapshot was the last backed up(snap2). Challenge is identifying name of most recent snapshot(snap5) that will be the incremental target.
Identifying the name of the most recent snapshot for a given filesystem was a bit of a challenge:
zfs list -H -t snapshot -o name -S creation -d1 TestOne | head -1
I'll have to learn some scripting, but I think I'm on my way.
So if you had two external HDs, always keeping one offsite and rotating, drive A might have increments up to snap5 and you disconnect it, take it offsite, and bring in drive B the next day. Drive B only has backups up to snap2. So the cron will run and see Drive B's most recent backup is snap2, and that the filesystem's most recent snapshot is snap5( or maybe snap7 if more snaps have occurred in the day), and generate an increment file between those two snapshots.
I know you can put a ZFS filesystem right on the drive, but using files provides some flexibility. As I mentioned I could point the destination of the script to a staging location and some other process can pick the files up. I can connect my external hardrive to other non-zfs/non-unix systems and copy the backup files elsewhere. Files are more intuitive to work with, and minimizes the risk that one might do something catastrophic during a restore that blows away the ZFS filesystem during a restore leaving you without a backup or having to choose a less ideal backup. I'm not even sure if you stream snapshots to a zfs filesystem on a usb hard drive, and then connected that drive to another system, what it would look like if it could connect at all. If are in a catastrohpic situation where a restore from backup is needed, you probably want the comfort of first thing making a copy of your restore media before actually doing the restoration.
Questions:
1) Does ZFS include in checksums and/or redundancy in the send stream? I.e. later when restoring a stream:
1.a) Can it verify it's integrity (check if data not matching checksum)?
1.b) Can it repair corruption ? If not then I need to generate par2 files along with my backups to guard against small amounts of corruption.
2) Once I get this working, how do I ensure my cron setup is included with the Settings->Save Config? I assume the cron job will be restored from config but the script file will not? Or is there specific folder that I can drop my script and have Save Config include it. I imagine I need to manually restore my script file in the event of a system restore, but would be nice if that could be automated.
3) In FreeBSD, if a script or process needs to write data, like logs, or settings, etc. where is the appropriate directory for this? I'm thinking in terms of systems where the location where the processes are executed is not writable by most users, and there is some other designated location for them to write data.
Notes:
-A metadata file in the stating/destination of the *.gz files will be more reliable than embedding metadata in the filename, but more complicated to script(at least for me).
-Will also need original creation data, or items ordered in someway to ensure restoration is in same order, regardless of names.
--Test if ZFS will error on incorrect ordering of incremental snapshots receiving.
-Metadata file would allow the *.gz to be deleted after staged, such as if being uploaded to glacier. The metadata file would remain and cron job could still see what last backup was.
--Beware file read/write contentions with processes on staging system.
To clarify by "to file", you can pipe a send to gzip for example:
zfs send tank/fs@snap1 | gzip > /mnt/backupvolume/pool/backup_full_snap1.gz
zfs send -i snap1 tank/fs@snap2 | gzip > /mnt/backupvolume/pool/backup_incremental_snap2.gz
After the initial backup, I can create additional incremental backups with -i. These *.gz files would accumulate in a separate local volume dedicated as a staging area for backups, from which another job or an external machine accessing via share can grab the files and push them to remote system or something like Amazon glacier. Alternatively the destination could be a mounted external drive, perhaps one of a pair that I rotate out and always have one offsite.
For example, lets say previously snap2 was the most recent snapshot that was backed up:
zfs send tank/fs@snap1 | gzip > /mnt/backupvolume/pool/backup_full_snap1.gz
zfs send -i snap1 tank/fs@snap2 | gzip > /mnt/backupvolume/pool/backup_incremental_snap2.gz
Since that time, 3 more snapshots have been generated by a perodic snapshot task.
snap3, snap4, snap5.
The next time my cron job runs, it can look at the backup files and see that *_*_snap2.gz was the last snapshot backed up.
Now the challenging part(at least for me), identifying via script what the most recent snapshot is. I need to programmatically determine that snap5 is the most recent snapshot for that pool so that my script can build the incremental command:
zfs send -i snap2 tank/fs@snap5 | gzip > /mnt/backupvolume/pool/backup_incremental_snap5.gz
The well structured filename, or better yet accompanying metadata file(txt/xml/json) can be used to determine which snapshot was the last backed up(snap2). Challenge is identifying name of most recent snapshot(snap5) that will be the incremental target.
Identifying the name of the most recent snapshot for a given filesystem was a bit of a challenge:
zfs list -H -t snapshot -o name -S creation -d1 TestOne | head -1
I'll have to learn some scripting, but I think I'm on my way.
So if you had two external HDs, always keeping one offsite and rotating, drive A might have increments up to snap5 and you disconnect it, take it offsite, and bring in drive B the next day. Drive B only has backups up to snap2. So the cron will run and see Drive B's most recent backup is snap2, and that the filesystem's most recent snapshot is snap5( or maybe snap7 if more snaps have occurred in the day), and generate an increment file between those two snapshots.
I know you can put a ZFS filesystem right on the drive, but using files provides some flexibility. As I mentioned I could point the destination of the script to a staging location and some other process can pick the files up. I can connect my external hardrive to other non-zfs/non-unix systems and copy the backup files elsewhere. Files are more intuitive to work with, and minimizes the risk that one might do something catastrophic during a restore that blows away the ZFS filesystem during a restore leaving you without a backup or having to choose a less ideal backup. I'm not even sure if you stream snapshots to a zfs filesystem on a usb hard drive, and then connected that drive to another system, what it would look like if it could connect at all. If are in a catastrohpic situation where a restore from backup is needed, you probably want the comfort of first thing making a copy of your restore media before actually doing the restoration.
Questions:
1) Does ZFS include in checksums and/or redundancy in the send stream? I.e. later when restoring a stream:
1.a) Can it verify it's integrity (check if data not matching checksum)?
1.b) Can it repair corruption ? If not then I need to generate par2 files along with my backups to guard against small amounts of corruption.
2) Once I get this working, how do I ensure my cron setup is included with the Settings->Save Config? I assume the cron job will be restored from config but the script file will not? Or is there specific folder that I can drop my script and have Save Config include it. I imagine I need to manually restore my script file in the event of a system restore, but would be nice if that could be automated.
3) In FreeBSD, if a script or process needs to write data, like logs, or settings, etc. where is the appropriate directory for this? I'm thinking in terms of systems where the location where the processes are executed is not writable by most users, and there is some other designated location for them to write data.
Notes:
-A metadata file in the stating/destination of the *.gz files will be more reliable than embedding metadata in the filename, but more complicated to script(at least for me).
-Will also need original creation data, or items ordered in someway to ensure restoration is in same order, regardless of names.
--Test if ZFS will error on incorrect ordering of incremental snapshots receiving.
-Metadata file would allow the *.gz to be deleted after staged, such as if being uploaded to glacier. The metadata file would remain and cron job could still see what last backup was.
--Beware file read/write contentions with processes on staging system.