BetYourBottom
Contributor
- Joined
- Nov 26, 2016
- Messages
- 141
I've been thinking about this quite a bit lately. It seems most backup solutions take forever to scan your files for change, or will upload everything every time, or don't allow easy versioning. (Rclone doesn't seem to have any versioning, Duplicati takes a long time to compile and send files for larger datasets)
I was wondering if it might be possible to directly send ZFS snapshots through to a cloud backup. Since the snapshots are virtually instant, all the detection for changes is handled by ZFS itself so you don't need lengthy checksuming or inaccurate timestamping. Snapshots can be diffed using tools built into ZFS by default, so filtering out only changed data is handled for free by ZFS as well. Versioning just comes with the territory when dealing with snapshots as well, so it seems like if there was a way to directly upload snapshots that would drastically simplify backups.
Based on all the reading I've been doing lately on ZFS and RClone, I came up with a command that I think should work (I haven't been able to test it yet, I'm moving drives and will use the freed ones later for a test pool). I'd like some feedback on if something like this could work as simply as I think it may.
The first command should send a full snapshot to stdout with longestlife referring to a periodic snapshot taken that expires the latest (I would like to integrate this with periodic snapshotting) to create a baseline. Then Gzip should compress the datastream that is being sent by
My current concerns is that this might either swamp your RAM by doing everything in stdout, and/or loads up 1000s of rclone instances at once causing other performance issues. I'm not an expert at how these commands work with stdout, but in the ideal case, I was hoping that there would be some blocking so that you'd only get so many instances of rclone open before everything pauses itself and no giant RAM cache is created.
If this would work out then for newer versions, the only thing that'd need to change would be the remote directory and
To do a full restore I would think something like this would work:
Basically the reverse of the previous command, and would need to be run on each of the uploaded snapshots as well.
Issues I can see with this are:
I was wondering if it might be possible to directly send ZFS snapshots through to a cloud backup. Since the snapshots are virtually instant, all the detection for changes is handled by ZFS itself so you don't need lengthy checksuming or inaccurate timestamping. Snapshots can be diffed using tools built into ZFS by default, so filtering out only changed data is handled for free by ZFS as well. Versioning just comes with the territory when dealing with snapshots as well, so it seems like if there was a way to directly upload snapshots that would drastically simplify backups.
Based on all the reading I've been doing lately on ZFS and RClone, I came up with a command that I think should work (I haven't been able to test it yet, I'm moving drives and will use the freed ones later for a test pool). I'd like some feedback on if something like this could work as simply as I think it may.
zfs send pool@longestlife | gzip | gsplit --bytes=10M --filter "rclone rcat remote:path/to/$FILE"
The first command should send a full snapshot to stdout with longestlife referring to a periodic snapshot taken that expires the latest (I would like to integrate this with periodic snapshotting) to create a baseline. Then Gzip should compress the datastream that is being sent by
zfs send
and feed it into gsplit
; gsplit
then takes the stream chops it into 10MB chunks and sends it off to rclone
to upload to a path on your remote device.My current concerns is that this might either swamp your RAM by doing everything in stdout, and/or loads up 1000s of rclone instances at once causing other performance issues. I'm not an expert at how these commands work with stdout, but in the ideal case, I was hoping that there would be some blocking so that you'd only get so many instances of rclone open before everything pauses itself and no giant RAM cache is created.
If this would work out then for newer versions, the only thing that'd need to change would be the remote directory and
zfs send
command to be something like zfs send -i pool@longestlife pool@latestsnap
, to send a diff of the two snapshots.To do a full restore I would think something like this would work:
rclone cat remote:path/to/dir | zcat | zfs receive
Basically the reverse of the previous command, and would need to be run on each of the uploaded snapshots as well.
Issues I can see with this are:
- it might be difficult to rebase the backup on a newer snapshot. So say you have a snapshot that lasts for 1 year, this backup solution would work until then but when it expires the only way I see it working right now would be to do a full upload based on the newest 1 year snapshot.
- I don't know how a partial restore would work, I'm not familiar enough with
zfs send
andzfs receive
to know how partial restores work using them to be able to figure out how it would work via the upload solution - I don't know how rclone would deal with errors during transfer, how much it'll confirm file integrity and how an error would propagate back to stop the backup should an issue arise.