Looking for Testers/Feedback on backup software for ZFS

Status
Not open for further replies.

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
Hello all,

Let me preface this with I have not found a piece of software that does what I am trying to do.

I am halfway through writing version 1 of a backup utility for ZFS systems. The idea was inspired by a file-level backup solution I currently use, duplicity, for my NAS share sitting on top of FreeNAS. I wanted a way to be able to leverage the built-in ZFS snapshot capabilities to do a block-level backup of my share instead of a file-level backup which is prone to issues (ACLs, static copy of data, high memory usage for large incremental backups, slow etc.)

This backup software was designed for the secure, long-term storage of ZFS snapshots on remote storage. Backup jobs are resilient to network failures and can be stopped/resumed. It works by splitting the ZFS send stream (the format for which is committed and can be received on future versions of ZFS as per the man page) into chunks and then optionally compressing, encrypting, and signing each chunk before uploading it to your remote storage location(s) of choice. For version 1 I plan on supporting Google Cloud Storage and AWS S3 storage. Backup chunks are validated using SHA256 and CRC32C checksums (along with the many integrity checks builtin to compression algorithms, SSL/TLS transportation protocols, and the ZFS stream format itself). The compression algorithm builtin to the software is a parallel gzip compressor but I have written support for 3rd party compressors so long as the binary is available on the host system and is compatible with the gzip binary command line (e.g. xz, bzip2, lzma, etc.) I use the PGP algorithm for encryption/signing if required. The result is similar to piping the zfs send command to a compression binary and then to a pgp binary such as gpg. The benefit of this software is that it manages the entire process for you, splits the stream into chunks for parallel upload and the ability to resume a backup due to failure/cancellation, uploads it to multiple storage locations, and tracks the entire process with useful information and checksums for later review and retrieval.

I can provide more information to those who are interested. There are NO dependencies required to run this software (written in Go). It can be compiled to any target supported by the Go compiler which includes FreeBSD, Linux, and Solaris. All you need is a single file to execute and run the software. Please PM me if you are interested and tell me a little bit about your use-case.

I'd like to gauge community interest and see if anyone is willing to help test the software with me. I will post the source code up on GitHub once it has some basic functionality coded out and tested, and I welcome the help of other developers who find the project interesting/useful. As of right now, I have coded the backup (send) half of the software. What remains is the restore (receive) half of the software. I have many plans to make this a robust piece of software and include a custom backup scheduler for advanced snapshot/backup configurations.

On a side note: It wouldn't be difficult to include BTRFS support as the send/recieve mechanisms are similar to the ones in ZFS.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
This does sound like a very cool project. I'm certainly interested to follow its progress. I hope you continue to post updates on it! :)
 
Last edited by a moderator:

Valdhor

Explorer
Joined
Feb 29, 2016
Messages
70
I'm liking the sound of this project. Keep up the good work.

Is there any way to make it generic? eg. Say I wanted to backup to local storage like a tape backup library?
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
This looks pretty interesting. Can you send me some info?
 

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
Is there any way to make it generic? eg. Say I wanted to backup to local storage like a tape backup library?

Yes, the backend storage interface is designed such that as long as basic operations are implemented (put/get/list/delete) for the backend you would like to support, it can easily be added. It should be fairly trivial to add support for copying to a local filesystem. Local filesystem and Azure support might be included for version 1 as adding backends isn't difficult to do.
 

JustinClift

Patron
Joined
Apr 24, 2016
Messages
287
@someone1 How's this going? I'd be interested in trying this out with Minio. It's a popular S3 compatible server people run themselves (eg locally on a LAN), also written in Golang.
 

manfromafar

Dabbler
Joined
Mar 2, 2015
Messages
13
damnit just need to get aws or gpe or X.cloudprovider to support send receives just stuck with tarsnap for now
 

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
My apologies, I did not get updates on any posts here. I've been using this internally, traveling right now but will post a proper update once I return.

Thanks.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Sounds to me like it doesn't support incremental snapshots though... right?

I'm interested in a backup script which can transfer the replication stream realiably across a flaky network to another ZFS system. And then repeat the trick for the next incremental.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Sounds to me like it doesn't support incremental snapshots though... right?

I'm interested in a backup script which can transfer the replication stream realiably across a flaky network to another ZFS system. And then repeat the trick for the next incremental.
Sure it does, it has the option to do zfs send/recv, which is incremental by nature.
 
S

sef

Guest
Sounds to me like it doesn't support incremental snapshots though... right?

I'm interested in a backup script which can transfer the replication stream realiably across a flaky network to another ZFS system. And then repeat the trick for the next incremental.
Mine does, and it sounds like @someone1's does as well.
 

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
Hi all, quick update:

I didn't make this public because I wanted to clean it up a bit but never got around to it. It's still not perfect (I've never "open sourced" anything before) but you can find the code here: https://github.com/someone1/zfsbackup-go

@sef - I think my project is a bit further along than yours though you have a few things that I don't (e.g. SSH targets) - maybe we can have a discussion on whether or not it makes sense to combine forces?

Some highlights on my version vs the work @sef has done:
  • Restore implemented
  • Multiple destinations when backing up
  • Coded in Go vs Python (No external dependencies, compile time correctness, actual concurrency/parallel processing)
If anyone wants to give this a spin, I've uploaded a compiled binary for FreeBSD on my Google Drive here: https://drive.google.com/file/d/0B0ffdf8VkAL6dVAwWWR2XzZ4QnM/view?usp=sharing

Let me know what you all think!
 
S

sef

Guest
Without being able to see the source code, I can't tell, and I'm not going to run random software as root. Sorry.

The primary goal of my project is to back up my own systems, and secondarily to replace the current replication code in FreeNAS (which is a mess, having grown very organically over a relatively long period of time). As a result, the only dependencies it has are for things already in FreeNAS -- and while being in Python means dealing with bolted-on threads, it also means it can easily integrate with the rest of the system, and be maintained by the bulk of the developers.

I haven't done Google Drive primarily because that's not something that's already incorporated into FreeNAS.
 

someone1

Dabbler
Joined
Jun 17, 2013
Messages
37
Are you unable to see the source code on GitHub? It's public...

I understand and I think our projects overlap a lot - with the exception that mine isn't meant specifically for FreeNAS and is much more portable. I set out with the same goals as you when I wrote this a few months back.

I'm sure you know this but with the GIL in Python2 threading is tricky and hardly, if ever, beneficial - most projects use subprocesses to achieve multiple core utilization. I've witnessed issues with backup scripts in Python slowing systems and hogging memory along with taking much longer than required - there are plenty of pitfalls in Python when handling large amounts of data (I've personally contributed and witnessed patches on the duplicity project to address these various things).

I think there's a benefit if a single project is focused on and built out but I can understand if you're hesitant to abandon yours. I don't think it'd be particularly difficult to integrate this into a Python environment - I think its akin to how snapshots are done just by calling an executable on the system. I'd be happy to put in development effort to make this easier/more robust.

Let me know if you have a change of heart - best of luck!
 
S

sef

Guest
Ah, no, I clicked on the wrong link there and forgot about the other one :).
 
Status
Not open for further replies.
Top