Deduplicated streams?

Status
Not open for further replies.

ajohnson

Dabbler
Joined
Feb 25, 2013
Messages
18
Does FreeNAS support deduplicated stream sending and receiving (a stream sent with zfs send -D and received with zfs recv, also with incremental snapshots)? Anybody out there actually using it in the wild with deduped datasets with success?

I've already tried, but ran into some issues with the following error:
"cannot receive incremental stream: invalid backup stream"

I haven't spent a lot of time debugging it yet, but before I do delve into debugging I wanted to see if -D is even supported.

Thanks!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
dedup doesn't apply to zfs send and receive. Do not confuse deduplicated with incremental. They are not the same. :)

Your error is that you are trying to send an incremental stream that doesn't have a common base snapshot that is replicated at the destination.
 

ajohnson

Dabbler
Joined
Feb 25, 2013
Messages
18
dedup doesn't apply to zfs send and receive. Do not confuse deduplicated with incremental. They are not the same. :)

Oh, I think I was a bit unclear. I'm referring to deduplicated streams. Like mentioned in this post by Dusan:
http://forums.freenas.org/index.php?threads/lz4-compression-and-replication.17890/#post-96919

Dusan said:
... Dedup is a different beast. The stream is normally always "non-deduped", even when you are replicating between two deduplicated pools. There is a zfs send option (-D) to generate a deduplicated stream ...

running "zfs -h" on freenas shows that -D is an accepted flag:
Code:
send [-DnPpRv] [-[iI] snapshot] <snapshot>
send [-i snapshot|bookmark] <filesystem|volume|snapshot>


But ultimately I'm not sure of its level of support on the kernel level.

Here's a similar issue -- this is zfsonlinux, mind you, so it probably doesn't really apply here.
https://github.com/zfsonlinux/zfs/issues/2210

My incremental replications work fine so long as -D isn't used.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
LOL. What's funny is the exact thread you linked is the one I figured you had read based on your first post. :D

Yes, you are correct, but I think you are looking for an Apple to solve the problem when the problem needs an Orange.

Your error, ""cannot receive incremental stream: invalid backup stream" isn't what I said above. Not sure where I got that idea, but I've had coffee today so I should have been more awake.

The error basically means that the sending side was speaking a language that the receiving side didn't understand. I think you have to do -D on both sides to enable it. -D has it's own limitations as it creates a dedup table for just that replication task. Now here's where things get interesting. Incremental snapshots should only include the changes from the previous snapshot. So they shouldn't normally be a huge amount of data and they typically won't dedup particularly well. Usually you can get better throughput by doing things like compressing the stream over SSH (which is a feature that is in 9.2.1.7) as well as disabling encryption over SSH (which is another feature in 9.2.1.7).

I can only assume you are trying to find ways to speed up the replication and those 2 are definitely good ones. The compression alone can be pretty impressive as you can choose to use something like gzip-9 if you are in a situation where you have limited bandwidth (like replicating from home to a friend's house through your local ISP) and you want to maximize your payload. dedup doesn't help particularly much unless you have a large number of blocks that are identical. Because of human behavior I'd think this is typically less likely because you typically don't end up with multiple copies of the same data except over a period of months and years. If you are ending up with this kind of problem you should be considering using dedup in production just for the performance savings.

Anyway, if I'm not mistaken you must use the -D parameter on both the sending and receiving side.

Hope this helps!
 

ajohnson

Dabbler
Joined
Feb 25, 2013
Messages
18
Thanks cyberjock.

Concerning zfs recv, I don't see a -D option (9.2.1.7)! I was wondering about this too, as I expected to see one, but it's not in the command output...
Code:
# zfs recv
missing snapshot argument
usage:
        receive|recv [-vnFu] <filesystem|volume|snapshot>
        receive|recv [-vnFu] [-d | -e] <filesystem>

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow


I did realize that my remote pull server has a different (older) version of freenas than my master, so I'm fixing that right now and re-running replication. Hopefully that resolves things.

Btw, I wouldn't normally care too much about deduping the replication stream, but in this case I've got a particular set of constraints:

a) A dataset with a usage pattern that causes a lot of duplicate data, every single day. Mind you, by "a lot" i mean only 50GB-100GB, to start, and maybe 500MB per day. So based on my readings and examination of zdb output, I'm not terribly worried about the DDT chewing through a ton of ram in this case like I would be with a multi-TB setup. I achieve 15X ratio on this dataset with dedup over compression alone.
Actually, the fact that I'm using dedup on the dataset is somewhat irrelevant to the main question, since it's completely separate from deduped replication streams, but I already typed this so I'll leave it here :)

b) As you alluded to, I have an ISP that caps my monthly transfer and starts complaining when I exceed it. This is where the deduped streams come in. I replicate offset nightly, and if I can use a deduped stream, particularly with this dataset, I can potentially transfer say 50MB nightly instead of 500MB nightly (not exact, but you get the idea -- sometimes the disparity is much more, sometimes less). Across a month of nightly transfers, this can make a pretty big difference.

Since the duplicate data is on the file level, I could tackle this on the application side and set things up to work with hard links via rsync or the like. I've done this many times before, but man it'd be nice if it was transparent and I didn't have to deal with that.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So I'd update the pull server to the same version as your push server. Then try doing this all again. If it still throws you the invalid stream I'd do a zfs recv -v so you can, hopefully, see what in particular is wrong.

It's possible that deduped streams aren't supported on FreeNAS, but I tend to think of that as "unlikely". If both sides are on 9.2.1.7 and you can't get dedup to work I'd put a ticket in at bugs.freenas.org. But, considering that dedup isn't an option it is at least possible that dedup isn't supported.
 

ajohnson

Dabbler
Joined
Feb 25, 2013
Messages
18
Just an update on this for others that may encounter it down the road - I still got the same errors after upgrading my PULL server.
I'm going to file a bug report on this. I should probably work on getting some reproducible steps, though. It may be some time before I'm able to do that.
 
Status
Not open for further replies.
Top