No dedup or compression during replication?

Status
Not open for further replies.

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
It seems that "zfs send" is not merely sending raw blocks of data, but actually sends contents of the filesystem, as if they were read as files. So if I have a dataset with dedup and compression enabled, which occupies 800G of physical disk space, but in reality there are over 6T of files - 6T is what "zfs send" will send over to the receiving side on the first run, which sucks :(

Any reason it doesn't just replicate blocks as they are on the physical disks, unmolested (compressed and deduped).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Short answer: because that's the way ZFS works. As far as compression, SSH supports a compressed link, so that should save some bandwidth if you're running over a network connection, but as far as deduplication is concerned, no joy.

ZFS's design allows you to replicate to a different RAID level, different volume size, different compression settings, different deduplication settings, different encryption capabilities, etc. If it were a direct, byte-for-byte copy, none of that would be possible.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Sir, "zfs send" and "zfs receive" don't copy files. They copy send/receive SNAPSHOTS. zfs send turns a snapshot into a stream of data, without even knowing, or caring, about anything underneath its abstraction layer, as far as I know. Thinking of them as copying files (or even knowing what they are) is not right. If your goal is to backup FILES, and not snapshots of datasets, you might consider cpio, or some other solution. When you copy files on the file abstraction level, then you will not push or inherit the dataset properties, and in your case, that means the 800GB only instead of the 6TB or whatever you said.

If your originating dataset has compression and dedup on it, and your only goal is to replicate the *FILES* without the ZFS abstractions, then do not use zfs send/receive.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
ZFS's design allows you to replicate to a different RAID level, different volume size, different compression settings, different deduplication settings, different encryption capabilities, etc. If it were a direct, byte-for-byte copy, none of that would be possible.

I am not sure what you mean by this. A zfs "sent" dataset has the properties of the original dataset. You could put that dataset on a different POOL, but the dataset is the dataset.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I don't believe this is correct, so I tried a simple experiment. I created two datasets: test1 with compression off, and test2 with lz4 compression enabled. I then copied ~37GB of various data into test2, created a snapshot of test2, and replicated that snapshot into test1 (zfs send -v tank/test2@snap1 | zfs recv -F tank/test1). I also just redirected the zfs send stream into a standard file with zfs send -v tank/test2@snap1 > /mnt/tank/samplesnapshot.

Results: according to the storage screen on my server, test1 has a compression of 1.00x, and test2 has compression of 1.37x. du -sh test1 reports 37G, while du -sh test2 reports 27G. The samplesnapshot file is 37 GB in size. Based on these results, it appears that a dataset, when replicated, adopts the properties of the destination dataset, at least with respect to compression (I didn't try deduplication). It also appears that the zfs send stream is in an uncompressed format.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
ok
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
In 9.2.x a replication stream is indeed sent uncompressed (dataset compression set is irrelevant to this). In 9.3 you have the option to compress the replication stream - option are lz4, pigz and plzip. Please download and try the nightly (for testing purposes only).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You can compress the stream via SSH's compression (if enabled in the WebGUI).
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
Cyberjock - I am not sure users can set this in GUI (just enable in sshd config) but pls correct me.

FYI 9.3 also does compression before throttle, ssh command is after throttle in replication command so you tend use less than allocated bandwidth.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Go to SSH options and there's a "Compress connections" button....
 

noprobs

Explorer
Joined
Aug 12, 2012
Messages
53
This compress options refers to whether SSH daemon accepts compressed connections. The replication stream command does not currently compress. Happy to share results of testing if interested.
 
Status
Not open for further replies.
Top