No BLOCKLEVEL deduplication ?!

Status
Not open for further replies.

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
Good evening,

I try to setup a test backup storage with FreeNAS. If FreeNAS is going to work fine, I would like to buy certified hardware. But unfortunately
I got stuck in deduplication. Either I configured something wrong or the ZFS deduplication feature is N O T on block level.

Please Look at this:
Code:
drwxr-xr-x  2 root  wheel  uarch 6 Apr 27 05:57 ./
drwxr-xr-x  3 root  wheel  uarch 3 Apr 27 05:51 ../
-rw-r--r--  1 root  wheel  uarch 4 Apr 27 05:55 1
-rw-r--r--  1 root  wheel  uarch 4 Apr 27 05:56 2
-rw-r--r--  1 root  wheel  uarch 5 Apr 27 05:57 3
-rw-r--r--  1 root  wheel  uarch 4 Apr 27 05:56 4
[root@srv/ibm/dataset]# cat *
123
123
1234
123
...

[root@srv /ibm/dataset]# zpool list
NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
freenas-boot   136G   653M   135G		 -	  -	 0%  1.00x  ONLINE  -
ibm		   3.97T   492K  3.97T		 -	 0%	 0%  3.00x ONLINE  -

So I have 4 files, they all are nearly identical. Only one file has an additional char. Deduplication ratio must have been much more than 3.00x.
I think it might be about > 3.90. I did everything in a right way: i did not change the files, I copied them etc...

Because of not knowing a lot about the block sizes, I repeated this experiment with BIG files, with the same poor ratio.

But experiments with reproducing identical files without changing their content have worked as expected.

I am pretty sure that deduplication is file level. So why everyone says that FreeNAS zfs deduplication is block level?
File level deduplication makes no sense for me, because I have no Identical files (veeam backup files) but I have a lot of identical blocks!

Sorry for my bad english --> I am a russian from germany :D

I'm looking forward to your reply

Thank you &
Good evening :)
 
Last edited by a moderator:

Linkman

Patron
Joined
Feb 19, 2015
Messages
219
"ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256)."
-- https://blogs.oracle.com/bonwick/entry/zfs_dedup
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Filelevel deduplication makes no sense for me, because I have no Identical files (veeam backup files) but I have a lot of identical blocks!
Even then, it makes little sense. "Potentially unbounded amounts of RAM" being needed to import pools are a big catch of ZFS deduplication.

as long as the checksum function is cryptographically strong (e.g. SHA256)
Since you mention SHA256, some information:
When dedup is enabled, the dataset's hashing function is switched to SHA256. FreeNAS 11 includes support for two new hashes, both of which are faster. SHA512 is a simple upgrade and is faster on 64-bit systems. Skein is even faster.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
The important point is that it is a "ZFS block", which has to be a 100% match, not any random chunk of a file that might happen to be match.
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
What are you trying to do? For many scenarios, dedup is not very useful. For VM images, using snapshots can result in as good or better dedup ratio but without the memory overhead. And compression can be a better win in the first place. It all depends on what you are doing.
 

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
"ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256)."
-- https://blogs.oracle.com/bonwick/entry/zfs_dedup

Yes, I know that, but if you look on my example mentioned above you can see that it seems not to work?
I consider having done something wrong, is there any manual, that helps to setup proper blocklevel deduplicated SMB Share on FreeNAS?
I did't find _exactly_ the right guide online
 

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
Even then, it makes little sense. "Potentially unbounded amounts of RAM" being needed to import pools are a big catch of ZFS deduplication.

I know that. Only Because of deduplication I use a Server with about 128GB Ram. I also said that if freenas (dedup) works fine, I would like to get certified hardware (I need an alternative to the EMC)
Thx


The important point is that it is a "ZFS block", which has to be a 100% match, not any random chunk of a file that might happen to be match.

Yes I know that. Imagine I have 3 Veeam Windows Server 2012 R2 Backups. There are of course a lot of identical zfs blocks (I have 3x Windows).
BUT NO DEDUP. Level was at 1.00x - do you have an Idea what is wrong?
Thank you
 

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
What are you trying to do? For many scenarios, dedup is not very useful. For VM images, using snapshots can result in as good or better dedup ratio but without the memory overhead. And compression can be a better win in the first place. It all depends on what you are doing.

Because of having a lot FullBackups, i want to deduplicate them (resulting in a nearly incremental backup). In my case deduplication might be very useful if it would work...

Next problem is Samba breaks up connection after a view GB transferred data (CPU and RAM have no load)
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Because of having a lot FullBackups, i want to deduplicate them (resulting in a nearly incremental backup). In my case deduplication might be very useful if it would work...
...
My FreeNAS is used mostly for backups. Originally I had a layout like this, with only a single backup dataset, pool/backups;
Code:
pool/backups/HOST1/
pool/backups/HOST1.OLDDATE1/
pool/backups/HOST1.OLDDATE2/
pool/backups/HOST2/
pool/backups/HOST2.OLDDATE1/

I basically renamed the backup directory after every backup so I could have a history. That scheme was what I used with my old Infrant ReadyNAS 1000S, (no ZFS). Each backup took about the same amount of time and destination disk space, (unless I had added or removed a lot of files).

Today, I use ZFS snapshots as below. This makes the Rsync backups faster AND saves backup space on my FreeNAS. My history is there, just not mounted unless I need it. Now each host I back up, has it's own sub-dataset in pool/backups, for example pool/backups/HOST1. And as many snapshots as I have backup instances.
Code:
pool/backups/HOST1/
pool/backups/HOST1@OLDDATE1
pool/backups/HOST1@OLDDATE2
pool/backups/HOST2/
pool/backups/HOST2@OLDDATE1
pool/backups/HOST2@OLDDATE2

Sometimes it takes time to re-configure your methodology to accomodate new technology like ZFS.

You can even convert individual directories to snapshots if you plan very well, (which I can and did).
 
Last edited:

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
My FreeNAS is used mostly for backups. Originally I had a layout like this, with only a single backup dataset, pool/backups;
Code:
pool/backups/HOST1
pool/backups/HOST1.OLDDATE1
pool/backups/HOST1.OLDDATE2
pool/backups/HOST2
pool/backups/HOST2.OLDDATE1

I basically renamed the backup directory after every backup so I could have a history. That scheme was what I used with my old Infrant ReadyNAS 1000S, (no ZFS). Each backup took about the same amount of time and destination disk space, (unless I had added or removed a lot of files).

Today, I use ZFS snapshots as below. This makes the Rsync backups faster AND saves backup space on my FreeNAS. My history is there, just not mounted unless I need it. Now each host I back up, has it's own sub-dataset in pool/backups, for example pool/backups/HOST1. And as many snapshots as I have backup instances.
Code:
pool/backups/HOST1
pool/backups/HOST1@OLDDATE1
pool/backups/HOST1@OLDDATE2
pool/backups/HOST2
pool/backups/HOST2@OLDDATE1
pool/backups/HOST2@OLDDATE2

Sometimes it takes time to re-configure your methodology to accomodate new technology like ZFS.

You can even covert individual directories to snapshots if you plan very well, (which I can and did).

Yes, thank you for your reply but in my special case i must not use incremental backups or differential backups
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
These are three different blocks:
Code:
123
1234
4123
There will be 0% dedup.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Yes, thank you for your reply but in my special case i must not use incremental backups or differential backups
They are not quite incremental or differential backups. Basically, I have Rsync and ZFS snapshots perform the de-duplication. (Not to mention ZFS compression.) In fact, last time I checked, my backup dataset on my FreeNAS was only using 59GBs. That's for 3 Linux clients, over 3 years, with about once a month backups. I keep thinking that's not right, but it works.

That said, only you can determine what will work for you.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Your 3 byte files are not sufficiently large to test ZFS dedup like this.
 

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
Your 3 byte files are not sufficiently large to test ZFS dedup like this.

I also generated 112MiB text files (used for-loop to fill files), than changed first oder last. it got to at first to 1.99 than 1.5 and stayed @1.55 :(
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I also generated 112MiB text files (used for-loop to fill files), than changed first oder last. it got to at first to 1.99 than 1.5 and stayed @1.55 :(

Not sure what you mean by oder
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
So, if you duplicate your test file, 3 times... what happens then? Do you have compression enabled on your dataset?

Once you've duplicated your file 3 times... what happens if you truncate the file?

What happens if you then edit the end of the file?
 

Sergej31231

Dabbler
Joined
Apr 26, 2017
Messages
14
So, if you duplicate your test file, 3 times... what happens then? Do you have compression enabled on your dataset?

Once you've duplicated your file 3 times... what happens if you truncate the file?

What happens if you then edit the end of the file?

Thank you for your reply

Compression is disabled
The file must be un-deduplicated. (Could I try to change file and than copy again on the deduplicating machine for avoiding the problem that you mean? It it what you mean?)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
the way its supposed to work, is that individual blocks have checksums. Each block which has the same checksum will only be stored on disk once.

Block sizes in ZFS are I think 128KB -> 1MB depending on dataset/zpool configurations.

Thus duplicating a file shouldn't use any extra space. Also, truncating part of the end of one of those files shouldn't use more than 1 extra block of space, and modifying one block in one of those files, again, shouldn't result in more than 1 extra block of used space.

Of course, used space will go up, but available space will not go down (by much).

Other than that, i have no experience with dedup, and can't tell you how good/bad the dedup ratio numbers are.

And then those blocks get compressed, leading to variable block sizes.
 
Status
Not open for further replies.
Top