No BLOCKLEVEL deduplication ?!

Sergej31231 · Apr 26, 2017

Good evening,

I try to setup a test backup storage with FreeNAS. If FreeNAS is going to work fine, I would like to buy certified hardware. But unfortunately
I got stuck in deduplication. Either I configured something wrong or the ZFS deduplication feature is N O T on block level.

Please Look at this:

Code:

drwxr-xr-x  2 root  wheel  uarch 6 Apr 27 05:57 ./
drwxr-xr-x  3 root  wheel  uarch 3 Apr 27 05:51 ../
-rw-r--r--  1 root  wheel  uarch 4 Apr 27 05:55 1
-rw-r--r--  1 root  wheel  uarch 4 Apr 27 05:56 2
-rw-r--r--  1 root  wheel  uarch 5 Apr 27 05:57 3
-rw-r--r--  1 root  wheel  uarch 4 Apr 27 05:56 4
[root@srv/ibm/dataset]# cat *
123
123
1234
123
...

[root@srv /ibm/dataset]# zpool list
NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
freenas-boot   136G   653M   135G		 -	  -	 0%  1.00x  ONLINE  -
ibm		   3.97T   492K  3.97T		 -	 0%	 0%  3.00x ONLINE  -

So I have 4 files, they all are nearly identical. Only one file has an additional char. Deduplication ratio must have been much more than 3.00x.
I think it might be about > 3.90. I did everything in a right way: i did not change the files, I copied them etc...

Because of not knowing a lot about the block sizes, I repeated this experiment with BIG files, with the same poor ratio.

But experiments with reproducing identical files without changing their content have worked as expected.

I am pretty sure that deduplication is file level. So why everyone says that FreeNAS zfs deduplication is block level?
File level deduplication makes no sense for me, because I have no Identical files (veeam backup files) but I have a lot of identical blocks!

Sorry for my bad english --> I am a russian from germany :D

I'm looking forward to your reply

Thank you &
Good evening :)

Linkman · Apr 26, 2017

"ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256)."
-- https://blogs.oracle.com/bonwick/entry/zfs_dedup

Ericloewe · Apr 26, 2017

Sergej31231 said:
Filelevel deduplication makes no sense for me, because I have no Identical files (veeam backup files) but I have a lot of identical blocks!

Even then, it makes little sense. "Potentially unbounded amounts of RAM" being needed to import pools are a big catch of ZFS deduplication.

Linkman said:
as long as the checksum function is cryptographically strong (e.g. SHA256)

Since you mention SHA256, some information:
When dedup is enabled, the dataset's hashing function is switched to SHA256. FreeNAS 11 includes support for two new hashes, both of which are faster. SHA512 is a simple upgrade and is faster on 64-bit systems. Skein is even faster.

rs225 · Apr 26, 2017

The important point is that it is a "ZFS block", which has to be a 100% match, not any random chunk of a file that might happen to be match.

wblock · Apr 26, 2017

What are you trying to do? For many scenarios, dedup is not very useful. For VM images, using snapshots can result in as good or better dedup ratio but without the memory overhead. And compression can be a better win in the first place. It all depends on what you are doing.

Sergej31231 · Apr 26, 2017

Linkman said:
"ZFS provides block-level deduplication because this is the finest granularity that makes sense for a general-purpose storage system. Block-level dedup also maps naturally to ZFS's 256-bit block checksums, which provide unique block signatures for all blocks in a storage pool as long as the checksum function is cryptographically strong (e.g. SHA256)."
-- https://blogs.oracle.com/bonwick/entry/zfs_dedup

Yes, I know that, but if you look on my example mentioned above you can see that it seems not to work?
I consider having done something wrong, is there any manual, that helps to setup proper blocklevel deduplicated SMB Share on FreeNAS?
I did't find _exactly_ the right guide online

Sergej31231 · Apr 26, 2017

Ericloewe said:
Even then, it makes little sense. "Potentially unbounded amounts of RAM" being needed to import pools are a big catch of ZFS deduplication.

I know that. Only Because of deduplication I use a Server with about 128GB Ram. I also said that if freenas (dedup) works fine, I would like to get certified hardware (I need an alternative to the EMC)
Thx

The important point is that it is a "ZFS block", which has to be a 100% match, not any random chunk of a file that might happen to be match.

Yes I know that. Imagine I have 3 Veeam Windows Server 2012 R2 Backups. There are of course a lot of identical zfs blocks (I have 3x Windows).
BUT NO DEDUP. Level was at 1.00x - do you have an Idea what is wrong?
Thank you

Sergej31231 · Apr 27, 2017

wblock said:
What are you trying to do? For many scenarios, dedup is not very useful. For VM images, using snapshots can result in as good or better dedup ratio but without the memory overhead. And compression can be a better win in the first place. It all depends on what you are doing.

Because of having a lot FullBackups, i want to deduplicate them (resulting in a nearly incremental backup). In my case deduplication might be very useful if it would work...

Next problem is Samba breaks up connection after a view GB transferred data (CPU and RAM have no load)

Arwen · Apr 27, 2017

Sergej31231 said:
Because of having a lot FullBackups, i want to deduplicate them (resulting in a nearly incremental backup). In my case deduplication might be very useful if it would work...
...

My FreeNAS is used mostly for backups. Originally I had a layout like this, with only a single backup dataset, pool/backups;

Code:

pool/backups/HOST1/
pool/backups/HOST1.OLDDATE1/
pool/backups/HOST1.OLDDATE2/
pool/backups/HOST2/
pool/backups/HOST2.OLDDATE1/

I basically renamed the backup directory after every backup so I could have a history. That scheme was what I used with my old Infrant ReadyNAS 1000S, (no ZFS). Each backup took about the same amount of time and destination disk space, (unless I had added or removed a lot of files).

Today, I use ZFS snapshots as below. This makes the Rsync backups faster AND saves backup space on my FreeNAS. My history is there, just not mounted unless I need it. Now each host I back up, has it's own sub-dataset in pool/backups, for example pool/backups/HOST1. And as many snapshots as I have backup instances.

Code:

pool/backups/HOST1/
pool/backups/HOST1@OLDDATE1
pool/backups/HOST1@OLDDATE2
pool/backups/HOST2/
pool/backups/HOST2@OLDDATE1
pool/backups/HOST2@OLDDATE2

Sometimes it takes time to re-configure your methodology to accomodate new technology like ZFS.

You can even convert individual directories to snapshots if you plan very well, (which I can and did).

Sergej31231 · Apr 27, 2017

Arwen said:
My FreeNAS is used mostly for backups. Originally I had a layout like this, with only a single backup dataset, pool/backups;

Code:
pool/backups/HOST1 pool/backups/HOST1.OLDDATE1 pool/backups/HOST1.OLDDATE2 pool/backups/HOST2 pool/backups/HOST2.OLDDATE1

I basically renamed the backup directory after every backup so I could have a history. That scheme was what I used with my old Infrant ReadyNAS 1000S, (no ZFS). Each backup took about the same amount of time and destination disk space, (unless I had added or removed a lot of files).

Today, I use ZFS snapshots as below. This makes the Rsync backups faster AND saves backup space on my FreeNAS. My history is there, just not mounted unless I need it. Now each host I back up, has it's own sub-dataset in pool/backups, for example pool/backups/HOST1. And as many snapshots as I have backup instances.

Code:
pool/backups/HOST1 pool/backups/HOST1@OLDDATE1 pool/backups/HOST1@OLDDATE2 pool/backups/HOST2 pool/backups/HOST2@OLDDATE1 pool/backups/HOST2@OLDDATE2

Sometimes it takes time to re-configure your methodology to accomodate new technology like ZFS.

You can even covert individual directories to snapshots if you plan very well, (which I can and did).

Yes, thank you for your reply but in my special case i must not use incremental backups or differential backups

rs225 · Apr 27, 2017

These are three different blocks:

Code:

123
1234
4123

There will be 0% dedup.

Arwen · Apr 27, 2017

Sergej31231 said:
Yes, thank you for your reply but in my special case i must not use incremental backups or differential backups

They are not quite incremental or differential backups. Basically, I have Rsync and ZFS snapshots perform the de-duplication. (Not to mention ZFS compression.) In fact, last time I checked, my backup dataset on my FreeNAS was only using 59GBs. That's for 3 Linux clients, over 3 years, with about once a month backups. I keep thinking that's not right, but it works.

That said, only you can determine what will work for you.

Stux · Apr 27, 2017

Your 3 byte files are not sufficiently large to test ZFS dedup like this.

Sergej31231 · May 1, 2017

rs225 said:
These are three different blocks:
Code:
123 1234 4123
There will be 0% dedup.

Good morning !

Thank you very much for your reply. This is interesting. Could you explain why there will be no dedub?

Sergej31231 · May 1, 2017

Stux said:
Your 3 byte files are not sufficiently large to test ZFS dedup like this.

I also generated 112MiB text files (used for-loop to fill files), than changed first oder last. it got to at first to 1.99 than 1.5 and stayed @1.55 :(

Stux · May 1, 2017

Sergej31231 said:
I also generated 112MiB text files (used for-loop to fill files), than changed first oder last. it got to at first to 1.99 than 1.5 and stayed @1.55 :(

Not sure what you mean by oder

Sergej31231 · May 1, 2017

Stux said:
Not sure what you mean by oder

Sorry "oder" in German means "or" in english ;)

Stux · May 1, 2017

So, if you duplicate your test file, 3 times... what happens then? Do you have compression enabled on your dataset?

Once you've duplicated your file 3 times... what happens if you truncate the file?

What happens if you then edit the end of the file?

Sergej31231 · May 1, 2017

Stux said:
So, if you duplicate your test file, 3 times... what happens then? Do you have compression enabled on your dataset?

Once you've duplicated your file 3 times... what happens if you truncate the file?

What happens if you then edit the end of the file?

Thank you for your reply

Compression is disabled
The file must be un-deduplicated. (Could I try to change file and than copy again on the deduplicating machine for avoiding the problem that you mean? It it what you mean?)

Stux · May 1, 2017

the way its supposed to work, is that individual blocks have checksums. Each block which has the same checksum will only be stored on disk once.

Block sizes in ZFS are I think 128KB -> 1MB depending on dataset/zpool configurations.

Thus duplicating a file shouldn't use any extra space. Also, truncating part of the end of one of those files shouldn't use more than 1 extra block of space, and modifying one block in one of those files, again, shouldn't result in more than 1 extra block of used space.

Of course, used space will go up, but available space will not go down (by much).

Other than that, i have no experience with dedup, and can't tell you how good/bad the dedup ratio numbers are.

And then those blocks get compressed, leading to variable block sizes.

Important Announcement for the TrueNAS Community.

No BLOCKLEVEL deduplication ?!

Dabbler

Patron

Server Wrangler

Guru

Documentation Engineer

Dabbler

Dabbler

Dabbler

MVP

Dabbler

Guru

MVP

MVP

Dabbler

Dabbler

MVP

Dabbler

MVP

Dabbler

MVP

Similar threads