how to check effectiveness of dedup?

Status
Not open for further replies.

Joe Gruher

Cadet
Joined
Jul 16, 2013
Messages
5
I am aware of the risks with dedup. I just want to run an experiment where I configure a number of clients with almost-identical OS installs and boot them over iSCSI. In theory dedup should let them boot faster if their data is highly dedup-able since I should have to do less seeking on disk and get more cache hits in memory. I have 32GB RAM and only about 650GB allocated in my dedup'd array so I should have plenty of memory.

Looking on disk I see all the targets shown as their full capacity. Much of this should be deduplicated as empty space and as duplicate blocks across similar OS installs. How can I check how much this is being deduplicated (how can I check the actual capacity consumed on disk after dedup versus the allocated capacity)?

Thanks!

[root@freenas-joeboot] /mnt/joeraid/dedup# ls -l
total 557206475
drwxr-xr-x 2 root wheel 38 Jul 16 05:12 ./
drwxr-xr-x 3 root wheel 3 May 9 04:57 ../
-rw-r--r-- 1 root wheel 16106127360 Jul 16 03:24 ki01
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:48 ki01a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 03:33 ki02
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:47 ki02a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 03:33 ki03
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:47 ki03a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:11 ki04
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:11 ki05
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:11 ki06
-rw-r--r-- 1 root wheel 16106127360 Jun 12 08:20 ki07
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:06 ki07a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:43 ki08
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:07 ki08a
-rw-r--r-- 1 root wheel 21474836480 Jul 16 05:47 ki09
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:07 ki09a
-rw-r--r-- 1 root wheel 16106127360 Jun 13 02:00 ki10
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:07 ki10a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:48 ki11
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:07 ki11a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:47 ki12
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:08 ki12a
-rw-r--r-- 1 root wheel 16106127360 Jul 11 04:52 ki13
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:12 ki13a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:48 ki14
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:13 ki14a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:47 ki15
-rw-r--r-- 1 root wheel 37580963840 Jul 16 05:16 ki15a
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:11 vi01
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:11 vi02
-rw-r--r-- 1 root wheel 16106127360 Jul 16 05:11 vi03
-rw-r--r-- 1 root wheel 64424509440 Jun 27 03:51 vi04vm
-rw-r--r-- 1 root wheel 64424509440 Jun 27 03:51 vi05vm
-rw-r--r-- 1 root wheel 64424509440 Jun 27 03:51 vi06vm
-rw-r--r-- 1 root wheel 64424509440 Jun 27 03:51 vi07vm
-rw-r--r-- 1 root wheel 64424509440 Jun 27 03:51 vi08vm
-rw-r--r-- 1 root wheel 64424509440 Jun 27 03:51 vi09vm
[root@freenas-joeboot] /mnt/joeraid/dedup#
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What's the output of zpool list?

That will tell you how much space dedup has saved you.
 

Joe Gruher

Cadet
Joined
Jul 16, 2013
Messages
5
Thanks! 11.55x currently for the pool.

Is there any way to check it other than at the pool level? Unfortunately I have a bunch of other targets in the same folder.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You could go by dataset, but I'm thinking you want file level.

Some more info can be obtained from "zfs list" and "zpool list".

Some fun reading is https://blogs.oracle.com/scottdickson/entry/sillyt_zfs_dedup_experiment

I think you know this, but I'll say it just in case you don't....

You didn't say how big your zpool is but keep in mind that dedup eats RAM for breakfast, lunch, and dinner. If you end up in a situation where you run out of RAM holding the dedup table you will be locked out of your zpool until you have more RAM. 5GB of RAM per TB seems to be the recommended with FreeNAS, but I've seen values from 3GB per TB up to 10GB per TB.

A small handful of people have used dedup in the forums and they always come here because they didn't have enough RAM for their zpool size. Because of this I consider the "average poster" that thinks dedup is awesome and uses it to be borderline irresponsible. Generally the cost of buying enough RAM is far higher than just buying enough hard disk space. It sounds like you have a good plan with 32GB of RAM and only 650GB of data. Hopefully your zpool isn't bigger than 5-6TB. :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Here's the warning from the FreeNAS developers regarding dedup...


ZFS v28 includes deduplication, which can be enabled at the dataset level. The more data you write to a deduplicated volume the more memory it requires, and there is no upper bound on this. When the system starts storing the dedup tables on disk because they no longer fit in RAM, performance craters. There is no way to undedup data once it is deduplicated, simply switching dedup off has NO AFFECT on the existing data. Furthermore, importing an unclean pool can require between 3-5GB of RAM per TB of deduped data, and if the system doesn't have the needed RAM it will panic, with the only solution being adding more RAM or recreating the pool. Think carefully before enabling dedup! Then after thinking about it use compression instead.
 

Joe Gruher

Cadet
Joined
Jul 16, 2013
Messages
5
Thanks for the warning. It is the amount data in the pool which is relevant to memory consumption, right? Or is it the raw size of the pool, even if only a small amount is consumed?

I have four 3TB disks which I put in a RAID10 (/mnt/joeraid). That volume does NOT have dedup enabled. Then I created a sub-volume (/mnt/joeraid/dedup) which DOES have dedup enabled. The idea is I would put the boot targets for my clients in the dedup pool, which will only require a relatively small amount of capacity, well under 1TB, and I have 32GB RAM. Ideally the boot targets will benefit heavily from dedup (I actually installed one client and just copied the target file to create each additional target) resulting in better weathering of "boot storms". Then I'll attach a second iSCSI target from the non-dedup pool to provide more capacity to each client. The idea is to dedup the boot area so I can boot many client simultaneously, such as whole rack of 45 servers, but then aside from boot use the non-dedup area to maximize performance and to keep the memory requirements under control.

Since dedup is a sub-volume of joeraid it does not really have its own capacity, it can use up to whatever is available in joeraid, I'm just being careful not to put too much data in it. Hopefully that is viable.

It is worth noting, this is just a lab experiment, nothing here is intended for a production environment.
 

Joe Gruher

Cadet
Joined
Jul 16, 2013
Messages
5
Just for reference.

[root@freenas-joeboot] /mnt/joeraid# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
joeraid 5.44T 46.8G 5.39T 0% 11.53x ONLINE /mnt
[root@freenas-joeboot] /mnt/joeraid# zfs list
NAME USED AVAIL REFER MOUNTPOINT
joeraid 532G 5.30T 1.93M /mnt/joeraid
joeraid/dedup 531G 5.30T 531G /mnt/joeraid/dedup
[root@freenas-joeboot] /mnt/joeraid#
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Thanks for the warning. It is the amount data in the pool which is relevant to memory consumption, right? Or is it the raw size of the pool, even if only a small amount is consumed?

It's data consumed, in blocks. The more "blocks" of data you have the larger the dedup table will be.

I have four 3TB disks which I put in a RAID10 (/mnt/joeraid). That volume does NOT have dedup enabled. Then I created a sub-volume (/mnt/joeraid/dedup) which DOES have dedup enabled. The idea is I would put the boot targets for my clients in the dedup pool, which will only require a relatively small amount of capacity, well under 1TB, and I have 32GB RAM. Ideally the boot targets will benefit heavily from dedup (I actually installed one client and just copied the target file to create each additional target) resulting in better weathering of "boot storms". Then I'll attach a second iSCSI target from the non-dedup pool to provide more capacity to each client. The idea is to dedup the boot area so I can boot many client simultaneously, such as whole rack of 45 servers, but then aside from boot use the non-dedup area to maximize performance and to keep the memory requirements under control.

Very nice! It sounds like you have a plan that may work out for you.

Since dedup is a sub-volume of joeraid it does not really have its own capacity, it can use up to whatever is available in joeraid, I'm just being careful not to put too much data in it. Hopefully that is viable.

It's totally viable. Just keep a lookout for disk usage. In the event that you run out of RAM I'm not sure if the zpool will be unmountable or just the dataset. I try not to make assumptions but I'm thinking the entire zpool would be lost.

I know that someone else tried to do something like what you are doing and dedup quickly became irrelevant because the iSCSI devices won't be aligning all files to the same block length. So if a new file is created on all of the iscsi devices but each isn't aligned to the same block boundaries then you won't see the same files actually take advantage of dedup. As temp files are created and deleted on the iSCSI devices the blocks will no longer be the same across all devices since the machines won't likely match everything up together. Keep in mind that files that are deleted still have their data on the iSCSI device and when you overwrite less than the full 128k block those blocks will now be unique forever. Example:

Pretend ZFS uses a block size of 8 characters. A temp file is created with exactly 8 characters on 2 different iscsi devices. We'll assume they aligned to a block boundary.

tempfile1 = aaaaaaaa

That file is then deleted and at some other time tempfile2 is created. It happens to occupy tempfile1, but is only 6 characters.

tempfile2 = bbbbbb

But one of the iSCSI devices didn't save it to the exact same place as the other iSCSI device. It just so happens that the block tempfile2 were saved to were cccccccc.

ZFS will see the block differently now. One will be aaaaaabb and the other will be aaaaaacc.


As time progresses there will be less and less blocks that "match" for dedup. If you do a defrag of your iSCSI device it kills your dedup VERY quickly(not to mention defragging is useless since you are using ZFS anyway). This is just a few of the reasons why dedup doesn't work 'as well' as people think in the long term. Windows' granularity is 4k while ZFS uses 128k blocks. The difference will tend to ruin dedup over time. I believe that the highest Windows' granularity can go is 32k or 64k. But even then, you have the example I provided above that kind of screws over dedup.

If you read that link I provided above he provides a rather simplistic explanation for how block level dedup works as well as how it can fall flat on its face. With "live" systems the blocks in each iSCSI device will become unique the more you use the system. Eventually you'll end up with very little savings for dedup. How much writing is actually being done to your iSCSI devices will dictate how fast the dedup ratio will drop.

You can change the block size for ZFS to make the granularity smaller, but I'm not sure if you can do that for a dataset(I'm not big on datasets myself). Also keep in mind that every time you 1/2 the block size(it works on powers of 2) you double the number of blocks that a given quantity of data will use(and hence the RAM needs for the dedup table will also double).

It is worth noting, this is just a lab experiment, nothing here is intended for a production environment.

Not trying to scare you off. I'm somewhat excited that you are trying it out in a non irresponsible manner and I'd love to hear back on how things are going for you in a month or two. Especially if you keep a log of what kinds of things you did on the iscsi devices and how the dedup ratio changed over time.
 

Joe Gruher

Cadet
Joined
Jul 16, 2013
Messages
5
Interesting, good points, thanks for the input. I may actually have a better than average case for dedup as I'm ultimately hoping to boot a cloud / datacenter geared OS that comes up on the clients and then generally runs out of memory. Aside from some logging to the non-dedup area the OS files should be pretty static so maybe I'll be able to maintain a reasonable level of dedup. We'll have to see see how well that actually works out in practice.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Definitely post back in a few months and let us all know how its going. You might be the first case that has even remote chance of having an actual valid use for dedup.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Definitely post back in a few months and let us all know how its going. You might be the first case that has even remote chance of having an actual valid use for dedup.

Wow, harsh man. You sayin' I ain't makin' good use of it?

Code:
zpool list
NAME      SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
xxxxxxxx  7.25T  2.41T  4.84T    33%  5.87x  ONLINE  /mnt


That's the poor little N36L that I've been punishing with NFS backups from ESXi. It used to run more like 60% full for three sets of backup images. I wanted to increase that without buying new disks just yet. It had 16GB of RAM so it was a little shy, but with a 60GB SSD for L2ARC it runs dedup just fine.

Now the thing is, it went from 50-60% utilization down to 33%, BUT that doesn't account for the fact that I also bumped it up from three sets to ten sets of backup images. Would be fun to boost it up to 30. Maybe.

For the situation described by the original poster, though, I find myself wondering if cloning wouldn't have been a better solution. Take advantage of ZFS COW without the hazards of the DDT.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I didn't even know you had a dedup system.. That makes 2 people.
 
Status
Not open for further replies.
Top