Dedup Tunables question

DScape · Jan 6, 2014

We are trying to use hardware we already have and do not want to spend more on storage. It may seem strange but we plan to put an HP Proliant dl385 G7 in front of a Pillar Axiom 600 for deduplication. The specs of the HP: 2 x AMD Opteron 6272 (32 cores total) and 176GB of memory. The SAN has 12TB of storage that will be connected via fibre channel or possibly iSCSI if needed. My question is: what tunables should I set for deduplication so that it does not have to read the dedup table from disk? From what I read, ZFS will only use 25% of the memory used for ARC for dedup. Also, does the dedup table get written to disk in case of a reboot anyway? Thanks.

jgreco · Jan 6, 2014

The dedup table is being cached by ZFS ARC more or less the same as other metadata. It will read the DDT data from disk as the system warms up the ARC and has to perform dedup tasks. You ideally want it to remain in ARC (or L2ARC) if possible, to minimize extra pool I/O.

cyberjock · Jan 6, 2014

I don't know alot about dedup since for most people its a waste. So here goes:

I wouldn't change any tunables, at least not unless you think you need to. I'd give your server a few days of regular use to see what happens.

Now for your first question. Let's assume you give it a few days, what is going to make you think you need to change your tunable settings? Typically, you change tunables because of a problem. That problem may be performance or reliability. But unless you actually have one of those problems, you shouldn't be changing things just to change them. Usually serious thought is put into the tunables and they shouldn't just be changed "just because you think you know better".

The next question is "Why do you think dedup is going to help you?" If you think that dedup is going to help shrink down VMs, you are somewhat right, and somewhat wrong. It won't work as well as you'll hope, but it will help somewhat. So what kind of files are we talking about where you think dedup will help? Block level dedup is a tricky thing, so some loads it won't be too helpful for. Generally, people are unimpressed with the ddt reduction for virtual machine files.

Some tunables you might want to google:

kstat.zfs.misc.arcstats.size
vfs.zfs.arc_meta_limit(I think your 25% limit goes here, but it also depends on other things.. so don't just google this one and call it good).
vfs.zfs.arc_meta_used

One thing I've been unable to verify is if the ddt and regular metadata compete for RAM. So assuming you can simply change the arc_meta_limit to like 90% of your RAM(this will break other things and is impossible, but bear with me) your metadata and the ddt will still compete and whichever is in more use will get its way. Additionally, that limit is just that.. a limit. It does not force that much RAM to be used for metadata/ddt. It simply sets the limit. Your system may use considerably less. One ZFS engineer somewhere said normally you should shoot for whatever 100% of your ddt would be with the 25% limit as it stands without tweaking. For you, that means about 240GB of RAM. Not exactly an easy thing to do in your shoes I imagine. This may presumably be why you are asking these questions. ;)

Most people that have played with dedup typically see real-world ddt ratios of 1.05 to 1.25 with most people rarely going over that. Usually, compression is considered to be a better option. Although it's not always the best option either.

DScape · Jan 6, 2014

Thanks for the responses! I am using both LZ4 compression and dedup. My plan is to use this for daily backups for my Hyper-V boxes. The backup software that comes with Windows Server (wbadmin) will only properly do a full backup, not incremental or differential. Since they are full backups of VHDs, the dedup ratio is very high; even the compression ratio gets close to 1.5. Is there any way to change the dedup block size? Since I am doing backups, I would theoretically get a good ratio with a higher block size, right?

titan_rw · Jan 6, 2014

cyberjock said:
Most people that have played with dedup typically see real-world ddt ratios of 1.05 to 1.25 with most people rarely going over that. Usually, compression is considered to be a better option. Although it's not always the best option either.

Here's my dedupe stats:

Code:

dedup = 18.50, compress = 1.67, copies = 1.03, dedup * compress / copies = 30.00

Dedupe is doing far better than compression. But both together make for even more space saving.

cyberjock · Jan 6, 2014

titan_rw.. what are you deduping to get an 18.5 ratio? is that the ratio from zpool list?

Also, you can't do dedup * compression / copies to get a value that is meaningful.

ZFS is pipelined. That is, each write to the pool is compressed, checksummed, then deduped in that order. What is compressed may not be deduped, and what is deduped may not compress. So your math equation is meaningless(I think). Otherwise you would be assuming that a given amount of data WOULD be compressed by about 40%, then deduped down by your exact ratio, then copied by your exact ratio. That's just not the reality of it.

Think about this.. if you wrote a 100TB file of zeros to your pool your dedup, and compression ratio would look amazingly high. And since you multiply your values it'll make it look artificially high. ;)

titan_rw · Jan 6, 2014

From what dscape mentions in Post #4, it seems like I'm doing something similar.

I had an esxi box with local storage. I run a backup script locally on that box that mounts an nfs share (freenas), snapshots each vm in turn, and copies the (now static) vmdk / vmx to the nfs share. It then deletes the vm snapshot returning things to normal.

These are by definition full backups everyday. But with dedupe, I save all the duplicated space, so each 'full' backup only really uses disk space from what's changed in the .vmdk image.

I tried lz4 vs gzip9, and settled on gzip9. lz4 was only slightly (15% or so) faster, and was network bound. Enabling gzip9 actually is cpu bound, but just barely, as it's only a bit slower than lz4.

As to whether the values are meaningful for not, I have no idea. That's the output from "zdb -D" though. Here's the raw output if it matters:

Code:

root@nas2 ~ # zpool list
NAME      SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
nas2pool  32.5T  9.36T  23.1T    28%  18.49x  ONLINE  /mnt
 
root@nas2 ~ # zdb -Dv nas2pool
DDT-sha256-zap-duplicate: 579601 entries, size 1309 on disk, 211 in core
DDT-sha256-zap-unique: 206626 entries, size 1658 on disk, 267 in core
 
DDT histogram (aggregated over all DDTs):
 
bucket              allocated                      referenced
______  ______________________________  ______________________________
refcnt  blocks  LSIZE  PSIZE  DSIZE  blocks  LSIZE  PSIZE  DSIZE
------  ------  -----  -----  -----  ------  -----  -----  -----
    1    202K  25.2G  3.76G  4.60G    202K  25.2G  3.76G  4.60G
    2    56.0K  7.00G  1.76G  1.99G    130K  16.2G  4.06G  4.59G
    4    39.7K  4.97G  1.58G  1.70G    202K  25.2G  8.10G  8.75G
    8    28.3K  3.54G  1017M  1.09G    312K  39.1G  10.5G  11.6G
    16    438K  54.8G  34.9G  35.8G    9.60M  1.20T    781G    799G
    32    3.65K    467M    198M    210M    167K  20.8G  8.82G  9.38G
    64      256  31.9M  12.2M  13.1M    18.2K  2.27G    864M    930M
  128        6    768K    24K  48.0K      908    114M  3.55M  7.09M
  256        3    384K    12K  24.0K    1.13K    145M  4.52M  9.04M
  512        1    128K      4K  7.99K      736    92M  2.88M  5.74M
    4K        1    128K      4K  7.99K    7.32K    938M  29.3M  58.5M
  16K        1    128K      4K  7.99K    16.9K  2.11G  67.4M    135M
Total    768K  96.0G  43.2G  45.4G    10.6M  1.33T    817G    839G
 
dedup = 18.50, compress = 1.67, copies = 1.03, dedup * compress / copies = 30.00

If I'm calculating things right, the DDT's should be using (768,000 * 320 bytes) of ARC. That only about 234 MB. I would have had the pool space to do this without dedupe, but I wanted to see how it worked. It's a backup nas, so I'm not terribly worried about having dedupe ram issues. I totally accept that I might lose the pool due to insufficient ram. This is kind of my 'experiment' box. It's only got 12 gigs of ram, which for the pool size it has, is already a bit low. Plus the little bit of dedupe I'm doing. Also note that dedupe is only enabled on the dataset I use for vm backup. The rest of the pool would dedupe extremely poorly.

Anyway, I'm not trying to derail the OP's thread, I just wanted to point out it's quite possible to have dedupe ratio's in excess of 2.0.

cyberjock · Jan 6, 2014

Ok.. I see what you are doing. Very unique way of taking advantage of dedup!

Your math looks correct to me titan_rw. Any chance you could PM me that script? I'm interested in trying to find a way to backup my linux VM in a more automated way...

Don't ever run a defrag on your VMs.. lol

jgreco · Jan 6, 2014

It definitely is possible to have high ratios. I had our little N36L flying real high (16GB RAM plus a SSD L2ARC for a 4TB usable pool) doing VM backups but it started locking up when it was ... I wanna say around 30? and then had some trouble finding a comfort point again. So I turned off dedup and went back to only enough space for 3 days backup.

titan_rw · Jan 6, 2014

As far as the script goes, it's simply "ghettovcb.sh" or something. Google will definitely find it.

Yea, if I defraged the VM's, the next daily backup will dedupe very badly. But the daily backups after that will start to dedupe well again. I currently have 25 days of backups, with a limit of 30, so appx a month after the defrag, the old non-defraged vmdk's would get deleted, and the average dedupe ratio would go back up.

Some of the VM's are on ssd, so there no point in defragging them. The ones that aren't on ssd don't have a lot of re-write disk io, so they don't really need defragging much.

Important Announcement for the TrueNAS Community.

Dedup Tunables question

DScape

Cadet

jgreco

Resident Grinch

cyberjock

Inactive Account

DScape

Cadet

titan_rw

Guru

cyberjock

Inactive Account

titan_rw

Guru

cyberjock

Inactive Account

jgreco

Resident Grinch

titan_rw

Guru

Similar threads