Why are my DDT entries so big?

virtualdxs

Dabbler
Joined
Nov 19, 2018
Messages
34
I decided to try out dedup on a dataset that should dedup decently and dedup even better over time. As of right now, the dataset is tiny - <5GB. However, it seems to be using about 16.5 GIGAbytes (well actually of RAM, according to this link:

Code:
dedup: DDT entries 99455, size 1174002 on disk, 165897 in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    65.3K   3.15G   2.56G   2.88G    65.3K   3.15G   2.56G   2.88G
     2    31.5K   1.40G   1.13G   1.24G    63.6K   2.81G   2.26G   2.49G
     4      234    911K    284K   2.20M    1.13K   4.52M   1.29M   10.8M
     8       41    148K     24K    375K      415   1.95M    256K   3.70M
    16       14    184K   25.5K    146K      279   3.11M    503K   2.86M
    32        1     512     512   9.14K       38     19K     19K    347K
Total    97.1K   4.55G   3.69G   4.13G     131K   5.97G   4.82G   5.39G


99455 entries times 165897 bytes per entry is 16499286135 bytes, divided by 1000^3 is 16.5 gigabytes of memory used. Looking at other examples on the web, the bytes per entry is usually in the 3 digit range. Any ideas why this would be the case?
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I think 3x is what is expected for memory usage.
 

virtualdxs

Dabbler
Joined
Nov 19, 2018
Messages
34
I think 3x is what is expected for memory usage.
I know for a fact that that's not true. If it were, dedup would be less than useless because you'd have to buy 15 terabytes of RAM to serve a 5 TB dataset (as an example).
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I know for a fact that that's not true. If it were, dedup would be less than useless because you'd have to buy 15 terabytes of RAM to serve a 5 TB dataset (as an example).
Correct, you would need that much RAM. This is exactly why dedup isn't something that is worth while or actually used very much.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Correct, you would need that much RAM.
No, you wouldn't. Deduplication is RAM-hungry, but not that RAM hungry such that it routinely needs 3x the amount of RAM as the amount of storage dedpulicated.
 

virtualdxs

Dabbler
Joined
Nov 19, 2018
Messages
34
Correct, you would need that much RAM. This is exactly why dedup isn't something that is worth while or actually used very much.
No. That's not how this works. If that were true, NOBODY would use dedup, as it would be orders of magnitude cheaper to buy extra storage.

Rule of thumb for ZFS dedup is 5 GB per TB.

I've found a likely cause of my issue, but I need more information to take care of that. I'm opening another thread to ask about that. I'll report back here if my suspicions are correct.
 
Top