Why are my DDT entries so big?

virtualdxs · May 10, 2019

I decided to try out dedup on a dataset that should dedup decently and dedup even better over time. As of right now, the dataset is tiny - <5GB. However, it seems to be using about 16.5 GIGAbytes (well actually of RAM, according to this link:

Code:

dedup: DDT entries 99455, size 1174002 on disk, 165897 in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    65.3K   3.15G   2.56G   2.88G    65.3K   3.15G   2.56G   2.88G
     2    31.5K   1.40G   1.13G   1.24G    63.6K   2.81G   2.26G   2.49G
     4      234    911K    284K   2.20M    1.13K   4.52M   1.29M   10.8M
     8       41    148K     24K    375K      415   1.95M    256K   3.70M
    16       14    184K   25.5K    146K      279   3.11M    503K   2.86M
    32        1     512     512   9.14K       38     19K     19K    347K
Total    97.1K   4.55G   3.69G   4.13G     131K   5.97G   4.82G   5.39G

99455 entries times 165897 bytes per entry is 16499286135 bytes, divided by 1000^3 is 16.5 gigabytes of memory used. Looking at other examples on the web, the bytes per entry is usually in the 3 digit range. Any ideas why this would be the case?

SweetAndLow · May 10, 2019

I think 3x is what is expected for memory usage.

virtualdxs · May 10, 2019

SweetAndLow said:
I think 3x is what is expected for memory usage.

I know for a fact that that's not true. If it were, dedup would be less than useless because you'd have to buy 15 terabytes of RAM to serve a 5 TB dataset (as an example).

SweetAndLow · May 10, 2019

virtualdxs said:
I know for a fact that that's not true. If it were, dedup would be less than useless because you'd have to buy 15 terabytes of RAM to serve a 5 TB dataset (as an example).

Correct, you would need that much RAM. This is exactly why dedup isn't something that is worth while or actually used very much.

danb35 · May 10, 2019

SweetAndLow said:
Correct, you would need that much RAM.

No, you wouldn't. Deduplication is RAM-hungry, but not that RAM hungry such that it routinely needs 3x the amount of RAM as the amount of storage dedpulicated.

virtualdxs · May 10, 2019

SweetAndLow said:
Correct, you would need that much RAM. This is exactly why dedup isn't something that is worth while or actually used very much.

No. That's not how this works. If that were true, NOBODY would use dedup, as it would be orders of magnitude cheaper to buy extra storage.

Rule of thumb for ZFS dedup is 5 GB per TB.

I've found a likely cause of my issue, but I need more information to take care of that. I'm opening another thread to ask about that. I'll report back here if my suspicions are correct.

Important Announcement for the TrueNAS Community.

Why are my DDT entries so big?

virtualdxs

Dabbler

SweetAndLow

Sweet'NASty

virtualdxs

Dabbler

SweetAndLow

Sweet'NASty

danb35

Hall of Famer

virtualdxs

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Why are my DDT entries so big?

virtualdxs

Dabbler

SweetAndLow

Sweet'NASty

virtualdxs

Dabbler

SweetAndLow

Sweet'NASty

danb35

Hall of Famer

virtualdxs

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Why are my DDT entries so big?"

Similar threads