OpenZFS "Fast Dedup" Project now in Public Review

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The collaborative project between Klara and iXsystems on "Fast Dedup" has been completed and presented as a series of pull requests (PR's) to the OpenZFS Github, ready for public review

We're targeting this "Fast Dedup" functionality to release, hopefully alongside RAIDZ expansion, with TrueNAS SCALE 24.10 later in 2024.

Read more in the blog announcement:
https://www.ixsystems.com/blog/fast...-gift-to-the-openzfs-and-truenas-communities/

For more technical detail, dig into the discussion on the OpenZFS Github:
https://github.com/openzfs/zfs/discussions/15896
 
Last edited:

probain

Patron
Joined
Feb 25, 2023
Messages
211
Really looking forward to see reviews and conclusions when people who know far more about ZFS that I, do said testing. :)
 

QuidNYC

Cadet
Joined
Jan 10, 2015
Messages
3
With this announcement, are you signaling a lengthy / indefinite / permanent divergence between the ZFS features that will be available in SCALE versus CORE, or is there an expected timetable for the latter to receive that implementation as well?
 

Volts

Patron
Joined
May 3, 2021
Messages
210
Amazing and congrats. Nice to see that development funding model working, too.

Are there any early comparisons - performance, efficiency, etc?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Funny how on Monday the folks at Oxide were lamenting that dedup just isn't very good on ZFS. Excellent timing on the drop.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
With this announcement, are you signaling a lengthy / indefinite / permanent divergence between the ZFS features that will be available in SCALE versus CORE, or is there an expected timetable for the latter to receive that implementation as well?
Since this work is going upstream to OpenZFS directly, it may land in a future update to CORE later when OpenZFS 2.3 is included there (but that's not going to be CORE 13.3) but it's going to land in SCALE first. Because fast dedup is enabled as an "extension" of the existing dedup capabilities, turning dedup=on on a given dataset/zvol should automatically enable this feature once present (assuming that you've upgraded the pool, as it will require a new set of on-disk data types)

Are there any early comparisons - performance, efficiency, etc?
FDT-log (Fast Dedup Table Logging) is where you should see the biggest performance gain - basically, current dedup requires all updates to the DDT be made in the same transaction group as the data itself. With the fdt-log update, it appends to both an in-memory and on-disk journal for the necessary updates, and uses similar methods to the existing OpenZFS write throttle to batch out chunks of the fdt-log to the permanent tables on disk (based on a bunch of different tunables that you can play with).

There's also a "ddt overflow abort" - if you exceed the (tunable) thresholds for maximum ddt size, it will revert to standard writes - and a ddt prefetch option to "pre-heat" the tables into RAM, reduction of the memory/disk footprint using a "flat" object ... there's lots of goodies in here.
 
Joined
Jun 15, 2022
Messages
674
RAIDZ expansion in CORE would be great.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
RAIDZ expansion in CORE would be great.
While that's also hopefully landing in OpenZFS 2.3 alongside FDT, it's likely not going to be supported via UI or heavily tested in CORE.
 
Joined
Jun 15, 2022
Messages
674
While that's also hopefully landing in OpenZFS 2.3 alongside FDT, it's likely not going to be supported via UI or heavily tested in CORE.
I understand and accept the reasoning that's likely driving those decisions. On SOHO installs (generally forum members) it would be useful.

(to be clear, I'm only constructively providing food for thought)
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
It's not a feature I plan to use in my home system, nor one I think should be ever used with ZFS. So not upset about it not being on CORE.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
It's not a feature I plan to use in my home system, nor one I think should be ever used with ZFS. So not upset about it not being on CORE.
Which one - fast dedup or RAIDZ expansion?
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Thx - I definitely don't plan on using dedup...
 
Joined
Jun 15, 2022
Messages
674
On mail server and other storage systems I can see the use for dedup. (Gaming rigs too, though this is kind of a poor fit in my estimation.) Fast Dedup is welcome even if I'm not currently using it.

RAID-Z expansion...wide applicable use for many forum members (probably best discussed outside this thread).

Thank you for the updates on Fast Dedup.
 

probain

Patron
Joined
Feb 25, 2023
Messages
211
How does dedup handle recursiveness with underlying datasets and zvols? Is each dataset and zvol isolated from each other? Or does the data inside them, get deduped if matching blocks have been found in a parent or someplace else in the pool already?

An example would be. Say I have two zvols installed in the same way with the same OS-installation. Would they be deduped against each other, for the blocks that they share? If they do. How would this apply if they were located in different datasets accross the pool (all of which are marked with dedup of course)?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The DDT is per pool, so dedup applies globally to all datasets that use it.
 

Philip Robar

Contributor
Joined
Jun 10, 2014
Messages
116
It's not a feature I plan to use in my home system, nor one I think should be ever used with ZFS. So not upset about it not being on CORE.
Why?
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Top