Game-changer for ZFS 2.2? Intelligent compression that saves CPU *and* space?

Joined
Oct 22, 2019
Messages
3,641
Apparently, an upcoming feature for ZFS is a complete re-imagining of how inline compression will be used.


Here is an example to summarize the gist of it:

  1. You configure a dataset to use ZSTD-9 compression
  2. Upon writing a new record, it first attempts to compress it entirely via LZ4 (ultra fast)
    • If the resulting compressed record does reduce in size by at least 12.5%, it compresses and saves it with your chosen level (i.e, ZSTD-9)
    • If the resulting compressed record does not reduce in size by at least 12.5%, it then...
  3. ...attempts to compress it entirely via ZSTD-1 (very fast)
    • If the resulting compressed record does reduce in size by at least 12.5%, it compresses and saves it with your chosen level (i.e, ZSTD-9)
    • If the resulting compressed record does not reduce the size by at least 12.5%, it then...
  4. ...discards compression, saving the record uncompressed
* The above process works the same for any level of ZSTD above ZSTD-1 (e.g, ZSTD-3, -9, -19).



Essentially, the idea is:

"If the record shows compressibility with LZ4, then let's go ahead and compress it with the desired ZSTD level, since we know we'll get space savings without wasting the CPU for nothing. LZ4 is so ultra fast that it works as a heuristic when used on an entire record."

"If the record supposedly lacks compressibility, based on our first attempt with LZ4, it could be a false negative? Let's try again with ZSTD-1. It's still very fast, and it might reveal that the entire record is indeed compressible! If ZSTD-1 reveals this, then let's go ahead and compress it with the desired ZSTD level."

"If the record supposedly lacks compressibility with LZ4 and even with ZSTD-1? Forget it! Not worth wasting more time and CPU trying to squeeze it with the desired compression level. Let's just write it as an uncompressed record."


zfs-compression-flowchart.png

Flowchart illustration of this process
"Desired ZSTD level" can be ZSTD-3, ZSTD-9, ZSTD-19, etc.




While all of this sounds great, I'm still unable to figure out of this is planned for ZFS 2.2? (I can't find any reference to this except for a thread with a Google employee who explains the logic behind this new feature.)



If this feature makes it into ZFS, it means you can set ZSTD compression levels at 3 and higher (e.g, ZSTD-3, -9, -19), without worrying about wasted CPU cycles trying to compress incompressible data with slower compression methods. The only "wasted" CPU will be spent on testing the record with LZ4 (and ZSTD-1 to rule out false negatives), which are not considered a huge cost since they are very fast.


Thoughts on this? Did I interpret it incorrectly?

Will this have to be enabled as a "pool feature", or will it automatically "just work" without any additional configuration? (Important for TrueNAS users.)

Does anyone know if this is going to make it into ZFS 2.2? I'm having difficulty finding more information.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have seen that, but I don't remember the status and I can't quickly find it again. ZFS 2.2 seems aggressive for this feature, given the moving parts involved.
Will this have to be enabled as a "pool feature", or will it automatically "just work" without any additional configuration? (Important for TrueNAS users.)
This is a tricky topic. The results should be read-only compatible, but settings for this might be tricky, unless this behavior turns into the standard for zstd.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
How's that a "complete re-imagining"?
If I remember correctly, ZFS already works this way, except that it directly attempts a test compression with the chosen algorithm, and carries on if it gets at least 1/8 savings (divide by 8 = binary shift >>3). So the improvement here is doing the test with faster algorithms, at the risk of doing more work and/or missing savings of barely more than 1/8 with an advanced algorithm but not with the faster and less effective lz4 or zstd-1.
 
Joined
Oct 22, 2019
Messages
3,641
unless this behavior turns into the standard for zstd.
This is what I'm gathering, since I recall reading an article in the FreeBSD Journal in 2021 on how they plan to declutter and streamline ZSTD inline compression for ZFS. (It's too granular and confusing right now, considering how many "levels" there are.)


I like this proposal, since you can more comfortably select a higher ZSTD compression level, without the added "buyer's remorse". You needn't worry if data is incompressible, since it will take only slightly longer to write records (just to rule out LZ4's and ZSTD-1's heuristic). If you don't notice any decreased performance when currently using LZ4, you're very unlikely to notice an extra ZSTD-1 pass. (Especially if using "async" writes.) The payoff is that when it "catches" compressible data, you'll squeeze out even more space savings with very little additional cost. (A "cost" so small, it's not even worth worrying about.)


So instead of deciding "What ZSTD level should I choose for each dataset?" you can simplify your choices down to:
  • LZ4 for datasets with incompressible data
  • Some high ZSTD level for datasets that are likely to contain compressible data.

Just pick the highest level you're comfortable with (such as ZSTD-9) and enjoy. No need to get granular.


UPDATE: I added a flowchart to the original post.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
If I remember correctly, ZFS already works this way
Only for LZ4, not for ZSTD.

This new method doesn't simply introduce "early abort" to inline ZSTD compression (which ZFS currently lacks), but also the logic behind the process is different than LZ4's early abort feature.

EDIT: On top of this is the proposal that you don't even select a ZSTD level. An algorithm will decide which compression level to save the record. The rationale for this is that ZSTD decompresses data at the same speed, regardless of the compression level used to shrink the data. (So theoretically, you choose "ZSTD", and the software will decide on its own to compress a record with ZSTD-3 or ZSTD-9 or ZSTD-19, etc. No user intervention or upfront decision involved.)

* A new method of which I'm trying to pinpoint when it will be available. It's hard to find information for it online.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
ZFS 2.2 seems aggressive for this feature, given the moving parts involved.
Looks like it will be featured in OpenZFS 2.2. :cool:

Here's the commit. (Already available in the 2.2 release candidate.)

Now I just need to find the status on "universal ZSTD compression level", which I read about in the FreeBSD Journal back in 2021.

Between these two compression mechanisms, it'll make inline compression for ZFS on a level beyond what other filesystems offer.


BLAKE3 hash, block cloning, a smarter ARC (especially in regards to metadata), and a new implementation of inline ZSTD compression, OpenZFS 2.2 is looking better and better.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Adaptive ZSTD is definitely huge as it addresses a major shortcoming of ZSTD - the lack of early-abort.

With an incompressible file, LZ4 basically gives up immediately; even the fastest ZSTD is about 1/3rd the speed as it naively tries to squeeze bits from a stone.

Code:
Compressor name         Compress. Decompress. Compr. size  Ratio 
lz4 1.9.3                2199 MB/s  3451 MB/s   276960629 100.30
zstd 1.5.5 -1             881 MB/s  3431 MB/s   275942581  99.93


With a file where LZ4 tries to compress it and fails, but ZSTD succeeds - an algorithmic win - this pans out well.

Code:
Compressor name         Compress. Decompress. Compr. size  Ratio
lz4 1.9.3                 407 MB/s  2977 MB/s     8390195  99.01
zstd 1.5.5 -1             274 MB/s   297 MB/s     6771474  79.91
zstd 1.5.5 -3            46.7 MB/s   213 MB/s     6085291  71.81
zstd 1.5.5 -5            18.7 MB/s   195 MB/s     5556373  65.57
zstd 1.5.5 -7            14.7 MB/s   183 MB/s     5433980  64.12
zstd 1.5.5 -9            10.1 MB/s   154 MB/s     5361356  63.27


However, do note the compression speeds on the higher ZSTD levels. If you blindly set your dataset to ZSTD-9 in the hopes that adaptive ZSTD will spare you pain, you're in for a bit of a bumpy ride if it decides something does meet the criteria for the final step.

ZSTD decompresses data at the same speed, regardless of the compression level used to shrink the data.

It doesn't decrease at the same "falls off a cliff" rate as compression, no, but it's still impacted.

(There's a longer post/blog article coming on this stuff. Compression is fun!)
 
Joined
Oct 22, 2019
Messages
3,641
It doesn't decrease at the same "falls off a cliff" rate as compression, no, but it's still impacted.
Dang it, Mr. Badger. Way to crush my hopes and dreams. :frown: ("Honey badger don't care!")

i-cant-hear-you.jpg





But then what is Allan Jude referring to in this article? He makes it sound like "Regardless of compression level used, decompression performance is unaffected." The implication is that "We should use the highest ZSTD compression level possible if we can sneak it in before any noticeable speed impact. Because, once it's written as a compressed record, it's 'mission accomplished'. The user will not notice any performance penalty when they read/decompress the records."
Allan Jude said:
One of the main advantages of Zstd is that the decompression speed is independent of the compression level. For data that is written once but read many times, Zstd allows the use of the highest compression levels without a performance penalty.

(...)

Aside from continuing to optimize Zstd for ZFS, the next obvious evolution is to remove the need for the user to decide what Zstd level is best (there are 40 options to choose from after all). Instead, we envision a user simply setting compress=zstd-auto and ZFS dynamically adapts in some sensible way.

(...)

In ZFS, this would likely be modelled on the amount of “dirty” data (data waiting to be compressed and written to disk). When new data is written to ZFS, it will be compressed with the maximum compression level. If the rate of incoming writes is too high for ZFS to keep up with the requested level of compression, which results in the amount of dirty data steadily increasing, the compression level would lower incrementally, ideally settling on the maximum level that does not limit throughput. As always, the ZFS philosophy is to make sensible use of system resources while minimizing the need for adjustment and tweaking by the user.



With a file where LZ4 tries to compress it and fails, but ZSTD succeeds - an algorithmic win - this pans out well.
However, do note the compression speeds on the higher ZSTD levels. If you blindly set your dataset to ZSTD-9 in the hopes that adaptive ZSTD will spare you pain, you're in for a bit of a bumpy ride if it decides something does meet the criteria for the final step.
The way rincebrain explained it, the ZSTD-1 pass (to rule out a false-negative of the LZ4 pass) is still "worth the cost" of spent time.

A pass of LZ4 + ZSTD-1, which ends up with a non-compressed record, is still faster than skipping the first two passes, trying to squeeze every record with your desired ZSTD level.

So given the two options, choice B is superior:
  • Option A: Compress every record with your desired ZSTD level
  • Option B: Spend a small bit of extra time with LZ4 + ZSTD-1 before committing to the decision of non-compressed or compressed with ZSTD-n.
The above applies for a dataset with mostly uncompressible data. However, even for a dataset with highly compressible data, you apparently only waste a negligible amount of time for the "redundant" LZ4 (+ ZSTD-1) passes.



I can't help but see this as a "win-win". If I want to save space by using ZSTD-9, the only "cost" I may see from this new method (being introduced with OpenZFS 2.2) is the time spent on the first passes.

Without this feature, setting ZSTD-9 compression will outright waste more time and CPU trying to squeeze every record to be written. Which means I'll have to "reconsider" and possibly use ZSTD-2 or ZSTD-3 to try to find a balance between speed and space-savings. However, with ZFS 2.2, you supposedly don't need to "find this sweet spot" for your dataset, since the first passes will eliminate the wasted CPU cycles on uncompressible data.



(There's a longer post/blog article coming on this stuff. Compression is fun!)
Will Christmas arrive early? :grin: Looking forward to reading this!
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
But then what is Allan Jude referring to in this article? He makes it sound like "regardless of compression level used, decompression performance is unaffected."

It's definitely "affected" but nowhere near the same degree that compression speed is. Take a look at the graph shown later in the paper when comparing algorithms against the FreeBSD source code:

1696954319198.png


Decompression speed actually drops from ZSTD-1 to -3 and -5 before accelerating again, and actually improving relative to -1 at -9 and beyond.

The proposed path towards zstd-auto is also really neat, leveraging the "--adapt" option in command line zstd - that offers varying compression based on things like the output speed, but integrating it in to ZFS and keying off of factors like outstanding dirty data in a txg buffer could result in it being able to squeeze space where it can while reverting to a faster algorithm under load.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Decompression speed actually drops from ZSTD-1 to -3 and -5 before accelerating again, and actually improving relative to -1 at -9 and beyond.
The "flaw" I see with that benchmark graph is that it doesn't take into account raw decompression speeds. (You can clearly see that "no compression" has the worst performance.)

Because source code is highly compressible, less data needs to be pulled from storage, where it goes into RAM to be decompressed and read.

That's why it's hard to compare the decompression speeds of ZSTD-5 to ZSTD-15, when in fact it's really comparing "how fast can we pull this off the disk to decompress/read it in RAM?"

There's no way that Off or LZ4 is "slower" than ZSTD-15 in regards to the CPU + decompression. (What the graph shows is that less data had to be read into RAM.)

It does, however, reveal that data which is written once and read many times, the higher ZSTD levels not only save space, but also boost performance. (Less to read from the drives).



I suppose to truly see the benefits of "auto" ZSTD or "early abort" ZSTD, you need to work with a mixed bag of uncompressible and compressible data. (Using software source code as your "sample data" grants an unfair advantage to the higher compression levels.)

Because at the end of the day, we want to spare our CPU (and time) from needless waste on unnecessary compressed writes. (While still keeping the door open to squish records as much as possible, with the highest level possible, since reading the data will not suffer any true penalty at the higher ZSTD levels.)


Another way to phrase the above:
If you could snap your fingers like a magical genie, you would choose to have all your compressible records saved with ZSTD-19. (It happens in an instant. Magic wand!) Because you'll gain the maximum possible space savings and your reads will be just as fast as if you had chosen ZSTD-3 with your "magic wand". But because no such magic wand exists, we need to be smart about the compression/writes, and not really worry about the the decompression/reads.

Yet another way to phrase it:
A record reduced in size by exactly 12.5% with ZSTD-3 will be read and decompressed at the same speed as the same record reduced in size by exactly 12.5% with ZSTD-19. So if you eliminate the worry about a performance hit during the compression stage, then it's moot to consider the decompression/read performance, since with ZSTD, it's practically equal across all levels from 1 through 19. (That's what the author seems to imply.)

I think that makes sense?
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
You edited another 200% of post content in there after my response, now I have to go back and reply to everything. :grin:

Option A: Compress every record with your desired ZSTD level
Option B: Spend a small bit of extra time with LZ4 + ZSTD-1 before committing to the decision of non-compressed or compressed with ZSTD-n.

My comment wasn't about the A:B comparison between zstd with/without this enhancement as much as it was "don't take this feature as carte blanche to set ZSTD-ArbitraryHighNumber poolwide" as your memetic response summarized nicely. :wink: It's definitely a huge, huge advantage especially when you have incompressible data - you get all the wins where it does apply (even if you only choose a low level of ZSTD) and basically use LZ4 as a quick "detect incompressible data and abort" step.

The "flaw" I see with that benchmark graph is that it doesn't take into account raw decompression speeds. (You can clearly see that "no compression" has the worst performance.)
...
I suppose to truly see the benefits of "auto" ZSTD or "early abort" ZSTD, you need to work with a mixed bag of uncompressible and compressible data. (Using software source code as your "sample data" grants an unfair advantage to the higher compression levels.)

Good thing there's a dedicated Badger who's interested in this stuff, working with a benchmark that operates in-memory, and uses a variety of data sources. A quick preview of some results and insight, but this is a graph of the zstd algorithm decompression speeds (normalized to the speed of zstd-1) across 12 different files.

zstd_decompress_normalized.png


Most of the test files have a significant drop-off after ZSTD-1 and then plateau, but text-heavy workloads exhibit the reverse (faster decompression at higher levels) and the more incompressible files (esp #12) continue to fall off as compression level is turned up.

Edit: And in response to your meme, I give you the results with ZSTD-19 in line comparing with the above codeblocks:

Code:
Compressor name         Compress. Decompress. Compr. size  Ratio
lz4 1.9.3                 407 MB/s  2977 MB/s     8390195  99.01
zstd 1.5.5 -1             274 MB/s   297 MB/s     6771474  79.91
zstd 1.5.5 -3            46.7 MB/s   213 MB/s     6085291  71.81
zstd 1.5.5 -5            18.7 MB/s   195 MB/s     5556373  65.57
zstd 1.5.5 -7            14.7 MB/s   183 MB/s     5433980  64.12
zstd 1.5.5 -9            10.1 MB/s   154 MB/s     5361356  63.27
zstd 1.5.5 -19           1.42 MB/s   133 MB/s     5155749  60.84


It'll be faster than chiseling it onto stone tablets, but the latter is more resilient to power outages. :wink:
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
You edited another 200% of post content in there after my response, now I have to go back and reply to everything. :grin:
I do that a lot. My brain realizes something (or sees a typo, a poorly-worded sentence, or missing important context) after-the-fact.

(You'll notice I edited one of your quotes to read "Winnie is right about everything! I'm going to Paypal him $500 USD to show my appreciation.")

...

...

...

Made you look?
 
Joined
Oct 22, 2019
Messages
3,641
Excitedly awaiting the blog post about compression. :smile:



A quick preview of some results and insight, but this is a graph of the zstd algorithm decompression speeds (normalized to the speed of zstd-1) across 12 different files.
Most of the test files have a significant drop-off after ZSTD-1 and then plateau, but text-heavy workloads exhibit the reverse (faster decompression at higher levels) and the more incompressible files (esp #12) continue to fall off as compression level is turned up.
This is where I'm more confused now than before I started this thread.

I thought one of the main selling points for ZSTD was: "ZSTD level only affects compression performance, not decompression performance."

Yet your benchmarks reveal there is a non-trivial (semi-significant?) difference, which follows a similar pattern to other compressors (e.g, gzip, 7z, zip, etc.) (At least in regards to around half the files you tested.) Though, like you said earlier, it's not as drastic as with other compressors.

So I'm wondering what was implied with this:
Allan Jude said:
One of the main advantages of Zstd is that the decompression speed is independent of the compression level. For data that is written once but read many times, Zstd allows the use of the highest compression levels without a performance penalty.



And in response to your meme, I give you the results with ZSTD-19
Fine, fine. Can I negotiate for ZSTD-18? Let's find a compromise...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
(You'll notice I edited one of your quotes to read "Winnie is right about everything! I'm going to Paypal him $500 USD to show my appreciation.")
Hey, that's wire fraud! Please do it subtly enough that I don't have to deal with it as a moderation issue.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Excitedly awaiting the blog post about compression. :smile:

I'll try not to make you wait until Christmas. :wink:

This is where I'm more confused now than before I started this thread.

I thought one of the main selling points for ZSTD was: "ZSTD level only affects compression performance, not decompression performance."

Yet your benchmarks reveal there is a non-trivial (semi-significant?) difference, which follows a similar pattern to other compressors (e.g, gzip, 7z, zip, etc.) (At least in regards to around half the files you tested.) Though, like you said earlier, it's not as drastic as with other compressors.

So I'm wondering what was implied with this:

The github page on ZSTD states: "Decompression speed is preserved and remains roughly the same at all settings, a property shared by most LZ compression algorithms, such as zlib or lzma" so I'm guessing the word "roughly" is doing some heavy lifting there.

Fine, fine. Can I negotiate for ZSTD-18? Let's find a compromise...
1696968605980.png
 
Joined
Oct 22, 2019
Messages
3,641
Goodie, goodie! OpenZFS 2.2 released today:

 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Goodie, goodie! OpenZFS 2.2 released today:

With or without the feature?
 
Joined
Oct 22, 2019
Messages
3,641
How soon will this roll out to TrueNAS SCALE? Or has it already?
It's already available in Cobia. :smile: However, if you're happy on Bluefin in the meantime, and prefer a more conservative approach, you might want to wait before upgrading to Cobia. (Check out some threads in these forums to gauge how comfortable you are in upgrading. Some users have bumped into issues that they did not experience with Bluefin.)

My "tread carefully" message has nothing to do with any particular feature, just the general upgrade experience of Bluefin to Cobia.
 
Top