Trying to understand file compression ratios

Status
Not open for further replies.

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
I was asked to compare the storage of certain types of files on a FreeNAS 9.3.1 box and see how much they compressed. I copied about 101 GB of Linux snapshot files into two different volumes, one set to no compression and no dedup and the other volume to LZ4 only.

The "Plain" volume shows 101 GB.
The "LZ4" volume shows 9.1 GB used.

I'm trying to understand why "zdb -b LZ4" and the GUI are showing a compression ratio of 2.41%. How is it doing it's math?

Thanks.



[blah@nas-test1 /mnt]$ sudo zdb -b LZ4

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 66 of 116 ...
7.86G completed (1055MB/s) estimated time remaining: 0hr 00min 01sec
No leaks (block sum matches space maps exactly)

bp count: 173868
ganged count: 0
bp logical: 22589351936 avg: 129922
bp physical: 9362349568 avg: 53847 compression: 2.41
bp allocated: 9672335360 avg: 55630 compression: 2.34
bp deduped: 0 ref>1: 0 deduplication: 1.00
SPA allocated: 9672335360 used: 0.97%

additional, non-pointer bps of type 0: 24
Dittoed blocks on same vdev: 1714
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
What does "zfs get compressratio" show?
 

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
NAME PROPERTY VALUE SOURCE
LZ4 compressratio 2.41x -
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Are there large empty blocks in the test data?

I ask because when writing to a dataset with any flavor of compression enabled, ZFS doesn't bother allocating blocks that are entirely composed of zeros. In this case, the unallocated space doesn't show up in the compression ratio. I discovered this recently in the course of experimenting with ddrescue to an NFS share, where the source drives were (edit: mostly) blank.
Code:
[root@poweredge] /mnt/pool0/storage/ddrescue/sandbox# ls -al
total 113289941
drwxr-xr-x  2 root     wheel               5 Nov 27 16:20 ./
drwxr-xr-x  4 windows  windows             4 Nov 20 17:35 ../
-rw-r--r--  1 root     wheel    250000000000 Nov 29 17:51 hd253gj
-rw-r--r--  1 root     wheel     55763075072 Nov 21 21:31 wd1200jd
-rw-r--r--  1 root     wheel    320072933376 Nov 28 02:20 wd3200bekt
[root@poweredge] /mnt/pool0/storage/ddrescue/sandbox# du -h *
512B    hd253gj
512B    wd1200jd
108G    wd3200bekt
[root@poweredge] /mnt/pool0/storage/ddrescue/sandbox# zfs get compressratio pool0/storage/ddrescue/sandbox

NAME                            PROPERTY       VALUE  SOURCE

pool0/storage/ddrescue/sandbox  compressratio  1.13x  -
[root@poweredge] /mnt/pool0/storage/ddrescue/sandbox#
 
Last edited:

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
The data I am using for this test is Linux snapshot data. It does decrease in size considerably when I deduplicate the data, though not as much as LZ4 compression. So I suppose this is possible. ZFS is very smart indeed.

Thanks for the clue.
 

dpearcefl

Contributor
Joined
Aug 4, 2015
Messages
145
No doubt a lot of space. I was just unaware ZFS does not store "empty" space. and that does not figure into the compression ratio.

Moral of the story: "Compression ratio" in the GUI does not tell the whole story.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I was just unaware ZFS does not store "empty" space.
Same here! When I saw my results, I dug around and found this.

EDIT

One of the consequences is that if you read from empty space, the disks are idle but the CPU isn't.
 
Status
Not open for further replies.
Top