Compress an existing data set?

CAlbertson · Jan 31, 2013

I was really surprised that this worked. Or did it work?

I made a ZFS volume, then a data set, Next I exported the data set using AFP and started to write data to it.

Then I thought Why not compress it and save space? So I use the GIU and "edit dataset" and select "lzjb"and clicked "edit dataset". It worked, I think.

Questions:

1) How does this work? Does some process start and go back and compress data that is already on the disk?

2) If I wanted to verify the data are compressed, say using the terminal, how can I find the compression ratio?

ProtoSD · Jan 31, 2013

CAlbertson said:
2) If I wanted to verify the data are compressed, say using the terminal, how can I find the compression ratio?

Create another dataset without compression and copy some of the files from the compressed dataset and compare the sizes.

cyberjock · Feb 1, 2013

You can determine the ratio(I think) by zfs get all zpoolname.

Enabling/Disabling only enables/disables compression for new files. Previous files(technically data blocks) that already exist are NOT compressed or decompressed. If you want to compress them after enabling compression you'd have to move the files off the zpool, then back on.

Depending on alot of things compressions can be a performance killer for no significant benefit. Depending on the file type and compression type you can have a compression ratio smaller than 1.0x.

CAlbertson · Feb 1, 2013

cyberjock said:
You can determine the ratio(I think) by zfs get all zpoolname.

Enabling/Disabling only enables/disables compression for new files......

So this explains why nothing dramatic happened when I enabled compression. I expected 30 minutes of disk activity. I don't see a performance problem, no difference really. My Intel Atom CPU stays at about 70% idle in the "top" display

THis data set is holding an Apple Time Machine backup. Basically incremental backup of an entire computer. Much of the data is not compressible but a Mac OS X system disk has over a million small files and they all compress. I'm not sure about Nikon NEF files. I have 350GB of those. Over all I think it might be about 25% reduction, maybe the backup to FreeNAS has been running a bout 12 hours an is not yet done. Apple's Time Machine is slow and will almost halt if the computer is in use so as not to "hug" resources

fracai · Feb 1, 2013

cyberjock said:
You can determine the ratio(I think) by zfs get all zpoolname.

You can use:

Code:

zfs get compressratio pool

I usually use like to filter out datasets that don't see any benefit from compression and snapshots with:

Code:

zfs get compressratio pool | egrep -v '(@|1.00x)'

Enabling/Disabling only enables/disables compression for new files. Previous files(technically data blocks) that already exist are NOT compressed or decompressed. If you want to compress them after enabling compression you'd have to move the files off the zpool, then back on.

Wouldn't it be enough to make a copy of the file next to the original, delete the original, and rename the copy?

Code:

cp file file.compressed && rm file && mv file.compressed file

Unless you had dedup turned on, but we know where that discussion leads.

Depending on alot of things compressions can be a performance killer for no significant benefit. Depending on the file type and compression type you can have a compression ratio smaller than 1.0x.

Have you done / seen testing on this?
The articles that I've read indicate that LZJB should pretty much always be on. And depending on what data you're using gzip-1 or gzip-9 is appropriate. I'd especially be interested in situations where the ratio drops below 1. You'd think that ZFS would be intelligent enough to not store data compressed if it's increasing in size. Though maybe the compression process doesn't allow for that.

https://blogs.oracle.com/observatory/entry/zfs_compression_a_win_win
http://web.archive.org/web/20090201...008/10/13/zfs-mysqlinnodb-compression-update/

cyberjock · Feb 1, 2013

Yes, you could copy the file to another file, delete the original, then rename the copy.

Check out this.. just found it... http://denisy.dyndns.org/lzo_vs_lzjb/ He shows a chart where lzjb file size was about 112% by the chart. I experimented with the compression a little bit just to see how good/bad it was. But since I know for a fact I'll see very very little benefit I saw no point in using compression. To me it adds an unnecessary level of complexity I don't want to deal with. Not sure about the time machine backups Apple uses, but I know Acronis and O&O DiskImage both include their own compression algorithms and the file usually doesn't get any smaller. In my case, if my systems were being backed up with either of those I'd be crazy to try to add another layer of compression. I know one person in the forum enabled compression on his Atom and he couldn't even get 2MB/sec transfer rates with it. LOL!

When it comes to data recovery, compression can, in some situations, make recovery impossible. For me all the reasons above are why I didn't use it. I have yet to see someone actually come in the forum and discuss compression or dedup and actually have a genuinely valid use-case scenario for it. For most people it seems to be "I want to get every byte of storage space out of my drives that I can because I'm cheap".

Of course, at the end of the day its totally your perogative as to how you want to do your server. That's the great thing about owning your own setup. I'll never use it and I know I'd be stupid to ever use it for me. But your mileage may vary. And don't be surprised if your server suddenly just can't transfer data as fast as you want.

I remember reading somewhere that Sun's recommendation for when to use it was only if your files were guaranteed to compress to at least 2:1. But I can't find the link now.

fracai · Feb 1, 2013

That link also details how the data isn't stored compressed if that would be more to write out. And it reiterates the point that compressed data should yield fewer blocks to read and thus better I/O.

I'm also not sure why wanting to use every byte of storage space is a bad thing or an invalid use-case. What's wrong with using your resources? I certainly don't mind being called "thrifty".

And if the Sun recommendation is indeed 2:1, gzip-9 has me covered for the datasets where that's enabled. For my others I'm fine with a bit of latency on writing to check if the data is compressible.

Important Announcement for the TrueNAS Community.

Compress an existing data set?

CAlbertson

Dabbler

ProtoSD

MVP

cyberjock

Inactive Account

CAlbertson

Dabbler

fracai

Guru

cyberjock

Inactive Account

fracai

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Compress an existing data set?

CAlbertson

Dabbler

ProtoSD

MVP

cyberjock

Inactive Account

CAlbertson

Dabbler

fracai

Guru

cyberjock

Inactive Account

fracai

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Compress an existing data set?"

Similar threads