Change and measure compression?

Status
Not open for further replies.

John M. Długosz

Contributor
Joined
Sep 22, 2013
Messages
160
It appears that I can change the Compression option of a dataset after it has been created. Does it change all the data to the new compression, or apply only to new writes?

How can I tell how well the compression is working? Looking at files, I expect I'll see only the "inside" size as compression is transparent.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
It appears that I can change the Compression option of a dataset after it has been created. Does it change all the data to the new compression, or apply only to new writes?
It only changes it for new writes.
How can I tell how well the compression is working? Looking at files, I expect I'll see only the "inside" size as compression is transparent.
You can check the compressratio and refcompressratio ZFS properties: zfs get compressratio,refcompressratio [dataset]
Find the description of the properties here: http://www.freebsd.org/cgi/man.cgi?query=zfs&manpath=FreeBSD+9.2-RELEASE
You can also use the logicalused and logicalreferenced properties: zfs get used,logicalused,referenced,logicalreferenced [dataset]
Check the description of the properties in the man page.
 

John M. Długosz

Contributor
Joined
Sep 22, 2013
Messages
160
So, that enables me to be selective in compression, even though there is no interface to change the compression of specified files.

I can crank up the compression for the initial TimeMachine image (which looks like it has lots of runs of repeated bytes in many of the segments) and then set it to a more performant value afterwards for subsequent updates.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Time Machine doesn't do its own compression? I find that so hard to believe...

Edit: I'll be damned.. it doesn't do compression!

What I'd do is make a dataset for your time machine backups and turn on compression for it.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I can crank up the compression for the initial TimeMachine image (which looks like it has lots of runs of repeated bytes in many of the segments) and then set it to a more performant value afterwards for subsequent updates.
I don't see the point of this.

Presumably, you're running TimeMachine over the network (rather than iSCSI) so your backup is stored in a compressed sparsebundle. So you may not actually see any benefit from compression at all.

Let's say there is a benefit though. TimeMachine completes the inital backup at to a dataset with high compression. Now you drop the compression in order to achieve some speed benefit. As TM starts pruning old backups, the bands inside the sparsebundle will need to be altered and those highly compressed blocks will be rewritten with the lower compression. If you get lucky, there may be significant numbers of bands that are never rewritten because the data is static in the source. I'm skeptical that this would be at all frequent. So, the likely case is that you'd eventually end up with a dataset that is entirely populated by the lower compression.

Just pick the compression level that you think is appropriate and stick with it. (I have lz4 on my TimeMachine datasets and the ratio is at 1; it's already compressed)

My email archive uses gzip-9, jails (ports tree ratio is over 3), syslog (ratio is 11.68!). Pretty much everything else started on lzjb and is now on lz4.
 

John M. Długosz

Contributor
Joined
Sep 22, 2013
Messages
160
I don't see the point of this.

Presumably, you're running TimeMachine over the network (rather than iSCSI) so your backup is stored in a compressed sparsebundle. So you may not actually see any benefit from compression at all.

The sparsebundle is not compressed. The band files are all the same size, and I saw that some of them contain runs of 512 0's or other fill patterns.

I'd rather use the NAS's cpu power to do the compression, and not rewrite entire band files when one sector changes. So I enabled compression for the dataset rather than redoing the sparsebundle creation with compression enabled.

I'm showing 1.14× after I recopied the band files after enabling compression. I don't know if all the existing snapshots are counted in the average or not.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Ah, ya know, I bet you're not using encryption on your sparsebundle. Sorry for the confusion.
 

John M. Długosz

Contributor
Joined
Sep 22, 2013
Messages
160
As TM starts pruning old backups, the bands inside the sparsebundle will need to be altered and those highly compressed blocks will be rewritten with the lower compression. If you get lucky, there may be significant numbers of bands that are never rewritten because the data is static in the source.

File number 1, for example, has a tiny clump of information near the end of a run of 0x4806000 zeros. After that it becomes all FFs for a while, then back to zeros.

That bugs me, more than the total savings. If it's just the unused sectors that are highly compressed, overwriting them will only affect the compression on the zfs blocks written, and the zeros are not changing.

Ragged sectors at the end of files appear to offer lots of consecutive zeros. I think I'll just leave it on high until I find that it's not good for performance reasons.
 

John M. Długosz

Contributor
Joined
Sep 22, 2013
Messages
160
Ah, ya know, I bet you're not using encryption on your sparsebundle. Sorry for the confusion.
You mean compression, not encryption?
I think the only way it can reasonably have random-access updates is to rewrite the entire band file. For NAS use we want to override the defaults and make the bands even bigger. So it's not a good mix. IAC, why should the laptop have to do the effort, when the NAS can offload that work and do it locally.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Encryption before compression is always stupid anyway because encryption deliberately creates pseudo-random characters and you never get much compression.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Status
Not open for further replies.
Top