On disk compression in jails...

Status
Not open for further replies.

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
So, I'm in the netherland of FreeBSD knowledge which makes me about 100x smarter than a noob, and about 100x dumber than a sensei.

So I am hoping Dusan or Jordan can tell me something about this.

I decided to make a ZFS dataset in which my jails would be stored, and I decided to compress (using Limpel-Ziv or whatever the 'recommended' is) the dataset, since it's a portsjail, and I'll be doing a lot of (what I assume is) compressible stuff. Now, whether or not that's a logical idea (i'm sure it's not, and one of you will chastise me), I still notice something interesting.

The jails dataset (currently containing just one portjail) appears to have 2.9GB of files in it, as:
Code:
[root@freenas] /mnt/drkk# du -Ahs jails
2.9G    jails


But then when I look at the COMPRESSED size:
Code:
[root@freenas] /mnt/drkk# du -hs jails
4.1G    jails


What the hell? Why is it *BIGGER*? "OK", I think to myself, I must have the -A "apparent size" flag backwards in my head. So I do a sanity check on my presently 192kB long nginx-access.log file:

Code:
[root@freenas] /mnt/drkk/syslog/log# du -hs nginx-access.log
 27k    nginx-access.log
[root@freenas] /mnt/drkk/syslog/log# du -Ahs nginx-access.log
193k    nginx-access.log


WHOOPS! Guess not, looks like I *DO* get the compressed-actual-on-disk size when I do *NOT* use the -A flag, as I thought originally, and I get the actually decompressed size WITH the -A flag. As I thought! what the hell is going on?

So, this means, my "compressed jails" are taking up *FAR* more room than the constituent sizes of the files, *OR*, I am stupid about something in FreeBSD, and I need to be enlightened, if one of your guys would kindly do so?

Thanks.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Or...this is one of those ZFS'isms that Cyberjock was telling me about that makes things like "df" and other familiar Unicisms unwise to use.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Try: zfs get compressratio dell/jails/<jail name>

Or: zfs get compressratio | grep -v 1.00x | grep -v @

That second one will show you everything that sees a detectable benefit, excluding snapshots.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
My syslog dataset is around 9x and my jails are all around 1.5-3x.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
So, this means, my "compressed jails" are taking up *FAR* more room than the constituent sizes of the files, *OR*, I am stupid about something in FreeBSD, and I need to be enlightened, if one of your guys would kindly do so?
Yes, the portsjail really takes more pool capacity that is the size of the files, but the compression is not the root cause. If you run du (-A) on a fresh compressed jail you'll see that it consumes less space than the files it contains (in line with what would you normally expect). However, run portsnap fetch extract in the jail and you suddenly see the "anomaly". Result for the jails directory on one of my VMs:
Code:
[root@freenas] /mnt/tank# du -hs jails/
2.4G    jails/
[root@freenas] /mnt/tank# du -Ahs jails/
1.8G    jails/

Btw., these numbers are not exactly right, lets ask ZFS directly:
Code:
[root@freenas] /mnt/tank# zfs get used,lused tank/jails
NAME        PROPERTY    VALUE  SOURCE
tank/jails  used        1.90G  -
tank/jails  logicalused  1.30G  -

The used property tells you how much of the pool capacity does the datset consume (similiar to du -hs), logicalused tries to give you the "real" size of the data (similar to du -Ahs). The numbers are different, but this difference is easy to explain. warden (the jail system in FreeNAS) is using snapshots & clones to save disk space. When you create a jail it downloads the jail template (it gets its own dataset), snapshots it and then creates the jails by cloning that snapshot. This way you need to store all the FreeBSD files only once instead of creating copies for every jail. However, du is not aware of this (it's similar to having hardlinks and running du with the -l option). The jail template has ~0.5G, so that explains the difference nicely -- du counts the files twice, both in the template and in the jail. However, the original question is still here: even zfs says that the files use more pool capacity that is their total size.
The reason is the filesystem overhead -- metadata (dnode, indirect blocks, ...), 4k alignment, blocks smaller than recordsize, ... The overhead is relatively small when you store larger files but really stands out with tiny files. The ports tree contains thousands of tiny files.
Let's try an extreme case, an one byte file:
Code:
[root@freenas] /mnt/tank# echo -n 0 > zero
[root@freenas] /mnt/tank# du -Ah zero
512B    zero
[root@freenas] /mnt/tank# du -h zero
6.0k    zero

The zfs command won't tell us anything about individual files so lets use zdb:
Code:
[root@freenas] /mnt/tank/test# zdb -ddddd tank/test 8
Dataset tank/test [ZPL], ID 175, cr_txg 31043, 197K, 8 objects, rootbp DVA[0]=<0:155912000:2000> DVA[1]=<0:393e04000:2000> [L0 DMU objset] fletcher4 uncompressed LE contiguous unique double size=800L/800P birth=31047L/31047P fill=8 cksum=8f7f2d4a5:acfecd712dd:838d2a4d1ae72:4a51c15866ccc03
 
    Object  lvl  iblk  dblk  dsize  lsize  %full  type
        8    1    16K    512  5.50K    512  100.00  ZFS plain file
                                        168  bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED
        dnode maxblkid: 0
        path    /zero
        uid    0
        gid    0
        atime  Mon Jan 13 18:08:57 2014
        mtime  Mon Jan 13 18:08:57 2014
        ctime  Mon Jan 13 18:08:57 2014
        crtime  Mon Jan 13 18:08:57 2014
        gen    31047
        mode    100644
        size    1
        parent  4
        links  1
        pflags  40800000004
Indirect blocks:
              0 L0 0:1558fa000:2000 200L/200P F=1 B=31047/31047
 
                segment [0000000000000000, 0000000000000200) size  512

The dsize is the on-disk dize of the file. If you wonder about the 512byte difference to what du outputs here's the explanation (I'm currently reading the unfortunately outdated ZFS On-Disk Specification draft, experimenting with zdb and referencing cddl/contrib/opensolaris/cmd/zdb/zdb.c and sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu.c & dnode.c :)).
zdb gets the dsize here:
https://github.com/trueos/trueos/bl...trib/opensolaris/cmd/zdb/zdb.c?source=c#L1683 (it's called asize in the code)
https://github.com/trueos/trueos/bl...olaris/uts/common/fs/zfs/dmu.c?source=c#L1716
Checking du.c shows that it uses the fts_ functions to traverse the file tree. fts_read returns the stat information about each file. du -hs gives you st_size (rounded up to 512 byte blocks), du -Ahs gives you st_blocks (multiplied by blocksize). The stat zips through the various kernel layers and finally gets the st_blocks here: https://github.com/trueos/trueos/bl...olaris/uts/common/fs/zfs/dmu.c?source=c#L1775 (it's basically the same, but it adds one 512byte block for the dnode)
(For completeness, in our case du -h would return 6k even without the +1 as the FreeNAS shell environment contains BLOCKSIZE=K :))

Bonus:
For comparison lets fill the filesystem with 10000 one byte files (echo > file_i) to see that the overhead is really large:
Code:
[root@freenas] /mnt/tank# du -hs test
60M    test
[root@freenas] /mnt/tank# du -Ahs test
4.9M    test

And now do the same with 10000 6k zero files (dd if=/dev/zero of=file_i bs=6k count=1):
Code:
[root@freenas] /mnt/tank# du -hs new
6.2M    new
[root@freenas] /mnt/tank# du -Ahs new
58M    new

Lets take a look at one of the 6k files:
Code:
[root@freenas] /mnt/tank/new# dd if=/dev/zero of=6k bs=6k count=1
1+0 records in
1+0 records out
1024 bytes transferred in 0.000197 secs (5199718 bytes/sec)
[root@freenas] /mnt/tank/new# du -h 6k
512B    6k
[root@freenas] /mnt/tank/new# du -Ah 6k
6.0k    6k
[root@freenas] /mnt/tank/new# zdb -ddddd tank/new 10008
Dataset tank/new [ZPL], ID 161, cr_txg 30915, 4.82M, 10008 objects, rootbp DVA[0]=<0:15573a000:2000> DVA[1]=<0:393c32000:2000> [L0 DMU objset] fletcher4 uncompressed LE contiguous unique double size=800L/800P birth=31018L/31018P fill=10008 cksum=c2f305ae0:e9fd1dcff9d:aca6de83f2e9d:5d20a7f041ed1e7
 
    Object  lvl  iblk  dblk  dsize  lsize  %full  type
    10008    1    16K  6.00K      0  6.00K    0.00  ZFS plain file
                                        168  bonus  System attributes
        dnode flags: USED_BYTES USERUSED_ACCOUNTED
        dnode maxblkid: 0
        path    /6k
        uid    0
        gid    0
        atime  Mon Jan 13 18:06:52 2014
        mtime  Mon Jan 13 18:06:52 2014
        ctime  Mon Jan 13 18:06:52 2014
        crtime  Mon Jan 13 18:06:52 2014
        gen    31018
        mode    100644
        size    6144
        parent  4
        links  1
        pflags  40800000004
Indirect blocks:
 
                segment [0000000000000000, 0000000000001800) size 6.00K

Interesting, right? ;)
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Yes, the portsjail really takes more pool capacity that is the size of the files, but the compression is not the root cause. If you run du (-A) on a fresh compressed jail you'll see that it consumes less space than the files it contains (in line with what would you normally expect). However, run portsnap fetch extract in the jail and you suddenly see the "anomaly".
I'm not sure I follow you here. Are you saying the ports tree exposes this issue because it contains very many small files?
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
I'm not sure I follow you here. Are you saying the ports tree exposes this issue because it contains very many small files?
Correct. The ports tree I have here contains 120726 files with total "apparent size" of 340MB -- average file size is just ~3kb.
But this is not a ZFS specific issue. Every filesystem has some overhead (metadata, minimum block size, ...) and it's always most visible with small files. Even a few tens kB overhead on a several megabyte file is a small fraction, but 6kB to store a small file multiplied by hundred thousand small files makes a noticeable difference.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Correct. The ports tree I have here contains 120726 files with total "apparent size" of 340MB -- average file size is just ~3kb.
Let's see then, I currently have five jails, but they all share a ports tree dataset. I'm probably only saving a GB or so, but it makes keeping the tree up to date a lot easier.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And keep this in perspective too. Unless you had 10's or 100's(?) of millions of very small files you aren't likely to even notice the difference. We're talking about 1GB of disk space for the ports while 4000GB drives are the norm and people often have pools exceeding 10TB. Are you really going to sit here and tell me you noticed that missing .1%(or less)of disk space and went looking for it. Hell no, you went running commands and comparing numbers.
 
Status
Not open for further replies.
Top