gzip vs lz4 compression (reducing space for server backup)

Status
Not open for further replies.

NASbox

Guru
Joined
May 8, 2012
Messages
650
I thought the community might benefit from this, so I decided to write it up and post it. I'd be interested in comments from those with a lot of ZFS experience in a wide variety of use cases and data types.

I decided to do a bit of investigation on the possible reduction of disk usage by moving away from the common tar.gz files to using FreeNAS native compression and ZFS snapshots. Furthermore I was interested to see if gzip could be used to improve disk usage without an unreasonable resource penalty. The bottom line is that I was able to reduce the amount of space used by almost 16% without a significant increase in resource usage!

I started out with a number of tar.gz backups from my web sever. Content is very "texty" - lots of php, config file, email directories etc. Not much in the way of video files or other stuff that won't compress well.

After seeing the issues that people were having with gzip-9, wondered it maybe a "less agressive" gzip might yield good results with much less overhead. After a short search I found the following article:

Relative compression speed/time
https://catchchallenger.first-world...rk:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

The TLDR; version:
  • gzip 7,8,9 use an inordinate amount of resources for a minimal improvement in results.
  • The study show the effects of various compression methods/levels on a 445M tar file (linux 3.3 kernel)
    (Test system: Intel Core i5-750 CPU @ 2.67GHz, 8GB DDR3 RAM, tmpfs as ram disk)
Key Results:
Code:
		Compressed Size	 Percent	 Time
gzip 5  102328357 (98M)	 22.0%	   14s
gzip 7  100128597 (96M)	 21.5%	   21s
gzip 9   99740486 (96M)	 21.4%	   33s

lz4	 165844264 (159M)	35.6%	  1.3s
Given that gzip has a lot of history, and the authors of FreeBSD chose the default level of 6, I would assume that is likely an "optimum choice", and therefore I chose gzip-6 or my tests.

The files to be consolidated were a series of archives consuming a total of 193GB of space:
Code:
#>ls -h -la -D"-" /mnt/SCRATCH/server/*tar.gz
-rw-r--r--  1 root  backup   2.1G - /mnt/SCRATCH/server/backup_20150701_020843EDT.tar.gz
-rw-r--r--  1 root  backup   2.1G - /mnt/SCRATCH/server/backup_20150717_015015EDT.tar.gz
-rw-r--r--  1 root  backup   2.1G - /mnt/SCRATCH/server/backup_20150718_042735EDT.tar.gz
-rw-r--r--  1 root  backup   2.1G - /mnt/SCRATCH/server/backup_20150803_025824EDT.tar.gz
-rw-r--r--  1 root  backup   1.9G - /mnt/SCRATCH/server/backup_20150824_025647EDT.tar.gz
-rw-r--r--  1 root  backup   1.9G - /mnt/SCRATCH/server/backup_20150824_041728EDT.tar.gz
-rw-r--r--  1 root  backup   2.1G - /mnt/SCRATCH/server/backup_20150915_021148EDT.tar.gz
-rw-r--r--  1 root  backup   2.1G - /mnt/SCRATCH/server/backup_20150917_011601EDT.tar.gz
-rw-r--r--  1 root  backup   2.2G - /mnt/SCRATCH/server/backup_20151001_002204EDT.tar.gz
-rw-r--r--  1 root  backup   2.2G - /mnt/SCRATCH/server/backup_20151004_050332EDT.tar.gz
-rw-r--r--  1 root  backup   2.3G - /mnt/SCRATCH/server/backup_20151010_014052EDT.tar.gz
-rw-r--r--  1 root  backup   2.3G - /mnt/SCRATCH/server/backup_20151019_014804EDT.tar.gz
-rw-r--r--  1 root  backup   2.3G - /mnt/SCRATCH/server/backup_20151021_015640EDT.tar.gz
-rw-r--r--  1 root  backup   2.2G - /mnt/SCRATCH/server/backup_20151027_003628EDT.tar.gz
-rw-r--r--  1 root  backup   2.3G - /mnt/SCRATCH/server/backup_20151112_153705EST.tar.gz
-rw-r--r--  1 root  backup   2.2G - /mnt/SCRATCH/server/backup_20151129_042944EST.tar.gz
-rw-r--r--  1 root  backup   2.3G - /mnt/SCRATCH/server/backup_20151203_053039EST.tar.gz
-rw-r--r--  1 root  backup   2.2G - /mnt/SCRATCH/server/backup_20151205_061641EST.tar.gz
-rw-r--r--  1 root  backup   2.6G - /mnt/SCRATCH/server/backup_20151210_061210EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20151211_080515EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20151216_093211EST.tar.gz
-rw-r--r--  1 root  backup   2.7G - /mnt/SCRATCH/server/backup_20160105_051741EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20160125_133459EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20160126_042431EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20160128_022857EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20160204_063953EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20160211_062954EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20160216_020526EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20160219_063138EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20160302_040443EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20160312_124850EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20160319_152232EDT.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20160322_033301EDT.tar.gz
-rw-r--r--  1 root  backup   3.1G - /mnt/SCRATCH/server/backup_20160403_173406EDT.tar.gz
-rw-r--r--  1 root  backup   3.0G - /mnt/SCRATCH/server/backup_20160414_034630EDT.tar.gz
-rw-r--r--  1 root  backup   3.0G - /mnt/SCRATCH/server/backup_20160422_014517EDT.tar.gz
-rw-r--r--  1 root  backup   3.2G - /mnt/SCRATCH/server/backup_20160506_212455EDT.tar.gz
-rw-r--r--  1 root  backup   2.6G - /mnt/SCRATCH/server/backup_20160509_125245EDT.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20160516_034750EDT.tar.gz
-rw-r--r--  1 root  backup   2.7G - /mnt/SCRATCH/server/backup_20160517_052322EDT.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20160602_014259EDT.tar.gz
-rw-r--r--  1 root  backup   1.7G - /mnt/SCRATCH/server/backup_20160608_035514EDT.tar.gz
-rw-r--r--  1 root  backup   1.8G - /mnt/SCRATCH/server/backup_20160613_041130EDT.tar.gz
-rw-r--r--  1 root  backup   1.8G - /mnt/SCRATCH/server/backup_20160702_013513EDT.tar.gz
-rw-r--r--  1 root  backup   1.6G - /mnt/SCRATCH/server/backup_20160706_145855EDT.tar.gz
-rw-r--r--  1 root  backup   1.8G - /mnt/SCRATCH/server/backup_20160716_014828EDT.tar.gz
-rw-r--r--  1 root  backup   1.3G - /mnt/SCRATCH/server/backup_20160819_044527EDT.tar.gz
-rw-r--r--  1 root  backup   2.4G - /mnt/SCRATCH/server/backup_20160819_051142EDT.tar.gz
-rw-r--r--  1 root  backup   2.4G - /mnt/SCRATCH/server/backup_20160826_025527EDT.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20160831_112308EDT.tar.gz
-rw-r--r--  1 root  backup   2.4G - /mnt/SCRATCH/server/backup_20160907_025404EDT.tar.gz
-rw-r--r--  1 root  backup   2.6G - /mnt/SCRATCH/server/backup_20160912_191813EDT.tar.gz
-rw-r--r--  1 root  backup   2.4G - /mnt/SCRATCH/server/backup_20160923_112439EDT.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20161102_031420EDT.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20161105_031518EDT.tar.gz
-rw-r--r--  1 root  backup   2.6G - /mnt/SCRATCH/server/backup_20161119_002552EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20161124_150813EST.tar.gz
-rw-r--r--  1 root  backup   2.7G - /mnt/SCRATCH/server/backup_20161126_001339EST.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20161203_023437EST.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20161206_043904EST.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20161220_041650EST.tar.gz
-rw-r--r--  1 root  backup   2.5G - /mnt/SCRATCH/server/backup_20161222_131624EST.tar.gz
-rw-r--r--  1 root  backup   2.8G - /mnt/SCRATCH/server/backup_20170228_101149EST.tar.gz
-rw-r--r--  1 root  backup   2.9G - /mnt/SCRATCH/server/backup_20170303_234458EST.tar.gz
-rw-r--r--  1 root  backup   3.3G - /mnt/SCRATCH/server/backup_20170323_015542EDT.tar.gz
-rw-r--r--  1 root  backup   3.5G - /mnt/SCRATCH/server/backup_20170411_180401EDT.tar.gz
-rw-r--r--  1 root  backup   3.2G - /mnt/SCRATCH/server/backup_20170418_021028EDT.tar.gz
-rw-r--r--  1 root  backup   3.5G - /mnt/SCRATCH/server/backup_20170502_052248EDT.tar.gz
-rw-r--r--  1 root  backup   3.2G - /mnt/SCRATCH/server/backup_20170506_003952EDT.tar.gz
-rw-r--r--  1 root  backup   3.2G - /mnt/SCRATCH/server/backup_20170512_110522EDT.tar.gz
-rw-r--r--  1 root  backup   3.2G - /mnt/SCRATCH/server/backup_20170513_010705EDT.tar.gz
-rw-r--r--  1 root  backup   3.1G - /mnt/SCRATCH/server/backup_20170707_024354EDT.tar.gz
-rw-r--r--  1 root  backup   3.3G - /mnt/SCRATCH/server/backup_20170819_095909EDT.tar.gz
-rw-r--r--  1 root  backup   3.3G - /mnt/SCRATCH/server/backup_20170824_010133EDT.tar.gz
-rw-r--r--  1 root  backup   3.3G - /mnt/SCRATCH/server/backup_20171110_014414EST.tar.gz
	 194G total
203542006 total
I then used a script to extract each tar overwriting only newer files, and then making a snapshot to preserve state with the following script
Code:
for t in $(ls -1 ..directory of files to compress.. )
do
   echo $t
   snapname=$(basename "$d" | cut -d'.' -f1)
   tar --keep-newer-files -xzvf "$t" -C/mnt/TANK/tar
   zfs snapshot "$snapname"
done
The following hardware was used:
FreeNAS-11.1-RELEASE
GA-Z77X-UD5H Rev 1.1, i7-3770 CPU @ 3.40GHz ~ 32GB (DDR3)
LSI 9211-8i 8-port PCI-E SATA HBA
RAIDZ2: WD60EFRX x 8 [30TB] (Result dataset)
RAIDZ HDS724040ALE640 x 2 (Archive files were stored here)

Results with LZ4
Code:
real	29m55.899s
user	11m58.223s
sys	 7m41.620s

#>zfs list  -o referenced,used,usedbydataset,usedbysnapshots,usedbychildren,logicalreferenced,logicalused,compressratio TANK/tar
REFER   USED  USEDDS  USEDSNAP  USEDCHILD  LREFER  LUSED  RATIO
14.9G  27.6G   14.9G	 12.7G		  0   16.9G  38.1G  1.76x

#>zfs list -t snap -r TANK/hmtar			 
NAME										USED  AVAIL  REFER  MOUNTPOINT
TANK/tar@server/backup_20150701_020843EDT   150M	  -  3.19G  -
TANK/tar@server/backup_20150717_015015EDT   121M	  -  3.46G  -
TANK/tar@server/backup_20150718_042735EDT   121M	  -  3.48G  -
TANK/tar@server/backup_20150803_025824EDT   126M	  -  3.63G  -
TANK/tar@server/backup_20150824_025647EDT  34.6M	  -  3.78G  -
TANK/tar@server/backup_20150824_041728EDT  34.2M	  -  3.78G  -
TANK/tar@server/backup_20150915_021148EDT   129M	  -  4.04G  -
TANK/tar@server/backup_20150917_011601EDT   130M	  -  4.12G  -
TANK/tar@server/backup_20151001_002204EDT   132M	  -  4.29G  -
TANK/tar@server/backup_20151004_050332EDT   132M	  -  4.30G  -
TANK/tar@server/backup_20151010_014052EDT   139M	  -  4.46G  -
TANK/tar@server/backup_20151019_014804EDT   137M	  -  4.58G  -
TANK/tar@server/backup_20151021_015640EDT   136M	  -  4.59G  -
TANK/tar@server/backup_20151027_003628EDT   142M	  -  4.63G  -
TANK/tar@server/backup_20151112_153705EST   141M	  -  4.76G  -
TANK/tar@server/backup_20151129_042944EST   132M	  -  4.83G  -
TANK/tar@server/backup_20151203_053039EST   127M	  -  4.94G  -
TANK/tar@server/backup_20151205_061641EST   128M	  -  4.97G  -
TANK/tar@server/backup_20151210_061210EST   147M	  -  5.77G  -
TANK/tar@server/backup_20151211_080515EST   134M	  -  6.28G  -
TANK/tar@server/backup_20151216_093211EST   142M	  -  6.31G  -
TANK/tar@server/backup_20160105_051741EST   146M	  -  6.39G  -
TANK/tar@server/backup_20160125_133459EST  50.6M	  -  6.70G  -
TANK/tar@server/backup_20160126_042431EST  44.9M	  -  6.71G  -
TANK/tar@server/backup_20160128_022857EST   128M	  -  6.71G  -
TANK/tar@server/backup_20160204_063953EST   138M	  -  6.76G  -
TANK/tar@server/backup_20160211_062954EST   131M	  -  7.05G  -
TANK/tar@server/backup_20160216_020526EST   129M	  -  7.06G  -
TANK/tar@server/backup_20160219_063138EST   141M	  -  7.10G  -
TANK/tar@server/backup_20160302_040443EST   135M	  -  7.19G  -
TANK/tar@server/backup_20160312_124850EST   133M	  -  7.22G  -
TANK/tar@server/backup_20160319_152232EDT   131M	  -  7.24G  -
TANK/tar@server/backup_20160322_033301EDT   198M	  -  7.31G  -
TANK/tar@server/backup_20160403_173406EDT   158M	  -  7.60G  -
TANK/tar@server/backup_20160414_034630EDT   177M	  -  7.69G  -
TANK/tar@server/backup_20160422_014517EDT   178M	  -  7.72G  -
TANK/tar@server/backup_20160506_212455EDT   143M	  -  8.00G  -
TANK/tar@server/backup_20160509_125245EDT   142M	  -  8.17G  -
TANK/tar@server/backup_20160516_034750EDT   141M	  -  8.23G  -
TANK/tar@server/backup_20160517_052322EDT   145M	  -  8.43G  -
TANK/tar@server/backup_20160602_014259EDT   150M	  -  8.64G  -
TANK/tar@server/backup_20160608_035514EDT   147M	  -  8.66G  -
TANK/tar@server/backup_20160613_041130EDT   147M	  -  8.69G  -
TANK/tar@server/backup_20160702_013513EDT   147M	  -  8.75G  -
TANK/tar@server/backup_20160706_145855EDT   152M	  -  8.80G  -
TANK/tar@server/backup_20160716_014828EDT  27.0M	  -  9.03G  -
TANK/tar@server/backup_20160819_044527EDT  26.7M	  -  9.03G  -
TANK/tar@server/backup_20160819_051142EDT   175M	  -  9.89G  -
TANK/tar@server/backup_20160826_025527EDT   167M	  -  9.92G  -
TANK/tar@server/backup_20160831_112308EDT   165M	  -  10.0G  -
TANK/tar@server/backup_20160907_025404EDT   169M	  -  10.1G  -
TANK/tar@server/backup_20160912_191813EDT   161M	  -  10.2G  -
TANK/tar@server/backup_20160923_112439EDT   203M	  -  10.5G  -
TANK/tar@server/backup_20161102_031420EDT   164M	  -  10.7G  -
TANK/tar@server/backup_20161105_031518EDT   155M	  -  10.7G  -
TANK/tar@server/backup_20161119_002552EST   182M	  -  11.1G  -
TANK/tar@server/backup_20161124_150813EST   163M	  -  11.3G  -
TANK/tar@server/backup_20161126_001339EST   161M	  -  11.3G  -
TANK/tar@server/backup_20161203_023437EST   160M	  -  11.4G  -
TANK/tar@server/backup_20161206_043904EST   161M	  -  11.4G  -
TANK/tar@server/backup_20161220_041650EST   163M	  -  11.5G  -
TANK/tar@server/backup_20161222_131624EST   167M	  -  11.5G  -
TANK/tar@server/backup_20170228_101149EST   176M	  -  12.0G  -
TANK/tar@server/backup_20170303_234458EST   176M	  -  12.0G  -
TANK/tar@server/backup_20170323_015542EDT   226M	  -  12.6G  -
TANK/tar@server/backup_20170411_180401EDT   175M	  -  12.8G  -
TANK/tar@server/backup_20170418_021028EDT   172M	  -  12.8G  -
TANK/tar@server/backup_20170502_052248EDT   173M	  -  13.2G  -
TANK/tar@server/backup_20170506_003952EDT   172M	  -  13.2G  -
TANK/tar@server/backup_20170512_110522EDT   159M	  -  13.3G  -
TANK/tar@server/backup_20170513_010705EDT   157M	  -  13.3G  -
TANK/tar@server/backup_20170707_024354EDT   217M	  -  13.8G  -
TANK/tar@server/backup_20170819_095909EDT   188M	  -  14.2G  -
TANK/tar@server/backup_20170824_010133EDT   169M	  -  14.2G  -
TANK/tar@server/backup_20171110_014414EST	  0	  -  14.9G  -
=====================================================================

Results with gzip 6
Code:
real	30m57.370s
user	12m31.919s
sys	 7m59.718s

#>zfs list  -o referenced,used,usedbydataset,usedbysnapshots,usedbychildren,logicalreferenced,logicalused,compressratio TANK/tar
REFER   USED  USEDDS  USEDSNAP  USEDCHILD  LREFER  LUSED  RATIO
13.4G  23.8G   13.4G	 10.4G		  0   16.9G  38.1G  2.14x

#>zfs list -t all -r TANK/tar
NAME										USED  AVAIL  REFER  MOUNTPOINT
TANK/tar								   23.8G  21.5T  13.4G  /mnt/TANK/hmtar
TANK/tar@server/backup_20150701_020843EDT   117M	  -  2.71G  -
TANK/tar@server/backup_20150717_015015EDT  96.8M	  -  2.92G  -
TANK/tar@server/backup_20150718_042735EDT  96.8M	  -  2.95G  -
TANK/tar@server/backup_20150803_025824EDT   101M	  -  3.07G  -
TANK/tar@server/backup_20150824_025647EDT  33.1M	  -  3.20G  -
TANK/tar@server/backup_20150824_041728EDT  32.8M	  -  3.20G  -
TANK/tar@server/backup_20150915_021148EDT   103M	  -  3.43G  -
TANK/tar@server/backup_20150917_011601EDT   105M	  -  3.51G  -
TANK/tar@server/backup_20151001_002204EDT   107M	  -  3.65G  -
TANK/tar@server/backup_20151004_050332EDT   106M	  -  3.66G  -
TANK/tar@server/backup_20151010_014052EDT   113M	  -  3.81G  -
TANK/tar@server/backup_20151019_014804EDT   110M	  -  3.92G  -
TANK/tar@server/backup_20151021_015640EDT   110M	  -  3.93G  -
TANK/tar@server/backup_20151027_003628EDT   115M	  -  3.96G  -
TANK/tar@server/backup_20151112_153705EST   115M	  -  4.08G  -
TANK/tar@server/backup_20151129_042944EST   105M	  -  4.14G  -
TANK/tar@server/backup_20151203_053039EST   101M	  -  4.24G  -
TANK/tar@server/backup_20151205_061641EST   101M	  -  4.27G  -
TANK/tar@server/backup_20151210_061210EST   118M	  -  5.04G  -
TANK/tar@server/backup_20151211_080515EST   107M	  -  5.54G  -
TANK/tar@server/backup_20151216_093211EST   115M	  -  5.58G  -
TANK/tar@server/backup_20160105_051741EST   119M	  -  5.65G  -
TANK/tar@server/backup_20160125_133459EST  48.3M	  -  5.94G  -
TANK/tar@server/backup_20160126_042431EST  42.6M	  -  5.95G  -
TANK/tar@server/backup_20160128_022857EST   103M	  -  5.95G  -
TANK/tar@server/backup_20160204_063953EST   112M	  -  6.00G  -
TANK/tar@server/backup_20160211_062954EST   105M	  -  6.26G  -
TANK/tar@server/backup_20160216_020526EST   104M	  -  6.27G  -
TANK/tar@server/backup_20160219_063138EST   114M	  -  6.31G  -
TANK/tar@server/backup_20160302_040443EST   108M	  -  6.39G  -
TANK/tar@server/backup_20160312_124850EST   108M	  -  6.42G  -
TANK/tar@server/backup_20160319_152232EDT   105M	  -  6.44G  -
TANK/tar@server/backup_20160322_033301EDT   163M	  -  6.50G  -
TANK/tar@server/backup_20160403_173406EDT   129M	  -  6.78G  -
TANK/tar@server/backup_20160414_034630EDT   140M	  -  6.86G  -
TANK/tar@server/backup_20160422_014517EDT   141M	  -  6.88G  -
TANK/tar@server/backup_20160506_212455EDT   116M	  -  7.14G  -
TANK/tar@server/backup_20160509_125245EDT   115M	  -  7.29G  -
TANK/tar@server/backup_20160516_034750EDT   114M	  -  7.34G  -
TANK/tar@server/backup_20160517_052322EDT   118M	  -  7.54G  -
TANK/tar@server/backup_20160602_014259EDT   123M	  -  7.72G  -
TANK/tar@server/backup_20160608_035514EDT   120M	  -  7.74G  -
TANK/tar@server/backup_20160613_041130EDT   120M	  -  7.76G  -
TANK/tar@server/backup_20160702_013513EDT   119M	  -  7.82G  -
TANK/tar@server/backup_20160706_145855EDT   124M	  -  7.86G  -
TANK/tar@server/backup_20160716_014828EDT  27.0M	  -  8.09G  -
TANK/tar@server/backup_20160819_044527EDT  26.7M	  -  8.09G  -
TANK/tar@server/backup_20160819_051142EDT   145M	  -  8.89G  -
TANK/tar@server/backup_20160826_025527EDT   137M	  -  8.92G  -
TANK/tar@server/backup_20160831_112308EDT   136M	  -  9.02G  -
TANK/tar@server/backup_20160907_025404EDT   140M	  -  9.07G  -
TANK/tar@server/backup_20160912_191813EDT   132M	  -  9.22G  -
TANK/tar@server/backup_20160923_112439EDT   163M	  -  9.43G  -
TANK/tar@server/backup_20161102_031420EDT   134M	  -  9.58G  -
TANK/tar@server/backup_20161105_031518EDT   126M	  -  9.59G  -
TANK/tar@server/backup_20161119_002552EST   151M	  -  9.97G  -
TANK/tar@server/backup_20161124_150813EST   133M	  -  10.1G  -
TANK/tar@server/backup_20161126_001339EST   132M	  -  10.2G  -
TANK/tar@server/backup_20161203_023437EST   131M	  -  10.3G  -
TANK/tar@server/backup_20161206_043904EST   131M	  -  10.3G  -
TANK/tar@server/backup_20161220_041650EST   134M	  -  10.4G  -
TANK/tar@server/backup_20161222_131624EST   137M	  -  10.4G  -
TANK/tar@server/backup_20170228_101149EST   144M	  -  10.8G  -
TANK/tar@server/backup_20170303_234458EST   145M	  -  10.8G  -
TANK/tar@server/backup_20170323_015542EDT   185M	  -  11.3G  -
TANK/tar@server/backup_20170411_180401EDT   143M	  -  11.5G  -
TANK/tar@server/backup_20170418_021028EDT   141M	  -  11.5G  -
TANK/tar@server/backup_20170502_052248EDT   141M	  -  11.9G  -
TANK/tar@server/backup_20170506_003952EDT   140M	  -  11.9G  -
TANK/tar@server/backup_20170512_110522EDT   130M	  -  12.0G  -
TANK/tar@server/backup_20170513_010705EDT   128M	  -  12.0G  -
TANK/tar@server/backup_20170707_024354EDT   176M	  -  12.4G  -
TANK/tar@server/backup_20170819_095909EDT   156M	  -  12.7G  -
TANK/tar@server/backup_20170824_010133EDT   137M	  -  12.8G  -
TANK/tar@server/backup_20171110_014414EST	  0	  -  13.4G  -
=====================================================================

Unless I'm missing something ZFS really changes the paradigm for backups when there is only relatively small deltas between backups. A 194GB of tar.gz archives reduces down to 23.8GB with gzip-6 (or 27.6GB with the default lz4).

Snapshots are a lot easier to selectively recover files from and they also take a lot less space. Gzip also significantly improves space usage when you know that your data is highly compressible (i.e. text). Clearly if the data were medial files or other uncompressible data gzip wouldn't be a good choice, but creating datasets optimized to the can save a lot of space!

As a noob I sure wish I had know this sooner! Comments?
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Gzip vs lz4 has been beat to death, but you know what would be cool...if FreeNAS supported the other lz4 compressor, lz4hc. It would be a lot faster than gzip with similar or better space savings. To my knowledge it isn't available in FreeNAS but would be cool to use it and be able to set the compression level to a user configurable value (0-12).
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377
The problem is the type of data you're writing. I split my server up with 2 very very media heavy datasets (no compression) and a data based one with lz4, however my server has a truly weeny CPU (FreeNAS is amazing it's still chugging on it) so I'd be very reluctant to move to a higher compression set until I upgrade my server.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
A 194GB of tar.gz archives reduces down to 23.8GB with gzip-6 (or 27.6GB with the default lz4).
This really doesn't make sense. A .tar.gz file is (or should be) already gzip'ed (that's what the .gz stands for). Neither lz4 nor gzip (at any level) should result in significant space savings on this data--unless you actually have just tar files that aren't gzip'ed (and are therefore misnamed).

Edit: But it's easy to figure this out: What's the output of file /mnt/SCRATCH/server/backup_20160414_034630EDT.tar.gz?
 
Last edited:
Joined
Jul 3, 2015
Messages
926
ZSTD will apparently give us gzip compress ratio at lz4 performance. Coming soon ....
 

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
Notes that Zstandard (http://facebook.github.io/zstd/) is on the Roadmap for this year.
That's great news! What I'd be interested in if there are plans for any others? I've read about "LZ4 fast 8" and it seems to offer a compression ratio near to LZ4 default (1.8 vs 2.1), while increasing compression throughput by 1/3! The latest version of lz4 (1.8.1.2) also offers dictionary compression like zstandard.
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
Thanks all for the comments... I'' add ...
The problem is the type of data you're writing. I split my server up with 2 very very media heavy datasets (no compression) and a data based one with lz4, however my server has a truly weeny CPU (FreeNAS is amazing it's still chugging on it) so I'd be very reluctant to move to a higher compression set until I upgrade my server.
I don't know what hardware your running, or what data you have in your pool, but the point was for my setup, with my data I got significant savings WITHOUT a lot of extra overhead. If you've got the space on your pool, create a dataset with gzip-6 and copy a bunch of stuff over to test it.
Notes that Zstandard (http://facebook.github.io/zstd/) is on the Roadmap for this year.
Very exciting thanks (& @Johnny Fartpants , @bigphil ) for mentioning that-I had not heard of Zstandard or "LZ4 fast 8" before.
This really doesn't make sense. A .tar.gz file is (or should be) already gzip'ed (that's what the .gz stands for). Neither lz4 nor gzip (at any level) should result in significant space savings on this data--unless you actually have just tar files that aren't gzip'ed (and are therefore misnamed).

Edit: But it's easy to figure this out: What's the output of file /mnt/SCRATCH/server/backup_20160414_034630EDT.tar.gz?
Read the article again... you missed the point. (I mention that I untarred/ungzipped each file one by once and created a snapshot in between each file.)

I was able to take 194GB of gzip files and archive the same data using only 23.8GB of disk space by using the native file system with gzip compression and snapshots rather than tar.gz archive files!

This is a totally different paradigm because ZFS is a very different file system from the common file systems most people have been using like EXT3, EXT4, (ex)FAT(32), NTFS etc. Since most of the data is the same, it's only necessary to store the delta, and then compress it. The result is much easier to work with and much, much smaller!

---
Final thoughts/points to ponder

The memory footprint appears to be much lighter for gzip than any of the others. I don't know how significant that is for the average user.

I also don't know what would happen if the data were all different each day (such as temperatures, power flows etc.). If you were archiving huge data files out of some industrial process where the data was compressible text, but unique. Savings would have to be at the block level rather than the file level since there wouldn't be any identical files. (Multiple WordPress installations are more or less identical except for a few unique config files).

Depending on the nature of the data the savings might be possible at the block level rather than the file level. For example, a set of temperatures taken every second where the temperatures remain constant. The only thing that will change is the time stamp. Is this a case where deduplication would do better than compression?

I wish I had thought of this sooner (or seen it in this forum). I was rsyncing my remote server to create a local copy, and then keeping gzips just in case the rsync corrupted my local copy (either connectivity issues or corrupted data at the source). I rewrote my backup to make a named snapshot instead of using tar. Runs a lot faster, and saves a lot of space and gives me the ability to easily find/restore a file from any of the snapshots. Faster, Easier, Better, Cheaper... it doesn't get much better than that!

Thanks again for the comments/info.
 

diskdiddler

Wizard
Joined
Jul 9, 2014
Messages
2,377

toadman

Guru
Joined
Jun 4, 2013
Messages
619
How would we switch an existing dataset to this? Or just make a new one and move data across/

You can change compression (for new data written) on a dataset with zfs set compression=gzip-6 pool/dataset. (or gzip-x for gzip level x)
 

NASbox

Guru
Joined
May 8, 2012
Messages
650
You can change compression (for new data written) on a dataset with zfs set compression=gzip-6 pool/dataset. (or gzip-x for gzip level x)
What is the best way to update existing data to the new compression scheme?
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
What is the best way to update existing data to the new compression scheme?

I think you have to copy/move it to a dataset set with the compression desired. (Which includes the existing dataset assuming you changed the compression to the desired level.) i.e. you could copy/move it to a new folder inside the existing dataset (with compression level changed) or to a new dataset with the desired compression level. It just needs to be read and rewritten.
 
Status
Not open for further replies.
Top