Hi All
TLDR;
I'm hoping someone can tell me if I've found a bug, or if I'm doing something wrong. I have two directory trees with almost exactly the same data, and I don't seem to be getting any space savings. Only 2 of the almost 4700+ files in each directory tree differ. All other files have exactly the same content and directory stat info (owner/group/size/date/time).
EDIT: Test was run with gzip compression and again with no compression.
BACKGROUND/TEST METHOD
I am trying to set up a system for backing up multiple WordPress sites from a public web host. (Since the bulk of the data is WordPress code, which is very compressible and the majority of the content is common across sites, I would expect deduplication to save considerable space, but it doesn't appear to be working.
Tests run on BUILD: FreeNAS-11.1-RELEASE
I created a dataset TANK/SITEBACKUPS which has dedup enabled and uses gzip compression (since gzip does a much better job than the default lz4 compression and in my application the extra execution time is not a problem-saving disk space is the priority).
I have a script that backs up the WP Site and the database. To test deduplication under ideal conditions I made the script backup the same site/database twice
/SITEBACKUPS/DUMMY
/SITEBACKUPS/DUMMY2
Here is the disk usage after both backups have been run:
du -h -d 1 .
238M ./DUMMY2
239M ./DUMMY
477M .
The two copies are identical except for two files. All 4 files total about 1500KB, and the differences are:
#>diff -rq DUMMY DUMMY2 | less
Files DUMMY/.BACKUPDATA/sqlbackup.sql and DUMMY2/.BACKUPDATA/sqlbackup.sql differ
Files DUMMY/.BACKUPDATA/sync_copylog.txt and DUMMY2/.BACKUPDATA/sync_copylog.txt differ
The only difference in
The rsync log file
To determine data usage, I used a script which relies on the the following zfs list command:
TEST RESULTS
Empty dataset before backup to ./DUMMY and ./DUMMY2 :
After first backup to ./DUMMY:
After first backup to ./DUMMY2:
Two copies appears to be double the space and no savings.
EDIT:
Just in case dedupliation and compression are mutually exclusive, I used the GUI to turn compression to off, deleted the contents of DUMMY/2 with rm -rf, and then recreated them with mkdir.
New Database Properties
Dataset edited from FreeNAS GUI and then DUMMY/2 deleted and recreated with rm -rf/mkdir
Empty Dataset before running backup
After First Backup Run to DUMMY2
After Second Backup Run to DUMMY
Directory Differences (Same as first run - almost identical)
TLDR;
I'm hoping someone can tell me if I've found a bug, or if I'm doing something wrong. I have two directory trees with almost exactly the same data, and I don't seem to be getting any space savings. Only 2 of the almost 4700+ files in each directory tree differ. All other files have exactly the same content and directory stat info (owner/group/size/date/time).
EDIT: Test was run with gzip compression and again with no compression.
BACKGROUND/TEST METHOD
I am trying to set up a system for backing up multiple WordPress sites from a public web host. (Since the bulk of the data is WordPress code, which is very compressible and the majority of the content is common across sites, I would expect deduplication to save considerable space, but it doesn't appear to be working.
Tests run on BUILD: FreeNAS-11.1-RELEASE
I created a dataset TANK/SITEBACKUPS which has dedup enabled and uses gzip compression (since gzip does a much better job than the default lz4 compression and in my application the extra execution time is not a problem-saving disk space is the priority).
Code:
TANK/SITEBACKUPS compression gzip local TANK/SITEBACKUPS dedup on local
I have a script that backs up the WP Site and the database. To test deduplication under ideal conditions I made the script backup the same site/database twice
/SITEBACKUPS/DUMMY
/SITEBACKUPS/DUMMY2
Here is the disk usage after both backups have been run:
du -h -d 1 .
238M ./DUMMY2
239M ./DUMMY
477M .
The two copies are identical except for two files. All 4 files total about 1500KB, and the differences are:
#>diff -rq DUMMY DUMMY2 | less
Files DUMMY/.BACKUPDATA/sqlbackup.sql and DUMMY2/.BACKUPDATA/sqlbackup.sql differ
Files DUMMY/.BACKUPDATA/sync_copylog.txt and DUMMY2/.BACKUPDATA/sync_copylog.txt differ
The only difference in
sqlbackup.sql
is a single line with the backup time.The rsync log file
sync_copylog.txt
is about 560K, and contains many differences becuseTo determine data usage, I used a script which relies on the the following zfs list command:
zfs list -t all -o name,used,avail,refer,creation,usedds,usedsnap,compression,compressratio,refcompressratio,lused -r "$DATASET"
TEST RESULTS
Empty dataset before backup to ./DUMMY and ./DUMMY2 :
Code:
Initial Empty Dataset ------------------------------------------------------------------------------------------------------------------- Recent Snapshots: NAME USED AVAIL REFER CREATION USEDDS USEDSNAP COMPRESS RATIO REFRATIO LUSED TANK/SITEBACKUPS 205K 14.5T 205K Mon Apr 9 4:13 2018 205K 0 gzip 1.00x 1.00x 40.5K -------------------------------------------------------------------------------------------------------------------
After first backup to ./DUMMY:
Code:
After backup to DUMMY ------------------------------------------------------------------------------------------------------------------- Recent Snapshots: NAME USED AVAIL REFER CREATION USEDDS USEDSNAP COMPRESS RATIO REFRATIO LUSED TANK/SITEBACKUPS 239M 14.5T 239M Mon Apr 9 4:13 2018 239M 0 gzip 1.25x 1.25x 259M -------------------------------------------------------------------------------------------------------------------
After first backup to ./DUMMY2:
Code:
After second backup of same site to DUMMY2 ------------------------------------------------------------------------------------------------------------------- Recent Snapshots: NAME USED AVAIL REFER CREATION USEDDS USEDSNAP COMPRESS RATIO REFRATIO LUSED TANK/SITEBACKUPS 478M 14.5T 478M Mon Apr 9 4:13 2018 478M 0 gzip 1.25x 1.25x 517M -------------------------------------------------------------------------------------------------------------------
Two copies appears to be double the space and no savings.
EDIT:
Just in case dedupliation and compression are mutually exclusive, I used the GUI to turn compression to off, deleted the contents of DUMMY/2 with rm -rf, and then recreated them with mkdir.
New Database Properties
Code:
TANK/SITEBACKUPS compression off local TANK/SITEBACKUPS dedup on local
Dataset edited from FreeNAS GUI and then DUMMY/2 deleted and recreated with rm -rf/mkdir
Empty Dataset before running backup
Code:
---------------------------------------------------------------------------------------------------------------------------------------------------------------- Recent Snapshots: NAME USED AVAIL REFER CREATION USEDDS USEDSNAP COMPRESS RATIO REFRATIO LUSED TANK/SITEBACKUPS 222K 14.5T 222K Mon Apr 9 4:13 2018 222K 0 off 1.00x 1.00x 44.5K ---------------------------------------------------------------------------------------------------------------------------------------------------------------- #>du -h 512B ./DUMMY2 512B ./DUMMY 1.5K .
After First Backup Run to DUMMY2
Code:
---------------------------------------------------------------------------------------------------------------------------------------------------------------- Recent Snapshots: NAME USED AVAIL REFER CREATION USEDDS USEDSNAP COMPRESS RATIO REFRATIO LUSED TANK/SITEBACKUPS 293M 14.5T 293M Mon Apr 9 4:13 2018 293M 0 off 1.00x 1.00x 259M ---------------------------------------------------------------------------------------------------------------------------------------------------------------- #>du -h -d1 292M ./DUMMY2 512B ./DUMMY 292M .
After Second Backup Run to DUMMY
Code:
---------------------------------------------------------------------------------------------------------------------------------------------------------------- Recent Snapshots: NAME USED AVAIL REFER CREATION USEDDS USEDSNAP COMPRESS RATIO REFRATIO LUSED TANK/SITEBACKUPS 586M 14.5T 586M Mon Apr 9 4:13 2018 586M 0 off 1.00x 1.00x 517M ---------------------------------------------------------------------------------------------------------------------------------------------------------------- #>du -h -d1 292M ./DUMMY2 292M ./DUMMY 585M .
Directory Differences (Same as first run - almost identical)
Code:
#>diff -rq DUMMY DUMMY2 | less Files DUMMY/.BACKUPDATA/sqlbackup.sql and DUMMY2/.BACKUPDATA/sqlbackup.sql differ Files DUMMY/.BACKUPDATA/sync_copylog.txt and DUMMY2/.BACKUPDATA/sync_copylog.txt differ
Last edited: