Re-backup of mix data (folders&archives) advice request

GolDenis

Dabbler
Joined
Oct 25, 2019
Messages
38
Hi guys.
Finally, I have nearly finished moving from old Synology to homebrewed FreeNAS.
The question is:
There are about 3/4 TB of old backups of accounting database (for the most part the huge amount of small .dbf files. Around 6xx.xxx files)
It appeared so that backups were made by 2/3 different backups systems so... it is a mix of un-compressed folders and same folders compressed by 7zip/rar/zip.

How you would recommend to re-pack all this data with maximum compression and changing equal files with hardlinks?
I think that the obvious solution is to de-compress all folders to get the whole un-compressed directories array and then, pack it again with something that can work with substitution of equal files with hard links (btw what should it be?). But…. it means hours of routine-job. Are there any smarter solutions?
Is there any freenas compatible compressor that can re-pack folders/archives with files/hardlinks substitution in one go?
 

GolDenis

Dabbler
Joined
Oct 25, 2019
Messages
38
Did you decide upon a solution for this?
Thinking a lot... about that...
But... no final decision yet.
Decide to left in as it is..... for while.... cause find another problem
Heat.
I have a small case (fractal design node 804) for that quantity HDD... so... need to make a serious update of the cooling system.
Especially, if I decide to pe-pack my 5Tb backups....
Have a chance to fry out my HDDs
But... if something - I'll let you know... (as a minimum... I'll appreciate for any critics/advice in that matter)
 

GolDenis

Dabbler
Joined
Oct 25, 2019
Messages
38
So far, I've stopped at this solution.
Advzip
I will dig in this direction.
looks like that can Re-pack .zip & un-ziped folders. Need to test it after cooling updates.
 

Jessep

Patron
Joined
Aug 19, 2018
Messages
379
Why not use the built in compression options? Set GZIP9 before you copy all legacy data, then change to LZ4 for least performance impact. All data previously compressed with GZIP will stay that way.

9.2.10.2. Compression
When selecting a compression type, balancing performance with the amount of disk space saved by compression is recommended. Compression is transparent to the client and applications as ZFS automatically compresses data as it is written to a compressed dataset or zvol and automatically decompresses that data as it is read. These compression algorithms are supported:

  • LZ4: default and recommended compression method as it allows compressed datasets to operate at near real-time speed. This algorithm only compresses files that will benefit from compression.
  • GZIP: levels 1, 6, and 9 where gzip fastest (level 1) gives the least compression and gzip maximum (level 9) provides the best compression but is discouraged due to its performance impact.
  • ZLE: fast but simple algorithm which eliminates runs of zeroes.
If OFF is selected as the Compression level when creating a dataset or zvol, compression will not be used on that dataset/zvol. This is not recommended as using LZ4 has a negligible performance impact and allows for more storage capacity.
 

GolDenis

Dabbler
Joined
Oct 25, 2019
Messages
38
My dear colleagues...
Let me a few hm..... weeks.... (I'm in Thailand, so I need order mostly EVERYTHING abroad) I'll tune my cooling fans & make a few experiments with real data...
I'll be glad to share results & we could discuss it
The problem I can see now - that I'm not sure, that compressors will understand, that we have a lot of equal files in differents zip & un-zip folders.
So... for the file system that will look like mix folders with some files & zip-files. & I'm sure (for now) that zfs will compress un-sip folders & add zip-folders without compression.
But my idea - find the way how to "feed" compressor (zfs/or external) UNZIPPED folder's array (without manual unzip, of cause)
I hope, that the compressor finds equal files in all folders & rezip it with the best result.
That's my goal.
 

GolDenis

Dabbler
Joined
Oct 25, 2019
Messages
38
As I promised... until my fan stuff somewhere outside Thailand.... I tuned the cooling system by myself & have a chance to make first -recompression test

Remind you, that I try to find out the best way how to re-pack mix (zip&un-zip) files/folders in one file/archive without hand routine & maximum compression level.
Folders have different structures, but a lot of common files (cause that is an backups of the same PC)
So..
Take 7zip archive & un-zip folders
That folder was included in 7zip archive
try to re-pack both with 7zip & see what I will get
2019-11-28_002751.jpg

we will talk about Galaxy 2016-08-06 23;00;11 (Full) 7.9G folder
& Alenks wrk 2019-01-30 16;24;45 (Full).7z 13.2G archive.
that archive includes that folder as:
2019-11-28_002108.jpg


we have the difference in size because its 2 years between that 2 folders, but believe me, many equal files in old un-zip folders & 7zip archive
2019-11-28_005947.jpg

sub-folder & archive have 11480 common files.
Great! Let's try to re-pack.
1st try - just simply repack by 7zip un-zipped folder & archive -> 18 196 311 225 byte
That nonsense! looks like 7zip don't count common files!
Make 2nd try & unzip archive file to normal (unzipped) folder & zip 2 unzipped folder together -> 17 268 956 135 byte!
Not a big difference!

1574882341710.png

so... short resume... 7zip - cant work with files in different sub-folders.
Need search for instruments working with sub-folders to get common files & change it to hard/soft links.
BTW, in that case, ideas of unzip all folders & put unzipped information on ZFS folders with a high level of compression - have a sense.
Hope ZFS can recognize equal files in different sub-folders.
But for the test it in that way I definitely don't have enough knowledge.
 

Attachments

  • 2019-11-28_002424.jpg
    2019-11-28_002424.jpg
    25.2 KB · Views: 188
Top