File size different on external HDD vs FreeNAS

Status
Not open for further replies.

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
Hi,

I am planning to upgrade my Hardware (Case and Mobo).
I have posted a question in regards to the HW upgrade in this forum, and the conclusion was, that it will be an easy swap without new configuration and setup of FreeNAS.

However, before I start, I want to make sure, that I have a backup JUST IN CASE.

So I connected an external HDD (2TB) to freenas and formated it as UFS2. Than I used a simple 'cp -iprv dir1 dir2' to copy the files. All worked nicely, however, FreeNAS tells me, that the source folder (dataset in this case) is ~28GB (with gzip and 1.58% compression) but once the copy finished, the size on destination disk was 119GB. I have anticipated 45GB but not 119GB.

What am I missing and how can I tell what the actual size the data will be, once it is on the destination drive.

Thanks.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
With compression enabled on a ZFS dataset, any completely empty blocks (all zero) go unallocated. This is not true of UFS, as far as I know, and might explain what you're seeing.
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
Hi,

It makes sense what you are saying, but it's 2+ times the size that FreeNAS is showing.

Is there a way to remedy this issue, like manual "trim" with dd?

Thanks.


Sent from my iPhone using Tapatalk
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Is there a way to remedy this issue
If my speculation is correct, then what you're seeing on UFS is your actual data size, and what you see on ZFS is a much more efficient allocation of resources. In other words, there isn't a remedy and trimming doesn't make any sense.
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
Hi,

thanks for the reply.

If ZFS can really compress the files by almost a 3rd of their size, than that is great news and one reason why to run ZFS (FreeNAS). Bummer is, that I should have gotten a 4TB HDD and not a 2TB ... oh well, will find use for it.

Thanks.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
My explanation is only valid if there are large empty blocks in the files. Is that what you have?

Is there any reason why you're using UFS and not ZFS on the external drive?
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
I am not sure how to see if I have large empty blocks. Is there a way to find out?

The reason I went with UFS is, that I only have one disk (and only means to connect one disk) for backup. UFS seemed a better choice, because, if I have corrupt data on ZFS, I am not getting the file back. With UFS, I can read the file and maybe even attempt a repair, if needed.

There is a very small probability, that I will need the backup anyway. It was just in case. I understand that ZFS with mirrored disks as backup is much superior than UFS, but I needed something easy and dirty. In an event, that I need to use the backup, I can boot up a Ubuntu live USB, mount the drive and just read the files I need. ZFS will not be that easy using Ubuntu. I would have to use FreeBSD Live, and I am not familiar with it.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I am not sure how to see if I have large empty blocks. Is there a way to find out?
You could hex dump the files and see if there are large sections of all zeros.
UFS seemed a better choice, because, if I have corrupt data on ZFS, I am not getting the file back.
Fair enough.
I understand that ZFS with mirrored disks as backup is much superior than UFS
I'm sure you understand that redundancy, whether from a mirror or from RAIDZ, is not a substitute for backup. The only data you don't have to backup is the data you don't mind losing.
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
Do you have any databases? Or files that came from databases?
Actually, when I look at the folders that had the biggest discrepancy between the size on ZFS vs UFS, all had databases inside. Specially the one folder, that showed 27GB on ZFS (compressed) was 119GB on UFS, had quite a few databases (its a backup destination for a remote server).
The folders that do not contain any DB's have almost no size difference.

How can this explained? How does ZFS treat DB files different than regular files, like text files, pictures etc.?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
You were not aware of two things. And many IT people do not know that either.

First, you need to learn about sparse files e.g. from http://en.wikipedia.org/wiki/sparse_file

The second piece is the knowledge which utilities properly copy sparse files. As far as I understand it, command cp, as included in FreeNAS (FreeBSD), has the traditional behaviour (of the original Unix cp command) of converting spares files into the files with actual NULLs written into empty space of a sparse file. When used with the option S command tar would recreate sparse files, c.f. http://www.freebsd.org/cgi/man.cgi?query=tar

P.S.
GNU cp has a nonstandard option --sparse
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
First, you need to learn about sparse files e.g. from http://en.wikipedia.org/wiki/sparse_file
Interesting, I did not know that. Learned something new ...

I actually did not use cp to copy the files over. I ended up using RSYNC. That decision proofed to be the better one for two reasons.
First, after I've done the initial sync of data, I needed to sync some folders again, because they had changed.
Second, and that after I now learned about sparse files, cp does not support sparse files, but RSYNC does ...

I will know for next time. Thanks for the info!!!
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
I did not initially.

However, for testing sake, I deleted the folder from the backup destination and ran rsync with --sparse again. It ended up saving me ~5Gb.


Sent from my iPhone using Tapatalk
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
Could you retest with tar, please? :)
Sure I can, however, I have never used tar before, so I am not familiar with it.

I use rsync like this:
rsync -avcPS --delete source/ destination/
(P is optional, I dont use it all the time; S is for --sparse on BSD)

If you could provide a "tar" command that would do the same or similar to the rsync, that would be great.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Aside from any sparse file effect, databases tend to be repetitive and thus highly compressible.
 

eddi1984

Dabbler
Joined
Apr 3, 2015
Messages
42
Hi,

I ended up doing some testing with different ways to backup my data. The goal for me was, to find the cost of storage space with different methods of coping the files over.

I selected one of my important folders. Before I go any further, I will have to elaborate on the content of the folder.
There are in total 29878 files broken down to the most important:
*.CAB - 615
xls/xlsx - 2013
doc/docx - 843
pdf - 1539
frm - 18723
others - 6145

This folder is a backup folder for Excel/Word and PDF files. It also is a backup for a common Accounting Software (using MSQL DB).

Here are the results:
Main Storage
- FreeNAS (ZFS) - 20GB (lz4 3.39% compressed)

Backup Storage (UFS)
- rsync (with --sparse and --exclude common files like *.log) - 59 GB
- rsync (just -av; no --sparse nor --exclude) - 73GB
- tar (with -cvjf (bzip2 archive)) - 15GB
- tar (with -cvf, regular uncompressed archive) - 73GB
- tar (tar -cf - . | ( cd DESTINATION; tar -xpSf - ), as shown above) - 60GB
- cp (with -pvr) - 73 GB

From Windows (as shared CIFS folder) - 73GB

So, it looks like, the actual raw data uncompressed is ~73GB. I guess it depends what type of data the folder/dataset contains. In this case, I would go with rsync with the --sparse option or the tar to bzip2 archive. Both have their advantages and disadvantages.

For my purposes, I settled with the rsync (with --sparse and --exclude), just because I do have the space and until I actually replace the hardware in the NAS, I can still update the backup destination with rsync, and rsync is super fast doing that.

One more thing: ZFS WITH lz4 IS THE BEST!!!! ;)

I hope this will help someone when they are thinking to do backups.

Cheers.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Unprotected compressed archives are not good, that is I would not recommend them. I would trade space for safety.

Thank you very much for getting back to us with the test results!
 
Status
Not open for further replies.
Top