SOLVED Deduplication of small dataset worth it

Status
Not open for further replies.

monotok

Dabbler
Joined
Dec 27, 2016
Messages
10
Hi All,

My system details.

FreeNAS-9.10.2-U3 (e1497f269)

Platform Intel(R) Xeon(R) CPU E3-1220L V2 @ 2.30GHz

Memory 16320MB
HP MicroServer Gen8.

I have a main machine that does regular backups via ISCSI to FreeNAS. I have split it up into "General", "Documents" and "Photos" so there are 3 ISCSI shares. This works great and performance max's out my gigabit LAN.

I also have a server running VM's and one of those is a nextcloud instance, the main data directory is mounted on FreeNAS via NFS and I want to mount the Photos to nextcloud via SMB/NFS shares.

Obviously I can't share the ISCSI mounts as they are exclusive to that backup machine. I could easily just create a new dataset to share via NFS/SMB and copy the photos ISCSI data to it. We are only talking about 140GB (I only have 1 TB remaining) but I would rather not just duplicate the data on FreeNAS.

Would it be a good idea to create a small 150GB dataset with deduplication on and then copy the photos to that? I know compression is recommended instead but I only get a compression ratio of 1.03 because as far as I know images aren't compressible. Looking at the manual it states 5GB of RAM per 1TB of storage and with only 140GB it would be about 700MB of RAM.

Good idea or just copy the data and buy more disks when needed?

Thanks in advance!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I think I'm missing something. How would deduplicating a new dataset containing a copy of the photos save space? Are you imagining that it will deduplicate the new dataset with the existing iSCSI storage?
I only have 1 TB remaining
What is the pool's CAP % (from zpool list)?
buy more disks when needed?
Probably this.
 

monotok

Dabbler
Joined
Dec 27, 2016
Messages
10
Yes I think I'm missing exactly how deduplication works. I assumed that it would deduplicate across the entire pool regardless of dataset.

I gather from your reply that it works only within the same dataset so if I created a dataset and put multiple copies of the same file in it then this would only be stored once?

Sent from my STV100-4 using Tapatalk
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What is the goal here? Lets say you turn on deduplication (which I advise against) then lets say you have 2000 photos that are duplicates (could be 1 photo duplicated 2000 times, 500 photos duplicated 4 times, or 1000 photos duplicated once, it doesn't matter) and then dedup takes action. Your used data should shrink but you will still have links from all the locations where the data was listed so it doesn't change the directory structure.

A better solution if you are just trying to consolidate your photos and remove all the duplicates is to run something like Auslogics Duplicate File Finder which works well for me, but the downside is it cannot be configured to look at networked drives. you would need to copy over all your data to your local machine drive and run it there. It's a bit of a downer but the good thing is if you accidentally delete something, you didn't destroy the original yet. Then I'd copy the files back to t different directory on the NAS and reorganize them as desired. Lastly delete the original file set.

Like I said, not the best solution but it's a good one and the download is not full of SPAM or Malware. It's an option.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I gather from your reply that it works only within the same dataset so if I created a dataset and put multiple copies of the same file in it then this would only be stored once?
First, deduplication works at the level of blocks, not files.
Second, existing data will not be deduplicated if it was there before deduplication was enabled.
Third, yes, it only works within a deduplicated dataset at its children.
 

monotok

Dabbler
Joined
Dec 27, 2016
Messages
10
@Robert Trevellyan , Thanks for the clarification on deduplication.

I don't think deduplication will do as I wanted. What would the main use case for it be?

@joeschmuck My goal is to store my photos in one dataset so I am not duplicating them. Then make them accessible via different shares etc from the one dataset. As they are currently in a ZVOL shared using ISCSI for backup I think the best way would be to either create a new dataset and share them again or backup via an NFS share instead of ISCSI.

Thanks!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
What would the main use case for it be?
At the risk of stating the obvious, it's aimed at use cases where a lot of duplicate data is expected. One example might be a company backing up lots of Windows workstations via disk image, where a significant portion of the data in each image would be the OS.

You might consider starting over from a higher level. Why are you backing up via iSCSI?
 

monotok

Dabbler
Joined
Dec 27, 2016
Messages
10
Yeah so must people won't have a need for it. I think it gives slightly better performance (It is probably negligible) and it is pretty easy in Linux to mount both ISCSI & NFS.
I might switch over as it will give better flexibility, just use the ISCSI for ESXi datastore.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Dedup is unlikely to work in the example given. Just accept the wasted space or convert the iSCSI data to NFS/SMB file data and have everything use that.
 
Joined
Dec 2, 2015
Messages
730
I don't think deduplication will do as I wanted. What would the main use case for it be?
It was created by RAM companies as a way to increase RAM sales. It serves no useful purpose in the vast majority of use cases.
 

monotok

Dabbler
Joined
Dec 27, 2016
Messages
10
I will add them in another dataset and buy more disks when needed :) Will mark as Solved, thanks all!
 
Status
Not open for further replies.
Top