SOLVED Deduplication of small dataset worth it

monotok · May 7, 2017

Hi All,

My system details.

FreeNAS-9.10.2-U3 (e1497f269)

Platform Intel(R) Xeon(R) CPU E3-1220L V2 @ 2.30GHz

Memory 16320MB
HP MicroServer Gen8.

I have a main machine that does regular backups via ISCSI to FreeNAS. I have split it up into "General", "Documents" and "Photos" so there are 3 ISCSI shares. This works great and performance max's out my gigabit LAN.

I also have a server running VM's and one of those is a nextcloud instance, the main data directory is mounted on FreeNAS via NFS and I want to mount the Photos to nextcloud via SMB/NFS shares.

Obviously I can't share the ISCSI mounts as they are exclusive to that backup machine. I could easily just create a new dataset to share via NFS/SMB and copy the photos ISCSI data to it. We are only talking about 140GB (I only have 1 TB remaining) but I would rather not just duplicate the data on FreeNAS.

Would it be a good idea to create a small 150GB dataset with deduplication on and then copy the photos to that? I know compression is recommended instead but I only get a compression ratio of 1.03 because as far as I know images aren't compressible. Looking at the manual it states 5GB of RAM per 1TB of storage and with only 140GB it would be about 700MB of RAM.

Good idea or just copy the data and buy more disks when needed?

Thanks in advance!

Robert Trevellyan · May 7, 2017

I think I'm missing something. How would deduplicating a new dataset containing a copy of the photos save space? Are you imagining that it will deduplicate the new dataset with the existing iSCSI storage?

monotok said:
I only have 1 TB remaining

What is the pool's CAP % (from zpool list)?

monotok said:
buy more disks when needed?

Probably this.

monotok · May 7, 2017

Yes I think I'm missing exactly how deduplication works. I assumed that it would deduplicate across the entire pool regardless of dataset.

I gather from your reply that it works only within the same dataset so if I created a dataset and put multiple copies of the same file in it then this would only be stored once?

Sent from my STV100-4 using Tapatalk

joeschmuck · May 7, 2017

What is the goal here? Lets say you turn on deduplication (which I advise against) then lets say you have 2000 photos that are duplicates (could be 1 photo duplicated 2000 times, 500 photos duplicated 4 times, or 1000 photos duplicated once, it doesn't matter) and then dedup takes action. Your used data should shrink but you will still have links from all the locations where the data was listed so it doesn't change the directory structure.

A better solution if you are just trying to consolidate your photos and remove all the duplicates is to run something like Auslogics Duplicate File Finder which works well for me, but the downside is it cannot be configured to look at networked drives. you would need to copy over all your data to your local machine drive and run it there. It's a bit of a downer but the good thing is if you accidentally delete something, you didn't destroy the original yet. Then I'd copy the files back to t different directory on the NAS and reorganize them as desired. Lastly delete the original file set.

Like I said, not the best solution but it's a good one and the download is not full of SPAM or Malware. It's an option.

Robert Trevellyan · May 7, 2017

monotok said:
I gather from your reply that it works only within the same dataset so if I created a dataset and put multiple copies of the same file in it then this would only be stored once?

First, deduplication works at the level of blocks, not files.
Second, existing data will not be deduplicated if it was there before deduplication was enabled.
Third, yes, it only works within a deduplicated dataset at its children.

monotok · May 7, 2017

@Robert Trevellyan , Thanks for the clarification on deduplication.

I don't think deduplication will do as I wanted. What would the main use case for it be?

@joeschmuck My goal is to store my photos in one dataset so I am not duplicating them. Then make them accessible via different shares etc from the one dataset. As they are currently in a ZVOL shared using ISCSI for backup I think the best way would be to either create a new dataset and share them again or backup via an NFS share instead of ISCSI.

Thanks!

Robert Trevellyan · May 7, 2017

monotok said:
What would the main use case for it be?

At the risk of stating the obvious, it's aimed at use cases where a lot of duplicate data is expected. One example might be a company backing up lots of Windows workstations via disk image, where a significant portion of the data in each image would be the OS.

You might consider starting over from a higher level. Why are you backing up via iSCSI?

monotok · May 7, 2017

Yeah so must people won't have a need for it. I think it gives slightly better performance (It is probably negligible) and it is pretty easy in Linux to mount both ISCSI & NFS.
I might switch over as it will give better flexibility, just use the ISCSI for ESXi datastore.

rs225 · May 7, 2017

Dedup is unlikely to work in the example given. Just accept the wasted space or convert the iSCSI data to NFS/SMB file data and have everything use that.

Kevin Horton · May 7, 2017

monotok said:
I don't think deduplication will do as I wanted. What would the main use case for it be?

It was created by RAM companies as a way to increase RAM sales. It serves no useful purpose in the vast majority of use cases.

monotok · May 8, 2017

I will add them in another dataset and buy more disks when needed :) Will mark as Solved, thanks all!

Important Announcement for the TrueNAS Community.

SOLVED Deduplication of small dataset worth it

monotok

Dabbler

Robert Trevellyan

Pony Wrangler

monotok

Dabbler

joeschmuck

Old Man

Robert Trevellyan

Pony Wrangler

monotok

Dabbler

Robert Trevellyan

Pony Wrangler

monotok

Dabbler

rs225

Guru

Kevin Horton

Guru

monotok

Dabbler

Similar threads