Deduplication special devices?

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Hi all
Firstly the primary reason to use truenas is multiple ISCSI for games, with deduplication.
Truenas is running as a virtual machine in proxmox.
It is a raidz1 5 x 4TB pool.
2 x 16GB intel optane as dedupliction. They were cheap.
There are 2 12TB sparse drives
To get the dedup data down to a managable level, they zvols are 1MB block sizes.
They are shared through ISCSI and ormatted with 1MB block sizes, which is not an issue for a big steam library.

By changing the block size, the dedup dataand I assume the I/O on the optane drives redude down by 64x, so nearly 2 orders of magnitude. The dedup data definitely did.

So if using dedup for some intensive with small block sizes, will be demanding I guess, but in my case it seems.

So now are looking at swapping out the optane drives for 2 x 256GB standard NVMe drives I have already.
The are adata SX8200 256GB. What is important is they have a small ram cache.

So other data do now use deduplicaton, maybe 400GB files in standard samba share of various file sizes. it is mainly static data.

When initially recopying over, could be slow as the NVME drives would be hit hard. That is about the only time it would be.
Sometimes I sync data to it, around one a week.


I am NOT spending cash on expensive high optane dirves for dedup. It is not worth it & would be better off buying more actual storage.
If buying more storage, have no need to truenas, as file sharing can just be setup on proxmox.


I have read the guide on special dedup devices here already.


So in this context, are 2 standard nvme drives as described suitable for deduplication?
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Most folks here will shy you away from ZFS deduplication. That should speak volumes. There are big changes to dedupe that we will eventually see, but its one of those things that its done when its done....(notice the date)

There is not a lot of data I have seen to advise you either way. Dedupe special devices are less talked about than even metadata special devices. For your use case the data is fungible. The only real cost is seeding the data to the ZVOL. If nothing else, I would be interested in seeing what you come up with and maybe a comparison of small optane vs nvme. Please don't hesitate to share with the class.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
I am not an expert on this. This was some dry stuff. It the video seems to be for IT experts.

I need to know how much swapping out the drives would cause performance issues. Where are benchmarks & such?
He a talked about having the dedup in memory. I am using drives for it.

I am not interesting in being an experiment, just to lose data or something similar.

To reiterate about your comment:
Most folks here will shy you away from ZFS deduplication. That should speak volumes.

Deduplication is the ONLY reason I have use truenas. Standard network shares can be done through other means if not using it.
My motivation is to save money on storage.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
I need to know how much swapping out the drives would cause performance issues. Where are benchmarks & such?
There aren't any. Thats my point. You are in basically uncharted waters. Very few folks use ZFS deduplication. Far fewer still use special allocation devices for deduplication.

Your data is not irreplaceable. Try it and let us know. Or don't.
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
I am NOT spending cash on expensive high optane dirves for dedup. It is not worth it & would be better off buying more actual storage.
If buying more storage, have no need to truenas, as file sharing can just be setup on proxmox.
The time and energy you're putting into making it work with a not-too-expensive dedup vdev should be an indication that you'd be better off spending more on plain storage…

Come to think of it, if your use case is to store multiple copies of the same data with minor variations and you know when and where changes will occur (updates, explicit configuration changes…), using clones could be an alternative solution.
Create a dataset of each game. Load version 1.0 of a game into a dataset. Make a snapshot. Clone the snapshot and promote the clone. Mount the clone and update the game in the clone to v.1.1. Now you can chose to play v.1.0 or v.1.1 by mounting the corresponding dataset, but ZFS only stores one full copy of v.1.0 and the specific blocks which were changed in the update to v.1.1.
No dedup involved.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
The time and energy you're putting into making it work with a not-too-expensive dedup vdev should be an indication that you'd be better off spending more on plain storage…

Come to think of it, if your use case is to store multiple copies of the same data with minor variations and you know when and where changes will occur (updates, explicit configuration changes…), using clones could be an alternative solution.
Create a dataset of each game. Load version 1.0 of a game into a dataset. Make a snapshot. Clone the snapshot and promote the clone. Mount the clone and update the game in the clone to v.1.1. Now you can chose to play v.1.0 or v.1.1 by mounting the corresponding dataset, but ZFS only stores one full copy of v.1.0 and the specific blocks which were changed in the update to v.1.1.
No dedup involved.

My present situation means I have time to set this up, but can ill afford to buy any more large capacity storage. If I do, will not be for a while.
I have biggish steam library (paid for already & there are many free games). Oh and many epic games for free, which take a lot of storage.

This little project has cost me 2 x 16GB optane & the cheapest 500GB NVMe as a cache drive.

The rest of the drives, network cards & switches I have already.

I have setup truenas scale already with deduplication, with all the data copied over.
Changing over to a clone setup ow may work, but it means more time & effort.
I may try it out on an old machine to test things out.

Another potential issue, is I intend to create more sparse drives, for use with linux. They will use ext4 format. So create a clone & reformat, then copy the same, or similar data over. Would that increase the overall usage on the zpool?
If it turns out I wasted cash on the 16GB optane dirves, will not be very upset. They are not exactly expensive.


Did you mean create a data set for each game, or each one for all games?
On the main gaming machine, the 12TB ISCSI is mounted and I use primocache software as a caching solution, which uses the 500GB drive.
It needs to be mounted only once, & the cache will be accessed for the most frequent data. If not, it would break/nullify the cache.

The other ISCSI drive is attached to a windows 10 virtual machine. It just updates games, typically overnight.
The 'magic' happpens when all download come from a lancache server , up to 2.5 gigabit, the the game updates are written to the cache. Both of these are very fast. In the background the updates are slowly written to the ISCSI drive.

The main issue for me was trying to get the dedup data down. The prolblem is there is a 128k block size limit in creating a zvol in the gui, even though zfs can apparently use 1MB. If that was an option in the GUI, deduplication may actually be used by more people. Also truenas scale lacks a lot of features to look at stats about pools & datasets, so the option was to use the command line.
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
My present situation means I have time to set this up, but can ill afford to buy any more large capacity storage. If I do, will not be for a while.
I have biggish steam library (paid for already & there are many free games). Oh and many epic games for free, which take a lot of storage.

This little project has cost me 2 x 16GB optane & the cheapest 500GB NVMe as a cache drive.

The rest of the drives, network cards & switches I have already.
Just because you can do something, doesn't mean that you necessarily should.

I have setup truenas scale already with deduplication, with all the data copied over.
Changing over to a clone setup ow may work, but it means more time & effort.
I may try it out on an old machine to test things out.
They are just video game files, I am not sure what the deal is here? How much data overlap do you actually have? How many other systems are accessing your library? The amount of time and energy coupled with the cost of 3 SSDs could have easily bought you another, larger, SSD to put on whatever other system you have gaming. Maybe adding that old system into the mix and selling it would yield a net zero sum of money. If you value your time at $25 an hour, how much money have you spent trying to do this?

If this exercise is for learning, knowledge and fun then I think my responses would be different. But you've clearly stated you are trying to solve a problem, and the solution you are proposing doesn't make a whole lot of sense from a technical standpoint. There's not a whole lot of data to even give you to point you in the right direction.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
Just because you can do something, doesn't mean that you necessarily should.


They are just video game files, I am not sure what the deal is here? How much data overlap do you actually have? How many other systems are accessing your library? The amount of time and energy coupled with the cost of 3 SSDs could have easily bought you another, larger, SSD to put on whatever other system you have gaming. Maybe adding that old system into the mix and selling it would yield a net zero sum of money. If you value your time at $25 an hour, how much money have you spent trying to do this?

If this exercise is for learning, knowledge and fun then I think my responses would be different. But you've clearly stated you are trying to solve a problem, and the solution you are proposing doesn't make a whole lot of sense from a technical standpoint. There's not a whole lot of data to even give you to point you in the right direction.

Not an exercise for fun, mainly to keep my sanity at the moment.
I got it working, so change it around. Why?
I have kept basic docs with the messy command line stuff I need to do.


So it seems a goodbye from me to this forum, for a while at least.
It is beginning to get unpleasant.

There are guides on how to do things similar to what I have done.
Nothing I found mentioned much on cloning.

Bye
 

NickF

Guru
Joined
Jun 12, 2014
Messages
763
Not trying to drive you away here. It's just that you are looking for an answer to a question that we don't have, and you are trying to solve a problem that we don't fully understand. All we've said here is that the path you've chosen isn't necessarily the best path.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
My present situation means I have time to set this up, but can ill afford to buy any more large capacity storage. If I do, will not be for a while.
I have biggish steam library (paid for already & there are many free games). Oh and many epic games for free, which take a lot of storage.
If your objective is to save money, dedup will probably end up costing you more money honestly. Bulk HDD storage is ultra cheap compared to the extra RAM/CPU power you'd have to invest due to the significant performance overhead that turning on dedup will incur.
 

ThisTruenasUser

Dabbler
Joined
Apr 19, 2023
Messages
44
So if I was not using any dedup devices, it would require the use of 3.02 GB of RAM instead, as I understand how it works. That is presently for the 3.22TB of unique data.
I may have actually wasted all that cash on those two 16GB optane drives. Look on ebay, they are not exactly expensive.
When I fill up the ISCSI dirve, who knows may need up to 12TB of use.

I could have just bought a few 12TB drives instead. 12TB for linux, 12TB for windows & one for paruty in RAIDZ1.
It seems cloning is similar to dedup, but are skeptical. if suitable for me.
If not, would ave needed maybe 5 drives, 3 on gaming machine and 3 on NAS, with no cloning or dedup.
It seems an absurd amount to spend on gaming storage.

My big cach items are gong on PC gaming related equipment.

Oh and the 3600X CPU, which was used once for gaming, seems to work just fine.
Multiple copies of games are what I want for 'backup' & speed up downloads through a lancache server.

I may go spend cash on cheap 256GB sata drives to replace the NVme drives I mentioned.



Why so much hostility to deduplication?
I did stumbled on this a while back.
I found it odd but kind of useful.


About 2 1/2 minutes, says some odd things about deduplication comments.
Now I kind of see why.

There were some very helpful people of the forum to help me.
Sadly this was needed due to issues in the GUI. Specifically these were a limit of 128k of zvol block sizes and limited reporting available through It means no choice but to use command line stuff instead.

With hindsight, would have just have set it up in proxmox or an ubuntu server virtual machine.
It is done now and working as I hoped.
 
Last edited:
Top