Question about dedup

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
I need to create a pool which has dedup enabled. The manual states that there is aproximatly a need of 5gb ram per 1 tb of dedup space.
So if i have 50 gb that i need to save 20 times on the dedup pool, does that count as 50x20 = 1tb, or does that count and 50gb for these calculationsp urposes ?

Also, say the system needs 20 gb RAM extra for dedup purposes. Plus 16gb minimum ram, id need lets say 36gb of ram.
What happens with the DEDUP table, once i shut down free NAS ?
The dedup tables are 20gb big, but my Free NAS OS drives is only 16gb Big.
Do i also need a comparable big OS install drive where the dedup table is dumped before a shutdown ?

Becuase that would mean i need to create a 36gb install partition as well, not only give it more ram.

Please advise.
 
D

dlavigne

Guest
What use case do you have that requires dedup? Typically there are other ways to achieve dedup results without the performance hit.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I need to create a pool which has dedup enabled.

Let's talk about "need" vs "want" here. ;) What workload do you believe "needs" dedup?

The manual states that there is aproximatly a need of 5gb ram per 1 tb of dedup space.
So if i have 50 gb that i need to save 20 times on the dedup pool, does that count as 50x20 = 1tb, or does that count and 50gb for these calculations purposes ?

It counts as 50GB x 20, the calculation is based on the pre-dedup numbers.

Also, say the system needs 20 gb RAM extra for dedup purposes. Plus 16gb minimum ram, id need lets say 36gb of ram.
What happens with the DEDUP table, once i shut down free NAS ?
The dedup tables are 20gb big, but my Free NAS OS drives is only 16gb Big.
Do i also need a comparable big OS install drive where the dedup table is dumped before a shutdown ?

Becuase that would mean i need to create a 36gb install partition as well, not only give it more ram.

Please advise.

The dedup tables (DDTs) are stored on your pool, not the OS drive, so don't worry about the boot device size.

Dedup is extremely RAM-sensitive and has potentially major performance implications; it's almost never worth the cost unless you're actually able to get that 20:1 ratio you're aiming for.
 

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
Well, i need to store an ever increasing amount of data, multiple time. The data is basically the same bit for bit, but i just need to have it multiple times. This data is increasing with a rate of about 1gb per day. The multiple copies are the same, bit for bit. This data is as of writing this post 76 gb big.
So theoretically, there should be a dedup factor of 100%. meaning, if i need to store it 20 times, ill save 19 times the space.

I just ordered 128 gb ram, currently have 64gb on my system. Im running a free NAS VM.
I could just as well use ZFS under LINUX, bus as i have allready FREENAS here, im thinking it should work like this.

THe Free NAS VM has is now working with the bare minim 16gb RAM, and there is only a single 10TB disk in its care. The dedupable data will be stored on a single 1tb SSD.

I was thinking going with 64 GB of RAM for the free NAS. I think it should be enough. But i could give it as much as 90+ gb of ram if needed. I can theorethically install 2TB of ram on my motherboard. But that is very expensive. Im lucky now the ram price just dropped. I know i payed 180 euro for one stick of 16gb ram, and i bought 4 when back in the days. now i payed 160 eur for one stick of 32gb and ordered 4 more.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
One way to avoid the De-Dup penalty, is to use snapshots. For example, my backup scheme uses something like this;

/backups/HOST
/backups/HOST@date1
/backups/HOST@date2
/backups/HOST@date2

I then use Rsync to make a backup of my OS onto the top PATH above. Then, when completed, snapshot it with the date. Thus, anything that did not change, was not backed up again. It's present, just that the snapshot and current share the file. Yet, if I need data from one of the older backups, I simple mount that snapshot and get the data.

In general, don't use De-Dup, because whence you do, it can be a monster to un-do. Meaning you basically have to copy your data off to another location, and destroy your pool or dataset that used de-dup.
 

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
It wont be a problem from this point of view, since my dedup data is basically multiple copies of the same data. So i only need to backup one from the multiple copies.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I need to create a pool which has dedup enabled. The manual states that there is aproximatly a need of 5gb ram per 1 tb of dedup space.
So if i have 50 gb that i need to save 20 times on the dedup pool, does that count as 50x20 = 1tb, or does that count and 50gb for these calculationsp urposes ?

Also, say the system needs 20 gb RAM extra for dedup purposes. Plus 16gb minimum ram, id need lets say 36gb of ram.
What happens with the DEDUP table, once i shut down free NAS ?
The dedup tables are 20gb big, but my Free NAS OS drives is only 16gb Big.
Do i also need a comparable big OS install drive where the dedup table is dumped before a shutdown ?

Becuase that would mean i need to create a 36gb install partition as well, not only give it more ram.

Please advise.

If you have a 1TB pool and enable dedup, expect to need at least 5GB extra RAM. If you have a 10TB pool and enable dedup, expect to need at least 50GB extra RAM. However, please note that the RAM to disk ratio is a function of the average block size in use. The number suggested by the manual is if you don't mess with stuff. If you set a smaller block size, the RAM requirements skyrocket quickly.

You should consider dedup memory requirements to be ON TOP OF any existing memory requirements.

The dedup table is stored in the pool. When accessed, it is loaded into ARC as metadata. It is eligible to be evicted to L2ARC. However, it works poorly if evicted to L2ARC. Therefore you should be generous with RAM so that it has a better chance of remaining in ARC.

When you turn off your NAS, the ARC vanishes because it is in RAM. The L2ARC is rendered useless because the L2ARC pointers are stored in the ARC.

When you turn on your NAS, the dedup table will be fetched from pool on demand and stored in ARC similarly to other metadata. This means post-reboot write performance is somewhat worse until the ARC warms up with the dedup data. This sucks. Try to avoid reboots when using dedup.

Your boot device size has nothing to do with dedup.

In general, the community feels that dedup is a poor strategy, and you are better off with compression, snapshots, higher level deduplication such as that provided by many data backup products, etc.

I disagree somewhat -- I think dedup has a specific valid role and is useful in some cases. If you have a datastore where you are storing backup images, for example, with relatively small deltas, you can store a HUGE number of full backups using dedup. But you need a lot of memory to do this successfully. But it's 2019 and now 256GB of used DDR3 is under $500.

The problem is that what people are THINKING is that "oh I have 500 Windows desktops (pc000-pc499) and they all have the common WIndows files". You would think that this would dedup great but it doesn't. Modern OS's tend to be nondeterministic when installing, and do not install the exact same blocks in the exact same places. If you could dedup 512-byte blocks this wouldn't be a problem, but ZFS simply can't do that and have it be practical and performant. So you need to look at much larger blocks. 1MB blocks dedup well, but the problem is that if you have 500 desktops, the contents of a given 1MB window on those disk images will generally result in 500 different layouts. So you still get 500 different blocks and 0% dedup.

What *does* dedup well is when you backup pc203 one day, then back pc203 up the next day and it writes 99% of the same blocks, with only a 1% delta. ZFS will do smashingly well on that and you will save all that duplicated space. But pc203 and pc167 are not likely to share much overlap, except for the all-zeroes block. So pc167's blocks will generally not overlap much with pc203, and just dedup against other images of pc167.

The big variables here are that you can play with blocksize (to increase the odds of overlap) and the amount of ARC reserved for metadata to optimize for your use case. The thing that really drives memory consumption is the number of unique disk blocks ZFS has to track. The more there are, the more memory it takes. The sad reality is that dedup doesn't work as well as most people would like.
 

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
First of all im not planning on deduping VMs or OSes, but a blockchain data file. Basically i Need to store the same blockchain multiple times because im running multiple nodes inside docker Containers. Seeing as it is the same blockchain for each and every node, it is logic to assume it is mostly the same data that is being created and stored.

My free NAS vm has 16gb of ram, and i was planning on giving it up to 64gb of ram. However you say i Need to take into consideration only 1TB of data, since the SSD only has 1TB in size ? I thought i had to take into consideration the Undeduped Data size.

Another Thing popping into my mind, was, since i do the docker Container thingy in Ubuntu, i could Mount the SSD drive in ubuntu as a ZFS file System where dedup is activated, but i would Need to do it through CLI, which would mean a bit of Research, learning all the quirks.
Free NAS would have been much more elegant.

My contianers plus Linux Need about 70gb of ram, and Free NAS Needs 5gb extra ram on top of the 16 it has, lets say i was generous and gave it 32gb ram.

If i did everything from Linux, i could have given Ubuntu the whole RAM (aproximatly 144GB) and be done with it.
Are there any Advantages of doing this from free NAS ? instead of from Ubuntu directly ?
 

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
I also heard that ARC and metadata can be as much as a Maximum of 25% of total RAM. so if i Need 50gb for metadata, i would Need 200gb of ram as a Minimum.

I am still figuring out how much RAM the free NAS would Need, only for dedping 1 TB of SSD. Is 32gb enough, or should i give it more ? apart from this 1TB ssd, i only have another pool with a single 10TB hdd. (yea, i know, no redundancy).
 

Arubial1229

Dabbler
Joined
Jul 3, 2017
Messages
22
Deduplication + FreeNAS VM = bad time
 

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
Deduplication + FreeNAS VM = bad time
Yah, thats what one would think. But ive had the Free NAS VM running without any Problems what so ever. with a single 10TB hdd, (one vpool, one vdev) given not as native drive.
Im sure it would work and that there would be no Problems.
Im picking up the Memory and SSD today, and going to do the Setup today. Will Keep you guys posted to see what results i get.
 

Arubial1229

Dabbler
Joined
Jul 3, 2017
Messages
22
Yah, thats what one would think. But I've had the Free NAS VM running without any Problems what so ever. with a single 10TB hdd, (one vpool, one vdev) given not as native drive.
Im sure it would work and that there would be no Problems.
Im picking up the Memory and SSD today, and going to do the Setup today. Will Keep you guys posted to see what results i get.

It will work until it doesn't and then your pool will be gone. Good luck.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
I’m a mechanical engineer and in material science there are two ways to deal with load. You have rigid materials that will take a load until they snap without warning and you have plastic materials that will stretch or buckle before ultimately breaking.

What you have built will run fine until it dosent, and you will have no warning before it’s all gone.
 
Last edited:

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
fortunatly this isnt materials science, nor is it biology. If it didnt work, it wouldnt have worked from the begining. It either works or it doesnt. And it does. It is either 0 or 1. There isnt anything in between. This isnt a complex organism that have redundancies built in, and compensatory mechanism, that at some Point become decompensated.
Im running Server grade Hardware, with Server grade Motherboard CPU, ECC Memory and Server grade HDD.
It works as expected. I dont know why it shouldnt to be honest. People saying we shouldnt use free NAS in a VM, im sory to say it, but it sounds more like Superstition to me.
If it didnt work, it shouldnt have installed. But it did, and it works. Im only seeing Facts. Nothing more.
 
Last edited:

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
Question is now: Do you trust the words of the "FreeNAS Guru" or the words of the "Newbie" ?
 

Arubial1229

Dabbler
Joined
Jul 3, 2017
Messages
22
Garm and I pretty much said the same thing. FreeNAS VMs are known to work fine for a while and then just magically blow up one day. Like I said, good luck with yours, I mean that.
 

Bytales

Dabbler
Joined
Dec 17, 2018
Messages
31
having heard the warning, i do have a backup of important things that i cannot allow myself to loose if shit does hit the fan. So im good to go. If i had a way to have a native freenas machine, i would, but i only have a single PC. Im running two Windows VM, a ubuntu vm and free NAS under ESXI, all working together.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Garm and I pretty much said the same thing. FreeNAS VMs are known to work fine for a while and then just magically blow up one day. Like I said, good luck with yours, I mean that.

Well, you know, I've been warning people not to run FreeNAS as a VM for production use for years now. On the other hand, I've been running *our* stuff here with FreeNAS as a VM for many years. There's a reasonably safe path to doing so, it's just that people start to get a little cocky about following a recipe for success. Or better yet think that this simply doesn't apply to them, because someone on Teh YuuTuubz said otherwise.

fortunatly this isnt materials science, nor is it biology. If it didnt work, it wouldnt have worked from the begining. It either works or it doesnt. And it does. It is either 0 or 1. There isnt anything in between. This isnt a complex organism that have redundancies built in, and compensatory mechanism,

Well, actually, it is an INCREDIBLY complex feat, one very complex entire operating system with FreeNAS on top of that, designed to run on bare metal, and then that abstracted and running virtually on another very complex system designed to create VM's, the illusion of real machines. There's a HUGE amount of room for things to go sideways, including FreeNAS-specific frequently-seen examples such as "RDM doesn't work right" to "my MSI/MSIX stuff isn't working correctly and everything hung."

There's an eff-ton of things between "it totally works perfectly 100% of the time" and "it failed catastrophically in the first second." I don't know why you think it is binary. We've seen people discover that their PCI passthru support on their mainboard is dodgy, because it seems to work for a day, and then goes to hell for a few moments.

It works as expected. I don't know why it shouldn't to be honest. People saying we shouldn't use free NAS in a VM, im sory to say it, but it sounds more like Superstition to me.

Sure. Because it's not like I didn't get super-tired of seeing people roll through here having done really dumb things to virtualize their FreeNAS and having it blow up on them. Which is why I wrote that warning article. And also why I wrote an article that described a way one could get it right.

If everything lines up perfectly, including hardware and software support, and you take a rational strategy, virtualization can work.

If it didnt work, it shouldn't have installed. But it did, and it works. Im only seeing Facts. Nothing more.

So if I throw you out the door of an airplane without a parachute, you should be able to land safely ... because the door let you through?

Seems a silly argument, but I'm fine for you showing me how that works out for you. :smile:
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
without any Problems what so ever. with a single 10TB hdd, (one vpool, one vdev) given not as native drive
Oxymoron statement right there, indirect disk access and ZFS, the internets marinate in the tears wept for needlessly lost data due to running ZFS on a virtual device.

Add then the dedup adventures and what ever else you are doing.. I foresee tears, but they won’t be mine, best of luck to you
 
Top