Hello. I am not sure if this is the right place for the thread, but couldn't find better category.
I am a beginner with both TrueNAS and ZFS. As probably a lot of people before me, I am toying with the idea of de-duplication. And as a lot of people before me, I am getting recommended from everywhere to stay away from it.
However, I fail to understand one thing. When learning about the topic, I stumbled upon an interesting reddit comment, where the person argues that de-duplication does not require crazy hardware, if configured correctly. Specifically, that the default record size should be increased. From my (limited) knowledge, it seems like the logic is correct (my explanation later). If true, I keep wondering why this isn't more often recommended on this forum.
I tried to explain this to myself as follows: (Please take this with a grain of salt - I am not presenting this as a correct information and I fully expect being corrected if I am wrong):
The dedup table takes 320B for every block it has to housekeep. For a block size of 128 KiB, that results in 2.5 GiB per TiB of data.
The problem is, that would work if all blocks were exactly 128 KiB (and each would be unique).
What causes it to increase:
If we theoretically had sizes of all the files divisible by the block size, there wouldn't be any increase. But in the real world, that is hardly the case. Because ZFS sets only the maximum record size and leaves it variable, there will be a lot of smaller records, which will also have to be managed with the same fixed overhead as whole full records. This increases the dedup table size, by a percentage that hugely depends on how big files you keep in your system. For a lot of small files, there will be a lot of small records, and the table size will rise quickly. For big files, it will rise only marginally (just for the record that holds the last bytes that did not fit).
What causes it to decrease:
The table has to store a new entry only for each unique record. If there is a lot of duplicates (as should be on a dedup-enabled system), the same records will be found as duplicates of already existing ones, therefore not saved in the table, and the table size will not increase. Let's say our data is on the system exactly twice. This would reduce the RAM requirement by half.
The "rule of thumb(s)" found on the internet recommend from 1 - 5 GiB of RAM per TiB of data. This would align with my understanding of the topic that I mentioned above. 1 GiB would be for use cases where the dedup ratio is 2.5x, whereas 5 GiB tries to be on the safe side by taking small files into account.
This all I would understand.
But now let's say that we would increase the block size from 128 KiB to 1 MiB.
This has both downsides and upsides.
Downsides:
The de-duplication does not work as efficiently as it could in terms of saved space. There will be less duplicates found, because even for a change of one bit, the whole record has to be stored separately. Let's say I would have two identical giant text files the size of 1 MiB. Because they are duplicate, they take only 1 MiB of storage. Now I change one character in one of the files. If the block size would be 128 KiB, only 1/8 of the file would need to be separated, making the total storage used 1.1 MiB. But with the record size of 1 MiB, the whole record would have to be copied over, and no space would be saved. The files are now taking 2 MiB.
But...
Upsides:
The dedup table takes a fixed amount of 320B for every block, no matter if the block is 128 KiB or 1 MiB. This means, that for perfectly filled blocks, the dedup table overhead decreases 8 times as well! Let's say I have 4 MiB file. If the record size is 128 KiB, the table contains 32 entries just for this file. With the record size of 1 MiB however, it only has to store 4.
When recalculating the formula, this would mean that the RAM size required is only 320 MiB per TiB! This no longer sounds that unachievable.
Both the small files problem and the reduction by duplicates still apply here of course. The small files are especially problematic for my calculation here. Given that the record size is so big, there will be a lot more of them by definition and could increase our table size significantly. Still, it should be a lot less than when used the default record size, that is if your data is not filled with only small files.
Another upside of this is 8x less hashes for the CPU. The chunks the CPU would have to hash are bigger mind you, but there is not the overhead of initializing each record and especially writing it out.
My question is:
Is there some hidden problem that I didn't account for in my calculations? Why is nobody talking about this? Sure, you have less saved space. But if your duplicate data are mostly large, would there be a problem? RAM requirements would be a lot more feasible this way, and this was always the main/only argument against it.
I get that for some use cases, the duplicate records that need to be accounted for are only a small chunks of otherwise different files. There, increasing the record size would make the dedup practically useless as it would be unable to find the similarities and would not save any space.
Also, not every dataset is fit for such big record size, as I understand. Database files or similar that need to be frequrently rewritten in small parts would suffer in performance, as the storage always has to read/write the whole 1 MiB.
But for a use case where our duplicates have large chunks of data in common or are even the same exact files (let's say, assets for a program, manual copy-backups of files, ISOs or same video files and photos), the increase in block size would not lower de-duplication efficiency, and would in fact increase the performance because of all the resources saved. CPU wouldn't have to work that hard, not so many accesses to the table and hashing. Not mentioning of course the massive RAM usage decrease, which would in turn increase performance as it could be used for ZFS.
I think that for home users for example, the second use case is lot more fitting, and could be done with reasonably equipped systems (which can't be said for the default record size, as everyone rightly suggests).
Thank you for reading this and even more so if you decide to explain or discuss it with me below.
I am a beginner with both TrueNAS and ZFS. As probably a lot of people before me, I am toying with the idea of de-duplication. And as a lot of people before me, I am getting recommended from everywhere to stay away from it.
However, I fail to understand one thing. When learning about the topic, I stumbled upon an interesting reddit comment, where the person argues that de-duplication does not require crazy hardware, if configured correctly. Specifically, that the default record size should be increased. From my (limited) knowledge, it seems like the logic is correct (my explanation later). If true, I keep wondering why this isn't more often recommended on this forum.
I tried to explain this to myself as follows: (Please take this with a grain of salt - I am not presenting this as a correct information and I fully expect being corrected if I am wrong):
The dedup table takes 320B for every block it has to housekeep. For a block size of 128 KiB, that results in 2.5 GiB per TiB of data.
The problem is, that would work if all blocks were exactly 128 KiB (and each would be unique).
What causes it to increase:
If we theoretically had sizes of all the files divisible by the block size, there wouldn't be any increase. But in the real world, that is hardly the case. Because ZFS sets only the maximum record size and leaves it variable, there will be a lot of smaller records, which will also have to be managed with the same fixed overhead as whole full records. This increases the dedup table size, by a percentage that hugely depends on how big files you keep in your system. For a lot of small files, there will be a lot of small records, and the table size will rise quickly. For big files, it will rise only marginally (just for the record that holds the last bytes that did not fit).
What causes it to decrease:
The table has to store a new entry only for each unique record. If there is a lot of duplicates (as should be on a dedup-enabled system), the same records will be found as duplicates of already existing ones, therefore not saved in the table, and the table size will not increase. Let's say our data is on the system exactly twice. This would reduce the RAM requirement by half.
The "rule of thumb(s)" found on the internet recommend from 1 - 5 GiB of RAM per TiB of data. This would align with my understanding of the topic that I mentioned above. 1 GiB would be for use cases where the dedup ratio is 2.5x, whereas 5 GiB tries to be on the safe side by taking small files into account.
This all I would understand.
But now let's say that we would increase the block size from 128 KiB to 1 MiB.
This has both downsides and upsides.
Downsides:
The de-duplication does not work as efficiently as it could in terms of saved space. There will be less duplicates found, because even for a change of one bit, the whole record has to be stored separately. Let's say I would have two identical giant text files the size of 1 MiB. Because they are duplicate, they take only 1 MiB of storage. Now I change one character in one of the files. If the block size would be 128 KiB, only 1/8 of the file would need to be separated, making the total storage used 1.1 MiB. But with the record size of 1 MiB, the whole record would have to be copied over, and no space would be saved. The files are now taking 2 MiB.
But...
Upsides:
The dedup table takes a fixed amount of 320B for every block, no matter if the block is 128 KiB or 1 MiB. This means, that for perfectly filled blocks, the dedup table overhead decreases 8 times as well! Let's say I have 4 MiB file. If the record size is 128 KiB, the table contains 32 entries just for this file. With the record size of 1 MiB however, it only has to store 4.
When recalculating the formula, this would mean that the RAM size required is only 320 MiB per TiB! This no longer sounds that unachievable.
Both the small files problem and the reduction by duplicates still apply here of course. The small files are especially problematic for my calculation here. Given that the record size is so big, there will be a lot more of them by definition and could increase our table size significantly. Still, it should be a lot less than when used the default record size, that is if your data is not filled with only small files.
Another upside of this is 8x less hashes for the CPU. The chunks the CPU would have to hash are bigger mind you, but there is not the overhead of initializing each record and especially writing it out.
My question is:
Is there some hidden problem that I didn't account for in my calculations? Why is nobody talking about this? Sure, you have less saved space. But if your duplicate data are mostly large, would there be a problem? RAM requirements would be a lot more feasible this way, and this was always the main/only argument against it.
I get that for some use cases, the duplicate records that need to be accounted for are only a small chunks of otherwise different files. There, increasing the record size would make the dedup practically useless as it would be unable to find the similarities and would not save any space.
Also, not every dataset is fit for such big record size, as I understand. Database files or similar that need to be frequrently rewritten in small parts would suffer in performance, as the storage always has to read/write the whole 1 MiB.
But for a use case where our duplicates have large chunks of data in common or are even the same exact files (let's say, assets for a program, manual copy-backups of files, ISOs or same video files and photos), the increase in block size would not lower de-duplication efficiency, and would in fact increase the performance because of all the resources saved. CPU wouldn't have to work that hard, not so many accesses to the table and hashing. Not mentioning of course the massive RAM usage decrease, which would in turn increase performance as it could be used for ZFS.
I think that for home users for example, the second use case is lot more fitting, and could be done with reasonably equipped systems (which can't be said for the default record size, as everyone rightly suggests).
Thank you for reading this and even more so if you decide to explain or discuss it with me below.