Actual space usage in pool statistics

ubergosu

Cadet
Joined
Feb 26, 2022
Messages
9
Hello!

How does used space for datasets in "Storage" tab relate to actual drive usage when using compression and deduplication?

For example in first pool, main, first dataset miran-replica is shown to use 3.78 Tb with compression ratio 3.85 and deduplicaiton. Does it mean that if we move its contents to uncompressed dataset with deduplication, it will use 3.78*4.07 ≈ 15.4 Tb ?

How can we see what amount of space does deduplication save? For example, we'll try to copy this data with rsync to another server. It will use 15.4 Tb of raw traffic, or even more due to deduplication?

Also is it true that when we have to transfer replica of this dataset (for example to another truenas pool, local or remote) we have to transfer 3.78 Tb of raw traffic and it will actually use 3.78 Tb on target drive?


1648822950359.png
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
First off (and this is admittedly a pet peeve), you are using units incorrectly. A Tb is a terabit. Which is different than a TB (a terabyte) or a TiB, which is a tebibyte. A terabyte is equal to 8 terabits. A terabyte is also equal to 1000 gigabytes. A tebibyte is equal to 1024 gibibytes.

How does used space for datasets in "Storage" tab relate to actual drive usage when using compression and deduplication?
Used space is the actual space on the disk.

How can we see what amount of space does deduplication save? For example, we'll try to copy this data with rsync to another server. It will use 15.4 Tb of raw traffic, or even more due to deduplication?

The compression ratio is just compression. There is a separate Dedup ratio.

Deduplication in ZFS can appear to work somewhat counter-intuitively. ZFS will deduplicate blocks across an entire pool, and does not limit itself to just the dataset that it is in. Put another way, marking a dataset as "Dedup On" means that, for files in that dataset, ZFS is allowed to use blocks that already exist anywhere else in the pool to "complete" those files and prevent duplicated blocks.

As such, Dedup ratio is reported on a pool-by-pool basis. You can get it from the command line by running: zpool get all <pool name>

For example in first pool, main, first dataset miran-replica is shown to use 3.78 Tb with compression ratio 3.85 and deduplicaiton. Does it mean that if we move its contents to uncompressed dataset with deduplication, it will use 3.78*4.07 ≈ 15.4 Tb ?
Yes, kinda. Again, deduplication is kinda weird.

Let's imagine that you have two identical files. We copy these files into two datasets with deduplication. When we copy the first file onto the first dataset, that dateset will report 100% of the file's size. When we copy the second file onto the second dataset, that file will deduplicate with the first file, and that dateset will report 0 bytes used.

However, copying either dataset out will use the same amount of data.

Also is it true that when we have to transfer replica of this dataset (for example to another truenas pool, local or remote) we have to transfer 3.78 Tb of raw traffic and it will actually use 3.78 Tb on target drive?
That is only true if the exact same duplicated blocks are available on the destination pool.
 

ubergosu

Cadet
Joined
Feb 26, 2022
Messages
9
Thank you for your reply.

A Tb is a terabit. Which is different than a TB (a terabyte) or a TiB, which is a tebibyte. A terabyte is equal to 8 terabits. A terabyte is also equal to 1000 gigabytes. A tebibyte is equal to 1024 gibibytes.
My mistake, of course I mean TiB and GiB (*bytes). It's a kind of my bad habit. As for power of 10 vs power of 2 I believe this may be some difference in regional standards and habits. In early 90s when personal computers became common in my country everybody used prefixes as power of 2 (and it was a kind of newbie mark when someone used megas and gigas like they are standard SI units). And there were even governmental recommendations about using k as 1024 and so on. But it's really annoying and it would be much easier for everyone if powers of 2 were never used.

Used space is the actual space on the disk.
Nice.

ZFS will deduplicate blocks across an entire pool, and does not limit itself to just the dataset that it is in.
Thanks, that was not clear for me. So even if dedup is off for pool, but there are datasets with dedup on on it, they will share common blocks? And if pool has dedup on , all its datasets are actually deduped?

Let's imagine that you have two identical files. We copy these files into two datasets with deduplication. ... that dateset will report 0 bytes used.
That's clear, thanks. Exactly like I understood it.

That is only true if the exact same duplicated blocks are available on the destination pool.
So replication of a dataset (not entire pool) to another pool may lead to bigger allocated space or smaller, depending on actual blocks on both pools? This leads to a conclusion: we can't have exact estimate on how much space would we need on target pool. If target pool already has most of blocks, actually allocated space for replica will be smaller. If source dataset did use blocks from outside of it and that blocks are not replicated, its replica will be bigger on target device than pool statistics showed for source dataset. Am I correct here?

Apart from theoretical interest I have a practical task. We have 3 TrueNASes in separate datacenters we use as storages for our backups (let's say, they are 3 Tb each). We are establishing an off-site server which should hold replicas of all 3 servers. How large should be our storage? Now I see that if we replicate all pools, target server should have all 3 replicas on one pool for dedup to run across all three replicas. But resulting storage should not be less than 9 Tb in case dedup between datasets won't be effective.

And another question related to replication. It involves snapshots and do snapshots also use common deduplicated blocks with its parent dataset or pool?
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
So even if dedup is off for pool, but there are datasets with dedup on on it, they will share common blocks? And if pool has dedup on , all its datasets are actually deduped?
Yes and yes.

So replication of a dataset (not entire pool) to another pool may lead to bigger allocated space or smaller, depending on actual blocks on both pools? This leads to a conclusion: we can't have exact estimate on how much space would we need on target pool. If target pool already has most of blocks, actually allocated space for replica will be smaller. If source dataset did use blocks from outside of it and that blocks are not replicated, its replica will be bigger on target device than pool statistics showed for source dataset. Am I correct here?
Yup.

Apart from theoretical interest I have a practical task. We have 3 TrueNASes in separate datacenters we use as storages for our backups (let's say, they are 3 Tb each). We are establishing an off-site server which should hold replicas of all 3 servers. How large should be our storage? Now I see that if we replicate all pools, target server should have all 3 replicas on one pool for dedup to run across all three replicas. But resulting storage should not be less than 9 Tb in case dedup between datasets won't be effective.
There's that pesky Tb again :tongue:. (Again, it's a pet peeve; don't take it personally)

It's not clear that your off-site server should only have one pool. It may be smarter, from a data segregation perspective, to maintain three separate pools on that server. That really depends on your use case, but it's what I would suggest in these kind of applications. However, if you are limited to only one set of disks, that kind of makes the decision for you.

9TB would, for this example, be your safe lower limit. However, you may definitely be able to lower that, if you can somehow characterize your data in a way that you're confident that your dedup ratio between servers would be high. For example, if you're doing full file-system backups of Windows machines, you can be pretty dang confident that the data is highly redundant between servers.

Unfortunately, I know of no easy way to calculate this, other than just testing it.

And another question related to replication. It involves snapshots and do snapshots also use common deduplicated blocks with its parent dataset or pool?
Yes. A good way to think of snapshots is just pointers to a point-in-time configuration of the dataset. When ZFS writes updated data, it always performs a copy-on-write: make a copy of the existing data, merge it with the new data, and then write that modified data to disk, leaving the original data untouched. Then, it can mark the original data as unneeded (or overwritable). However, if there's a dataset, then that dataset continues to point to that original data, so that original data is left alone, and not marked as overwritable.

A consequence of this behavior is that snapshots and deduplication can cause sudden and unexpected ballooning in your dataset. For example, imagine that I have a dozen Windows clients that back up to a server with deduplication. Great, now I'm basically using one client worth of data to store a dozen clients. But if each client gets a snapshot at a different point in time (aka, with different files), then deduplication is useless on those snapshots, and suddenly I'm dealing with 12x that amount of data. Definitely not insurmountable, but something to be mindful of.

A good solution to this problem is to use a fast server to store the most recent week or so of data (maximizing the space saving from small snapshots and deduplication, and therefore saving money on fast storage), and then use a large and slow server to store older backups (maximizing the cost efficiency of huge storage arrays).
 

ubergosu

Cadet
Joined
Feb 26, 2022
Messages
9
Thank you very much for your detailed reply!

It may be smarter, from a data segregation perspective, to maintain three separate pools on that server. That really depends on your use case, but it's what I would suggest in these kind of applications. However, if you are limited to only one set of disks, that kind of makes the decision for you.

That's my case - i'm rather limited. It is a kind of experiment - we're exploring offsite backups and are not sure how it will look finally. So I'm not ready to spend twice or three times money for constructing proper array (mirror or zN pools) until we are sure if we can get replication keeping up with new data (offsite server has good uplink but onsite servers have limited disk performance and constant write load so full initial replication can take days). So we'll consdier having multiple pools after proof-of-concept solution will work.

9TB would, for this example, be your safe lower limit.

Nice, came to same conclusion. Sorry again about lowercase b :) .

For example, if you're doing full file-system backups of Windows machines, you can be pretty dang confident that the data is highly redundant between servers.

Unfortunatly, they're Linux (still 2-3 Gb on each VM are common, yes). These are mostly backups of databases, so most of occupied space is user data. They are compressed rather well (3-4 times). We're using deduplication because user data changes slowly and we take daily diff backups and full weekly backups of VMs.

I didn't explicitly said this before, but we use fast ssd local drives in servers with hypervisors where VMs are running, and these servers do backups to TrueNAS (they export VMs and their diff snapshots to NFS shares on TrueNAS). VMs do not run from TrueNAS. So on TrueNAS we will have for each VM, e.g. 2 full backups (last week and week before it) and 14 diffs . Because data changes slowly and mostly is being added, not rewritten, full backups would not have dramatic differences, but diffs will be likely different from each other. So we expect that deduplication will save space approximatly equal to the size of 1 full backup minus 14 diffs. For example, zpool get all shows dedupratio 1.68x on one share, so my calculation is probably correct.

I have a dozen Windows clients that back up to a server with deduplication. Great, now I'm basically using one client worth of data to store a dozen clients. But if each client gets a snapshot at a different point in time (aka, with different files), then deduplication is useless on those snapshots, and suddenly I'm dealing with 12x that amount of data.

You mean that difference of data between snapshots will be multiplied by 12 (total size of files/blocks that have been changed/written in VMs), not total size of VM will be multiplied?

A good solution to this problem is to use a fast server to store the most recent week or so of data

As I'we said before we use SSDs in servers which run VMs and VMs are backed up to spinning disks on TrueNASes as VM images. These TrueNASes replicate to off-site TrueNAS. I appreciate your advice that we should have fast storage on first TrueNAS to run VMs from it and then replicate it to slower and larger second TrueNAS but it will require us to change our fault tolerance model and to establish more advanced network and hardware configuration. We'll consider it after some time, but now we aren't ready for it.
 
Top