"Usable capacity" is wrong. It's not slop.

Sawtaytoes · Dec 12, 2023

I noticed some size discrepancies that I'd like to figure out regarding ZFS. From what I'm seeing, it's eating terabytes (tebibytes) of disk space.

I have 2 x 2TB drives. That's 1.82 TiB per drive. If I mirror them, I'd expect to get 1.82 TiB of usable capacity:

I understand that there's 2GB of swap space reserved, but 1000GB, it's a negligible 0.002TB.

We can verify that with the OpenZFS calculator (https://www.truenas.com/community/resources/openzfs-capacity-calculator.185/):

It says because of slop (whatever that is), I should only get 1.76 TiB of usable capacity with about 1.4 TiB being usable if we avoid going over 80% capacity.

But what do I actually see?

1.32 TiB of usable capacity. WHAT? That's 73% of the total capacity. What's going on here?

TrueNAS will start emailing me if I get close to 80%, but we're already down 27%! Meaning, at the point we get the 80% warning, we're at only 58% drive capacity.

I saw an article about all this space taken from a zpool including slop space taking up 3.125%. This is by the same guy that made that calculator:

Even with the right numbers, none of this adds up.

It's worse with my dRAID zpools. I'm losing ~10 TiB for every 100 TiB (after using the dRAID calculation against my TiB); that's nearly 10%!

Even zpool get size doesn't show them the same:

Code:

# zpool get size
NAME          PROPERTY  VALUE  SOURCE
Bunnies       size      329T   -
TrueNAS-Apps  size      1.36T  -
Wolves        size      511T   -
boot-pool     size      55G    -

All these numbers are off. Wolves is made up of 60 x 9.1 TiB drives (546 TiB), but with this dRAID2:5d:15c:1s config, it should be 364 TiB. I can see losing a few TiB because of rounding errors, but the available space is 332 TiB (a loss of 32 TiB)!

TrueNAS will error at 80% usage, but I've already lost 9% in the first place. So that leaves me with only 266 TiB at 80% where I should've had 291 TiB. Still 25 TiB difference not counting the ~10 TiB of slop.

What's going on here? Where'd all my drive capacity go?

nemesis1782 · Dec 20, 2023

I was going to post looks good on my system. However looks like I also have a 9%ish loss. On the RAIDZ1 pools that consist of 1 VDEV. And on pools consisting of Single disk VDEVs.

Before all hell breaks loose, the single disk VDEVs are by design and I understand and accept the risk as it is ephemeral data.

Sawtaytoes · Dec 20, 2023

So it seems like you're seeing the same issue as me?

Is this a ZFS issue?

Sawtaytoes · Dec 20, 2023

I read these two related threads:

Not one person explains this issue. They're talking about how files take up more space, but that has nothing to do with the usable capacity. I'll explain why.

ashift
ashift might affect the total size of a vdev.

I gained 160GiB by changing from 512 Bytes to 4KB on each of my 10TB HDDs. From 8.91TiB to 9.1TiB.

If that's true, wouldn't using an ashift of 12 actually get you more capacity?

Or maybe it's that a 1kB file takes up 4kB. In that case, this would only affect filesize, not usable capacity.

recordsize
recordsize has nothing to do with total capacity. It's related to how much space each file takes up relative to metadata, but the capacity of your zpool doesn't change if you add metadata drives or not.

Also, each dataset, and even each file, can have different recordsizes. You can even change the recordsize after creating a dataset, and it won't change your usable capacity.

Slop space
As the first poster found, he's losing 1TiB unaccounted for by slop space. I'm losing 20TiB and 30TiB in my pools.

It gets worse as your zpool grows, so something's taking away a percentage that no one seems to know about. It's not slop space.

RAID-Z 2^n+p
Sure, RAID-Z has some sort of 2^n+p equation that might affect the total capacity, but so what? The usable capacity is still missing even with a mirror, so RAID-Z has nothing to do with it.

joeschmuck · Dec 20, 2023

This topic has been discussed quite a bit. I'm fairly certain there was a solid answer but maybe that was just in my mind.

Sawtaytoes said:
It says because of slop (whatever that is)

Slop is the unused bits in a block. Example (do not take this completely literally): If you have some data that consumes 515 bits of data and data is written in 512 bit blocks, the location you write the data will hold the first 512 bits, then you have to still write two more bits to equal 515 bits total. Hopefully you are following me so far. These two 512 bits of data now consume 1024 bits in total yet the data is only 515 bits. You cannot use the last 510 bits left over for anything, they become slop. This is just an example to demonstrate what slop is and how it can impact the total capacity. Think about how this can affect 4K bit blocks, it can add up quickly with Advance Format Drives, hence the ashift has a play here.

If you can imagine the physical storage as blocks and how bits are organized, then the amount of blocks needed to store the data, you should be able to visualize what slop is.

Enjoy your conversation about ZFS, Zpool, capacities, and slop.

Davvo · Dec 20, 2023

joeschmuck said:
I'm fairly certain there was a solid answer but maybe that was just in my mind.

Can confirm this, but don't actually recall where.

Sawtaytoes · Dec 20, 2023

joeschmuck said:
This topic has been discussed quite a bit. I'm fairly certain there was a solid answer but maybe that was just in my mind.

Slop is the unused bits in a block. Example (do not take this completely literally): If you have some data that consumes 515 bits of data and data is written in 512 bit blocks, the location you write the data will hold the first 512 bits, then you have to still write two more bits to equal 515 bits total. Hopefully you are following me so far. These two 512 bits of data now consume 1024 bits in total yet the data is only 515 bits. You cannot use the last 510 bits left over for anything, they become slop. This is just an example to demonstrate what slop is and how it can impact the total capacity. Think about how this can affect 4K bit blocks, it can add up quickly with Advance Format Drives, hence the ashift has a play here.

If you can imagine the physical storage as blocks and how bits are organized, then the amount of blocks needed to store the data, you should be able to visualize what slop is.

Enjoy your conversation about ZFS, Zpool, capacities, and slop.

If this explanation of slop is correct, I understand how ashift can negatively affect the number of small files you can write, but it shouldn't affect the usable capacity before you have any files written.

And if it's related to RAID-Z or dRAID, mirrors have the same issue. I'm sure stripes also have the same issue, but I haven't tested.

But then why would stripes and mirrors be affected? You haven't written any data to a mirror, so why would the usable capacity be any lower?

Here's a simple mirror example (says it should be 3.64TiB, the total size of the drive):

But in actuality, it has less space (96.4%), and then you're supposed to use up to 80% of that smaller number which means you can use only 77.1%:

The way you described slop, it's not possible for this number to drop without data. With a RAID-Z, you could calculate it, but mirrors and stripes should be unaffected.

I understand 2GiB are partitioned, but that number is negligiable when we're losing ~23GiB.

I created the same zpool with a single 3.64TiB drive:

Same usable capacity as the mirror. What's going on here?

Sawtaytoes · Dec 20, 2023

The new calculator is completely different to the old one, but it includes dRAID values now:

What am I missing? Where did my 25TiB go?

joeschmuck · Dec 21, 2023

Sawtaytoes said:
What am I missing?

About 25TiB.

Too soon? I was trying to make a funny.

Let us know when you are as bald as I am and have just accepted the values reported. You are taking the values too literal from the calculators. These are estimates. Also, the 80% rule is to retain a fast system, but once you hit 90%, write optimization goes into effect and it all slows down considerably. It's is easy to go from 80% to 90% so it is nice to have a warning and a 10% buffer to address storage concerns.

Keep searching, hopefully you will find the answer that satisfies your curiosity.

Davvo · Dec 21, 2023

ZFS keeps reserved a certain amount of space in order to avoid the pool's completely stopping even at 100% capacity. There is also space reserved for the ZIL iirc.
But I would guess it's not as big as 25TiB.

Maybe I found the thread.

Misaligned Pools and Lost Space...

Index So, it took me the reading of more than a dozen of web pages, a very informative conversation with @jgreco, 3 posts* from @SirMaster and months (well, a couple of dozens of hours actually, but as the posts were posted months apart...) to understand how all of this works (ZFS isn't that...

www.truenas.com

nemesis1782 · Dec 21, 2023

I don't really care about the lost storage space, tbh. When I started with ZFS I knew it would come at a high cost.

I do care that no one seems to be able to give an actual answer to many questions. That there is no proper/usable documentation (as far as I know), that the GUI doesn't inform the user about where the storage actually goes and worst off all that you can do things in the GUI that breaks things which could easily be avoided.

I personally thought it had to do with the way ZFS works and consists of the following:
- I remember ZFS needing quite a bit of space for house keeping. Keeping references to the files, managing versions, hashes and the sorts. Forgive my brief and inaccurate statement here. From what I read this takes up 3%
- A pool in FreeBSD seems to keep 1.5 isch%, Linux 3isch % to make sure you never fill it to 100%. Otherwise bad times
- There is a raidz overhead apparently. I found a few mentions of this. Apperently a 4 disk RAIDZ2 with ashift=12(4k sectors) has a 3.125% overhead, This can be reduced to 0.19% by setting a 1MiB recordsize. I still need to verify this though so take with a grain of salt. Also not sure if that would be taken into account for the usable space, SirMaster in (https://www.reddit.com/r/zfs/comments/4sqp5i/raidz_pool_has_less_space_than_expected/) posted a spreadsheet detailing ZFS overheas (https://docs.google.com/spreadsheet...J-Dc4ZcwUdt6fkCjpnXxAEFlyA/edit#gid=804965548) here is another in the same thread from Matt Ahrens (https://docs.google.com/spreadsheet...jHv6CGVElrPqTA0w_ZY/edit?pli=1#gid=1224630924)

In total we get to around 9% on a Linux system if what I wrote is correct. I'm sorry for providing partials without much proof. I'll see if I can find the time to dig deeper and follow this thread closely as I am quite curious.

Update: Davos has updated his previous post, please see his thread instead of what I wrote. I left the content for prosperity.

nemesis1782 · Dec 21, 2023

joeschmuck said:
Let us know when you are as bald as I am and have just accepted the values reported. You are taking the values too literal from the calculators. These are estimates. Also, the 80% rule is to retain a fast system, but once you hit 90%, write optimization goes into effect and it all slows down considerably. It's is easy to go from 80% to 90% so it is nice to have a warning and a 10% buffer to address storage concerns.

I'm sorry to be the bearer of bad news. Not everyone goes bald... I know as a balding middle aged man I feel your pain.

If you want I have a father in law who has a full head of hair. Beating him up might make us feel better :P

Davvo · Dec 21, 2023

nemesis1782 said:
Not everyone goes bald...

Every engineer goes bald, those who don't aren't true engineers!!1!11

joeschmuck · Dec 21, 2023

My father had a full head of hair, grandfather didn't. Mine is virtually all gone as well.

Davvo said:
Every engineer goes bald, those who don't aren't true engineers!!1!11

I wish I wasn't an engineer now

Sawtaytoes · Dec 24, 2023

Davvo said:
ZFS keeps reserved a certain amount of space in order to avoid the pool's completely stopping even at 100% capacity. There is also space reserved for the ZIL iirc.
But I would guess it's not as big as 25TiB.

Maybe I found the thread.

Misaligned Pools and Lost Space...

Index So, it took me the reading of more than a dozen of web pages, a very informative conversation with @jgreco, 3 posts* from @SirMaster and months (well, a couple of dozens of hours actually, but as the posts were posted months apart...) to understand how all of this works (ZFS isn't that...

www.truenas.com

I thought I already responded to those points. It can't be what's in that thread because those things affect filesizes, not capacity.

Post in thread '"Usable capacity" is wrong, and it's not slop' https://www.truenas.com/community/t...-is-wrong-and-its-not-slop.114860/post-797461

nemesis1782 said:
I don't really care about the lost storage space, tbh. When I started with ZFS I knew it would come at a high cost.

I do care that no one seems to be able to give an actual answer to many questions. That there is no proper/usable documentation (as far as I know), that the GUI doesn't inform the user about where the storage actually goes and worst off all that you can do things in the GUI that breaks things which could easily be avoided.

I personally thought it had to do with the way ZFS works and consists of the following:
- I remember ZFS needing quite a bit of space for house keeping. Keeping references to the files, managing versions, hashes and the sorts. Forgive my brief and inaccurate statement here. From what I read this takes up 3%
- A pool in FreeBSD seems to keep 1.5 isch%, Linux 3isch % to make sure you never fill it to 100%. Otherwise bad times
- There is a raidz overhead apparently. I found a few mentions of this. Apperently a 4 disk RAIDZ2 with ashift=12(4k sectors) has a 3.125% overhead, This can be reduced to 0.19% by setting a 1MiB recordsize. I still need to verify this though so take with a grain of salt. Also not sure if that would be taken into account for the usable space, SirMaster in (https://www.reddit.com/r/zfs/comments/4sqp5i/raidz_pool_has_less_space_than_expected/) posted a spreadsheet detailing ZFS overheas (https://docs.google.com/spreadsheet...J-Dc4ZcwUdt6fkCjpnXxAEFlyA/edit#gid=804965548) here is another in the same thread from Matt Ahrens (https://docs.google.com/spreadsheet...jHv6CGVElrPqTA0w_ZY/edit?pli=1#gid=1224630924)

In total we get to around 9% on a Linux system if what I wrote is correct. I'm sorry for providing partials without much proof. I'll see if I can find the time to dig deeper and follow this thread closely as I am quite curious.

Update: Davos has updated his previous post, please see his thread instead of what I wrote. I left the content for prosperity.

This is the only explanation which makes more sense, not it's still not enough because, like you said, no source.

nemesis1782 · Dec 25, 2023

Sawtaytoes said:
It can't be what's in that thread because those things affect filesizes, not capacity.

I'm not sure about that. I'll see if I have sometime to dive in because I'd like to know as well. No promises though ;)

Important Announcement for the TrueNAS Community.

"Usable capacity" is wrong. It's not slop.

Sawtaytoes

Patron

nemesis1782

Contributor

Sawtaytoes

Patron

Sawtaytoes

Patron

joeschmuck

Old Man

Davvo

MVP

Sawtaytoes

Patron

Sawtaytoes

Patron

joeschmuck

Old Man

Davvo

MVP

Misaligned Pools and Lost Space...

nemesis1782

Contributor

nemesis1782

Contributor

Davvo

MVP

joeschmuck

Old Man

Sawtaytoes

Patron

Misaligned Pools and Lost Space...

nemesis1782

Contributor

Similar threads