Compression and ESXi datastores?

NightNetworks · Sep 14, 2015

So I was playing around with lz4 compression and iSCSI and discovered that a 20GB Virtual Machine only consumes 9.5GB when using lz4, great!!! Not so fast...

As we are all aware the compression is occurring on the FreeNAS box and is transparent to the end users/devices. So my question is can I over-allocate a iSCSI File Extant on FreeNAS?

Lets just say for the sake of conversation that I have a 500GB hard drive and I create a 500GB file extant. I then place a 500GB Virtual Disk on that file extant... To ESXi that drive appears full however back on the FreeNAS box its really only consuming 250GB of the file extant. The only way that I would be able to claim that space and use it in ESXi is if I could then extend the 500GB file extant to 750GB. Thus allowing me to get around the fact that ESXi is not aware of the compression that is happening on the back end. However this could only be done if FreeNAS supports over-allocating the file extant.

Thanks!

Darren Myers · Sep 14, 2015

Dont fill an iSCSI share over 50%, if you have a 500Gb extant, you shouldnt fill it over 250Gb, if you have a 1Tb dont fill it over 500Gb. This is well documented in the FreeNAS manual

NightNetworks · Sep 15, 2015

Darren Myers said:
Dont fill an iSCSI share over 50%, if you have a 500Gb extant, you shouldnt fill it over 250Gb, if you have a 1Tb dont fill it over 500Gb. This is well documented in the FreeNAS manual

FYI:
The above was for example purpose just to keep the numbers/example simple.

So my understanding was that the pool/volume itself could not exceed 50% usage? Your post is suggesting that the extant should not exceed 50%, so going back to my original question can I over allocate the extant thus negating the whole 50% usage? For example if I am using a 500GB hard drive could I just create two 500GB file extant's and only use 250GB on each extant. This would avoid the 50% usage of the extant and allow me to use all of the space on the drive, however I think that the 50% usage is for the whole drive\volume.

Also can someone please help me to understand the fragmentation issue? Half of the information out there suggests that it is pretty much the same as disk fragmentation on a Windows 8 PC, but there is no real way of correcting it like there is on a regular PC. Then the other half of the information suggests that it is something completely different and no where near the same as the disk fragmentation that you have on a Windows PC.

Thanks

HoneyBadger · Sep 15, 2015

No, the "50% rule" applies to pools/volumes, not extents. If you make a bunch of extents filled to <50%, but the sum total of that usage results in your pool going over 50%, you'll still have a performance reduction.

So in your example, you can make two sparse zvols of 500GB each, but you'll want to stop filling them when their *compressed* size is roughly equal to 250GB combined. Using your rough 2:1 compression example, that means you can put 250GB on each. So you might as well just fill a single extent to 500GB and reduce the risk of running out of space on your volume.

Half of the information out there suggests that it is pretty much the same as disk fragmentation on a Windows 8 PC, but there is no real way of correcting it like there is on a regular PC.

I'd say this is a closer analogy. There's no "zfs-defrag" but it can soft-correct it by having enough space for copy-on-write to allocate contiguous blocks. Eventually the "old data" is invalidated in big enough chunks that it makes contiguous holes.

mav@ · Sep 15, 2015

Overallocation is supported feature of FreeNAS, but it should be used with care. By default FreeNAS tries to keep user from creating zvols bigger then 80% of available pool space, that can potentially be a user mistake. But for qualified user there is a checkbox to force the wanted size. So you can create sparse extent even bigger then the whole pool size, if you know what you are doing.

From its own side FreeNAS provides available space thresholds warnings on both extent and pool levels, that will start bugging your VM servers every five minutes through VAAI if you reach those configured levels. But it is your duty to monitor those events.

As additional method of strict protection you can create non-sparse zvol/dataset with space reservation set to some part (20-50%) of your pool capacity. After that, you won't be able to occupy that space on pool in any way, but it won't lead to pool fragmentation itself, since there is no real data in this zvol/dataset, only a reservation. If you try to use that space, ZFS will return write error, that will be delivered to VM and handled there as Stun VAAI primitive, freezing VMs until you resolve the situation. This method can perfectly coexist with available space thresholds, so VM servers will also be notified before state get critical. For example, you can configure your NAS to start send threshold warnings when 50% reached, and return hard error (Stun) at 80%, if it is ever reached.

NightNetworks · Sep 15, 2015

HoneyBadger said:
I'd say this is a closer analogy. There's no "zfs-defrag" but it can soft-correct it by having enough space for copy-on-write to allocate contiguous blocks. Eventually the "old data" is invalidated in big enough chunks that it makes contiguous holes.

So would it be safe to say that the "fragmentation its self" is the same in FreeNAS as it is on a Windows PC... the only real difference is how that fragmentation is then dealt with? And if that is true then would it also not be true that using SSDs would resolve the fragmentation issue as SSDs are not de-fragmented even on a Windows PC. Nor would FreeNAS even have the ability to write contiguous blocks of data since the SSDs wear leveling algorithms would override anything that FreeNAS tried doing.

Thus concluding that if I were to use an SSD drive for my iSCSI file extents that fragmentation would not ever be a problem, theoretically. Correct?

Thanks

NightNetworks · Sep 15, 2015

mav@ said:
Overallocation is supported feature of FreeNAS, but it should be used with care. By default FreeNAS tries to keep user from creating zvols bigger then 80% of available pool space, that can potentially be a user mistake. But for qualified user there is a checkbox to force the wanted size. So you can create sparse extent even bigger then the whole pool size, if you know what you are doing.

From its own side FreeNAS provides available space thresholds warnings on both extent and pool levels, that will start bugging your VM servers every five minutes through VAAI if you reach those configured levels. But it is your duty to monitor those events.

As additional method of strict protection you can create non-sparse zvol/dataset with space reservation set to some part (20-50%) of your pool capacity. After that, you won't be able to occupy that space on pool in any way, but it won't lead to pool fragmentation itself, since there is no real data in this zvol/dataset, only a reservation. If you try to use that space, ZFS will return write error, that will be delivered to VM and handled there as Stun VAAI primitive, freezing VMs until you resolve the situation. This method can perfectly coexist with available space thresholds, so VM servers will also be notified before state get critical. For example, you can configure your NAS to start send threshold warnings when 50% reached, and return hard error (Stun) at 80%, if it is ever reached.

So basically what your saying is the risk with over allocation (fragmentation aside) is that ESXi would not really be aware of the true amount of hard disk space and may attempt to write data to space that is not really there? Another words its OK to use over-allocation, but you just need to properly monitor and manage it from both ESXi, the VMs, and FreeNAS. Correct?

Also you mention "zvols" can be over-allocated I am not using "zvols" I decided to create "datasets" and then use file extents instead of the blocks... I found that I got better performance with my setup when using this method. So the over-allocation would have to take place at the file extent not the "dataset"? Is that possible?

Thanks

HoneyBadger · Sep 15, 2015

In theory the impact of fragmentation is significantly reduced on all-flash pools, but in theory, communism works. Your mileage may vary.

There's a line at which ZFS switches to an aggressive "use the free space more efficiently" which will tank performance, and I can't recall if it's 80% or 90% under FreeNAS.

Edit: What version of FreeNAS are you running? Zvols should be faster as of 9.3

NightNetworks · Sep 15, 2015

HoneyBadger said:
In theory the impact of fragmentation is significantly reduced on all-flash pools, but in theory, communism works. Your mileage may vary.

There's a line at which ZFS switches to an aggressive "use the free space more efficiently" which will tank performance, and I can't recall if it's 80% or 90% under FreeNAS.

First thank you so much for the information... really a big help!

I get what you are saying... Well 80% seems to be the number that is thrown around a lot in this forum however I have read in a number of locations outside of this site that it has since been raised to 95%... I would say 80% is the safe choice if you want to risk it split the difference and go 90%... lol idk

FYI:
My current setup has iSCSI running off of SSDs.

Thank you again, very helpful.

mav@ · Sep 15, 2015

NightNetworks said:
So basically what your saying is the risk with over allocation (fragmentation aside) is that ESXi would not really be aware of the true amount of hard disk space and may attempt to write data to space that is not really there? Another words its OK to use over-allocation, but you just need to properly monitor and manage it from both ESXi, the VMs, and FreeNAS. Correct?

Yes, and Yes.

NightNetworks said:
Also you mention "zvols" can be over-allocated I am not using "zvols" I decided to create "datasets" and then use file extents instead of the blocks... I found that I got better performance with my setup when using this method. So the over-allocation would have to take place at the file extent not the "dataset"? Is that possible?

Theoretically you can do over-allocation with file-based extents, but the problem is that they don't support UNMAP now. It means that when you one day delete some unneeded VMs, it won't get you your pool space back. With zvols same it works much better -- WMware still requires you to run UNMAP process manually from the command like, but finally it give you your space back. I would not even start to think about serious overallocation without UNMAP support.

NightNetworks · Sep 15, 2015

HoneyBadger said:
Edit: What version of FreeNAS are you running? Zvols should be faster as of 9.3

I am using FreeNAS-9.3-STABLE-201509022158

Its a new install.

NightNetworks · Sep 15, 2015

mav@ said:
Yes, and Yes.

Theoretically you can do over-allocation with file-based extents, but the problem is that they don't support UNMAP now. It means that when you one day delete some unneeded VMs, it won't get you your pool space back. With zvols same it works much better -- WMware still requires you to run UNMAP process manually from the command like, but finally it give you your space back. I would not even start to think about serious overallocation without UNMAP support.

Thank you so much for the information, really is very helpful!

Wait WHAT? ...mind blown...lol

Well that changes things a bit... So there is no way of getting the space back even after deleting all of the VMs? What if you removed the whole extent I am assuming that would give you the space back correct? Just to confirm this is only an issue if I start over-allocating resources otherwise I am safe?

mav@ · Sep 15, 2015

Extent itself does not take space. It is either backing file of zvol that take it. If you delete file or zvol, then you'll definitely get the space back. But since VM datastore usually stores multiple VMs same time I would not bet on that scenario. But as I have told, for zvols there is UNMAP support, that with some efforts will allow you to return freed space without deleting everything and starting from scratch.

mav@ · Sep 15, 2015

Without overallocation you may be safe from pool overflow. But still not using UNMAP you quite likely get pool usage higher then needed, that as was already told may increase fragmentation, etc.

NightNetworks · Sep 15, 2015

So I just checked... I have a 100GB File Extent on my ESXi server.

The ESXi server is reporting that of the 100GB 85.34GB is in use. FreeNAS is reporting that the dataset where the File Extent is stored is using 80.1GB, not sure why the datset is using less then ESXi, I would assume this is due to block size or something as I do NOT have compression turned on for this dataset. So it would appear that looking at these numbers that the above scenario that you were talking about has not started to play out yet I am going to keep going with it and see what happens just for my own personal curiosity.

Thanks though for the heads up!

cyberjock · Sep 15, 2015

NightNetworks said:
So basically what your saying is the risk with over allocation (fragmentation aside) is that ESXi would not really be aware of the true amount of hard disk space and may attempt to write data to space that is not really there? Another words its OK to use over-allocation, but you just need to properly monitor and manage it from both ESXi, the VMs, and FreeNAS. Correct?

You do NOT want to overallocate.. ever. The other thing to keep in mind is that the smaller your blocks are (which can be affected by high fragmentation) the more metadata you'll have to store, which means even more space used by ZFS for storing the zvol's physical locations. So you'll end up in this feedback loop that will not end well for you or your data, and the only fix is to destroy the zpool and recreate it to undo the massive metadata you'll have to deal with.

NightNetworks said:
Also you mention "zvols" can be over-allocated I am not using "zvols" I decided to create "datasets" and then use file extents instead of the blocks... I found that I got better performance with my setup when using this method. So the over-allocation would have to take place at the file extent not the "dataset"? Is that possible?

Thanks

Honestly, I read that and about spit out my Coke onto my laptop. There is no way that a file based extent is faster than a zvol. The difference in performance is staggering, and if you are going to argue that file-based is faster then your testing conditions must not be equal. ZVOL is something like "up to 240% faster" than file based.

There's a bunch of other reasons to go with zvol, but just trust us... you want zvol over filebased. The opposite was/is true if you are/were using pre-9.3 builds. But on 9.3+ you WANT zvols. There is nothing to be gained by using file based.. except you can copy it off the server as a file if you needed to get off of ZFS.

Oh, and you without a doubt want lz4 compression.

Have you read any of the stuff in the forums about this? It's almost like you are wanting to do the opposite of what we all call "best practices" for VMs.

NightNetworks · Sep 15, 2015

cyberjock said:
You do NOT want to overallocate.. ever. The other thing to keep in mind is that the smaller your blocks are (which can be affected by high fragmentation) the more metadata you'll have to store, which means even more space used by ZFS for storing the zvol's physical locations. So you'll end up in this feedback loop that will not end well for you or your data, and the only fix is to destroy the zpool and recreate it to undo the massive metadata you'll have to deal with.

Honestly, I read that and about spit out my Coke onto my laptop. There is no way that a file based extent is faster than a zvol. The difference in performance is staggering, and if you are going to argue that file-based is faster then your testing conditions must not be equal. ZVOL is something like "up to 240% faster" than file based.

There's a bunch of other reasons to go with zvol, but just trust us... you want zvol over filebased. The opposite was/is true if you are/were using pre-9.3 builds. But on 9.3+ you WANT zvols. There is nothing to be gained by using file based.. except you can copy it off the server as a file if you needed to get off of ZFS.

Oh, and you without a doubt want lz4 compression.

Have you read any of the stuff in the forums about this? It's almost like you are wanting to do the opposite of what we all call "best practices" for VMs.

So I am not really sure what to say/ask....

Its not that I am purposely doing the "opposite of what we all call best practices for VMs" if that was the case I would have just done it and not posted any questions or really cared about any of the responses in this post including yours... As you are aware there area a lot of posts on this forum and some information contradicts other information such as this post. Fore example you are saying not to ever over-allocate resources, but mav@ is saying that it is OK. So I am just trying to post and gather some information and determine what is best...

Over-allocation:
So are you saying not to over-allocate due to the whole block size issue and the associated amount of meta data that would then be generated? Or are there additional reasons for not doing this?

ZVOL Performance:
So it would appear that ZVOL performance is better (or at least should be better) so I will take your advice and re-evaluate that choice, thanks for the input.

Compression:
Whats kind of performance impact will the compression have? Seems to be a lot of debate surrounding the real world performance impact of LZ4 compression, real time compression with NO performance impact just seems to good t be true?

Thank you in advance!

cyberjock · Sep 15, 2015

mav didn't say it was OK. He said it was supported. There are ways that you can have free space that the WebGUI might not be able to allocate, and so you can override the setting. Likewise doing sparse files is also dangerous, but there is the option to override. I've seen quite a few users lose everything because they did sparse files and didn't watch it until one day all of their VMs went offline. One company had to run a 2 day chkdsk and lost like 80% of their files as a result of file system corruption in the iscsi extent. Whoops!

I do not recommend overallocating because it will almost always end badly. You'll get complacent after setting the server up and a year later you'll be looking at a broken FreeNAS system and not entirely sure what is wrong.

zvols are the king, by a long shot. I have no idea how you came to the conclusion that file-based is better, but I can assure you that is not the case. ;)

LZ4 is a free lunch, assuming you aren't doing something incredibly silly like trying to use a Celeron CPU. Even an old P4 can compress at 500MB/sec and decompress at something like 1.2GB/sec. LZ4 was meant to be a free lunch. It doesn't provide great compression, but some is better than none. That's why LZ4 is the default. ;)

NightNetworks · Sep 16, 2015

cyberjock said:
mav didn't say it was OK. He said it was supported. There are ways that you can have free space that the WebGUI might not be able to allocate, and so you can override the setting. Likewise doing sparse files is also dangerous, but there is the option to override. I've seen quite a few users lose everything because they did sparse files and didn't watch it until one day all of their VMs went offline. One company had to run a 2 day chkdsk and lost like 80% of their files as a result of file system corruption in the iscsi extent. Whoops!

I do not recommend overallocating because it will almost always end badly. You'll get complacent after setting the server up and a year later you'll be looking at a broken FreeNAS system and not entirely sure what is wrong.

zvols are the king, by a long shot. I have no idea how you came to the conclusion that file-based is better, but I can assure you that is not the case. ;)

LZ4 is a free lunch, assuming you aren't doing something incredibly silly like trying to use a Celeron CPU. Even an old P4 can compress at 500MB/sec and decompress at something like 1.2GB/sec. LZ4 was meant to be a free lunch. It doesn't provide great compression, but some is better than none. That's why LZ4 is the default. ;)

Ok, everything your saying sounds reasonable.

I have already created my ZVOL and moved everything to it.... then destroyed the file extant, thanks (it is better)

I think I may have goofed =( I am using the following CPU, Intel Celeron G1840 (http://ark.intel.com/products/80800/Intel-Celeron-Processor-G1840-2M-Cache-2_80-GHz). Did I screw up? Please keep in mind that this is being used in a home setup... and I am not running any jails off of the NAS nor do I plan to (not sure if that makes a difference or not). I tired out the compression last night and I was still able to fully saturate the 1Gbit LAN connection....

Thank you so much!

HoneyBadger · Sep 16, 2015

It's a Celeron yes, but it's still a dual core Haswell; it will be faster than some of the old dual core Xeons I've seen used. Gigabit will be your bottleneck way before compression.

Important Announcement for the TrueNAS Community.

Compression and ESXi datastores?

Explorer

Guru

Explorer

actually does care

iXsystems

Explorer

Explorer

actually does care

Explorer

iXsystems

Explorer

Explorer

iXsystems

iXsystems

Explorer

Inactive Account

Explorer

Inactive Account

Explorer

actually does care

Similar threads