Slow storage/iscsi performance due to "cbb" devices are 100% utilized

Amit.I

Cadet
Joined
Nov 4, 2019
Messages
6
Hi,
We are running FreeNAS 11.2-RELEASE-U1 for quite a while.
We have an ESX is running VMs over a lun that is located on the FreeNAS, connected over ISCSI.
Lately, it seems like we have reached some bottleneck in our storage system, causing our VMware VMs to run very slowly.

We tried to identify the root cause of the slowness, but we are still not sure what is the problem.
CPU is around 20% used
RAM is used 7GB out of 64
Network interface uses ~0.7Gb/s (while the connection to the ESX is 10Gb/s)
And "iostat -x 3" shows all the SSDs in the pool are utilized to ~50%

The one suspicious thing that we did was in iostat, but it wasn't the SSDs.
There are some"cbb" devices, and part of them (in our case cbb3 and cbb2) are utilized to ~100%
1573578566212.png

We suspect this is the issue, but have no idea what are these devices.

Sadly, the internet didn't help us much with understanding what are these devices, and they don't even appear in /dev/.
We did manage to understand that there is one "cbb" per ISCSI extent, but we have no idea what that means, and how can we improve it.

A debug file of the FreeNAS machine attached, as far as I understand, it should also contain the hardware information.
Help anyone? :)
 

Attachments

  • debug-cloud-20191112190153.tgz
    1.2 MB · Views: 355

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
From what I can tell, cbb appears to represent the iscsi extent. On my setup, I can get above 90% busy on it even when the disks are mostly idle - probably because the 4GB data using to test is in ARC. For me the kind of data being accessed matters. If I test with crystaldiskmark on a VM, (random writes 4k q1d1) keeps cbb %b at around 2%, while the sequential test pushes it above 90%.

My CPU ends up being between 60-80% for the streaming tests, but only about 5% for the random tests. However, I am able to get about 18gbs during these tests to my vms over iscsi.

What kind of CPU do you have and what is the workload being performed? The %b seems to correlate to CPU, and how much data it can process over the connection.

I assume you've also done some iperf testing to validate that your 10gb connection to your VMs works as expected?
 

Amit.I

Cadet
Joined
Nov 4, 2019
Messages
6
Hello mgittelman, and thanks for your fast reply!
Iperf between a machine on the ESX and the FreeNAS server shows 8.31 Gbits/sec, so I believe the network connection is healthy.
The CPU is: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
And it doesn't seem to work hard - load average is ~4.5 and the CPU has 20 cores, and the workload seems to be balanced between the cores (there is not core running close to 100%).

Any idea?
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
What are those VMs doing? How many of them are there?

If it was me, I'd shut down/pause all the VMs and test one of them doing something specific (I'm using crystal bench on Windows). But then, I'd be doing that in a lab, not a prod environment.

You could also set up an iscsi share to a windows machine directly and test the performance there, especially if you happen to have one on 10gbe. For that matter, if you can I'd test smb transfers of large files for comparison.

From your info on iostat, I'm making the assumption you are correct that the SSDs are not overloaded. I usually use gstat to check the busy time of the drives as well (it has pretty colors that make it easy to see.) Just to be sure though would be good to know the pool configuration for the SSDs, and how many of them there are. Also what type. Also, this is a reach, but what is the block size on the zvol? I know different sizes can perform differently.
 

Amit.I

Cadet
Joined
Nov 4, 2019
Messages
6
Most of the VMs are Linux machines running some data analysis software, and there are a few hundreds of them.
I am afraid we cannot stop all the VMs as this is a production environment, and I don't have any equivalent equipment to set up a lab.

I thought maybe it would be good to try mapping a new disk that is located on an NFS datastore and compare the performance between them. I believe this may help us know is the problem is related to the ISCSI configuration. What do you think?

The pool consists of two RAIDz2 arrays, each one has 12 SSDs (The disks are Crucial MX300 2TB).
There are several zvols inside, most of the have volblocksize of 64K (there is one that is 32K but it's an old one)
By the way, the extents logical block size is configured to the default, which is 512.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
I'm the only one who seems to be responding so you are stuck with my opinion for now, but I'm not convinced running that many VMs on RAIDZ2, even if they are SSDs is a good idea. Usually you'd go with mirrors. Especially the MX300 which is not exactly a high end model. If you are doing data analysis, is that data all on disk that it's pulling? Or are we talking about number crunching? It helps to know the particular disk load of the software you are running.

Block size of 64K is also a little large - mine are all set to 16K and I think that was by default. This will affect random access speeds the most, which is what really matters for VMs.

What do disk speed tests within FreeNAS tell you? Knowing how your pool/zvols are configured, I'd skip troubleshooting iSCSI vs NFS performance issues for now.

Your CPU, while not overloaded because of the # of cores, might still get bottle necked if iSCSI isn't multi-threaded enough since it's clock speed isn't exactly super high. I don't know the particulars of how iSCSI works, or if it spawns multiple processes for each connection to spread core load.

Do you have any spare drive bays and/or SSDs where you can set up a test pool that's not under load?

I'm also surprised that RAM usage is 7GB for hundreds of VMs. What is the actual ARC size/usage?
 

Amit.I

Cadet
Joined
Nov 4, 2019
Messages
6
Okay, after reading what I wrote again I understood it might sound much worse than it is. Only tens of our VMs really work hard, and the data they analyzed is being received from the network. They use the disk only to save the results of the analysis.

The block size of 64K is the FreeNAS default - I rechecked that. If the block size was the problem, I would expect the iostat to show that the disks are close to 100% busy. don't you think?

A local disk check on the FreeNAS with dd if=/dev/zero of=./output bs=5M count=1024 shows about 10-14Gb/s, and the same test on a VM that is located on the same pool shows 0.272Gb/s. So I can hardly believe the problem is in the disks.

I don't have spare bays nor free drive sockets to perform a more sterile environment, but I believe we can test most of the things on this one.

The used ram is around 7-8GB, and there are additional 50GB cached, which is the ARC (as also shown in the FreeNAS UI).

I have got more to tell, but I will do that in another comment so it will be easier to read :)
 

Amit.I

Cadet
Joined
Nov 4, 2019
Messages
6
So we ran a series of tests on the pool today, and they made me think maybe the problem is not necessarily in the FreeNAS server itself, but in the ESXi servers.

We tried to run the dd command (with conv=fsync this time) and measure the write speed of the disks over several ways:
1. On a VM's disk that is located on an ISCSI datastore: ~0.27Gb/s
2. On a VM's disk that is located on an NFS datastore: ~0.27Gb/s
3. On a VM's *mount* to the same NFS location: ~2.64Gb/s.
4. On a VM's disk that is located directly on FreeNAS over ISCSI (without a datastore): ~4.4Gb/s

So, maybe, the problem is somewhere in some ESXi parameters that make its communication to the FreeNAS to be inefficient.
Do you have any ideas what could it be?

BTY I didn't mention earlier, we are running 9 hosts of ESXi 6.5.0 that use mostly the same datastores, all on the same FreeNAS server over ISCSI.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
Is conv=fsync the equivalent of forcing sync all? Not familiar with that. What are the sync settings on the zvol?

Your tests do seem to point to a datastore configuration issue in vmware. Is the network/vlan path for those datastores configured exactly the same as for the iSCSI tests? It's certainly possible to route your datastore traffic over different networks/vswitches than the network the VM would actually see the FreeNAS box over. I could see a situation where a vmkernel is attached to the wrong vlan, and everything is sent through a less ideal path. Your earlier iperf tests wouldn't necessarily account for that as they'd be using the network the VM itself is on.

Was anything customized when creating those datastores? Maybe create a new zvol/datastore and put a VM on it to test that?

4.4Gb/s (I assume that is bits, and not bytes?) to an iSCSI mount isn't that bad especially for RAIDZ2 and disks in use by other VMs, so it would seem it's not an issue with the FreeNAS box and it's capabilities.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
using esxtop look at the kavg, davg, and gavg numbers. this will help identify where the latency is coming from.
 

Amit.I

Cadet
Joined
Nov 4, 2019
Messages
6
Just updating here the developments of the last few days.
Three days ago, out of nowhere, our FreeNAS server got totally stuck and became unresponsive.
Login attempts failed as the shell just got stuck after the password prompt on both SSH and IDRAC console.
We had no other options but to force reboot the server, and it came up normally as nothing happened.
No clue in the logs for a problem and the reporting graphs were unavailable because of a bug in the installed version of FreeNAS.

We decided to make lemonade out of the lemons and upgraded the FreeNAS to 11.2-U6.
Since this then, we do not encounter the performance problem anymore, probably because it will take a few days until the environment will be back to the same load of VMs it was before.

Meanwhile, we are starting to upgrade our ESXi servers to 6.7U3, with the hope it will solve the performance problem we saw in the connection in the storage.

I will update here again in a few days after we finish all the upgrades and get back to full usage of the environment, and let you know if there was some improvement.

Many thanks for your help!
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Yeah your performance sounds about right depending on access patterns. Perhaps a little on the slower side of what I would expect but your not using striped mirrors your using RAIDz.

Look at my last post about esxtop.
 
Top