high latency with ESXi iSCSI Datastores

Jimmy Tran · Mar 13, 2016

FreeNAS 6 x 4TB raidz2 with two zVols, no SLOG
3 x ESXi mounting zvols via iSCSI

I recently added 3 new VM's into production and immediately after that I noticed my ALL my VM's starting to crawl. I checked the reports from FreeNAS and everything checked out. I went into vCenter to take a look at the datastore and noticed I would get latency from 25ms-500ms.

I use to have my VM's setup on a 4 x 4TB Mirrored and it worked really well other than I ran out of space due to my carelessness with the amount of VM"s on that datastore and snapshots running too often. I have now went back to this setup and carefully planned out what VM's will be living on this datastore. Only time will tell how this will hold up.

Question: Everything was running fine on raidz2 until I brought up 3 more VM's. Once I shut those VM's off, the latency was still there. Is that an expected behavior? Would a reboot have fixed that issue? Anyway to view the latency ESXi saw from within FreeNAS?

Mirfster · Mar 13, 2016

Jimmy Tran said:
FreeNAS 6 x 4TB raidz2 with two zVols, no SLOG
3 x ESXi mounting zvols via iSCSI

How full is your pool?
How much RAM do you have? (is it still the 4 x 16GB DDR4 ECC REG DIMM = 64 GB)
RaidZ2 is not the recommended model for iSCSI (you were correct when you were originally using mirrored vdevs)
Read: "Why iSCSI often requires more resources for the same result"
This thread may shed some light as well: "Network Services Drop"
You may benefit from a L2ARC (would wait to see what the others say)

depasseg · Mar 13, 2016

With RAID Z2, you are getting the IOPS of the single slowest drive.
I would expect the latency to return to normal after shutting them down, unless you've drastically increased the used space on your iscsi volume. This could have a linger impact and would not be fixed by a reboot.
You can see per drive stats with ' zpool iostat -v tank' and watch the zil performance with 'zilstat'. I don't know of anything specifically for latency though.

Jimmy Tran · Mar 13, 2016

Mirfster said:
How full is your pool?

How much RAM do you have? (is it still the 4 x 16GB DDR4 ECC REG DIMM = 64 GB)

RaidZ2 is not the recommended model for iSCSI (you were correct when you were originally using mirrored vdevs)

Read: "Why iSCSI often requires more resources for the same result"

This thread may shed some light as well: "Network Services Drop"

You may benefit from a L2ARC (would wait to see what the others say)

1. The total pool size is below. I can never understand how to read this. If reading it correctly, out of 10TiB, I'm using 45% after calculating the usage of replica-iscsi and vmdk-isci (what is the term for those two zvols within the raidz2-prod pool?) Interestingly enough, I have moved all data from VMDK-iSCSI but the usage still shows 35%. Why?

2. I still have 64GB of RAM and considering 128GB.
3. I learned a valuable lesson. Benchmark it before putting it into production and plan for better disk usage.
4. Wish this was available before. I did post a question there regarding VAAI. I hope it gets answered because I may have to buy a SLOG and use NFS.
5. I don't think this one is the culprit. As I'm still running vMotion to move the remainder VM's off of the RaidZ2, I'm getting latency alerts from the reads.
6. I would probably upgrade to 128 or 256 of RAM before I do this unless I will benefit more from a L2ARC. (also will wait to what others say)

depasseg said:
With RAID Z2, you are getting the IOPS of the single slowest drive.
I would expect the latency to return to normal after shutting them down, unless you've drastically increased the used space on your iscsi volume. This could have a linger impact and would not be fixed by a reboot.
You can see per drive stats with ' zpool iostat -v tank' and watch the zil performance with 'zilstat'. I don't know of anything specifically for latency though.

Latency is even occurring why no VM's are powered on, just running vMotion.

jgreco · Mar 13, 2016

Latency is the symptom of too many writes on a too-full pool that's too fragmented.

So what you want is a lot more free space and a lot more vdevs and maybe fewer writes, and your latency issues will disappear. At 45% pool usage you should consider your current storage full-full-FULL and the fact that you're getting latency is just evidence of that.

Revert to mirrors, throw more raw disk space at it, maybe add the RAM, and call it a lesson learned.

Jimmy Tran · Mar 13, 2016

jgreco said:
Latency is the symptom of too many writes on a too-full pool that's too fragmented.

So what you want is a lot more free space and a lot more vdevs and maybe fewer writes, and your latency issues will disappear. At 45% pool usage you should consider your current storage full-full-FULL and the fact that you're getting latency is just evidence of that.

Revert to mirrors, throw more raw disk space at it, maybe add the RAM, and call it a lesson learned. :)

That is exactly what I did. Data is migrated to a QNAP NAS so I can destroy the RaidZ2 and build a 4 disk mirror. Above, you said more vdev's do you mean making my mirror with more than 4 disks or do you mean make 2 mirrors?

Clarification: The 50% threshold is for the pool or per vdev?

jgreco · Mar 13, 2016

Jimmy Tran said:
That is exactly what I did. Data is migrated to a QNAP NAS so I can destroy the RaidZ2 and build a 4 disk mirror. Above, you said more vdev's do you mean making my mirror with more than 4 disks or do you mean make 2 mirrors?

Make three mirror vdevs, or buy two more disks and go so far as 4 mirror vdevs if you can.

Clarification: The 50% threshold is for the pool or per vdev?

For the pool, but in the long run they're approximately the same thing, since ZFS will try to balance things out over the available vdevs. If you add a fresh vdev to a very full pool, what'll happen is that almost all writes will be sent to that new vdev, since it looks both "very empty" and "very fast". That's a little deceptive, because over time, all that contiguous free space on the new vdev will become fragmented too. In the end, your choices will resemble the graph above. If you give ZFS a lot of free space to work with, it'll be sweet and fast at writes, even though fragmentation might be totally through the roof. But you have to remember that when I say "a lot of free space", I'm not talking 10% or 20% of your total space. I'm talking 50, 75, even 90% of your total space free. So if you need 8TB of actual VM storage, you need to plan to provide a MINIMUM of 16TB of usable pool space, and since that needs to be mirrored, that means a MINIMUM of 32TB of raw disk space.

Don't tell me how it sucks. I'm just relaying the facts. Our VM server here is built for 52TB of raw disk space (26 x 2TB) in three way mirrors to deliver 7TB of usable space. 128GB of RAM plus 768GB of L2ARC. It's pleasant, and it's a lot cheaper than the commercial alternatives, but no doubt it's a bit pricey.

Jimmy Tran · Mar 13, 2016

jgreco said:
Make three mirror vdevs, or buy two more disks and go so far as 4 mirror vdevs if you can.
If you add a fresh vdev to a very full pool, what'll happen is that almost all writes will be sent to that new vdev, since it looks both "very empty" and "very fast". That's a little deceptive, because over time, all that contiguous free space on the new vdev will become fragmented too.

Would ZFS start writing data to the new vdev even if that vdev is not presented to ESXi via iSCSI?

I think I understand you. You are recommending I take my 12 bays, and go 3 vdevs that are stripped mirrors? That would give me 24TB of usable space out of my 48TB Raw.

jgreco · Mar 13, 2016

A "virtual device" (vdev) is just a component device of the storage that makes up your pool. Your 6-drive RAIDZ2 is a single vdev. You could add another 6-drive RAIDZ2 to your pool, for a total of 12 drives, in 2 vdevs.

If you add a vdev to your pool, ZFS will use it as it sees fit. ZFS will typically do something similar to striping data across all available vdevs, making things faster for you. You have no control over that, or what things get allocated out of that vdev when writing to the pool. (You CAN have multiple pools. There are ups and downs to that).

The zvol's that you create on a pool will write data to all component vdevs of the pool. The algorithm isn't actually striping, so "striped mirrors" is a poor description of what it is actually doing. If you have mirror vdevs, the vdevs are indeed mirrors, but the action across multiple devices is only loosely "striping" in the case where everything else is roughly equal. If you take a full pool with one vdev and add a second vdev to it, approximately 100% of the writes to the pool will hit vdev #2 until things become more balanced, at which point things will start to slowly get maybe 10% written to vdev #1 and 90% to vdev #2, and then as it gets balanced, you slowly return to a roughly 50/50 situation. But RAID0-style striping doesn't exist in ZFS. It stores an entire block on one vdev, period, end of story. Any use of language such as "striped vdevs" should be taken to mean "intelligently balanced across vdevs depending on various characteristics including write performance and space availability."

I'm unclear on what your actual disk layout is, but I can tell you that you're best off with mirrors, and the more, the better. Defining the word "usable" to mean space that we actually intend to use and can use safely without worrying about ZFS performance tanking horribly, there's no way to create a mirror array out of 48TB of raw disk that winds you up with 24TB usable. You can mirror 48TB of raw space into a pool that reports 24TB of available space, which delivers you about 6-12TB of usable space, depending on where on the above graph you wish to end up.

Jimmy Tran · Mar 13, 2016

Ok, now I am thoroughly confused. I know what a vdev is, and I know what a pool is but I'm missing something here.

This is what I have:

What do you recommend I do for the best performance.
1. What is the best way to setup mirrors? Do I destroy everything, start making 2 disk mirrors? Then I would show 6 vdevs right? Then those ZFS will handle distributing the across them.
2. Do I have 3 Pools with here or do I have 3 vdevs in one pool?

Help? All my data is almost moved off and I'll be ready to recreate this once I understand what I need to do.

Rand · Mar 13, 2016

- I'd say 3 pools as you have created three items with specific layouts (Z2/Mirror/Stripe according to the graphic);)

-To get a clean start destroying and re-adding will be best.
And what @jgreco said was:
Running 6 mirror vdevs for your 12 disks will provide best performance. Thats 48TB raw, 24 TB <available> space.
Make sure to keep utilization under 50% of total space (ie around 12 TB <usable> space) to keep the performance up.

Jimmy Tran · Mar 13, 2016

Rand said:
- I'd say 3 pools as you have created three items with specific layouts (Z2/Mirror/Stripe according to the graphic);)

-To get a clean start destroying and re-adding will be best.
And what @jgreco said was:
Running 6 mirror vdevs for your 12 disks will provide best performance. Thats 48TB raw, 24 TB <available> space.
Make sure to keep utilization under 50% of total space (ie around 12 TB <usable> space) to keep the performance up.

Ok thanks for the clarification. So I have 3 pools, got it. Now everything makes sense.

If I use all 12 disks in a mirror, in one pool, I would have 24 TB available but need to stay under 50% (12TB) for usable space.

jgreco · Mar 13, 2016

Yes. Didn't realize you had multiple pools. That's ... unusual.

Jimmy Tran · Mar 13, 2016

jgreco said:
Yes. Didn't realize you had multiple pools. That's ... unusual.

I like to keep my core production VM's on a different pool than Veeam Replica's and other VM's that don't need fast storage.

If I get a nice PCI SLOG, would I benefit if I stay with iSCSI? What if I switched over to NFS after a SLOG is installed?

mav@ · Mar 14, 2016

VMware's iSCSI initiator doesn't generate synchronous writes, so it won't use SLOG by default. Same time, if your data are very critical, you may force SLOG use with setting sync=always, that will give effect close to VMware's NFS default. The opposite is also possible -- if your data are not very critical, you may set sync=disabled for specific datasets and use VMware's NFS without SLOG.

mav@ · Mar 14, 2016

Jimmy Tran said:
I like to keep my core production VM's on a different pool than Veeam Replica's and other VM's that don't need fast storage.

Splitting disk into several pools you are splitting performance in pieces. It may be fine if you want to isolate different loads, but that also means that almost for sure you will use only small piece of performance you would get if you had only one big striped pool.

jgreco · Mar 14, 2016

mav@ said:
Splitting disk into several pools you are splitting performance in pieces. It may be fine if you want to isolate different loads, but that also means that almost for sure you will use only small piece of performance you would get if you had only one big striped pool.

That depends. In a VM environment, you may be running backups for a majority of the day, in which case, you're both gaining space (RAIDZ2 is more efficient spacewise) and isolating performance-killing behaviours to a different pool.

jgreco · Mar 14, 2016

Jimmy Tran said:
If I get a nice PCI SLOG, would I benefit if I stay with iSCSI? What if I switched over to NFS after a SLOG is installed?

There is no case where you are already doing asynchronous writes where switching to a SLOG and sync writes would be faster.

Jimmy Tran · Mar 15, 2016

Thanks for the feedback guys. Lets see how my pool performs now!

Jimmy Tran · Mar 15, 2016

Looks Like I now have latency on my 4 disk mirror. Seems to happen during my backups only. Two of the hosts came back with alerts about latency, shown below. I have monitoring software where the thresholds are set so that can be adjusted as need but here are my questions because these numbers don't really mean much to me. I don't have any benchmaks to go off of.

1. Would throwing more disks to this mirror solve this issue?
2. Would changing it to faster disks help?
3. Are the numbers below something to be concerned about?

Host 1
Read Latency: 44.0 msec
Write Latency: 12.0 msec
Disk Commands: 4871
Aborted Disk Commands: 0
Data Transferred: 6.25 MB
Data Received: 69.42 MB
Bus Resets: 0
Device Latency: 39.0 msec
Kernel Latency: 0.0 msec
Queue Latency: 0.0 msec
Total Latency: 40.0 msec

Host2
Read Latency: 0.0 msec
Write Latency: 2.0 msec
Disk Commands: 977
Aborted Disk Commands: 0
Data Transferred: 600.59 KB
Data Received: 182.79 MB
Bus Resets: 0
Device Latency: 0.0 msec
Kernel Latency: 40.0 msec
Queue Latency: 0.0 msec
Total Latency: 40.0 msec

Important Announcement for the TrueNAS Community.

high latency with ESXi iSCSI Datastores

Dabbler

Doesn't know what he's talking about

FreeNAS Replicant

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Guru

Dabbler

Resident Grinch

Dabbler

iXsystems

iXsystems

Resident Grinch

Resident Grinch

Dabbler

Dabbler

Similar threads