high latency with ESXi iSCSI Datastores

jgreco · Mar 15, 2016

That seems like a small sample size, only a few thousand disk commands.

The 44ms read latency is about the only thing that I think might be worth worrying about. ZFS is a CoW filesystem, and writes to the VM virtual disk tend to cause fragmentation. A VM backup strategy that pulls a linear read of the entire vmdk will end up doing a lot of seeking on a well-established VM (one that's had lots of random writes everywhere). Some backup software even tries that in parallel.

In ZFS we try to mitigate the resulting fragmentation (and seek/read operations) by adding ARC and then L2ARC. Complicating this in your case might be the multiple pools, but in general the way we "fix" fragmentation through the caching, so that's probably part of the answer, especially since the main problem I see is read latency. So if and when you decide you have a problem, you can consider trying to fix this through caching by adding a 256GB SSD for L2ARC, or 64GB more RAM, or both.

More disks may also help, if the pool is being driven to unreasonably busy levels during backups. The problem is that your typical hard disk just isn't really capable of a ton of IOPS, so adding more of something that might be way too slow means you've got a vaguely less-slow thing, which isn't necessarily a fix. This requires some analysis and contemplation on your part.

Jimmy Tran · Mar 15, 2016

Ok, go RAM and L2ARC are both a possibility. What recommendations do you have for L2ARC? Hopefully it isn't as expensive as the SLOG ones...

How do we fix fragmentation? Periodically migrate the VM's?

Does NFS have this fragmentation issue?

jgreco · Mar 15, 2016

First, yes, anything that causes random block-style write/rewrite access to a ZFS system will fragment over time. This has nothing to do with NFS or iSCSI; a database stored on a local ZFS filesystem will exhibit it just as readily. We just tend to think about iSCSI and NFS because these are commonly used with hypervisors for VM disk storage.

The good news is that you can literally go and get the cheapest SSD's that you can find and they're almost certainly faster than your HDD pool. You may not want to actually do THAT, but the special magic characteristic that you want for L2ARC is the random seek speed.

Two SSD's are better than one. You'll want to make some tweaks to the L2ARC eviction process, I still suggest the numbers from this post:

https://forums.freenas.org/index.ph...ed-from-raid-iscsi-and-afp.40102/#post-250945

Be aware that the sizing of ARC to L2ARC is a tricky thing. We normally recommend 1:4 as a safe starting point, so if you have 64GB, add a 256GB L2ARC. But the trick here is that you MIGHT be able to go farther than 1:4, such as 1:8. So add a single 256GB SSD today and then let it run a week under load. If the ARC isn't stressed, you can add another. Or add more RAM. Or both.

For SATA use, the inexpensive Intel 535 or Samsung 850 EVO units are probably fine. Do be aware that you may burn through their TBW ratings faster than the warranty says, but losing an L2ARC device isn't fatal, and if you manage to kill it in three years, a replacement will be twice as hardy at a quarter of today's prices.

If you really want to hit the turbo button, the NVMe based Samsung 950 Pro is really nice. It is capable of about 3x the speed of a 6Gbps SATA SSD on linear reads (1.6GBytes/sec) and the price isn't bad. The Intel 750 is even better (2.3GBytes/sec).

Migrating VM's back and forth may or may not reduce fragmentation. If you do enough of it often enough, yes, it reduces fragmentation, but boy what a pain. I tell ya, the VM filer here has about 7TB of VM storage on it, and it seems easier to me just to throw 1TB of L2ARC at it and let it cache everything it is making good use of, and let it hit the pool for the rest.

Jimmy Tran · Mar 15, 2016

I'm looking to get the Intel 750. Do you recommend getting that or upping the RAM to 128GB or both?

All I see is a 400GB. Will that be recommended if I have 64GB or 128GB of RAM?

jgreco · Mar 16, 2016

The conventional wisdom around here has been a ratio of about 1:4 or 1:5 ARC:L2ARC if you "don't know". That's still a good idea, but the realities of practical system design don't always give you just what you want.

The 400GB will likely be okay at 64GB, but there's an unlikely possibility that if it is stressing the ARC that you might have to detach it and create a smaller partition for the L2ARC (~250-300GB).

The VM filer here is 128GB RAM and has 768GB L2ARC right now that's full and consuming 9GB of ARC space with headers. That seems equitable to me.

mav@ · Mar 16, 2016

With FreeNAS 9.10 planned very soon, memory use by L2ARC headers should reduce almost in half. It should make L2ARC size limitations even less strict. It does not mean that huge L2ARC will be useful, but at least it should be harmless.

jgreco · Mar 16, 2016

Also a good read to understand why more RAM is better when using ZFS for VM disk storage:

https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/

Jimmy Tran · Mar 20, 2016

Ok, go I got my Intel 750 GB. After doing some reaserach on how to add a L2ARC, it looks like I can only apply it to a single pool. So now I have to decide on something.

1. I will probably move all my Lab, low priority VM's to a NAS so I can keep my usage on my disk low. I have another NAS laying around so I can free up space on my FreeNAS.
2. With my 12 disks, do I go 2 vdevs, 6 drives mirrored in each?
3. Or do I go with 6 vdevs with 2 drives mirrored in each?
4. I was readying another post by @jgreco and he mentioned "Ten two-way mirrors. Wait six months and then add a third drive to each mirror. Three hot spares." I thought you can't add additional disks to an existing vdev?
5. Can someone send me a link to properly setup a L2ARC. Can I do it from the GUI, volume manager and select cache for the type of drive? It gives me a error about associating it with a pool so I don't know how to get past that.

mav@ · Mar 20, 2016

Jimmy Tran said:
2. With my 12 disks, do I go 2 vdevs, 6 drives mirrored in each?

It is theoretically possible, but hardly you want 6x redundancy for your data. :)

Jimmy Tran said:
3. Or do I go with 6 vdevs with 2 drives mirrored in each?

Yes, this is a way to go.

Jimmy Tran said:
4. I was readying another post by @jgreco and he mentioned "Ten two-way mirrors. Wait six months and then add a third drive to each mirror. Three hot spares." I thought you can't add additional disks to an existing vdev?

You can't add disk to RAIDZ to increase its size, but you can freely add/remove disks to mirrors to change the level of redundancy.

Jimmy Tran said:
5. Can someone send me a link to properly setup a L2ARC. Can I do it from the GUI, volume manager and select cache for the type of drive? It gives me a error about associating it with a pool so I don't know how to get past that.

GUI should do it. It would be good it you quoted the error exactly.

Jimmy Tran · Mar 20, 2016

mav@ said:
It is theoretically possible, but hardly you want 6x redundancy for your data. :)

So maybe I'm saying it wrong. So what I did was go to the volume manager, and in the volume layout I went 3 across for a mirror, then I dragged the slider down one row to go 3 across and 2 down. Is a vdev with a 6 disk mirror or a vdem with a 3 disk mirror? Please help me correct my terminology so I can provide the community with the proper setup :)

How do I freely add and remove disks to mirrors? I don't see how to do that from the GUI.

mav@ · Mar 20, 2016

Jimmy Tran said:
So what I did was go to the volume manager, and in the volume layout I went 3 across for a mirror, then I dragged the slider down one row to go 3 across and 2 down. Is a vdev with a 6 disk mirror or a vdem with a 3 disk mirror?

There is estimated capacity indicator. The way it shows bigger capacity is right. :) IIRC moving right increases redundancy, while moving down increases number of vdevs.

Jimmy Tran said:
How do I freely add and remove disks to mirrors? I don't see how to do that from the GUI.

Hmm. I don't see it in the GUI now, but ZFS definitely allows that from command line.

Mlovelace · Mar 20, 2016

:)

mav@ said:
IIRC moving right increases redundancy, while moving down increases number of vdevs.

You're correct. Dragging the slider to the right is how many drives in the vDev; dragging the slider down is the number of vDevs in the pool. You specify the redundancy with the drop down menu but freeNAS tries to suggest the redundancy when you move the slider.

Ericloewe · Mar 20, 2016

mav@ said:
Hmm. I don't see it in the GUI now, but ZFS definitely allows that from command line.

It's been scheduled "for the next major release" for a couple of years now. We've been telling people that they have to export the pool, import it with the CLI, do the operations, export it via the CLI and import it in the GUI again.

jgreco · Mar 20, 2016

Ericloewe said:
It's been scheduled "for the next major release" for a couple of years now. We've been telling people that they have to export the pool, import it with the CLI, do the operations, export it via the CLI and import it in the GUI again.

Yeah, it's kind of funny, I keep hearing how TrueNAS is used by so many places for VM datastores, but it's missing what would be considered basic functionality by most storage admins. Sigh.

Jimmy Tran · Mar 20, 2016

Cache drive is now in place. I left my two pools alone. The other pool doesn't have a need for caching. I added another mirror to a total of 3 mirrored vdevs.

I guess the next step is to left the cache build and use arcstat to monitor the hits? Anything else I should monitor?

jgreco · Mar 21, 2016

Jimmy Tran said:
it looks like I can only apply it to a single pool.

It *is* possible to split an L2ARC into two partitions and apply one to each pool, but you're probably better avoiding that if you can. You should have a "performance" pool of mirrors and a different pool for RAIDZ. But since you should already be expecting crap performance from the RAIDZ pool compared to the mirror one, you're probably better just making the mirror pool do what it does as well as it can.

jgreco · Mar 21, 2016

Jimmy Tran said:
I guess the next step is to left the cache build and use arcstat to monitor the hits? Anything else I should monitor?

arc_summary.py gives you a longer-term overview.

jgreco · Mar 21, 2016

Jimmy Tran said:
I guess the next step is to left the cache build and use arcstat to monitor the hits?

Oh, also, arcstat puts out sorta-crap by default. You can run something like

% arcstat.py -f time,hits,miss,read,hit%,l2hits,l2miss,l2hit%,l2bytes 1

or if you run a wider tty on ssh,

% arcstat.py -f time,hits,miss,read,hit%,arcsz,l2hits,l2miss,l2hit%,l2bytes,l2size 1

Adjust to suit.

Jimmy Tran · Apr 2, 2016

So after 2 weeks this is what my ARC looks like. Any suggestions on how I can tweak this?

sfcredfox · Jul 3, 2016

Jimmy Tran,

Did you do anymore work on your ARC performance since this? My system characteristics are similar and my L2ARC performance is terrible from what I can tell also. I'm trying to research everything in the forum before I create a post about it.

Important Announcement for the TrueNAS Community.

high latency with ESXi iSCSI Datastores

Resident Grinch

Dabbler

Resident Grinch

Dabbler

Resident Grinch

iXsystems

Resident Grinch

Dabbler

iXsystems

Dabbler

Attachments

iXsystems

Guru

Server Wrangler

Resident Grinch

Dabbler

Resident Grinch

Resident Grinch

Resident Grinch

Dabbler

Patron

Similar threads