[SOLVED] iSCSI - Performance and Reliability Headaches

hutchers

Cadet
Joined
Jun 19, 2019
Messages
8
Hello,

As part of one of the projects I’m working on, I've been tasked with testing out the viability of using FreeNAS to host our VMs.

I came across a decommissioned Dell unit that looked like a decent test bed.

FN1 Rig Specification:
Dell PowerEdge T620 Chassis
2 x Intel Xeon E5-2659v2 2.4ghz 12c
64GB DDR3 1333mhz ECCC Memory
4 x 15k 600gb SAS Drives
10Gtek LSI 9211-8I HBA Card
Dual Port Intel 10GbE NIC - Single interface going in to a core switch.
120gb Toshiba SSD for FreeNAS 11.2 Boot.


Single pool of disks configured in RAIDZ-1, 1.54tb
Single zvol 1.0tb (under the 80% recommendation).

I've set up a virtual host that is also connected to the core switch - also running on a 10GbE card.
I have the ISCSI mapped and the storage is showing correctly, F:iData_FN1
Hyper-V Manager is installed on this host - I have several virtual disks that I want to run from this FN1 unit.

I've copied several test files over from local storage on this host to FN1, around 6-10gb in size - the performance looked incredibly quick at first but then found at the end of the transfers, the transfer would hang, and the web interface would bomb out.

Today I attempted to copy a 340gb virtual hard disk over - the transfer rates started at around 1.3gbps for a few seconds then dropped to 400mbs, then to 0 -7/8mbps. Again, the FN web interface fell over during transfer and took a while to recover after cancelling off the transfer.

We also tested creating an SMB share with 4 SATA disks (RAIDZ-1) and attempted the same transfer - the performance was much slower but was more consistent.

Are there any special requirements for running iSCSI? We plan to have the VMs running in a failover cluster with clustered storage to several FN units if we can get around this issue.

Thanks in advance to anyone who has any ideas - let me know if you need any additional info.

Neil :)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Single pool of disks configured in RAIDZ-1, 1.54tb
Single zvol 1.0tb (under the 80% recommendation).
There is part of the problem. The system you are testing with is massively under-provisioned.
Please review these resources:

Why iSCSI often requires more resources for the same result (block storage)
https://www.ixsystems.com/community...res-more-resources-for-the-same-result.28178/

Some differences between RAIDZ and mirrors, and why we use mirrors for block storage (iSCSI)
https://www.ixsystems.com/community...and-why-we-use-mirrors-for-block-storage.112/

Again, the FN web interface fell over during transfer and took a while to recover after cancelling off the transfer.
Exactly which update of FreeNAS 11.2 are you using?
Are there any special requirements for running iSCSI? We plan to have the VMs running in a failover cluster with clustered storage to several FN units if we can get around this issue.
Yes, there are a massive number of special considerations for the use you describe.
That type of installation, you will likely need to get with iXsystems and obtain TrueNAS systems as the TrueNAS systems can be configured with redundant head units to provide full fault tolerance. Here is a video that demonstrates that:
https://www.youtube.com/watch?v=GG_NvKuh530
 

hutchers

Cadet
Joined
Jun 19, 2019
Messages
8
Good morning Chris,

Thank you for your reply and links.

There is part of the problem. The system you are testing with is massively under-provisioned.
In order for me to establish where we're going wrong, could you please clarify what you mean by under-provisioned?
Do you mean in terms of the amount of storage we've configured for this single test - as this wouldn't be the final solution.
Would the same apply if opting to use SMB?

Exactly which update of FreeNAS 11.2 are you using?
11.2-U4.1

It's worth mentioning we currently run the majority of our VMs in a clustered environment off a Netgear ReadyData 5200 (block level). We have a second unit not currently in use that we'd like to put the FN OS on but also have budget to purchase some new equipment, possibly from iX.

UPDATE: I have removed the ISCSI zvol, created a an SMB share on the same disks then re-tried the above copy. Currently transferring between 30-48MB/s.

Cheers
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In order for me to establish where we're going wrong, could you please clarify what you mean by under-provisioned?

As built your system is ripe for failure. @Chris Moore has linked to some resources on that. I guess we don't actually have a resource that is optimized towards explaining the fragmentation and pool occupancy issues you can run into with ZFS. You can find me discussing this often, search jgreco fragmentation delphix or something like that in the forum search.

Specific suggestions (suggest doing all of these)

- add lots more disks
- destroy RAIDZ and replace with mirror vdevs
- add RAM or L2ARC (or better yet both) if you can. 64GB is the bare minimum I'd suggest for any serious iSCSI.
- the recommendation for block storage is to remain somewhere in the 10-50% utilization range to maximize long-term pool performance, 80% will be catastrophic in the long run even if it seems to work in the short term
- replace 600GB disks with much bigger disks (this gets back to bullet point 4)

ZFS uses computer science trickery to deliver high performance. You give ZFS lots more resources than a conventional RAID SAN, and it can make your HDD-based storage seem a lot closer to SSD performance. That same CS trickery sabotages you if you give it minimal resources.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
In order for me to establish where we're going wrong, could you please clarify what you mean by under-provisioned?
I think that @jgreco gave you some great guidance and I am not saying anything to contradict that, but to clarify what I was saying and to add a little more information, I was primarily looking at the relatively small amount of RAM and the single RAIDz1 vdev.

To ensure we are talking the same language, you might want to review this terminology guide:

Terminology and Abbreviations Primer
https://www.ixsystems.com/community/threads/terminology-and-abbreviations-primer.28174/

Also, for a brief overview of ZFS, you might want to look at this introductory guide:

Slideshow explaining VDev, zpool, ZIL and L2ARC
https://www.ixsystems.com/community...ning-vdev-zpool-zil-and-l2arc-for-noobs.7775/

To summarize, but please read the guides, you can think of each vdev in a ZFS pool as having the performance characteristics of a single constituent disk. So, if you create a pool that only has one vdev, which is what you did, you only have roughly the IO capability of one of the disks in the vdev. The exact details vary depending on the nature of the vdev.
The IO to your pool was initially fast because of the way ZFS uses RAM to cache IO, but slowed to the speed of the disk subsystem once the RAM was full and the cache had to be flushed. The GUI was probably not able to respond because the system load was very high.
In very general terms, more vdevs equates to more IO capacity.
 

hutchers

Cadet
Joined
Jun 19, 2019
Messages
8
Thanks both of you for taking the time to come back to me with more info - it's appreciated.

Jgreco, we have replaced the 600gb SAS drives with 6 x 2tb Seagate SSHD drives. - as per your recommendation I will mirror these in to three vdevs.

Taking a step back from going the block level route for just a moment - how would you envision such a set up if we went down the SMB route?

The IO to your pool was initially fast because of the way ZFS uses RAM to cache IO, but slowed to the speed of the disk subsystem once the RAM was full and the cache had to be flushed. The GUI was probably not able to respond because the system load was very high.
In very general terms, more vdevs equates to more IO capacity.

This makes perfect sense to what we've been seeing now.

I'm off to do some light reading.
 

hutchers

Cadet
Joined
Jun 19, 2019
Messages
8
So just to finish up on this post, it appears the issues I've been having with speed has been a combination of insufficient disk provisioning as previous suggested as well as a dodgy switch.

Thanks for all suggestions, links and info.
 
Top