FreeNAS + ESXi - Performance configuration questions

Status
Not open for further replies.

hcITGuy

Cadet
Joined
Feb 24, 2018
Messages
3
Greetings, all.

I am a first time poster, years-long lurker...

My FreeNAS environment is intended to host family media services as well as various work-related projects with which I am continually engaged. I am, for example, thrilled that I can get a RancherOS node + RancherUI spun up in minutes without tons of setup.

Here's the setup right now:

FreeNas Box:
Dell R710
2 x X5670 CPU
144GB DDR3 ECC
Redundant 870W PSUs
2 x 128GB SSD (currently mirrored for boot only, since I can't sub-partition in FreeNAS without the FreeNAS kid gloves wrecking my config, would love to use these as L2ARC and the NVMe drive as SLOG)
1 x 128GB Corsair MP500 NVMe drive - L2ARC
1 x 6Gb SAS HBA (4 ports) for future external enclosure
1 x QLogic QLE8152 CNA for future 10Gb network
1 x QLogic QLE2562 (probably replaced with a 2564 soon) for FC connectivity to servers (yes, yes... I know...)
6 x 2TB Enterprise SATA disks
Supported by Liebert UPS capable of running the box for a theoretical ~60 minutes.

ESXi Hosts:
1 x R610 - 2 Procs, 144GB Memory, local mirrored boot disks, 4Gb FC card currently connected with one port.
1 x R710 - 2 Procs, 144GB Memory, local mirrored boot disks, 4Gb FC card currently connected with one port.
Neither of these is battery-backed.

Before anyone asks, I've spent very little on this setup, it's all been EOL gear either donated for picked up for a song... except for the solid state drives, of course. It's also not particularly loud in a house with an energetic Great Dane and a 4-year-old little girl who thinks she's constantly "on stage" or "an adventure". :)

One interesting tidbit is that I recently rebuilt this box because the PCIe SSD-- Not an NVMe card... an SLC NAND card with an ATA controller on it, ugh --I had been using completely shit the bed and took the box with it. Lost a little data but was able to bounce back.

I'm posting because I can't seem to crack the code to get (what I think is) appropriate performance figures out of the box. I am perfectly fine with upwards of 100GB of memory being devoted to write caching, the NVMe drive being thrashed into oblivion and needing to be replaced in a year or less and willing to accept the risk of in-flight data loss in the event of hardware failure. I may even start backing this box up to my Azure or GCP accounts.

Is anyone willing to help me work through these configuration challenges to try and squeeze some more performance out of this gear?

I am not sure which benchmarks or metrics are most relevant here, so I will start with some basic storage guy test stuff... Here are the charts from a 4K random writes, LBA-aligned, 128QD iometer session running on a W10 VM. This VM is living on a datastore derived from a zdev mounted up over 4Gb FibreChannel. I started the test probably 20 minutes ago. Prior to that, I was doing a similar test with a 50/50 R/W ratio. I am getting a very underwhelming 200-400 IOPs at latencies that are just...awful: 2-200ms

I have the pool defaulted to `sync: disabled` and `compression: LZ4`. The FC volume is not overriding that.

I've also included screenshots of the tunables page and my init scripts. Note that the autotuner is not currently enabled. The values there have been overridden by me in a couple spots. (whoops, I'll put those on a reply)


Thanks in advance!


Screen Shot 2018-02-26 at 1.37.10 PM.png Screen Shot 2018-02-26 at 1.37.50 PM.png Screen Shot 2018-02-26 at 1.39.07 PM.png Screen Shot 2018-02-26 at 1.39.42 PM.png Screen Shot 2018-02-26 at 1.39.28 PM.png Screen Shot 2018-02-26 at 1.39.52 PM.png Screen Shot 2018-02-26 at 1.40.14 PM.png Screen Shot 2018-02-26 at 1.40.24 PM.png Screen Shot 2018-02-26 at 1.40.39 PM.png
 
Last edited:

bigphil

Patron
Joined
Jan 30, 2014
Messages
486
First off...welcome to the forum! Second...why in the world would you set arc_max to 16GB when you have 144GB on your FreeNAS box??? By having this set, you're severely limiting the potential of your system. Remove all of the autotune garbage and that init srcript setting arc_max and reboot. Also, what is your pool layout? The output from "zpool status" from the FreeNAS console would help us.

Other suggestions: once you fix your arc_max issue, I'd remove the L2ARC drive and re-purpose it to something else. It doesn't look like your MP500 NVMe drive has power loss protection, so it's not a recommended SLOG device. You stated your ok with the very slim possibility of data loss in case of hardware issues, so I'd just run sync=disabled on your ESXi datasets if that is an acceptable risk to you.
 
Last edited:

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Slowing down and working through your problem methodically would be a *huge* help. Start local on the box... with compression disabled, what sort of performance do you see? Then take the next step - have you confirmed you can actually get rated speed through your fiber channel? Step through the problem piece at a time.
 

hcITGuy

Cadet
Joined
Feb 24, 2018
Messages
3
I removed the limit on the ARC size (wasn't aware this had any effect on write caching...) and discovered that the FC performance was due to being single path... I have worked with FC storage for many years and have never seen performance that was so abysmally bad as I did before I added another path and configured the RR PSP (probably because I've always worked with switched networks with many paths, rather than a DAS setup like this). Now, I can get 10-15K IOPs (50/50 read/write random) at 4K and if I run the block size up to 64-128K I am seeing around 400-500MB/s throughput, which is really all I could possibly ask for from decade-old 4Gb FC cards installed in decade-old server hardware with only 6 backing spindles.

Thanks for reading all this and being a sanity check for me. :D
 
Status
Not open for further replies.
Top