Slow 4k single thread i/o on Truenas Scale

rynotg12

Cadet
Joined
Apr 27, 2023
Messages
5
I'm working on implementing an all NVMe san using dell a Dell R750 and 6 NVMe drives in a 2-to-a vdev config, each drive pair sharing a processor. I chose a thin provisioned pool and have attached the target to a MPIO initiator using two separate connectx-4 25gbe nics, one directly connected to the R750, the other through a FS switch.

My thin provisioned pool was setup this evening--I've seen some file systems that require a "build" time, even with mirroring, perhaps this is what's going on??

My sequential 1M read/write speeds seem to be better than expected-- 2050MB/s read, 3100MB/s write, but I cannot seem to get the random single thread 4K read/writes past 8MB/s on both ends!

Is this a reflection on the limits of iscsi connections?? If so, my dreams of a SAN with three failover cluster hosts attached will have to go out the window. Please help!
 

Attachments

  • F12FA0D1-0B88-4539-964A-D0E6F29C1267.jpeg
    F12FA0D1-0B88-4539-964A-D0E6F29C1267.jpeg
    284.2 KB · Views: 134

dl9

Cadet
Joined
Jan 11, 2023
Messages
7
I think I have a similar situation with my Dell T550. I have truenas core virtualized on proxmox.

I have 4 nvme drives (Intel p5500) in a raidz1 and when I SMB to a windows computer I get around 5 MB/s random 4k. That's abysmal. I did iSCSI and similar performance.

I did a raidz1 on proxmox and created a virtual drive and passed it through to windows and it performs much better, but not perfect. 40 MB/s random 4k.

Finally I tried SAMBA on a container on proxmox and did SMB and got the same performance as Truenas. 5 MB/s

Either it's a limitation of SMB or Dell. I'm at wit's end.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I have 4 nvme drives (Intel p5500) in a raidz1 and when I SMB to a windows computer I get around 5 MB/s random 4k. That's abysmal. I did iSCSI and similar performance.

Did you use PCIe passthru to hand the drives off to TrueNAS? Also, RAIDZ1 and iSCSI isn't going to be a good idea.
 

dl9

Cadet
Joined
Jan 11, 2023
Messages
7
Passed the drives with the pcie passthrough. The drives are not behind any controller.

iSCSI was not the goal. I was just testing to compare to SMB. I need SMB.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Passed the drives with the pcie passthrough. The drives are not behind any controller.

What happens when you do it on the bare metal? Did you follow the virtualization guide?


Basically the thing is, when you virtualize, you're taking lots of potential capacity away from the NAS, and this, in combination with other factors, can significantly negatively impact performance. I see people coming in here all the time not following the virtualization guide, using Proxmox, doing other weird things, and then problems show up. A NAS is a stressy, high I/O virtual machine, so you need to make sure you're doing stuff like making sure that your NAS is getting sufficient timeslices, perhaps even dedicating CPU cores to it, and making sure the hypervisor has no reason to starve the VM. The people who go offroading with Proxmox sometimes find it works substantially suckier than ESXi, which is one reason why I still refuse to endorse Proxmox. Try using SR-IOV with a virtual function for the network, don't use the Proxmox bridging crap. The T550 is a fairly nice looking workstation so I'm guessing that oversubscription isn't your issue.
 

dl9

Cadet
Joined
Jan 11, 2023
Messages
7
What happens when you do it on the bare metal? Did you follow the virtualization guide?


Basically the thing is, when you virtualize, you're taking lots of potential capacity away from the NAS, and this, in combination with other factors, can significantly negatively impact performance. I see people coming in here all the time not following the virtualization guide, using Proxmox, doing other weird things, and then problems show up. A NAS is a stressy, high I/O virtual machine, so you need to make sure you're doing stuff like making sure that your NAS is getting sufficient timeslices, perhaps even dedicating CPU cores to it, and making sure the hypervisor has no reason to starve the VM. The people who go offroading with Proxmox sometimes find it works substantially suckier than ESXi, which is one reason why I still refuse to endorse Proxmox. Try using SR-IOV with a virtual function for the network, don't use the Proxmox bridging crap. The T550 is a fairly nice looking workstation so I'm guessing that oversubscription isn't your issue.

I am not able to try Truenas on bare metal as I'm at a point of no return on that option, but I did ZFS on bare metal proxmox and it performs as expected 40 MB/s random 4k. Almost 10x better than Truenas (virtualized).

I think you're onto something with the proxmox networking. It could be a limitation of SMB over proxmox's bridged networks. I will investigate SR-IOV for my networking.

Thank you very much for your suggestion!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thank you very much for your suggestion!

Feel free to report back. I do most of the virtualization support and it is handy to understand what you've encountered.

Here's an intelligent guess:

The Linux bridging (which is IIRC what Proxmox uses) combined with the I/O to your PCI devices may be placing too much stress on the system for good performance. You're getting lots of interrupts per second and this is kinda split between the VM (experiencing storage related I/O) and the Proxmox host (experiencing network related I/O and having to process that over the bridge for the VM). Proxmox is a poor choice for virtualization. The VMware ESXi folks have a highly optimized and incredibly efficient software vSwitch design; by comparison, I believe that the Proxmox solution uses standard Linux kernel drivers for the ethernet card and then standard Linux bridging. It's kind of the difference between a highly tuned Indy car and your average street legal sedan.

In most cases, this is never a problem for Proxmox because many VM's are basic, pedestrian tasks like web servers, SQL servers, mail servers, etc.; things that do not generate a ton of network I/O. It's fine for them to have some minor latency in there, passing packets around from the Linux kernel to the Linux bridge to the VM guest interface, incurring some context switches etc. But that is unlikely to result in high performance networking especially if there are also other simultaneous pressures in the host system. My untested *suspicion* is that you would get a significant boost by using PCIe virtual functions, with an ethernet chipset like the Intel X710. A pseudo-interface is presented within the VM so that the hypervisor is only responsible for routing an interrupt to the VM, not actually handling the data related to it. The VM itself talks directly to the ethernet silicon and processes packets right from the wire. This offers you the opportunity to get much closer to bare metal performance levels. This is not possible on all ethernet chipsets; check your tech docs (Intel ARK for Intel chipsets, etc.)

I'm trying to give you as much as I reasonably can here on the hopes that you can run with this.
 

rynotg12

Cadet
Joined
Apr 27, 2023
Messages
5
my setup was baremetal as well--i think it may have to do with the iscsi drivers that windows/hyper-v server uses.. I've also tried to use starwinds san and microsoft windows hyper-v 2019 and get the same result...

I may try an eval version of server 2022 just to verify
 

rynotg12

Cadet
Joined
Apr 27, 2023
Messages
5
Does anyone have a tuning guide for truenas scale to increase 4k IO? Again, I'm using 4-6nvme drives on a dell R750 dividing each into mirrors across the numa nodes (ie: CPU1 mirrored to CPU2)
 

rynotg12

Cadet
Joined
Apr 27, 2023
Messages
5
Sorry for the vague question--I'm trying to increase single-thread throughput, specifically, 4000bit sized files from adjacent iscsi connected hyper-v hosts. Random IO will be the majority of queries in my 3 hyper-v host environment. We have two win file server vms, two win database vms and two win terminal vms, along with misc utility linux vms where latency isn't as important. Sequential throughput has been pretty amazing, but I'm concerned that users are going to complain about a sluggish response when making sql queries or opening small files.

My particular test is using a truenas scale bare metal install on a Dell R750 using 2 pairs of 7.6TB Samsung NVMe drives for the main data pool, along with a pair of 900GB Dell NVMe metadata drives (because the host I bought came with the two drives, along with the sata BOSS drives that run the OS)

The R750 is then connected via two pair of connectx-4 25gbe cards directly connected (bypassing a switch) to a Dell R740xd w/ a connectx-4 25gbe adapter and two Dell R630 with connectx-4 adapters. Each hyper-v host is put on its own subnet for its main iscsi connection (eventhough I've experimented w/ bridging these to one subnet, and it seems to work!)

MPIO is then achieved using a connectx-4 100gbe adapter on the truenas bare metal host going into a FS S5860, with the hyper-v hosts connected to an additional, off prod, subnet. I've connected each hyper-v to the iscsi target twice, using a least queue depth MPIO strategy.

I use perc H750 and H730 adapters on the hyper-v hosts with 3.8tb Dell SAS SSDs in a mirror array for local data storage--this achieves strong random read and writes in the 4000bit range during testing, perhaps due to caching, but it's so much faster than what I can get out of my NVMe iscsi array at the moment!

I'm $15k into this build, with two hosts purchased, numerous network cards, a new switch, and 2 months of testing. It looks like I'll have to keep my VMs tied to physical hosts instead of creating the cluster I was hoping for. Local storage appears to remain the strongest performer here, especially where sql is concerned!

For the file servers, I'm going with my R750 truenas, but sql will most likely remain on the SAS raid controller.

Thanks y'all for your input, I do appreciate it!

Ryan


Screenshot 2023-05-02 212852.png
 

rynotg12

Cadet
Joined
Apr 27, 2023
Messages
5
Just a followup for y'all--

I created a real-world test of my truenas setup using actual 4MB and 4KB sized text files. I then copied these files to and from a local mirrored sas raid array--the results were very encouraging!

I found that copying the 4KB sized files inside the exact same array actually yielded SLOWER results than copying the files from the SAS array to the Truenas iscsi target. I've compiled the results, at this point--I'm encouraged to say that crystaldiskmark may be misrepresenting real world file access and transfer capabilities, at least where my servers are concerned.

I've also benchmarked windows filesharing speed capabilities compared to iscsi--I'll use these same tests to verify that our file servers are equal or better once we fully implement the new cluster/san topology.

Ryan

Screenshot 2023-05-04 123638.jpg

sas to sas.jpg
sas to iscsi.jpg
 
Top