iSCSI performance, slow read speed.

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
Hi all.
Help solve the problem with low sequential read speed. It is 2 times lower than the write speed.
Installed TrueNAS Scale:
OS Version:TrueNAS-SCALE-22.02.4
Product:ProLiant XL420 Gen9 (HP Apollo 4200 Gen9)
Model:Intel(R) Xeon(R) CPU E5-2643 v3 @ 3.40GHz
Memory: 1 TiB
disk:
pool 1: 2x7.68Tb PCIe NMVe Intel , mirror
pool 2: 16x3.84Tb SATA Intel,Toshoba, raid-z2
pool 3: 16x3.84Tb SATA Intel, Toshoba, raid-z2

fio test results on the server itself:
Read:
TrueNAS fio read.jpg

Write:
TrueNAS fio write.jpg

The record is low because pool 99% busy.
No need to pay attention to the record now.

ESXi and TrueNAS Connection Diagram

TrueNAS(4x10Gb)LACP:4 IP address <=> Cisco Nexus 5672UP <=> ESXi6.7 2x10Gb (no LACP),2 IP address
LACP: IP+MAC
MTU: 9000
The packet loss switch does not fix.

Link speed 10Gbit. iperf3
TrueNAS => ESXi
tcp speed test TR to ESXi.jpg


ESXi => TrueNAS

tcp speed test ESXi to Tr.jpg


When you test the disk speed from a Linux CentOS 7 virtual machine, you get the following results:
Only this virtual machine runs on ESXi. No Storage policies are used.
iSCSI: No multipath
Read:
Linux test Speed read.jpg


Write:

Linux test Speed write.jpg



Here you can see that the read speed is almost 2 times lower than the write speed.

If you use multithreading ( 4ip * 2ip = 8 threads), then the read speed will increase to 1600-1700Mb/s, the write speed will increase to 2200Mb/s
while the read speed of each stream is no more than 200-220Mb/s
The switch does not fix interface congestion.

I suspect that iSCSI-scst + ZFS needs to be tuned.
Please help me figure it out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Help solve the problem with low sequential read speed. It is 2 times lower than the write speed.

The solution is to make sure you've got your entire working set in ARC+L2ARC. See point #11 in


Writes go fast because writes always go to the ZFS write cache in main memory, and this generally happens as fast as network+CPU+memory allows.

Reads must necessarily read from somewhere, and if you actually have to go out and pull data off of pool devices, that is necessarily going to be slow because ZFS cannot predict what the next block is that is going to be required when using iSCSI. It cannot read ahead when it has no idea what needs to be prefetched. Your only option to get fast read speeds is to make sure your working set (the data frequently accessed) is cached in ARC, so you need gobs of RAM to "fix" this problem.

Additionally, you've shot yourself in the foot by using SCALE unnecessarily. Linux memory management sucks and it has trouble optimizing the ARC; by default, you may only be using half your memory for ARC. CORE is a much better choice for iSCSI service.

Related reading material:


I suspect that iSCSI-scst + ZFS needs to be tuned.
Please help me figure it out.

It's actually an exercise in tuning your expectations. RAM is inherently much faster than flash. Writes happen staged to RAM, and this can be managed within a small(ish) amount of RAM. However, to make all reads equally fast, you would need to cache a huge percentage of the pool in RAM.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
ESXi and TrueNAS Connection Diagram

TrueNAS(4x10Gb)LACP:4 IP address <=> Cisco Nexus 5672UP <=> ESXi6.7 2x10Gb (no LACP),2 IP address
LACP: IP+MAC
MTU: 9000
The packet loss switch does not fix.

iSCSI: No multipath

Can I have some clarity around this point? You would generally want to use iSCSI MPIO for multipathing, not LACP.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Can I have some clarity around this point? You would generally want to use iSCSI MPIO for multipathing, not LACP.

Yeah, I didn't have the strength to get into a battle over proper IPv4 network design this morning, since the user had already implied that this was probably a bad design.

See

 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
Additionally, you've shot yourself in the foot by using SCALE unnecessarily.
Thanks for the quick response.
Using TrueNAS SCALE was a forced measure.
Unfortunately, the p840ar + SCSI Extender controller driver does not work correctly in TrueNAS CORE.
The Server HP Apollo 4200 Gen9 uses a 16 channel p840ar disk controller that is connected to an internal SCSI Extender.
If you start the server with empty disk slots and then insert disks, TrueNAS Core will not see them. TreeNAS SCALE handles this procedure correctly.
I installed the LSI9300 IT controller instead of the HP p840ar controller. The result was the same. If you insert a new disc, the disc is not recognized,
until you reboot TrueNAS CORE.
TrueNAS core does not work correctly with HP's built-in SCSI Extender.

Also, TrueNAS Scale showed the expected result when tested on this hardware.
When using the fio test, simultaneously direct connection to 16 SSD drives without a file system (16 channel controller), TrueNAS scale showed a speed of 7.6Gb / s, which is the limit for the PCI-Express 3.0 bus
To my regret, TrueNAS CORE didn't show such results even close this hardware.
I understand that TrueNAS CORE is a "tried and tested" product, but it has significant issues with HP hardware drivers that I found in testing.
Historically, we have only such iron. And TrueNAS SCALE gives hope to use it.

We also got a bad performance test with intel ssd pcie 4500 series 8tb drives on TrueNAS CORE. Mirror mode, 2 disks.
Default configuration. Reading speed 300Mb/s. I haven't been able to find a solution to this problem on the internet or forums.

TrueNAS SCALE showed good results when using an intel ssd pcie 4500 series 8tb disk.
 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
Can I have some clarity around this point? You would generally want to use iSCSI MPIO for multipathing, not LACP.

iSCSI MPIO + LACP network design works and traffic is distributed equally to physical interfaces, as shown by Cisco Nexus statistics.
But I agree with you, because. it's a heap of technology. And in the future I will redesign the network to use 4 subnets. I will give up LACP.

But even now, the current network design is not the bottleneck.

What worries me the most is:
- Why is the read speed using 1 channel per 10Gbit 600MB/s? The channel is idle and iperf shows
that there are no problems on the channel and it gives out completely 10Gbit.
When using MPIO, the storage system can deliver at a speed of ~ 1700Mb / s, So the disk system + ARC can give at a speed of more than 1.2 Gb / s
and therefore the 10Gbit channel can be 100% full. But it doesn't happen. esxtop on ESXi shows latency:
DAVG/cmd = ~30ms is a lot
KAVG/cmd = ~ less than 1ms
- Why, in a local speed test with sequential reading and cache off, fio shows a speed of ~ 1600Mb / s, and zpool iostat shows ~ 300Mb / s
how is this possible?
I just rebooted the server, the cache is empty and off. I run a simple reading test and get an inexplicable result.

cfg.jpg

zpool iostat.jpg


fio test.jpg
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks for the quick response.
Using TrueNAS SCALE was a forced measure.
Unfortunately, the p840ar + SCSI Extender controller driver does not work correctly in TrueNAS CORE.

It also doesn't work correctly on SCALE. Please do not use a RAID controller on a ZFS system. See


The Server HP Apollo 4200 Gen9 uses a 16 channel p840ar disk controller that is connected to an internal SCSI Extender.

It is not a "SCSI Extender". It is an "SAS Expander".

I installed the LSI9300 IT controller instead of the HP p840ar controller. The result was the same. If you insert a new disc, the disc is not recognized,
until you reboot TrueNAS CORE.

This is typical misbehaviour when using a RAID controller. RAID controllers generally do not allow the host OS to know that there has been a SAS topology change; the RAID card is busy trying to create virtual disk abstractions for use as virtual RAID volumes, and there is no mechanism to report this back to the host OS.

TrueNAS core does not work correctly with HP's built-in SCSI Extender.

The HP SAS expander has been used successfully in the past with LSI HBA's, so it is likely you are mistaken.

Also, TrueNAS Scale showed the expected result when tested on this hardware.
When using the fio test, simultaneously direct connection to 16 SSD drives without a file system (16 channel controller), TrueNAS scale showed a speed of 7.6Gb / s, which is the limit for the PCI-Express 3.0 bus
To my regret, TrueNAS CORE didn't show such results even close this hardware.

Yes, once again, please refer to the above article on not using RAID cards. This is not supported, isn't supposed to work, and will likely end up harming your pool and your data down the road in unexpected ways. The FreeBSD driver for the HP controller is the two decades old CISS driver, which is total crap. While the Linux driver might perform better, it is equally crap as it also misses lots of the finer points needed for ZFS to work correctly. DO NOT USE CISS BASED RAID CONTROLLERS.

I understand that TrueNAS CORE is a "tried and tested" product, but it has significant issues with HP hardware drivers that I found in testing.
Historically, we have only such iron. And TrueNAS SCALE gives hope to use it.

Well, then, let me dash those hopes, brutally, on the rocky shores of a distant uninhabited island. This should not be used and is not expected to work on either Scale or Core. Being partially successful in getting it to work should not be taken as encouragement, no matter how tempting it is to be optimistic about it. The "problems" you think you found with Core are simply there because you failed to follow the recommended hardware guide for TrueNAS. Scale will have other problems, more subtle problems. Don't do it.
 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
This is typical misbehaviour when using a RAID controller. RAID controllers generally do not allow the host OS to know that there has been a SAS topology change; the RAID card is busy trying to create virtual disk abstractions for use as virtual RAID volumes, and there is no mechanism to report this back to the host OS.
The HP SAS expander has been used successfully in the past with LSI HBA's, so it is likely you are mistaken.

But with LSI SAS 9300-16i (RTL) PCI-Ex8, 16-port SAS/SATA 12Gb/s + HP SAS Extender, exactly the same behavior in TrueNAS core.

I tested on two HP Apollo 4200 Gen9 servers. Instead of the built-in p840ar, I used an LSI SAS 9300-16i (this is a pure HBA adapter).
On one server, two HP SAS Exteners on LFF disks (2x12LFF)
On the second server, two HP SAS Exteners on SFF disks (2x24LFF)

I installed TrueNAS CORE on these servers and did a simple online drive removal and installation test.
I did not set up pools, I just watched how the system detects disks. The result is the same, LSI SAS 9300-16i did not detect newly inserted disks.
I don't think I was wrong in what I saw. This looks like a fact.
 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
It also doesn't work correctly on SCALE. Please do not use a RAID controller on a ZFS system. See
Scale will have other problems, more subtle problems. Don't do it.
Thank you very much for the link to the article. I read it.
I consider these risks.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The result is the same, LSI SAS 9300-16i did not detect newly inserted disks

Did you request a camcontrol rescan? In theory you shouldn't need to, but, you know, computers.

Also, were you running the correct firmware on the 9300? TrueNAS does require a very specific firmware version. The major phase (the first octet in the version number) HAS to match the firmware on the card, and if the minor numbers do not match what is documented, it is very likely that there will be some broken functionality, possibly minor.

The result is the same, LSI SAS 9300-16i did not detect newly inserted disks.
I don't think I was wrong in what I saw. This looks like a fact.

Well that's not a fact, because it's known to work. Perhaps it didn't work through your HP SAS Expander, which would be eyebrow raising but not impossible. In such case, replace the SAS Expander with one known to work. Just another reason to trash talk HP and their crappy gear, heh.
 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
Also, were you running the correct firmware on the 9300? TrueNAS does require a very specific firmware version.
I checked the firmware version (tested 2 months ago, but did not write down the firmware version, just checked) and it did not arouse my suspicions.
I'll be sure to write it down on my next test.

The p840ar controller in HBA mode sends data to the "camcontrol devlist" and "smartctl" commands, which gives little hope that it is working in HBA mode.

1669116687757.png


1669116742074.png



I understand that you disapprove of (reasonably) the use of RAID controllers in HBA mode, but still:
(We will not touch on the topic of data security. I accept the risks and this problem is solved by other technologies)

Even this hardware (with the controller in HBA mode) is capable of extracting data from disks of at least 2GB/s.
In this case, the reaction speed of the disk is almost always less than 1ms. Basically, according to zpool iostat, the disks are idle.

I suggest getting back to the topic of iSCSI-scst performance.
Why can't 1 iSCSI stream fill the entire 10Gbit bandwidth?
Why is storage response, from an ESXi point of view, ~30ms on a single iSCSI thread?
This is a lot with a free 10Gbit channel and a disk response time on the storage system of less than 1ms (~200us) according to zpool iostat.
Let me remind you that the read speed is 600MB / s, with a storage response time of 30ms, from the point of view of ESXi.
When using MPIO in 8 threads, the response time, from the point of view of ESXi, is ~15-19ms, while the read speed increases to 1700Mb/s.
Can you help in solving this problem?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The p840ar controller in HBA mode sends data to the "camcontrol devlist" and "smartctl" commands, which gives little hope that it is working in HBA mode.

It does not HAVE an "HBA mode". It is using the CISS driver, which is not acceptable and not reliable.

I understand that you disapprove of (reasonably) the use of RAID controllers in HBA mode, but still:

It does not HAVE an "HBA mode". I have a bunch of crappy CISS RAID gear in inventory here. It works kind-of okay for a bare metal machine that is running FFS on FreeBSD or EXT3 on Linux. It is NOT safe for ZFS.

Even this hardware (with the controller in HBA mode) is capable of extracting data from disks of at least 2GB/s.

With CISS? On FreeBSD? Color me doubtful. It was never a high performance driver.

I suggest getting back to the topic of iSCSI-scst performance.
Why can't 1 iSCSI stream fill the entire 10Gbit bandwidth?

Because now you're talking Linux, and that's not recommended for iSCSI because it's known to work poorly due to the Linux implementation. iXsystems funded the development of integrated kernel iSCSI target mode on FreeBSD in order to make the performance there EXCELLENT, so if you follow all the recommendations and read everything that mav has ever written on the topic, you might very well get an iSCSI stream tuned to be able to fill 10Gbps.

This is a lot with a free 10Gbit channel and a disk response time on the storage system of less than 1ms (~200us) according to zpool iostat.

That's a fat sack of crap. Your typical HDD is capable of, at best, maybe 300 seeks per second. 250 seeks per second is commonly used in the industry as an optimistic value. That means disk response time of about 4ms, which is much more realistic and much closer to manufacturer posted specifications. So I'm going to call out your numbers as crazy.

Can you help in solving this problem?

I could, but you are being very resistant to the information that's being offered, so I really don't have any motivation here to try to help you if you're just going to argue every point. I lean little-l libertarian, which means that I respect your right to be incorrect, and I will not and know I cannot force you to listen to the things I'm saying. However, I think you do yourself a disservice by failing to listen to someone who has taken the time and effort to write thousands of responses on these kinds of topics over many years.
 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
Thank you for your patience and answers to my questions.
"In order to ask the right question, you need to know half the answer"
I find your last post very helpful. Thanks again.

I plan to return the LSI 9300 to the server, check the firmware and test the server.
 

volff

Dabbler
Joined
Jul 21, 2022
Messages
10
Hello, I installed the LSI 9300 controller. The firmware is correct 16.00.12.00
1670350873862.jpeg

Installed TrueNAS CORE 13
Created a pool of 8 SSD drives.

Turned off the cache.
1670350967566.jpeg

Ran a sequential read test.
1670350983854.jpeg


Ran zpool iostat -vly .
I read the numbers and they don't beat. Please explain or provide material to explain:
Why is bandwidth so low with IOPs equal to IOPs fio.
How are these numbers calculated?
10,5k(iops)*128k(bs)=1 344Mib
zpool iostat shows 209Mib

1670351512067.png

1670351386273.jpeg
 

Attachments

  • 1670351312864.jpeg
    1670351312864.jpeg
    130.6 KB · Views: 120
Last edited:

kae8

Cadet
Joined
May 9, 2023
Messages
2
Hi!

Were you able to solve the problem? I have a similar situation with reading through iscsi on core.
 
Top