Hello all,
I have a strange performance "problem" that concerns iperf3 only. Actual file transfer speeds are not affected, so I'm mostly just wondering what may be causing this.
When testing the network performance using iperf3, running it as the server on my desktop, and connecting to it from TrueNAS, I get 6.5GBit on a single network thread. When using two threads (-P 2), I pretty much max out my 10GbE line:
However when I go in the opposite direction, running TrueNAS as the server and my desktop as the client, I get abysmal results:
If I increase the amount of threads, I can improve performance:
... however, I need to run this with up to 32 threads to cap the 10GbE line. (the log is pretty large for that so I didn't put it in this post)
One would think the issue is that my desktop PC is severly more powerful, but CPU usage is somewhere around 1% on TrueNAS when running the test with one network thread, and around ~30% when running with 32 network threads. So the CPU is not an issue.
The odd thing is that this only affects this specific test with iperf3. When I'm running file transfers, I can hit up to 750 mbyte/sec in benchmarks, roughly 6.1Gbit of transfer as shown on the task manager. The network interface stats on the TrueNAS side also confirm this speed. When doing actual file transfers, I can still hit ~500mbyte/sec when copying from my z1 pool to the desktop NVME, which is very near the limit of what the physical drives can do, plus the iSCSI overhead.
So basically everything functions as intended and there is no real issue regarding the server usage. I can hit 10GbE caps perfectly fine.
But I'm still curious as to why is the single network thread performance so low in iperf3, when being run as the server on the TrueNAS side?
Server is TrueNAS-12.0-U8.1 listening on 192.168.1.137.
Hardware:
Asrock b450m Pro4 with a Ryzen 2200GE
16GB DDR4 3200MHz
Intel X540-T2
4x WD Red Plus 4TB in a raidz1 pool, with 3 zvols, record size and block size are set to 64k on everything
Every zvol is shared via iSCSI to my desktop, where they are formatted with NTFS and 64k sector size. Setting everything to 64k vastly improves performance, but that's a different story.
The server was previously on a GA-Z77X-DS3H with an Intel 3570k and 16GB DDR3 (everything else the same), behavior was the exact same with that. Of course, the two CPUs are roughly the same in performance.
Client is Windows 10 on 192.168.1.152
Asrock B550M Steel Legend with Ryzen 5600G
16GB DDR4 3600MHz
Intel X550-T2
Samsung 970 NVME and Samsung 860 SATA SSDs
The two are connected via the intel 10GbE network cards directly with a Cat6A S/FTP cable. Both network cards are configured with interrupt moderation disabled, 9000 MTU/Jumbo frames, etc, as necessary to get the best performance out of them.
I have a strange performance "problem" that concerns iperf3 only. Actual file transfer speeds are not affected, so I'm mostly just wondering what may be causing this.
When testing the network performance using iperf3, running it as the server on my desktop, and connecting to it from TrueNAS, I get 6.5GBit on a single network thread. When using two threads (-P 2), I pretty much max out my 10GbE line:
Code:
----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.1.137, port 20051 [ 5] local 192.168.1.152 port 5201 connected to 192.168.1.137 port 53204 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 657 MBytes 5.52 Gbits/sec [ 5] 1.00-2.00 sec 749 MBytes 6.29 Gbits/sec [ 5] 2.00-3.00 sec 778 MBytes 6.53 Gbits/sec [ 5] 3.00-4.00 sec 768 MBytes 6.44 Gbits/sec [ 5] 4.00-5.00 sec 778 MBytes 6.53 Gbits/sec [ 5] 5.00-6.00 sec 776 MBytes 6.51 Gbits/sec [ 5] 6.00-7.00 sec 780 MBytes 6.55 Gbits/sec [ 5] 7.00-8.00 sec 781 MBytes 6.55 Gbits/sec [ 5] 8.00-9.00 sec 781 MBytes 6.55 Gbits/sec [ 5] 9.00-10.00 sec 780 MBytes 6.55 Gbits/sec [ 5] 10.00-10.13 sec 98.5 MBytes 6.46 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender [ 5] 0.00-10.13 sec 7.55 GBytes 6.40 Gbits/sec receiver ----------------------------------------------------------- Server listening on 5201 ----------------------------------------------------------- Accepted connection from 192.168.1.137, port 10707 [ 5] local 192.168.1.152 port 5201 connected to 192.168.1.137 port 37980 [ 7] local 192.168.1.152 port 5201 connected to 192.168.1.137 port 64771 [ ID] Interval Transfer Bandwidth [ 5] 0.00-1.00 sec 504 MBytes 4.22 Gbits/sec [ 7] 0.00-1.00 sec 502 MBytes 4.21 Gbits/sec [SUM] 0.00-1.00 sec 1005 MBytes 8.43 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 1.00-2.00 sec 578 MBytes 4.85 Gbits/sec [ 7] 1.00-2.00 sec 578 MBytes 4.85 Gbits/sec [SUM] 1.00-2.00 sec 1.13 GBytes 9.69 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 2.00-3.00 sec 579 MBytes 4.85 Gbits/sec [ 7] 2.00-3.00 sec 579 MBytes 4.85 Gbits/sec [SUM] 2.00-3.00 sec 1.13 GBytes 9.71 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 3.00-4.00 sec 581 MBytes 4.88 Gbits/sec [ 7] 3.00-4.00 sec 580 MBytes 4.87 Gbits/sec [SUM] 3.00-4.00 sec 1.13 GBytes 9.74 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 4.00-5.00 sec 578 MBytes 4.85 Gbits/sec [ 7] 4.00-5.00 sec 577 MBytes 4.84 Gbits/sec [SUM] 4.00-5.00 sec 1.13 GBytes 9.69 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 5.00-6.00 sec 569 MBytes 4.77 Gbits/sec [ 7] 5.00-6.00 sec 569 MBytes 4.77 Gbits/sec [SUM] 5.00-6.00 sec 1.11 GBytes 9.54 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 6.00-7.00 sec 580 MBytes 4.86 Gbits/sec [ 7] 6.00-7.00 sec 581 MBytes 4.87 Gbits/sec [SUM] 6.00-7.00 sec 1.13 GBytes 9.74 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 7.00-8.00 sec 577 MBytes 4.84 Gbits/sec [ 7] 7.00-8.00 sec 577 MBytes 4.84 Gbits/sec [SUM] 7.00-8.00 sec 1.13 GBytes 9.69 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 8.00-9.00 sec 572 MBytes 4.80 Gbits/sec [ 7] 8.00-9.00 sec 572 MBytes 4.80 Gbits/sec [SUM] 8.00-9.00 sec 1.12 GBytes 9.60 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 9.00-10.00 sec 573 MBytes 4.81 Gbits/sec [ 7] 9.00-10.00 sec 574 MBytes 4.82 Gbits/sec [SUM] 9.00-10.00 sec 1.12 GBytes 9.63 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 5] 10.00-10.13 sec 73.5 MBytes 4.78 Gbits/sec [ 7] 10.00-10.13 sec 73.6 MBytes 4.79 Gbits/sec [SUM] 10.00-10.13 sec 147 MBytes 9.57 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 5] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender [ 5] 0.00-10.13 sec 5.63 GBytes 4.77 Gbits/sec receiver [ 7] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender [ 7] 0.00-10.13 sec 5.63 GBytes 4.77 Gbits/sec receiver [SUM] 0.00-10.13 sec 0.00 Bytes 0.00 bits/sec sender [SUM] 0.00-10.13 sec 11.3 GBytes 9.55 Gbits/sec receiver
However when I go in the opposite direction, running TrueNAS as the server and my desktop as the client, I get abysmal results:
Code:
Connecting to host 192.168.1.137, port 5201 [ 4] local 192.168.1.152 port 49284 connected to 192.168.1.137 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.01 sec 33.5 MBytes 279 Mbits/sec [ 4] 1.01-2.00 sec 35.4 MBytes 299 Mbits/sec [ 4] 2.00-3.00 sec 33.1 MBytes 277 Mbits/sec [ 4] 3.00-4.01 sec 40.1 MBytes 336 Mbits/sec [ 4] 4.01-5.00 sec 30.9 MBytes 261 Mbits/sec [ 4] 5.00-6.01 sec 31.6 MBytes 262 Mbits/sec [ 4] 6.01-7.01 sec 31.5 MBytes 264 Mbits/sec [ 4] 7.01-8.01 sec 30.0 MBytes 252 Mbits/sec [ 4] 8.01-9.01 sec 31.4 MBytes 262 Mbits/sec [ 4] 9.01-10.01 sec 28.0 MBytes 236 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 4] 0.00-10.01 sec 326 MBytes 273 Mbits/sec sender [ 4] 0.00-10.01 sec 325 MBytes 273 Mbits/sec receiver iperf Done.
If I increase the amount of threads, I can improve performance:
Code:
Connecting to host 192.168.1.137, port 5201 [ 4] local 192.168.1.152 port 49358 connected to 192.168.1.137 port 5201 [ 6] local 192.168.1.152 port 49359 connected to 192.168.1.137 port 5201 [ 8] local 192.168.1.152 port 49360 connected to 192.168.1.137 port 5201 [ 10] local 192.168.1.152 port 49361 connected to 192.168.1.137 port 5201 [ 12] local 192.168.1.152 port 49362 connected to 192.168.1.137 port 5201 [ 14] local 192.168.1.152 port 49363 connected to 192.168.1.137 port 5201 [ 16] local 192.168.1.152 port 49364 connected to 192.168.1.137 port 5201 [ 18] local 192.168.1.152 port 49365 connected to 192.168.1.137 port 5201 [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.01 sec 43.1 MBytes 358 Mbits/sec [ 6] 0.00-1.01 sec 51.1 MBytes 425 Mbits/sec [ 8] 0.00-1.01 sec 51.1 MBytes 425 Mbits/sec [ 10] 0.00-1.01 sec 48.1 MBytes 400 Mbits/sec [ 12] 0.00-1.01 sec 59.6 MBytes 496 Mbits/sec [ 14] 0.00-1.01 sec 73.2 MBytes 609 Mbits/sec [ 16] 0.00-1.01 sec 60.5 MBytes 503 Mbits/sec [ 18] 0.00-1.01 sec 66.2 MBytes 551 Mbits/sec [SUM] 0.00-1.01 sec 453 MBytes 3.77 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 1.01-2.00 sec 58.6 MBytes 496 Mbits/sec [ 6] 1.01-2.00 sec 57.4 MBytes 486 Mbits/sec [ 8] 1.01-2.00 sec 59.4 MBytes 503 Mbits/sec [ 10] 1.01-2.00 sec 73.5 MBytes 622 Mbits/sec [ 12] 1.01-2.00 sec 69.8 MBytes 591 Mbits/sec [ 14] 1.01-2.00 sec 62.8 MBytes 531 Mbits/sec [ 16] 1.01-2.00 sec 59.6 MBytes 505 Mbits/sec [ 18] 1.01-2.00 sec 75.5 MBytes 639 Mbits/sec [SUM] 1.01-2.00 sec 516 MBytes 4.37 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 2.00-3.00 sec 63.6 MBytes 534 Mbits/sec [ 6] 2.00-3.00 sec 56.1 MBytes 471 Mbits/sec [ 8] 2.00-3.00 sec 52.0 MBytes 436 Mbits/sec [ 10] 2.00-3.00 sec 68.4 MBytes 574 Mbits/sec [ 12] 2.00-3.00 sec 83.5 MBytes 701 Mbits/sec [ 14] 2.00-3.00 sec 84.6 MBytes 710 Mbits/sec [ 16] 2.00-3.00 sec 61.6 MBytes 517 Mbits/sec [ 18] 2.00-3.00 sec 59.4 MBytes 498 Mbits/sec [SUM] 2.00-3.00 sec 529 MBytes 4.44 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec [ 6] 3.00-4.00 sec 60.5 MBytes 507 Mbits/sec [ 8] 3.00-4.00 sec 66.6 MBytes 559 Mbits/sec [ 10] 3.00-4.00 sec 62.9 MBytes 527 Mbits/sec [ 12] 3.00-4.00 sec 81.6 MBytes 685 Mbits/sec [ 14] 3.00-4.00 sec 68.9 MBytes 578 Mbits/sec [ 16] 3.00-4.00 sec 65.1 MBytes 546 Mbits/sec [ 18] 3.00-4.00 sec 51.0 MBytes 428 Mbits/sec [SUM] 3.00-4.00 sec 502 MBytes 4.21 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 4.00-5.01 sec 61.0 MBytes 509 Mbits/sec [ 6] 4.00-5.01 sec 58.2 MBytes 486 Mbits/sec [ 8] 4.00-5.01 sec 77.5 MBytes 646 Mbits/sec [ 10] 4.00-5.01 sec 53.5 MBytes 446 Mbits/sec [ 12] 4.00-5.01 sec 80.6 MBytes 672 Mbits/sec [ 14] 4.00-5.01 sec 76.0 MBytes 634 Mbits/sec [ 16] 4.00-5.01 sec 70.5 MBytes 588 Mbits/sec [ 18] 4.00-5.01 sec 71.2 MBytes 594 Mbits/sec [SUM] 4.00-5.01 sec 549 MBytes 4.57 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 5.01-6.00 sec 53.1 MBytes 448 Mbits/sec [ 6] 5.01-6.00 sec 65.5 MBytes 553 Mbits/sec [ 8] 5.01-6.00 sec 66.6 MBytes 562 Mbits/sec [ 10] 5.01-6.00 sec 69.6 MBytes 588 Mbits/sec [ 12] 5.01-6.00 sec 67.5 MBytes 570 Mbits/sec [ 14] 5.01-6.00 sec 71.8 MBytes 606 Mbits/sec [ 16] 5.01-6.00 sec 77.2 MBytes 652 Mbits/sec [ 18] 5.01-6.00 sec 72.6 MBytes 613 Mbits/sec [SUM] 5.01-6.00 sec 544 MBytes 4.59 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 6.00-7.00 sec 48.6 MBytes 408 Mbits/sec [ 6] 6.00-7.00 sec 62.5 MBytes 524 Mbits/sec [ 8] 6.00-7.00 sec 84.9 MBytes 712 Mbits/sec [ 10] 6.00-7.00 sec 67.4 MBytes 565 Mbits/sec [ 12] 6.00-7.00 sec 87.9 MBytes 737 Mbits/sec [ 14] 6.00-7.00 sec 72.8 MBytes 610 Mbits/sec [ 16] 6.00-7.00 sec 67.1 MBytes 563 Mbits/sec [ 18] 6.00-7.00 sec 67.5 MBytes 566 Mbits/sec [SUM] 6.00-7.00 sec 559 MBytes 4.69 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 7.00-8.00 sec 42.2 MBytes 353 Mbits/sec [ 6] 7.00-8.00 sec 44.8 MBytes 374 Mbits/sec [ 8] 7.00-8.00 sec 44.9 MBytes 375 Mbits/sec [ 10] 7.00-8.00 sec 51.0 MBytes 427 Mbits/sec [ 12] 7.00-8.00 sec 61.1 MBytes 511 Mbits/sec [ 14] 7.00-8.00 sec 51.5 MBytes 431 Mbits/sec [ 16] 7.00-8.00 sec 44.9 MBytes 375 Mbits/sec [ 18] 7.00-8.00 sec 48.5 MBytes 406 Mbits/sec [SUM] 7.00-8.00 sec 389 MBytes 3.25 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 8.00-9.00 sec 46.9 MBytes 394 Mbits/sec [ 6] 8.00-9.00 sec 59.8 MBytes 502 Mbits/sec [ 8] 8.00-9.00 sec 50.4 MBytes 423 Mbits/sec [ 10] 8.00-9.00 sec 50.5 MBytes 424 Mbits/sec [ 12] 8.00-9.00 sec 56.5 MBytes 475 Mbits/sec [ 14] 8.00-9.00 sec 68.6 MBytes 577 Mbits/sec [ 16] 8.00-9.00 sec 69.1 MBytes 581 Mbits/sec [ 18] 8.00-9.00 sec 68.9 MBytes 579 Mbits/sec [SUM] 8.00-9.00 sec 471 MBytes 3.95 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ 4] 9.00-10.01 sec 54.1 MBytes 451 Mbits/sec [ 6] 9.00-10.01 sec 51.9 MBytes 432 Mbits/sec [ 8] 9.00-10.01 sec 51.8 MBytes 431 Mbits/sec [ 10] 9.00-10.01 sec 63.4 MBytes 528 Mbits/sec [ 12] 9.00-10.01 sec 73.0 MBytes 609 Mbits/sec [ 14] 9.00-10.01 sec 74.8 MBytes 623 Mbits/sec [ 16] 9.00-10.01 sec 67.9 MBytes 566 Mbits/sec [ 18] 9.00-10.01 sec 60.6 MBytes 505 Mbits/sec [SUM] 9.00-10.01 sec 497 MBytes 4.15 Gbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth [ 4] 0.00-10.01 sec 516 MBytes 433 Mbits/sec sender [ 4] 0.00-10.01 sec 516 MBytes 433 Mbits/sec receiver [ 6] 0.00-10.01 sec 568 MBytes 476 Mbits/sec sender [ 6] 0.00-10.01 sec 568 MBytes 476 Mbits/sec receiver [ 8] 0.00-10.01 sec 605 MBytes 507 Mbits/sec sender [ 8] 0.00-10.01 sec 605 MBytes 507 Mbits/sec receiver [ 10] 0.00-10.01 sec 608 MBytes 510 Mbits/sec sender [ 10] 0.00-10.01 sec 608 MBytes 510 Mbits/sec receiver [ 12] 0.00-10.01 sec 721 MBytes 604 Mbits/sec sender [ 12] 0.00-10.01 sec 721 MBytes 604 Mbits/sec receiver [ 14] 0.00-10.01 sec 705 MBytes 591 Mbits/sec sender [ 14] 0.00-10.01 sec 705 MBytes 591 Mbits/sec receiver [ 16] 0.00-10.01 sec 644 MBytes 540 Mbits/sec sender [ 16] 0.00-10.01 sec 643 MBytes 539 Mbits/sec receiver [ 18] 0.00-10.01 sec 642 MBytes 538 Mbits/sec sender [ 18] 0.00-10.01 sec 641 MBytes 538 Mbits/sec receiver [SUM] 0.00-10.01 sec 4.89 GBytes 4.20 Gbits/sec sender [SUM] 0.00-10.01 sec 4.89 GBytes 4.20 Gbits/sec receiver iperf Done.
... however, I need to run this with up to 32 threads to cap the 10GbE line. (the log is pretty large for that so I didn't put it in this post)
One would think the issue is that my desktop PC is severly more powerful, but CPU usage is somewhere around 1% on TrueNAS when running the test with one network thread, and around ~30% when running with 32 network threads. So the CPU is not an issue.
The odd thing is that this only affects this specific test with iperf3. When I'm running file transfers, I can hit up to 750 mbyte/sec in benchmarks, roughly 6.1Gbit of transfer as shown on the task manager. The network interface stats on the TrueNAS side also confirm this speed. When doing actual file transfers, I can still hit ~500mbyte/sec when copying from my z1 pool to the desktop NVME, which is very near the limit of what the physical drives can do, plus the iSCSI overhead.
So basically everything functions as intended and there is no real issue regarding the server usage. I can hit 10GbE caps perfectly fine.
But I'm still curious as to why is the single network thread performance so low in iperf3, when being run as the server on the TrueNAS side?
Server is TrueNAS-12.0-U8.1 listening on 192.168.1.137.
Hardware:
Asrock b450m Pro4 with a Ryzen 2200GE
16GB DDR4 3200MHz
Intel X540-T2
4x WD Red Plus 4TB in a raidz1 pool, with 3 zvols, record size and block size are set to 64k on everything
Every zvol is shared via iSCSI to my desktop, where they are formatted with NTFS and 64k sector size. Setting everything to 64k vastly improves performance, but that's a different story.
The server was previously on a GA-Z77X-DS3H with an Intel 3570k and 16GB DDR3 (everything else the same), behavior was the exact same with that. Of course, the two CPUs are roughly the same in performance.
Client is Windows 10 on 192.168.1.152
Asrock B550M Steel Legend with Ryzen 5600G
16GB DDR4 3600MHz
Intel X550-T2
Samsung 970 NVME and Samsung 860 SATA SSDs
The two are connected via the intel 10GbE network cards directly with a Cat6A S/FTP cable. Both network cards are configured with interrupt moderation disabled, 9000 MTU/Jumbo frames, etc, as necessary to get the best performance out of them.