10Gb setup (Direct Attach) speed between 1.88 Gbits/sec and 2.5 Gbits/sec

Status
Not open for further replies.

Pliqui

Dabbler
Joined
Apr 24, 2018
Messages
25
I recently finished my new build and I had some troubles with a Qlogci QLE8152 NICs, ESXi 6.5 recognized them but the NICs where not able to provide a link. As a replacement I ordered a NIC based on intel chipset 82599ES model X520-DA2 (10Gtek Brand) and got recognized intermediately by ESXi, but since Wednesday 06/20 I'm not being able to figure it out why I'm getting slow speeds.

VMware:

CPU:
Intel(R) Xeon(R) CPU W3550 @ 3.07GHz
Mobo: Lenovo Workstation S20
RAM: 24Gb non-ECC (maxed out)
Raid Controller: IBM ServeRaid M5015 (Cross-flashed to LSI 9260-8i)
HDD1: 1 x Samsumg 850 EVO 250gb SSD (Boot device for VMs and boot disks for some vms)
HDD2: 4 x HGST DeskStar NAS 3.5" 4TB 7200 RPM 128MB in Raid5 (1 Datastore)
NIC: 1 x 1gb - Management network
10GB NIC: 10Gtek X520-DA2 (vmnic1 and vmnic2)

9V99cbX.png


FreeNAS:

CPU: Intel Xeon E3-1230 V6 Kaby Lake 3.5 GHz
Mobo: SUPERMICRO MBD-X11SSL-CF Micro ATXa
RAM: 4 x Crucial CT16G4WFD824A 16Gb Ddr4 Ecc Unbuff Cl17 (64Gb maxed out)
Boot: 1 x 128gb SSD
HDD1: 4 x HGST Deskstar NAS 3.5" 8TB 7200 RPM 128MB stripped-mirror
SSD1: 1 x Intel Solid State Drive DC S3110 128GB for ZLOG
SSD2: 4 x 128 SSD Stripe (for testing purporse)
10GB NIC1: NetApp Chelsio Dual Port SFP+ 10GbE PCIe 111-00603+A0 CC2-S320E-SR 100-1082-00
Code:
[root@freenas ~]# ifconfig cxgb0
cxgb0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
		options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
		ether 00:07:43:08:42:f4
		hwaddr 00:07:43:08:42:f4
		inet 10.0.0.2 netmask 0xffffff00 broadcast 10.0.0.255
		nd6 options=9<PERFORMNUD,IFDISABLED>
		media: Ethernet 10Gbase-Twinax <full-duplex>
		status: active


Code:
[root@freenas ~]# ifconfig cxgb1
cxgb1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
		options=6c07bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,LRO,VLAN_HWTSO,LINKSTATE,RXCSUM_IPV6,TXCSUM_IPV6>
		ether 00:07:43:08:42:f5
		hwaddr 00:07:43:08:42:f5
		inet 10.0.1.2 netmask 0xffffff00 broadcast 10.0.1.255
		nd6 options=9<PERFORMNUD,IFDISABLED>
		media: Ethernet 10Gbase-Twinax <full-duplex>
		status: active


The cables used are 2 x 10Gtek for Cisco SFP-H10GB-CU1M,10GBASE-CU Direct Attach Copper Cable, Twinax Cable, Passive, 1-Meter

Zpool List

Code:
[root@freenas ~]# zpool list -v
NAME									 SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
Test_SSD								 468G   223G   245G		 -	 2%	47%  1.00x  ONLINE  /mnt
  gptid/a850b6bf-77bf-11e8-b7b0-ac1f6b83f450   117G  55.3G  61.7G		 -	 2%	47%
  gptid/a88ea704-77bf-11e8-b7b0-ac1f6b83f450   117G  55.1G  61.9G		 -	 3%	47%
  gptid/a8d40f00-77bf-11e8-b7b0-ac1f6b83f450   117G  47.7G  69.3G		 -	 3%	40%
  gptid/a9496891-77bf-11e8-b7b0-ac1f6b83f450   117G  64.8G  52.2G		 -	 3%	55%
VMWARE								  14.5T  10.3T  4.25T		 -	 5%	70%  1.00x  ONLINE  /mnt
  mirror								7.25T  5.11T  2.14T		 -	 6%	70%
	gptid/e2a521ff-7687-11e8-99bb-ac1f6b83f450	  -	  -	  -		 -	  -	  -
	gptid/e3224228-7687-11e8-99bb-ac1f6b83f450	  -	  -	  -		 -	  -	  -
  mirror								7.25T  5.14T  2.11T		 -	 5%	70%
	gptid/e3a8244e-7687-11e8-99bb-ac1f6b83f450	  -	  -	  -		 -	  -	  -
	gptid/e41f26ad-7687-11e8-99bb-ac1f6b83f450	  -	  -	  -		 -	  -	  -
log										 -	  -	  -		 -	  -	  -
  gptid/e459c976-7687-11e8-99bb-ac1f6b83f450   119G   384K   119G		 -	 0%	 0%
freenas-boot							 118G  1.11G   117G		 -	  -	 0%  1.00x  ONLINE  -
  ada1p2								 118G  1.11G   117G		 -	  -	 0%


When I got the build, the first thing I did was to performe some IO tests using sync standard and always on the VMWARE zpool (created 2 zvol)

This was run inside a Debian 9 VM.

Sync = Standard
Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

fio-2.16
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [154.8MB/53280KB/0KB /s] [39.7K/13.4K/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1267: Wed Jun 20 21:13:30 2018
  read : io=3070.4MB, bw=80247KB/s, iops=20061, runt= 39179msec
  write: io=1025.8MB, bw=26808KB/s, iops=6702, runt= 39179msec
  cpu		  : usr=10.66%, sys=32.24%, ctx=22131, majf=0, minf=9
  IO depths	: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
	 submit	: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
	 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
	 issued	: total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
	 latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3070.4MB, aggrb=80246KB/s, minb=80246KB/s, maxb=80246KB/s, mint=39179msec, maxt=39179msec
  WRITE: io=1025.8MB, aggrb=26808KB/s, minb=26808KB/s, maxb=26808KB/s, mint=39179msec, maxt=39179msec

Disk stats (read/write):
  sdb: ios=785006/262268, merge=0/61, ticks=1190340/889084, in_queue=2079088, util=99.98%


Sync = Always
Code:
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=test --bs=4k --iodepth=64 --size=4G --readwrite=randrw --rwmixread=75

fio-2.16
Starting 1 process
test: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [m(1)] [100.0% done] [47356KB/16180KB/0KB /s] [11.9K/4045/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1538: Wed Jun 20 21:25:05 2018
  read : io=3070.4MB, bw=79202KB/s, iops=19800, runt= 39696msec
  write: io=1025.8MB, bw=26459KB/s, iops=6614, runt= 39696msec
  cpu		  : usr=10.61%, sys=38.78%, ctx=19718, majf=0, minf=9
  IO depths	: 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
	 submit	: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
	 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
	 issued	: total=r=785996/w=262580/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
	 latency   : target=0, window=0, percentile=100.00%, depth=64

Run status group 0 (all jobs):
   READ: io=3070.4MB, aggrb=79201KB/s, minb=79201KB/s, maxb=79201KB/s, mint=39696msec, maxt=39696msec
  WRITE: io=1025.8MB, aggrb=26459KB/s, minb=26459KB/s, maxb=26459KB/s, mint=39696msec, maxt=39696msec

Disk stats (read/write):
  sdb: ios=782880/261557, merge=1/9, ticks=1080076/835584, in_queue=1915504, util=99.97%


When I transferred one virtual machine to the new iSCSI datastore it was slow, so the path into the rabbit hole started.

The network is configured as follow: at this point everything is setup vanilla.
Code:
cxgb0	iSCSI-0	Active	10.0.0.2 --> iSCSI vDS --> iSCSI-PG1 --> vmkernel1 (10.0.0.3) --> vmnic1

cxgb1	iSCSI-0	Active	10.0.1.2 --> iSCSI vDS --> iSCSI-PG2 --> vmkernel2 (10.0.1.3) --> vmnic2


Running iperf from ESXi to Freenas

Code:
[root@esxi01:/usr/lib/vmware/vsan/bin] ./iperf3.copy -c 10.0.1.2
Connecting to host 10.0.1.2, port 5201
[  4] local 10.0.1.3 port 11121 connected to 10.0.1.2 port 5201
iperf3: getsockopt - Function not implemented
[ ID] Interval		   Transfer	 Bandwidth	   Retr  Cwnd
[  4]   0.00-1.00   sec   410 MBytes  3.44 Gbits/sec  8626536   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   1.00-2.00   sec   152 MBytes  1.27 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   2.00-3.00   sec   276 MBytes  2.32 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   3.00-4.00   sec   187 MBytes  1.57 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   4.00-5.00   sec   232 MBytes  1.95 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   5.00-6.00   sec   229 MBytes  1.92 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   6.00-7.00   sec   183 MBytes  1.54 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   7.00-8.00   sec   175 MBytes  1.47 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   8.00-9.00   sec   219 MBytes  1.83 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   9.00-10.00  sec   182 MBytes  1.53 Gbits/sec  4286340760   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval		   Transfer	 Bandwidth	   Retr
[  4]   0.00-10.00  sec  2.19 GBytes  1.88 Gbits/sec	0			 sender
[  4]   0.00-10.00  sec  2.19 GBytes  1.88 Gbits/sec				  receiver


Right now, since I got the host under maintenance mode and no load I got this
Code:
[root@esxi01:/usr/lib/vmware/vsan/bin] ./iperf3 -c 10.0.1.2
Connecting to host 10.0.1.2, port 5201
[  4] local 10.0.1.3 port 11095 connected to 10.0.1.2 port 5201
iperf3: getsockopt - Function not implemented
[ ID] Interval		   Transfer	 Bandwidth	   Retr  Cwnd
[  4]   0.00-1.00   sec   725 MBytes  6.09 Gbits/sec  8626536   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   1.00-2.00   sec   726 MBytes  6.09 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   2.00-3.00   sec   725 MBytes  6.08 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   3.00-4.00   sec   728 MBytes  6.11 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   4.00-5.00   sec   732 MBytes  6.14 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   5.00-6.00   sec   726 MBytes  6.09 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   6.00-7.00   sec   730 MBytes  6.12 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   7.00-8.00   sec   732 MBytes  6.14 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   8.00-9.00   sec   731 MBytes  6.13 Gbits/sec	0   0.00 Bytes
iperf3: getsockopt - Function not implemented
[  4]   9.00-10.00  sec   724 MBytes  6.07 Gbits/sec  4286340760   0.00 Bytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval		   Transfer	 Bandwidth	   Retr
[  4]   0.00-10.00  sec  7.11 GBytes  6.11 Gbits/sec	0			 sender
[  4]   0.00-10.00  sec  7.11 GBytes  6.11 Gbits/sec				  receiver


But the second I fire up a vm, doesn't matter the size I got between 1.8 Gbits/sec to ~2.5 Gbits/sec

The max transfer is around 300-400.

  • From the RAID5 to a Datastore created on top of the 4 x 128gb SSD and sync=disabled

9tTfAhM.png

  • From the RADI5 to a Dastore on top of the 4 x 8TB Stripe-mirror and sync=standard
BzdFwKf.png


Transfer a 2 TB VM took 7h:22min and a 8TB took 20H 45 Min. But moving back the 2TB VM took 2h:58 mins

And when I transfered 9 VMs to 3 different DS (at this point all were sync=always), 1 for general VMs shared, 1 exclusive for Veeam and 1 for the file server. Powered them all and ran a Veeam backup job... the whole ESXi became unresponsive and I got these errors.
Code:
2018-06-24T03:14:03.310Z cpu7:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4571: vmnic1 : scheduler(0x4304780af120)/device(0x430250c61080) 1/0 lock up [stopped=0]:
2018-06-24T03:14:03.310Z cpu7:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4578: detected at 97938998 while last xmit at 97933998 and 36524 bytes in flight [window 86460 bytes]
2018-06-24T03:14:03.310Z cpu7:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4583: and last enqueued/dequeued at 97933997/97933997 [stress 0]
2018-06-24T03:14:03.310Z cpu7:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4586: with 478 pkts inflight
2018-06-24T03:14:03.488Z cpu0:65934)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:03.488Z cpu0:65934)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:04.490Z cpu2:65930)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:04.490Z cpu2:65930)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:05.491Z cpu0:65940)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:05.491Z cpu0:65940)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:06.493Z cpu0:65943)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:06.493Z cpu0:65943)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:07.495Z cpu2:65934)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:07.495Z cpu2:65934)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:08.497Z cpu2:65930)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:08.497Z cpu2:65930)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:09.310Z cpu6:106438)VSCSI: 2891: handle 8230(vscsi1:1):Reset [Retries: 40/0] from (vmm0:obiwan.abocor.com)
2018-06-24T03:14:09.310Z cpu0:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4571: vmnic1 : scheduler(0x4304780af120)/device(0x430250c61080) 1/1 lock up [stopped=0]:
2018-06-24T03:14:09.310Z cpu0:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4578: detected at 97944998 while last xmit at 97938998 and 36524 bytes in flight [window 86460 bytes]
2018-06-24T03:14:09.310Z cpu0:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4583: and last enqueued/dequeued at 97938998/97938998 [stress 0]
2018-06-24T03:14:09.310Z cpu0:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4586: with 478 pkts inflight
2018-06-24T03:14:09.310Z cpu0:76124)WARNING: netschedHClk: NetSchedHClkWatchdogSysWorld:4602: vmnic1: packets completion seems stuck, issuing reset
2018-06-24T03:14:09.310Z cpu0:65596)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:09.499Z cpu0:65940)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:09.499Z cpu0:65940)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:10.109Z cpu7:103063)HBX: 2958: 'FN_TMP01': HB at offset 3473408 - Waiting for timed out HB:
2018-06-24T03:14:10.109Z cpu7:103063)  [HB state abcdef02 offset 3473408 gen 19 stampUS 96653456148 uuid 5b2d8e09-3d404882-af51-001b21bdc930 jrnl <FB 7> drv 24.81 lockImpl 4 ip 192.168.0.10]
2018-06-24T03:14:10.500Z cpu2:65943)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:10.500Z cpu2:65943)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:10.704Z cpu0:66306)HBX: 2958: 'FN_TMP01': HB at offset 3473408 - Waiting for timed out HB:
2018-06-24T03:14:10.704Z cpu0:66306)  [HB state abcdef02 offset 3473408 gen 19 stampUS 96653456148 uuid 5b2d8e09-3d404882-af51-001b21bdc930 jrnl <FB 7> drv 24.81 lockImpl 4 ip 192.168.0.10]
2018-06-24T03:14:11.120Z cpu0:67666)HBX: 2958: 'FN_TMP01': HB at offset 3473408 - Waiting for timed out HB:
2018-06-24T03:14:11.120Z cpu0:67666)  [HB state abcdef02 offset 3473408 gen 19 stampUS 96653456148 uuid 5b2d8e09-3d404882-af51-001b21bdc930 jrnl <FB 7> drv 24.81 lockImpl 4 ip 192.168.0.10]
2018-06-24T03:14:11.502Z cpu6:65934)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:11.502Z cpu6:65934)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:12.503Z cpu2:65930)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:12.504Z cpu2:65930)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:13.506Z cpu0:65940)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:13.506Z cpu0:65940)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds
2018-06-24T03:14:14.506Z cpu3:65943)WARNING: LinNet: netdev_watchdog:3688: NETDEV WATCHDOG: vmnic1: transmit timed out
2018-06-24T03:14:14.506Z cpu3:65943)<6>ixgbe 0000:02:00.0: vmnic1: Fake Tx hang detected with timeout of 160 seconds


I understand that I only have 4 vdevs in stripe-mirror and I will not saturate 10GB, but that should be faster than my raid5 and the striped ssd. But not even iperf??

What I have tried:

  • I have tried tunables from the 10GB primer and several other post
  • Jumbo Frames (I got less performance when enable)
  • Disable ACK in ESXi
  • Update ESXi 6.5 to U2
  • All devices are setup to Round-Robin policy with 1 iops
  • Send email to 10Gtek support
  • VDS and standard switches
Links consulted:
And a ton more on this forum and out too.

  1. Is this might be a hardware issue? (NICs or cable)
  2. Is there any tuning that I'm missing? When tried tunables it got worse
  3. Could the CPU on the ESXi box could be a bottleneck for the 10gb?
  4. Could be my choice of NICs (both the NetApp Chelsio and the 10Gtek) is wrong? perhaps another model?
  5. Does anyone have a similar setup in terms of 10GB connections that can shine some light?
Thanks,

EDIT 1: Change CMD tags to CODE tags
EDIT 2: Disk Model
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
HDD1: 4 x HGST 8TB stripped-mirror
I am not sure exactly what model, you didn't say, but I took a guess and looked up the data sheet for:
HUH7210xxALE60y / HUH7210xxALN60y at https://www.hgst.com/sites/default/files/resources/Ultrastar-He10-DS.pdf
According to that the drives might have a transfer rate of between 249 and 225 MB/s and you have two vdevs in your pool, so you should expect no more than 249 MB/s times the number of vdevs for throughput. That would be around 500 MB/s and it is a mechanical limitation of the disks. If you are seeing more than that, count yourself lucky. Now HGST lists that as a 'Sustained Transfer Rate', but those are probably peak speeds and every drive I have ever worked with has slowed down rapidly from the 'peak' that the manufacturer listed if they were ever able to hit that number at all. You should expect around 65% of that speed, based on what I have seen.

Jumbo Frames (I got less performance when enable)
I wish I could make that right there a sticky so everyone could see it. I have been saying (and others too) that the time of jumbo frames was 15 to 20 years ago and it is not applicable with modern hardware.

But the second I fire up a vm, doesn't matter the size I got between 1.8 Gbits/sec to ~2.5 Gbits/sec
This is probably a speed burst due to caching and not indicative of any sustained transfer rate.

I hope that this helps and that others can give you more insight into your configuration.
 
Last edited:

Pliqui

Dabbler
Joined
Apr 24, 2018
Messages
25
Thanks for you reply.

I agreed with you about the math for the HGST disks, I plan to expand in the future, but transferring a VM the local RAID5 to Freenas is slower than the reverse.

But the main issue right now I think is the 10GB connection. Iperf test should not be something like ~9.9 Gbits/sec ? (is a disk independent test)

I can't get that either with iperf2 (before upgrading the ESXi 6.5 to U2) nor with iperf3
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Can you measure the surface temps of the NIC in the ESXi host? Also monitor CPU usage during network testing.
Also I'm not an iperf3 user but if you can force multiple connections that may show some difference.

As for the iSCSI, what do you have set as the PSP?
 

Pliqui

Dabbler
Joined
Apr 24, 2018
Messages
25
Can you measure the surface temps of the NIC in the ESXi host? Also monitor CPU usage during network testing.
Also I'm not an iperf3 user but if you can force multiple connections that may show some difference.

As for the iSCSI, what do you have set as the PSP?
Hi KD, I do not have a measurement device at this moment, but the case is open at this moment from both sides

I was doing some iperf3 again while I'm transferring a vm. In ESXi 6.5 U2 the version of iperf got upgraded

The results are all across the board, but still matching the max I can get some times ~ 5.0 / 5.5 Gbits/seconds (from FreeNAS to ESXi is always that). From ESXi to FreeNAS fluctuates between ~1.8 to ~2.5 Gbits/seconds and at random times can get to ~5.5 Gbits/seconds.

jt9phU8.png


I've tried different iperf commands before with different permutations and the results where around the same.

For example: Pulling these from history
Code:
-P 2 -w 128k -fg
-P 1 -i 1 -t 10 -d
-d -w 512k -P 4 -t 90 -i 2
-m -i1 -fg
-P 2 -t 60
-P 4 -w 4096k
-P 8 -w 4096k


And CPU where OK

ESXi
ImK9LXu.png


FreeNAS:

Qhm7ypp.png


Thanks
 
Last edited:
Status
Not open for further replies.
Top