SOLVED Transmit slow with Chelsio T420-CR (BUG??)

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
I'm at a complete loss here. Recently installed a Chelsio T420-CR to boost NFS performance. Been noticing that the transmit speed degrades over time; dropping from 6-9Gbps to 60-400Mbps

Current setup:
FreeNAS 11.1U6 / 32GB RAM /Chelsio T420-CR
Jumbo Frames enabled / Direct connect to workstation (i.e. no switches)
TSO, LRO Enabled

WORKSTATION
Ubuntu 18.04 / 32GB RAM / Intel X520-SR
TSO, LRO Disabled due to known driver bug

Performance:
After a clean reboot of both FreeNAS and workstation, iperf results:
FN -> Workstation ~6Gbps
Workstation -> FN ~9.88Gbps

24 hours later, noticed dramatic pauses and slowdown with NFS
FN -> Workstation ~200-400Mbps
Workstation -> FN ~9.88Gbps

FreeNAS still receives packets at near line speed, but transmission speed dropped significantly. At first I thought maybe the problem was on workstation end, so I rebooted the workstation, but the transfer numbers didn't change.

Rebooting FreeNAS brought performance back to 6Gbps / 9.88Gbps. I've made various changes to rule out hardware issues. Disabling jumbo frames, disabling TSO and LRO on the Chelsio, changed TCP window sizes, changing tcp congestion algo from default newreno to htcp, etc... Nothing made a difference. I've tuned the network parameters based on the various 10GB tuning guides out there (including the ones posted in this forum)

fn-10gb-tuning.png


Anyone with experience using these Chelsio's? Any idea why FN performance would change so much in a matter of 24 hours?
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Something is seriously wrong with this 10 Gbps setup. Just ran iperf again as a client sending to my workstation. 13.1 Mbits/sec !!! Going other direction (FreeNAS as server) still getting near wire speed. Testing the same on the gigabit interface also gets wire speed in both directions. So it only appears to be the transmit portion of the Chelsio card.

iperf -c 192.168.1.10
------------------------------------------------------------
Client connecting to 192.168.1.10, TCP port 5001
TCP window size: 4.00 MByte (default)
------------------------------------------------------------
[ 3] local 192.168.1.1 port 24983 connected with 192.168.1.10 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.1 sec 15.8 MBytes 13.1 Mbits/sec

I tried swapping fibre cable as well to rule it out. Nothing fixes it except a reboot. I thought Chelsio was the recommended brand from ixSystems??!
 
Last edited:

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
No, still getting inconsistent speed (10mbps - 1.2Ggbs with average around 400mbps). Only time I ever see full speed on transmit is after a reboot. I'm getting a few new sfp among other things to rule out hardware, but being a production box, it's been slow going. Any experience with these Chelsios or 10gbe HBAs in general?
 
D

dlavigne

Guest
We've had a few reports on them, but nothing conclusive. If your tests don't work out, it's prob worth reporting at bugs.freenas.org for a dev to take a look at.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
I'm at a complete loss here. Recently installed a Chelsio T420-CR to boost NFS performance. Been noticing that the transmit speed degrades over time; dropping from 6-9Gbps to 60-400Mbps

Current setup:
FreeNAS 11.1U6 / 32GB RAM /Chelsio T420-CR
Jumbo Frames enabled / Direct connect to workstation (i.e. no switches)
TSO, LRO Enabled

WORKSTATION
Ubuntu 18.04 / 32GB RAM / Intel X520-SR
TSO, LRO Disabled due to known driver bug

Performance:
After a clean reboot of both FreeNAS and workstation, iperf results:
FN -> Workstation ~6Gbps
Workstation -> FN ~9.88Gbps

24 hours later, noticed dramatic pauses and slowdown with NFS
FN -> Workstation ~200-400Mbps
Workstation -> FN ~9.88Gbps

FreeNAS still receives packets at near line speed, but transmission speed dropped significantly. At first I thought maybe the problem was on workstation end, so I rebooted the workstation, but the transfer numbers didn't change.

Rebooting FreeNAS brought performance back to 6Gbps / 9.88Gbps. I've made various changes to rule out hardware issues. Disabling jumbo frames, disabling TSO and LRO on the Chelsio, changed TCP window sizes, changing tcp congestion algo from default newreno to htcp, etc... Nothing made a difference. I've tuned the network parameters based on the various 10GB tuning guides out there (including the ones posted in this forum)

View attachment 28058

Anyone with experience using these Chelsio's? Any idea why FN performance would change so much in a matter of 24 hours?
Did you "tune" these tunables to your system or just copy what looked okay? They aren't a bad start but... you're loading the htcp kernel module, which is great, but you aren't selecting it over newreno, which is pointless. 4M recvspace/sendspace is a good start but why use a 64K increment. Just bump the buffer to 4M from start. Also, 16M upper limit might need to go up a bit. I have the T520-CR in one of my systems so it's not apples to apples but take a look at some of these initial values and keep testing.

StartingTunables.JPG
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Did you "tune" these tunables to your system or just copy what looked okay? They aren't a bad start but... you're loading the htcp kernel module, which is great, but you aren't selecting it over newreno, which is pointless. 4M recvspace/sendspace is a good start but why use a 64K increment. Just bump the buffer to 4M from start. Also, 16M upper limit might need to go up a bit. I have the T520-CR in one of my systems so it's not apples to apples but take a look at some of these initial values and keep testing.
Thanks for the feedback, Mlovelace. I copied these tuneables as a starting point. As mentioned in my post, I did play with the values. 2M up to16M buffer, etc... I left default tcp algo as newreno, but loaded the htcp module so I can switch to it for testing. Neither one made a difference.

Agree on the 64K slow ramp up. The fibre cables are certainly reliable enough to just use 4M, but still doubt it'll make much difference. The numbers I'm seeing are WAYYY too far off. 400mbps on a 10gbe is 4% of max. Hard to point at buffer size for numbers that low; especially since I'm getting wire speed on my gigabit interfaces with the same settings. That's partly why I suspect it may be a driver issue. My gigabit NICs use igb while the Chelsios are using cxgbe. The only odd part is why rebooting FN fixes the problem temporarily.

Appreciate the suggestions though. It's always good to get feedback and ideas as I can often miss the obvious.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Thanks for the feedback, Mlovelace. I copied these tuneables as a starting point. As mentioned in my post, I did play with the values. 2M up to16M buffer, etc... I left default tcp algo as newreno, but loaded the htcp module so I can switch to it for testing. Neither one made a difference.

Agree on the 64K slow ramp up. The fibre cables are certainly reliable enough to just use 4M, but still doubt it'll make much difference. The numbers I'm seeing are WAYYY too far off. 400mbps on a 10gbe is 4% of max. Hard to point at buffer size for numbers that low; especially since I'm getting wire speed on my gigabit interfaces with the same settings. That's partly why I suspect it may be a driver issue. My gigabit NICs use igb while the Chelsios are using cxgbe. The only odd part is why rebooting FN fixes the problem temporarily.

Appreciate the suggestions though. It's always good to get feedback and ideas as I can often miss the obvious.
You could try using an older version of FreeNAS to test if it is in fact a driver issue. Maybe Install 9.10 or 11.0 and try your iperf tests again.
 

DaveY

Contributor
Joined
Dec 1, 2014
Messages
141
Thanks for all the feedback guys. Finally fixed the slow transmit speed. Putting update here in case someone runs into same issue as I did.

Turns out both SFP I purchased from an Ebay vendor were either lemons or the brand they said was compatible was not really compatible.

The broken SFP were branded as JDSU 10G SFP and the seller claims they are OEM parts for Chelsio SFPs. He also claims they were brand new. After installing the new Chelsio branded SFPs I purchased from another vendor, the transmit speed now runs at a solid 8Gbps. Still not the 9+Gbps I'm seeing on receive, but much better than the 200mbps of the JDSU ones. So at this point i'm not sure if JDSU SFPs are just not compatible with T420s or if the seller just sold me worn out used parts.

Either way, my suggestion is pay a few extra bucks and get ones that have the Chelsio sticker on them and save yourself hours of troubleshooting.

Still can't explain how rebooting FN fixes a hardware problem temporarily.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I had to buy chelsio specific transceivers for my T520 to get it to work.
 
Top