10GbE Interface Input Drops/LACP Fail

Kelsey Cummings · Dec 11, 2015

I have a Intel 10GbE lagg running LACP to a pair of junipers running MCLAG with VLANS and an MTU of 9000. Every is seemingly working correctly - read performance that seems reasonable enough for the disk config (5x 1TB mirrors at +350MB/s+) over NFS from a linux client running on top of ESX. NFS write speed is slower and obviously held back by the cheap SLOG device that is currently in use. But, under sustained saturated writes, the underlying interfaces start showing input drops which, in addition to causing packet loss is also triggering LACP to failover to the standby interface. "netstat -m" shows denied requests for 9k jumbo clusters increasing.

Any pointers for fixing this?

[root@yard] ~# netstat -I ix1 5
input (ix1) output
packets errs idrops bytes packets errs bytes colls
12764 0 0 18994594 12766 0 1224124 0
290022 0 3343 440256076 292205 0 22358078 0
254550 0 27 382015870 257558 0 11222406 0
113403 0 0 170311760 114983 0 16606922 0
42613 0 0 63744840 42625 0 3787048 0
339145 0 2671 514292488 341234 0 26096711 0
2 0 27 10252 5 0 640 0
292022 0 5 439514838 296153 0 21959828 0
69310 0 0 103193882 69318 0 5533882 0
340498 0 1520 512331556 342288 0 26349982 0
7 0 25 17678 10 0 1666 0
296737 0 7 446358184 300921 0 22367250 0
69474 0 0 103995858 69488 0 5586580 0

The platform is a Supermicro X8DTL w/ 2x E5620 and 24G with autotuning enabled.

Kelsey Cummings · Dec 18, 2015

The issues go away with jumbo frames disabled and sequential write speed from the same linux client runs around 110MB/s.

Actually, make that, the problems go away when autotuning is disabled and all sysctl variables are returned to defaults.

jgreco · Dec 19, 2015

Kelsey, got your e-mail but the connectivity here sucks bad enough I'm not going to compose a reply via ssh/elm (1200ms ping with 20% packet loss, hah). I got crunched into building some gear to run out east and now sitting at the train station. ;-)

Is that an X520? X540? I seem to recall both show as ix.

First off, jumbo frames suck. I'm pretty convinced that there's enough driver badness in the different codepaths that get used for buffer allocation, etc., and pretty much no stupid issue surprises me anymore when it comes to jumbo.

Second, try setting hw.ix.enable_aim to zero. This may or may not help. I have a suspicion it may.

Third, while I like the Intel cards for 1G, the go-to cards for FreeNAS for 10G are probably the Chelsio; the T420-CR is available relatively inexpensively on eBay and works very well. I've used the X520 with no issues, but perhaps I wasn't stressing it in the same way you were.

I doubt very much that disabling autotune and returning all the sysctls to defaults has actually fixed the problem. More likely it has reduced pressure on the system to the point that whatever issue you're hitting no longer afflicts you. To me that's just hanging the knife a few feet above your head and hoping the rope holds.

Suggestion: turn off jumbo, set enable_aim to zero, re-run autotune, see if it's fixed. If not, you may find your time better spent acquiring a T420-CR or equivalent.

Kelsey Cummings · Dec 19, 2015

Joe, the nics are X520 which we had around. For some reason it took me a while to find the 10ge/jumbo frames threads on the forums but we generally agree with that conclusion. But... we already have jumbo frames enabled on this network for the Tintris and ESX hosts per their recommendations and it is most convenient to keep them enabled since it is already in production...

I went ahead and ordered ram to bump the host to 64G from 24G (which is too small anyway) and will see if I can break it again. There's a chance that I was using autotune variables from when the host had even less ram. And I agree it would be nice if autotune actually autotuned the host at boot rather than once.

jgreco · Dec 19, 2015

Yeah, an existing dedicated storage net is problematic... I am told the Chelsio stuff works well with jumbo but haven't tried it since I got tired of trying to un-break storage nets everytime something changed, and had a fit one day and transitioned it all to 1500.

The bump to 64G may place more stress on the system due to larger transaction groups. Typically you'll discover that you can eat a lot of data VERY quickly, until you're committing one txg group to disk while you've managed to fill another one, and everything hangs until that first one clears. Lather rinse repeat.

Current ZFS is much better than old ZFS about this sort of thing, but it still helps if you lower vfs.zfs.txg.timeout from 5 down to 1. You lose some write speed but you gain a LOT in overall responsiveness when under severe write pressure.

Keep me posted but I'm outtatheoffice until Xmas eve.

Important Announcement for the TrueNAS Community.

10GbE Interface Input Drops/LACP Fail

Kelsey Cummings

Cadet

Kelsey Cummings

Cadet

jgreco

Resident Grinch

Kelsey Cummings

Cadet

jgreco

Resident Grinch

Similar threads

Important Announcement for the TrueNAS Community.

10GbE Interface Input Drops/LACP Fail

Kelsey Cummings

Cadet

Kelsey Cummings

Cadet

jgreco

Resident Grinch

Kelsey Cummings

Cadet

jgreco

Resident Grinch

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "10GbE Interface Input Drops/LACP Fail"

Similar threads