Jumbo frames notes

jgreco · Dec 23, 2014

We got into a little back-and-forth on jumbo frames on the 10 gig thread. I don't know if anyone cares about this but I thought I'd throw some info out there.

IP network packets have a Maximum Transfer Unit (MTU), which on a conventional Ethernet interface is 1500 bytes. Ethernet is not the only transport available for IP, but it is fairly common. IP endpoints are expected to be able to ascertain the smallest MTU supported by the intermediate networks, most recently using technologies such as PMTUD.

In the early days of Ethernet, we moved fairly rapidly from 10Mbps to 100Mbps and beyond. This was great for throughput but had the horrifying side effect of increasing the number of packets per second gear was expected to cope with. Especially on a hub network, this was ... problematic. Further, CPU and interrupt load on the endpoints were rapidly increasing as speeds increased. Someone recognized that putting more data in a single packet was a fix. Thus was born ... Jumbo frames!

Skipping a bunch of history, though, I want to point out: in moving from 10Mbps to 10Gbps Ethernet, we have increased the network throughput by a factor of 1000. Jumbo typically increases the frame size from 1500 to something probably not larger than 9216, which is only about 6 times larger. Jumbo sizes have not kept pace with throughput increases, and therefore potential packets per second continue to increase with throughput.

Today, jumbo primarily works to slightly increase the data-to-overhead ratio of an Ethernet network, moving network efficiency from about 95% (1500 mtu) to around 99% (9000 mtu). The reduction in CPU and interrupt processing has largely been nullified by the introduction of large segment offload (LSO) and large receive offload (LRO), and hub oriented technologies have virtually disappeared. Modern switch silicon may be capable of handling a gigapacket per second or more, all at wire speed.

Layer 2

Layer 2 refers to the Data Link Layer. For Ethernet, you can think about this as the network which is formed by attaching all the member devices to a switch. All of these devices must agree on certain parameters, and rather importantly all of them must agree on and be able to support the MTU the administrator wishes. If you have two devices on the network, one of which is capable of and configured for a 9000 MTU, but the other device cannot even process 9000, everything will appear to work until a TCP or UDP packet larger than 1500 is sent to that limited device, at which point the packet is lost.

Jumbo frames must be configured identically on all devices attached to a jumbo network, or erratic and unpredictable behaviour may result.

Layer 3

Layer 3 refers to the Network Layer. For Ethernet, these are routed networks. Please note that what many end users refer to as a "router" is actually a NAT gateway and not a router at all. A router is a device that performs routing, the act of selecting the best path for a packet and moving it on to its next hop.

Routers are what make the Internet possible. They connect together discrete networks, both within an organization and externally. Routers are the natural boundary point of a network's MTU, and therefore the point at which a packet might encounter a different MTU. A router may be expected to perform fragmentation or forward packets from a network with one MTU to another. I'll come back to this in a bit.

Modern jumbo and issues

A lot of modern equipment supports jumbo, but when I say that, I really mean "supports." It is a technical feature on a checklist, and sometimes the reality is that it doesn't work as well as you'd expect. Or at all.

One of the problems with jumbo is that on much i386/amd64 hardware, jumbo is treated differently from standard traffic. In some of the FreeBSD drivers, for example, buffers for jumbo traffic are allocated/handled differently (in some cases using fixed parameters) than standard traffic. This can lead to different behaviours for a host's connection to the network depending on whether or not jumbo has been enabled.

Another is that you might decide to design a storage network as jumbo, which is actually a fairly sensible thing to do. However, if you then need to introduce a device that doesn't support jumbo, or supports jumbo but with a smaller MTU than you've selected, you're hosed.

The real problem is dealing with the layer 3 issue, however. If you have a NAS such as FreeNAS, it probably expects to be able to access your DNS server, your mail server, etc. If those are not on the jumbo network, and your router cannot properly cope with being the MTU gatekeeper, you can run into mysterious issues such as the test mail working but crisis notification mails not working. That's very bad.

So in recent years, I've seen a tendency for network administrators to ditch jumbo. It adds a lot of complexity and potential issues to a network for a modest boost in performance.

Here at grinch central, the move towards layer 3 switching revealed some unhappy issues with network MTU handling by layer 3 switches: they assume a chassis-wide MTU. This appears to be an ugly issue with a lot of layer 3 switches.

Conclusion

Well, my conclusion, anyways: Screw jumbo.

Jumbo might be fine for a totally dedicated storage network without any external routing requirements, but even there it is quite possible to run into issues with how FreeBSD handles jumbo traffic differently.

Jumbo is a technology that served a valid purpose at one point, but hardware offload and wire speed routing have significantly eaten away at the advantages it once offered. It will offer you a modest performance boost if you can get it configured properly, but that time might be better spent designing your network topology differently.

cyberjock · Dec 23, 2014

I just want to emphasize one thing in this post...

jgreco said:
Jumbo frames must be configured identically on all devices attached to a jumbo network, or erratic and unpredictable behaviour may result

Do NOT assume that if you set every desktop, network printer, and your router to 9000 that they are all identically configured. Different network hardware will treat that value differently. To Intel, an MTU of 9000 is NOT the same as a Realtek NIC with an MTU of 9000. Network printers and routers often have different standards still. ESXi on my server (with an Intel NIC), if set to 9000 only provides a data packet that is something like 8700 bytes. The highest value you can set ESXi to is 9000. So in my case this means that I cannot set the rest of my devices on my LAN to 9000 and expect trouble-free networking.

The conclusion basically sums up everything about jumbo frames nicely.. screw jumbo frames.

They had a purpose 10 years ago, and there are no doubt people that will swear up, down and sideways that they matter a great deal. There are also very specific situations in which you can see some benefits of MTUs that are specfically crafted for a particular workload (for example, if you are doing SQL transactions over a LAN and every transaction is precisely a given number of bytes) then you can see some benefits. But the people that know that the MTU is bigger will have gobs and gobs of data to support their claim.

For those that want to claim that an iperf test can confirm or deny a benefit with jumbo frames, that is not the whole story and that assessment is incorrect.

And for anyone that asks, I stick with the default MTU for a reason... it works.

jgreco · Dec 24, 2014

cyberjock said:
Do NOT assume that if you set every desktop, network printer, and your router to 9000 that they are all identically configured. Different network hardware will treat that value differently. To Intel, an MTU of 9000 is NOT the same as a Realtek NIC with an MTU of 9000. Network printers and routers often have different standards still. ESXi on my server (with an Intel NIC), if set to 9000 only provides a data packet that is something like 8700 bytes. The highest value you can set ESXi to is 9000. So in my case this means that I cannot set the rest of my devices on my LAN to 9000 and expect trouble-free networking.

Well, MTU does have a well-defined meaning, and if it isn't the same on your Intel and Realtek NIC, either the hardware or the driver is broken. What I think you are describing is a vendor's broken implementation, which falls under the umbrella I labelled "supports." Usually network appliances like printers and other gadgets won't support jumbo, or if they do it'll be virtually untested. I certainly agree that the result resembles what you describe. Testing for correct jumbo configuration certainly includes needing to test that each endpoint is both transmitting and receiving jumbo frames correctly, and because the IP stack will fragment packets transparently, you can't just run ping with a 9k packet size and trust that a response is evidence of correctness. You actually have to check by inspecting the packets. And that's got to be a complete test of all the nodes on the network, or at least all the nodes with different hardware.

In practice, though, on a real modern network, jumbo gets even more complex because most nontrivial networks are using things like vlans, which also rob frames of a few bytes. This gets real fun because, at least years ago, different hardware dealt with this differently: the Intel stuff would just transparently handle it IIRC while other drivers might or might not, meaning you might need to configure a vlan trunk for a slightly larger MTU than you might expect.

cyberjock said:
For those that want to claim that an iperf test can confirm or deny a benefit with jumbo frames

Well, there's no doubt that there's a benefit with jumbo frames. It's roughly 5% faster. The question is whether it is worth the effort. Because while the jumbo guy is doing all the testing needed to qualify his network as jumbo-compatible, I find it more effective to just add a second LACP link and get up to 100% faster. Plus at the same time I'll be able to add devices to my network in the future without having to qualify them for jumbo as well

To put this all in a little perspective: the switching core here used to fully support jumbo, and we had a few networks where that was used. On the FreeBSD based routers, though, this made life complex because the primary links to the switching core were LACP 2xGbE on which vlans were delivered. So the em interfaces had to be configured for jumbo, then the LACP lagg interface also had to be configured for it, and then the individual vlans were set according to whatever the attached network was. But we also needed so many ports that we had to use the onboard 10/100 fxp's for some stuff and they acted as a backup trunk, and they passed the vlan tags differently, so they needed a different mtu at the fxp level... and then of course at the routing control plane layer, OSPF also had to be configured per-IP-interface. Lots of parts all need to be working exactly right. And it did.

You'd THINK that upgrading would not be difficult but you'd be wrong. The new modern layer 3 switchgear is amazing stuff but it copes with MTU size differences by simply ignoring the issue and hoping that things like PMTUD "figure it out" (which may in fact kinda work), but some networking stuff actually requires MTU's to be configured correctly. Consider OSPF; the OSPF implementation on the switch expects you to use "ip ospf mtu-ignore" but that's pretty idiotic because you're really opening yourself up to random exstart hangs when the switch sends out a jumbo sized OSPF packet and a host that thinks it is on a non-jumbo network and gets-and-drops the jumbo sized packet basically never establishes the adjacency. FAIL.

So when we did the upgrade, it became a question of, do we continue to do routing on the FreeBSD routers (which I do love, and handle the MTU issues correctly, but which have limited performance) or do I offload most of the routing onto the layer 3 switches (which don't handle the MTU issue but are capable of routing a gigapacket per second/terabit per second)?

Do I continue beating my head trying to make jumbo work for a minimal gain that meant more to us ten years ago? Or do I take the pragmatic route and go back to plain standard ethernet and take advantage of a different set of benefits?

So that's a little more insight into what I summarized in a single sentence in the original post:

jgreco said:
Here at grinch central, the move towards layer 3 switching revealed some unhappy issues with network MTU handling by layer 3 switches: they assume a chassis-wide MTU.

But there's a lot of annoyance and pent-up jumbo rage built into that. Jumbo works best on something like a small dedicated private storage network on its own switchgear, like what you might find attaching a NAS to several ESXi hosts with the same type of network hardware. But it's like the dark side of the force: "If once you start down the dark path, forever will it dominate your destiny, consume you it will..." - My brother-in-green, Yoda

My conclusion: Screw jumbo.

That isn't the right answer for everyone of course, and of course in some environments it will be easier to make it work correctly. But do consider whether the modest gain is worth the headaches.

Dave Genton · Jan 15, 2015

For the most part, I agree. As a Network Engineer for Cisco well beyond 20 years now I have also lived thru an mtu setting saving the CPU on a WAN router till now where I daily must defend its merits (or not) in a new data center design being proposed. If pre-existing networks the hassle is huge, but then again I have done an mtu migration as a consultant for one of the three largest financial institutions on the planet. Talk about a headache, one nearly a full year in the making.... That said having the opportunity to design and build greenfield Datacenters with storage in both fibre channel and iSCSI I will always tend to follow best practices which includes the use of jumbo frames for transport. I always ensure that iSCSI, networks are built without default gateways ensuring the customer has no way of routing off the segment if they wanted after I leave. Fully explaining during knowledge transfer these are purpose built, isolated networks. In most cases this would be for booting servers over iSCSI. Basically following practices laid down by decades of Fibre Channel use by the big storage vendors, replicate those practices despite everyone thought that since it's IP, since it's Ethernet I can just use my existing switches and infrastructure etc. You can, but you cannot expect it to behave or perform like Fibre Channel networks do. With Cisco UCS hitting the street I have gotten the opportunity to partner with VMware and build some serious infrastructures for virtualization and storage platforms. Great interesting stuff, and jumbo frames certainly have their place, but its a well designed and isolated place I will give you that :)

Important Announcement for The TrueNAS Community.

Jumbo frames notes

jgreco

Resident Grinch

cyberjock

Inactive Account

jgreco

Resident Grinch

Dave Genton

Contributor

Similar threads

Important Announcement for The TrueNAS Community.