Static Routes(only set at boot?)

jeremybox · Feb 18, 2022

I am using a system on version:
Version:
TrueNAS-12.0-U7
I have configured static routes via the UI. These routes do not seem to populate the routing table if there is a network connectivity loss.

After a power loss to the switch:

I am able to manually add routes and restore connection, but this is troublesome. Have I configured something incorrectly, or is this something to report as a bug?

jgreco · Feb 18, 2022

Are you by any chance letting the NAS get its IP address via DHCP?

If so, then this is operating correctly, although perhaps unexpectedly (to you). When an IP interface is reconfigured, in many cases, IP routes via that interface are cleared by the kernel.

I would consider it a UI bug if it lets you add static routes out a DHCP interface.

jeremybox · Feb 18, 2022

I am indeed using DHCP for this interface. (The same routes are also being handed out through DHCP options, but the device seems to be ignoring them). As a personal learning opportunity, Is there something inherent to DHCP that would conflict with also having static routes defined via the UI, or is that just bad practice because if the DHCP scope changes the routes may not be valid or reachable?

jgreco · Feb 18, 2022

So what's happening here is this. Time sequence of events.

Your NAS boots.

Your NAS gets its em0 IP via DHCP, let's say 192.168.1.10 with gateway 192.168.1.1. This causes dhclient to start running.

The middleware boots, and begins configuring the system.

The middleware sees a static route instruction, so it adds a static route for 172.16.0.0/24 out via 192.168.1.1. This goes out via the currently connected interface em0.

Time passes on your happy network.

You flap the network interface. Switch crash, ethernet unplugged, who cares what.

dhclient sees the em0 link go away.

dhclient sees the em0 link return. Since this means it could be on another network, it is mandatory to re-DHCP.

dhclient reinstantiates the IP address.

The kernel clears the static route because the interface IP configuration was updated.

Now you have an unhappy network.

It isn't DHCP itself that's a problem. It's that DHCP reconfigures the interface and that the kernel treats interface reconfigurations as an implicit instruction to remove the old routes, because they may no longer be reachable under the new configuration.

jeremybox · Feb 18, 2022

Gotcha, and the middleware isn't aware of or watching for changes so it wouldn't re-insert them at that time.

Should I consider the not accepting static routes via DHCP as something to pursue further, would it be expected that those are ignored? This is my only freebsd device on the network, but linux devices seem to be respecting them via DHCP and I don't know enough to know if that's just a freebsd difference, or something not behaving properly.

jgreco · Feb 18, 2022

Unclear. I know that dhclient can receive static routes. Which kind are you sending? RFC-compliant ones or the Microsoft ones?

These correspond to server configs of

option rfc3442-classless-static-routes code 121 = array of integer 8;
option ms-classless-static-routes code 249 = array of integer 8;

jeremybox · Feb 19, 2022

I believe I've found the root cause, and I don't know if this is by design, or something I should put in a request/bug report for.

In my default truenas dhcp client configuration, the following is set:

root@nas[~]# cat /etc/dhclient.conf.orig

supersede routers 10.0.0.1;

request subnet-mask, broadcast-address, time-offset,
        domain-name, domain-name-servers, domain-search, host-name,
        interface-mtu;

When I modify it as such:

root@nas[~]# cat /etc/dhclient.conf

supersede routers 10.0.0.1;

request subnet-mask, broadcast-address, time-offset,
        domain-name, domain-name-servers, domain-search, host-name,
        interface-mtu, classless-static-route;

Everything works as desired, and the static routes are pulled from DHCP. It seems that the default configuration never requests these values. As I see the 'supersede routers 10.0.0.1 line, I suspect something from my network configuration is being pulled from elsewhere, as I never touched this file by hand previously, and don't otherwise understand why it would know anything about 10.0.0.1 existing

jgreco · Feb 19, 2022

The 10.0.0.1 is presumably your manually specified default gateway setting. In a full DHCP environment, it isn't there. It is using a request stanza to groom the DHCP answer for your mixed DHCP/static environment, and it would then be correct to avoid accepting classless static routes in such a case. So this all looks correct.

Go into Network -> Global Configuration -> Default Gateway and clear out the "10.0.0.1" that you have set in IPv4 Default Gateway. Save it. Your change to dhclient.conf will be lost, but also the defaults (a null file) will be restored.

jgreco · Feb 19, 2022

Oh you probably also have to remove any OTHER statically specified routes too.

jeremybox · Feb 19, 2022

After removing all static routes set in the UI, as well as the gateway which was also set through the UI, I can confirm that the /etc/dhclient.conf file was blank.

I then removed all routes from the routing table, ifconfig down, then up the interface, and finally ran dhclient on the interface.

All routes were properly pulled and inserted into the routing table from what was set in DHCP option 121, and is now working as I expect/intend it to.

Thank you so much for stepping through this with me, it seems that DHCP configuration with truenas needs to be all or nothing, and this isn't in any way a bug or anything, simply a design choice that a user would be all in on DHCP, or statically setting values otherwise.

If anyone in the future comes across this, TL;DR is: go back and remove any and all manual networking settings that you might have inserted and forgotten if you intend to use DHCP for your system, or there will be unexpected results.

jgreco · Feb 19, 2022

I would say it isn't a "design choice"; this implies that the developers at iX would have had some reasonable path to making this work. It seems clear that someone has probably run across this in the past, and the solution that would work for this is the one you encountered.

A lifetime of professional infrastructure engineering and sysadmin experience tells me that DHCP for servers and infrastructure devices is ill-advised, and I'm passing that insight on to you. Relying on DHCP to deliver static routes is dicey, and likely makes your networks more fragile.

guemi · Feb 19, 2022

jgreco said:
A lifetime of professional infrastructure engineering and sysadmin experience tells me that DHCP for servers and infrastructure devices is ill-advised, and I'm passing that insight on to you. Relying on DHCP to deliver static routes is dicey, and likely makes your networks more fragile.

No. Servers and infrastructure devices should definitely NOT have static assigned addresses.
That's extremely poor network design.

It doesn't scale.
It takes 10x time when you need to do maintenance on the network. Not to mention when you want to change for example the gateway or the DNS servers.

Your lifetime professional engineering should've taught you that devices gets their address from DHCP, and reports back to the DNS server what it got and DNS should be to go to usage for contacting devices across the network.

jgreco · Feb 19, 2022

guemi said:
Your lifetime professional engineering should've taught you that devices gets their address from DHCP, and reports back to the DNS server what it got and DNS should be to go to usage for contacting devices across the network.

My lifetime of professional engineering tells me that having your network all go offline when something bad happens to the DHCP server is a recipe for disaster. Seen it happen. Been paid to fix it.

You can use DHCP to bootstrap stuff. For example, our ESXi installer that runs in a bunch of places bootstraps via DHCP/PXE, gets a mapping for the MAC which identifies the IP, and from there everything is driven via the ESXi jumpstart and scripting. That brings up all the stateful configs and does a persistent installation. DHCP is nice for that kind of bootstrap. But I'd never be an idiot and let an ESXi host run DHCP in production. When a lease goes away or changes, you'd lose all your connectivity and storage. Reeks of amateur hour.

When you do infrastructure at scale, you need to plan for failures, and what happens when failures occur.

When you get arrogant about it, like Facebook did late last year, you run into a trainwreck of unintended consequences.

What is BGP, and what role did it play in Facebook’s massive outage

The tech that runs (and sometimes breaks) the internet.

www.theverge.com

Simple BGP edit caused a network-wide failure. They couldn't get into their remote gear because of a lack of planning for such an eventuality, and then it turned out that they couldn't even get into their buildings, because the access controllers were network-based and were nonfunctional.

guemi said:
It takes 10x time when you need to do maintenance on the network

Good engineering isn't about planning how to make things EASY. It takes time to do things right. If it's important, people want it done right.

guemi said:
when you want to change for example the gateway or the DNS servers.

I can't help it if you're a crappy network planner. We're going nearly 30 years here at AS14536 on the same DNS server IP's, so I know it's possible to plan correctly.

guemi said:
Servers and infrastructure devices should definitely NOT have static assigned addresses.
That's extremely poor network design.

I pity you the day you need to do a cold start and run into ordering dependency failures.

My measure of success is when the equipment can do a cold start and instantiate itself correctly without human intervention. But then, I'm designing for and often managing gear that's at least hundreds, often thousands of miles away.

It might be different if I was running a thousand ESXi hosts all under the same roof and on a single management network with a NOC staff on hand. But really only the big hosting operations are like that these days.

Either way, the comment stands. You don't want to use DHCP for NAS because you're creating a dependency that will shatter your network if something bad happens to the DHCP server.

guemi · Feb 20, 2022

jgreco said:
My lifetime of professional engineering tells me that having your network all go offline when something bad happens to the DHCP server is a recipe for disaster. Seen it happen. Been paid to fix it.
But I'd never be an idiot and let an ESXi host run DHCP in production. When a lease goes away or changes, you'd lose all your connectivity and storage. Reeks of amateur hour.

Someone who thinks you can't have DHCP redudancy and doesn't know you can tell dhcp-clients that if they do not get an ACK, just assign the last good address shouldn't be giving out advice.

I'm sorry but you're outdated.

Patrick M. Hausen · Feb 20, 2022

guemi said:
It takes 10x time when you need to do maintenance on the network. Not to mention when you want to change for example the gateway or the DNS servers.

Ever heard of something called "Ansible"?

jgreco · Feb 20, 2022

guemi said:
Someone who thinks you can't have DHCP redudancy and doesn't know you can tell dhcp-clients that if they do not get an ACK, just assign the last good address shouldn't be giving out advice.

I'm sorry but you're outdated.

DHCP redundancy doesn't help when the hyperconverged infrastructure at the data center you're trying to cold start cannot get their DHCP addresses because both the storage systems and the hypervisors are pending IP assignments, and the DHCP servers aren't up because they cannot boot because they're on the storage that the hypervisor is trying to mount. "Duh"

It's also worth noting that the assumptions you have built in to your "solution" include the broken concept that a device stores "the last good address"; even my junior techs know that's not universally the case. Consider the obvious example of ESXi.

guemi said:
I'm sorry but you're outdated.

I have a word for you as well, but I'm not allowed to use it in polite company.

jgreco · Feb 20, 2022

Patrick M. Hausen said:
Ever heard of something called "Ansible"?

Yes, Infrastructure as Code systems are a much healthier way to manage this; relying on DHCP is a poor man's substitute for doing the real work and having a system properly deployed and provisioned in a reproducible manner using jumpstart/API/Ansible/whatever style management. Some of us write our own. DHCP? Puhhhhleaze.

I liked the bit where I did a deep dive into the behaviours of dhclient as they interact with FreeBSD/TrueNAS, and then he tried to make it like I didn't know jack about using DHCP. Perhaps all the more galling because I have an Xmas card signed by Paul Vixie a few feet away on my wall of cards, and he and I spent a decade as cagemates in Ashburn until his company got sold off to DomainTools.

Important Announcement for the TrueNAS Community.

Static Routes(only set at boot?)

jeremybox

Cadet

jgreco

Resident Grinch

jeremybox

Cadet

jgreco

Resident Grinch

jeremybox

Cadet

jgreco

Resident Grinch

jeremybox

Cadet

jgreco

Resident Grinch

jgreco

Resident Grinch

jeremybox

Cadet

jgreco

Resident Grinch

guemi

Dabbler

jgreco

Resident Grinch

What is BGP, and what role did it play in Facebook’s massive outage

guemi

Dabbler

Patrick M. Hausen

Hall of Famer

jgreco

Resident Grinch

jgreco

Resident Grinch

Similar threads