VLAN stopped working after adding new VMs

endzyme

Cadet
Joined
Dec 27, 2023
Messages
5
I am running TrueNAS-13.0-U6.1.

I have a problem where my VMs associated with bridges which are on a VLAN, no longer receive traffic. My configuration is as follows:

VM(vm1) <=> NIC(e1000) <=> bridge400 <=> vlan400 <=> re0(the host Ethernet card)
&
VM(vm2) <=> NIC(e1000) <=> bridge98 <=> vlan98 <=> re0(the host Ethernet card)

The NICs in the VM used to get their IPs via DHCP and communicate with my router fine.

My other VMs which have their NICs associated directly with re0 work fine and if I add a a NIC to "vm1" and "vm2" which is attached to re0, they also work with DHCP.

I have tried diagnosing with tcpdump on both my router and the truenas machine. When inspecting from the router, I do see frames requesting IPs (DHCP requests and responses) coming into and out of the router.

On the truenas machine I see frames on re0 leaving the box, tagged with the correct vlan, and I also see the router responses to DHCP requests which assign the expected IPs for the boxes in question. When inspecting the vnet interface (which is created when I start up the VMs) I only see the requests and none of the responses.

I should also mention that the bridge98 interface has an ip statically assigned in the same subject for that vlan and can not ping the router. Oddly enough I do see the right eth mac address in the arp table, but for some reason the truenas machine doesn't acknowledge the ping responses.

I've also taken the steps of removing all NICs from the VMs which are not working, and re adding them. I've also, separately, removed the vlans and bridges from the truenas, leaving only re0, rebooted, and then readded everything, including new NICs attached to the newly created bridges. Still it exhibited the same tcpdump behavior.

To be honest I haven't looked at all the '-e' options from tcpdump to verify all the mac addresses make sense. I have an inkling that the problem exists on the truenas server itself, something blocking it from passing frames from the bridge to the vnet.

My next hardware step is to connect the server directly to the router which houses the gateway in the vlan to eliminate any oddities from switches, but I don't think it's that because I see the response frames getting to re0 on the truenas server.

I'm not too sure how bsd works for ether layer 2 routing or how vnets work. I want to confirm how the vnet is configured and I want to setup a more simplified test to see if the OS is behaving normally.

Are there any next steps people could suggest?
 

endzyme

Cadet
Joined
Dec 27, 2023
Messages
5
After looking at the ifconfig more closely I noticed that bridge400 has both vlan400 and the vnet of the expected VM. I also noticed that I was incorrect above about seeing the DHCP response on the bridge and the vlan interfaces. I see all the *request* on all relevant interfaces (the vnet, bridge, vlan and re0), but the response only gets to re0 and never shows up on tcpdumps of vlan400, bridge400 nor the VM vnet.

I also noticed something else, the mac address of the return response does not exactly match the MAC of the vnet. Only the first two characters are different from the VM nic and the vnet (vm-nic: 00:aa:bb:cc:dd:ee; and the vent: fe:aa:bb:cc:dd:ee) -- the mac addresses are identical except the first two characters. I assume this is some internal magic for vnets. Not sure what a working build should look like.

I will post my ifconfig and tcpdumps when I'm back at the machine.
 

endzyme

Cadet
Joined
Dec 27, 2023
Messages
5
Below is my ifconfig and examples of the DHCP request / responses.
Code:
re0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=82099<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether 74:56:3c:4a:66:7a
        inet x.x.x.x netmask 0xffffff00 broadcast x.x.x.x
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=9<PERFORMNUD,IFDISABLED>
...
vlan400: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80001<RXCSUM,LINKSTATE>
        ether 74:56:3c:4a:66:7a
        groups: vlan
        vlan: 400 vlanproto: 802.1q vlanpcp: 0 parent interface: re0
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
        nd6 options=9<PERFORMNUD,IFDISABLED>
bridge400: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 58:9c:fc:10:ff:9b
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: vnet2 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 9 priority 128 path cost 2000000
        member: vlan400 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 20000
        groups: bridge
        nd6 options=9<PERFORMNUD,IFDISABLED>
vnet2: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=80000<LINKSTATE>
        ether fe:a0:98:2f:b7:35
        hwaddr 58:9c:fc:00:6e:09
        groups: tap
        media: Ethernet autoselect
        status: active
        nd6 options=9<PERFORMNUD,IFDISABLED>
        Opened by PID 1937


Command: tcpdump -nnnn -vvvv -e -i vnet2
Code:
16:24:44.168246 00:a0:98:2f:b7:35 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 331: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 317)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 00:a0:98:2f:b7:35, length 289, xid 0x42927d9e, secs 20955, Flags [none] (0x0000)
          Client-Ethernet-Address 00:a0:98:2f:b7:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Discover
            Client-ID Option 61, length 19: hardware-type 255, c2:72:f6:09:00:02:00:00:ab:11:b8:fc:e0:e8:b6:8b:dc:c4
            Parameter-Request Option 55, length 11:
              Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
              Domain-Name, MTU, Static-Route, NTP
              Option 119, Option 120, Classless-Static-Route
            MSZ Option 57, length 2: 576
            Hostname Option 12, length 5: "media"
            END Option 255, length 0


Command: tcpdump -nnnn -vvvv -e -i bridge400
Code:
16:24:44.168271 00:a0:98:2f:b7:35 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 331: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 317)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 00:a0:98:2f:b7:35, length 289, xid 0x42927d9e, secs 20955, Flags [none] (0x0000)
          Client-Ethernet-Address 00:a0:98:2f:b7:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Discover
            Client-ID Option 61, length 19: hardware-type 255, c2:72:f6:09:00:02:00:00:ab:11:b8:fc:e0:e8:b6:8b:dc:c4
            Parameter-Request Option 55, length 11:
              Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
              Domain-Name, MTU, Static-Route, NTP
              Option 119, Option 120, Classless-Static-Route
            MSZ Option 57, length 2: 576
            Hostname Option 12, length 5: "media"
            END Option 255, length 0


Command: tcpdump -nnnn -vvvv -e -i vlan400
Code:
16:24:44.168259 00:a0:98:2f:b7:35 > ff:ff:ff:ff:ff:ff, ethertype IPv4 (0x0800), length 331: (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 317)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 00:a0:98:2f:b7:35, length 289, xid 0x42927d9e, secs 20955, Flags [none] (0x0000)
          Client-Ethernet-Address 00:a0:98:2f:b7:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Discover
            Client-ID Option 61, length 19: hardware-type 255, c2:72:f6:09:00:02:00:00:ab:11:b8:fc:e0:e8:b6:8b:dc:c4
            Parameter-Request Option 55, length 11:
              Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
              Domain-Name, MTU, Static-Route, NTP
              Option 119, Option 120, Classless-Static-Route
            MSZ Option 57, length 2: 576
            Hostname Option 12, length 5: "media"
            END Option 255, length 0


Command: tcpdump -nnnn -vvvv -e -i re0 vlan 400 (filter to only frames which are tagged with VLAN 400
Code:
16:24:44.168265 00:a0:98:2f:b7:35 > ff:ff:ff:ff:ff:ff, ethertype 802.1Q (0x8100), length 335: vlan 400, p 0, ethertype IPv4, (tos 0xc0, ttl 64, id 0, offset 0, flags [none], proto UDP (17), length 317)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 00:a0:98:2f:b7:35, length 289, xid 0x42927d9e, secs 20955, Flags [none] (0x0000)
          Client-Ethernet-Address 00:a0:98:2f:b7:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Discover
            Client-ID Option 61, length 19: hardware-type 255, c2:72:f6:09:00:02:00:00:ab:11:b8:fc:e0:e8:b6:8b:dc:c4
            Parameter-Request Option 55, length 11:
              Subnet-Mask, Default-Gateway, Domain-Name-Server, Hostname
              Domain-Name, MTU, Static-Route, NTP
              Option 119, Option 120, Classless-Static-Route
            MSZ Option 57, length 2: 576
            Hostname Option 12, length 5: "media"
            END Option 255, length 0
16:24:44.169366 78:8a:20:b9:3b:ab > 00:a0:98:2f:b7:35, ethertype 802.1Q (0x8100), length 346: vlan 400, p 0, ethertype IPv4, (tos 0x10, ttl 128, id 0, offset 0, flags [none], proto UDP (17), length 328)
    172.18.0.1.67 > 172.18.0.5.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x42927d9e, secs 20955, Flags [none] (0x0000)
          Your-IP 172.18.0.5
          Client-Ethernet-Address 00:a0:98:2f:b7:35
          Vendor-rfc1048 Extensions
            Magic Cookie 0x63825363
            DHCP-Message Option 53, length 1: Offer
            Server-ID Option 54, length 4: 172.18.0.1
            Lease-Time Option 51, length 4: 28800
            Subnet-Mask Option 1, length 4: 255.255.255.0
            Default-Gateway Option 3, length 4: 172.18.0.1
            Domain-Name-Server Option 6, length 8: 1.1.1.1,1.0.0.1
            END Option 255, length 0
            PAD Option 0, length 0, occurs 22
 

endzyme

Cadet
Joined
Dec 27, 2023
Messages
5
Ok, interesting turn of events. I ended up shutting down all my VMs and removing all their NICs from hardware configurations. Then I restarted the truenas server. I noticed when it started back up again, there was no more "bridge0" device listed in ifconfig. This was interesting because I originally thought this was a necessary device for vnets to be associated to VMs on the main (untagged) re0.

When I added an address to the bridge400, ping worked again.

I then removed the address from bridge400 and added a NIC attached to bridge400 to my VM. DHCP works now for that VM (shocking).

It broke again after adding a VM attached to the re0 device. I notice the system automatically adds a "bridge0" with re0 and the vnets to the VMs attached to re0 as the members. This is definitely what breaks all traffic to the VMs associated with bridge/vlan devices.

As a workaround I'll have to make a new VLAN for all my VMs (so that no VMs have NICs associated with re0). This makes it so that no VMs accidentally auto-create the bridge0 interface, thus breaking other bridge traffic for vlan bound bridges.

It is still unclear to me why bridge0 breaks all traffic to other bridges, but maybe someone out there knows something about this.

As a heads up this used to work before I upgraded to U6.1 (from U6).
 

endzyme

Cadet
Joined
Dec 27, 2023
Messages
5
Oops - looks like when I was running U6, I only had VMs which were on VLANs, none associated directly with re0.
 
Top