SOLVED Intermittent Network Issues on new build

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
I'm not really sure where to start with this, as the problems I'm having are intermittent, as sometimes I can access the services and other times I can't. I'm struggling to even pin down exactly what or where the problem is, and while I may have developed some skills setting up stuff on FreeNAS, my skills to diagnose problems are exhausting fast :(

I've just built the new system in my signature and migrated everything across and initially, everything appeared to work fine. In summary, I have 9 jails running a variety of services with some (WordPress, Nextcloud, emby, Calibre & Limesurvey) are exposed externally through a jail running NGINX as a reverse-proxy and generating SSL certificates using Letsencrypt (this jail is ssl-proxy). They all have certificates in the format service.domain.co.uk and only connect via https on port 443. I've also built a few Ubuntu VMs, 2 of which are running ONLYOFFICE Document Server and Docker, with a number of containers exposed externally in a similar way to the jails (another Wordpress for testing, Mattermost, Collabra and other things I might be playing with) so always with a SSL certificate and reverse-proxying from 443 to the IP:port in the container.

I noticed the issue with the VMs first, and initially thought it was just the VMs I was having an issue with and not the jails. I'm now not so sure. I never have problems accessing the same services using the IP addresses inside my network, so I think the problem is somewhere from my router to the services. The A records are set up exactly the same as they were on the old machine to the same fixed IP address, so I don't think the issue is getting to my door. When I get the 502/504 error, the certificate icon is showing in the browser, which is making me think it's talking to the ssl-proxy jail which presents the certificate. So the problem seems to be from the ssl-proxy jail to the other jails and VMs. The only thing I've changed with the NGINX.conf files are some of the IP addresses and I've added an example below (for ONLYOFFICE Document Server, which is in a Ubuntu 18.04 VM with nothing else, and the one I first noticed the issue with). They are not all the same and are specific to the service, but again they used to work all of the time. They are still working now, some of the time, which again makes me think it's not this that's the problem.

Code:
server {
    listen 443 ssl;
    ssl on;
    server_name onlyedit.domain.co.uk;
    ssl_certificate /usr/local/etc/letsencrypt/live/onlyedit.domain.co.uk/fullchain.pem;
    ssl_certificate_key /usr/local/etc/letsencrypt/live/onlyedit.domain.co.uk/privkey.pem;

    add_header Referrer-Policy "no-referrer";
    add_header X-Frame-Options "ALLOW-FROM https://nextcloud.domain.co.uk/" always;
    add_header X-XSS-Protection "1; mode=block";
    add_header X-Robots-Tag none;
    add_header X-Download-Options noopen;
    add_header X-Permitted-Cross-Domain-Policies none;

    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload;" always;
#     add_header X-Frame-Options SAMEORIGIN;
    add_header X-Content-Type-Options nosniff;

    location /favicon.ico {
        return 204;
        access_log off;
        log_not_found off;
    }

    location / {
        proxy_pass https://192.168.168.41;
        proxy_redirect off;
        proxy_read_timeout 1800;
        proxy_connect_timeout 1800;
        proxy_set_header Host $http_host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Host $server_name;

   }

    location /.well-known {                                                       
        root /usr/local/www;
    }
}


I'm just at a bit of a loss as to what I need to check next and how in order to find the problem. If any of the above makes sense (which I accept it might not!) can you offer any advice or point me in the direction of something to read and try. My google searches on this have been less than helpful so far.

Thanks in advance.

Adrian
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
One of my New Years resolutions is to understand properly how my network works! I've spent quite a lot of time trying to diagnose this now, and am pretty sure it's something between the jail and VM in terms of the networking. This is my ifconfig:

Code:
ifconfig
igb0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2400b9<RXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,VLAN_HWTSO,RXCSUM_IPV6>
        ether ac:1f:6b:98:3b:9c
        hwaddr ac:1f:6b:98:3b:9c
        inet 192.168.168.14 netmask 0xffffff00 broadcast 192.168.168.255
        inet6 fe80::ae1f:6bff:fe98:3b9c%igb0 prefixlen 64 scopeid 0x1
        nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
igb1: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=6403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO,RXCSUM_IPV6,TXCSUM_IPV6>
        ether ac:1f:6b:98:3b:9d
        hwaddr ac:1f:6b:98:3b:9d
        nd6 options=9<PERFORMNUD,IFDISABLED>
        media: Ethernet autoselect
        status: no carrier
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0x3
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        groups: lo
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: iohyve-bridge-igb0
        ether 02:11:e2:37:39:00
        nd6 options=1<PERFORMNUD>
        groups: bridge
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: tap0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 5 priority 128 path cost 2000000
        member: vnet0:21 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 11 priority 128 path cost 2000
        member: vnet0:20 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 14 priority 128 path cost 2000
        member: vnet0:19 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 13 priority 128 path cost 2000
        member: vnet0:16 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 10 priority 128 path cost 2000
        member: vnet0:15 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 9 priority 128 path cost 2000
        member: vnet0:14 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 8 priority 128 path cost 2000
        member: vnet0:13 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 7 priority 128 path cost 2000
        member: vnet0:12 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 2000
        member: igb0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 1 priority 128 path cost 2000000
vnet0:12: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: calibre as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:66:c6:3f
        hwaddr 02:a8:d0:00:06:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:13: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: dns as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:fd:33:60
        hwaddr 02:a8:d0:00:07:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:14: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: embyms as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:0d:d6:f3
        hwaddr 02:a8:d0:00:08:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:15: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: homeassistant as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:b3:40:32
        hwaddr 02:a8:d0:00:09:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:16: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: nextcloud as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:ba:b5:81
        hwaddr 02:a8:d0:00:0a:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:19: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: vpn as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:49:46:d1
        hwaddr 02:a8:d0:00:0d:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:20: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: wordpress as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:ed:53:08
        hwaddr 02:a8:d0:00:0e:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
vnet0:21: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: associated with jail: ssl-proxy as nic: epair0b
        options=8<VLAN_MTU>
        ether 02:ff:60:68:78:43
        hwaddr 02:a8:d0:00:0b:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
        groups: epair
bridge1: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:11:e2:37:39:01
        nd6 options=1<PERFORMNUD>
        groups: bridge
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: igb1 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 2 priority 128 path cost 55
tap0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        description: Attached to onlyoffice
        options=80000<LINKSTATE>
        ether 00:bd:5f:50:04:00
        hwaddr 00:bd:5f:50:04:00
        nd6 options=1<PERFORMNUD>
        media: Ethernet autoselect
        status: active
        groups: tap
        Opened by PID 98623


So I think there is some problem with vnet0:21 (ssl-proxy jail) talking to tap0 (ubuntu VM) but I'm not sure what :(

Can anyone see anything obvious? And any recommendations for good networking books relevant to a FreeNAS set-up would be much appreciated.

Thanks!
 

adrianwi

Guru
Joined
Oct 15, 2013
Messages
1,231
Haha! I think I might have figured this out, and will probably be investing in Networking for Dummies over the festive period.

I'd switched off all of the jails running on my old FreeNAS box after transferring everything across, as I intended to use the same IP addresses for them when I switched them on the new FreeNAS box.

I was starting to prepare the old one to replicate data back from the new one and noticed the two jails were running. Turns out I'd forgotten to delete 2 Cron Tasks - one that checks Calibre was running and another that runs the Certbot renew script. These were the two jails running, so the Cron Task must have restarted the jails. Unsurprisingly, it didn't like having 2 jails running on the same IP, one directing to old IP addresses for the VMs and one to new IP addresses!

Cron Tasks removed and old jails stopped and everything appears to be working at it should!

Merry Christmas and a Happy New Year everyone!!!
 
Last edited:
Top