Troubleshoot why network reset

stevewest15

Cadet
Joined
Nov 24, 2019
Messages
3
Hello,

Any ideas how I can troubleshoot why two NICs reset and a third NIC (cxl0) stop working resulting in the nfs mounts to go offline on remote XCP-ng server? The freenas has the following 5 nics:

2020-04-29_1328.png


I tried looking at /var/log/messages but there is no mention of the networks:

# grep "Apr 29" /var/log/messages
Apr 29 00:00:00 freenas25 syslog-ng[6165]: Configuration reload request received, reloading configuration;
Apr 29 00:00:00 freenas25 syslog-ng[6165]: Configuration reload finished;

I searched dmesg and found that there is mention of bge1 and bge3 but not cxl0:

# tail /var/log/dmesg.today
mfi0: 16400 (638071191s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
mfi0: 16401 (638675991s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
mfi0: 16402 (639280791s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
bge1: link state changed to DOWN
bge3: link state changed to DOWN
bge1: link state changed to UP
bge3: link state changed to UP

mfi0: 16403 (639885591s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
mfi0: 16404 (640490391s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD
mfi0: 16405 (641095191s/0x0020/WARN) - Patrol Read can't be started, as PDs are either not ONLINE, or are in a VD with an active process, or are in an excluded VD

On the XCP-ng server which is directly connected via 10Gbe to cxl0, it shows the following:

Apr 29 08:03:31 XenServer70 kernel: [7539679.420431] nfs: server 10.10.1.5 not responding, timed out
Apr 29 08:03:34 XenServer70 kernel: [7539681.724438] nfs: server 10.10.1.5 not responding, timed out
Apr 29 08:03:38 XenServer70 kernel: [7539686.076346] nfs: server 10.10.1.5 not responding, timed out
Apr 29 08:03:38 XenServer70 kernel: [7539686.076384] nfs: server 10.10.1.5 not responding, timed out

The server 10.10.1.5 is the freenas nic cxl0 as you can see from the above image.

I'm confused to why dmesg shows bge1 and bge3 but not mention cxl0 which indeed had an issue. Any recommendations on how to proceed?

Thank you,

SW
 

stevewest15

Cadet
Joined
Nov 24, 2019
Messages
3
Actually dmesg.today does show cxl0 going down but not clear as to why. See attachment of dmesg-today.txt for full ouput.
 

Attachments

  • dmesg-today.txt
    35.3 KB · Views: 188
Top