Hi - I'm really hoping someone maybe able to shed some light on a network issue I'm facing with my FreeNAS-9.10.2-U2 (e1497f2) server. The server is a DELL Server that contains two nVidia MCP55 network cards. This server is my datastore for my ESXi infrastructure.
The cards are configured as follows:
NFE0 - 192.168.0.21 - Management Network
NFE1 - 192.168.10.21 - Data Network (NFS)
The issue I am facing is that during a period of large data transfers the NFE1 network card stops responding. It can ping itself on 192.168.10.21 (assumed internal routing at this point) but no other machine on the network. Likewise no other machine on the .10.x network can ping 192.168.10.21.
If I restart the netif service the NFE1 card comes back to life. What's interesting is that if you tcpdump -i nfe1 during the crash / drop you can see traffic such as ARP requests and incoming NFS requests but to all ESXi hosts, vCenter and all other machines on the network the interface is down.
At this point I can only assume that the NIC / Driver is crashing. The net result is that I've lost VMs on some occasions.
Has anyone seen this issue before or knows of a way to limit the traffic on the card to prevent the crash? I've read in multiple places that the cards should be using the forcedeth driver?
This is the dmesg output:
nfe0: <NVIDIA nForce MCP55 Networking Adapter> port 0x3088-0x308f mem 0xc8045000-0xc8045fff,0xc8041800-0xc80418ff,0xc8041400-0xc804140f at device 8.0 on pci0
miibus0: <MII bus> on nfe0
e1000phy0: <Marvell 88E1116 Gigabit PHY> PHY 1 on miibus0
e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
nfe0: Using defaults for TSO: 65518/35/2048
nfe0: Ethernet address: 6c:f0:49:4e:43:62
nfe1: <NVIDIA nForce MCP55 Networking Adapter> port 0x3090-0x3097 mem 0xc8047000-0xc8047fff,0xc8046000-0xc80460ff,0xc8041c00-0xc8041c0f at device 9.0 on pci0
miibus1: <MII bus> on nfe1
e1000phy1: <Marvell 88E1116 Gigabit PHY> PHY 2 on miibus1
e1000phy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
nfe1: Using defaults for TSO: 65518/35/2048
nfe1: Ethernet address: 6c:f0:49:4e:43:63
Really appreciate any assistance as this is causing a lot of grief each time the internet *appears* to crash or drop from the network.
Any ideas?
Thanks for your time :)
The cards are configured as follows:
NFE0 - 192.168.0.21 - Management Network
NFE1 - 192.168.10.21 - Data Network (NFS)
The issue I am facing is that during a period of large data transfers the NFE1 network card stops responding. It can ping itself on 192.168.10.21 (assumed internal routing at this point) but no other machine on the network. Likewise no other machine on the .10.x network can ping 192.168.10.21.
If I restart the netif service the NFE1 card comes back to life. What's interesting is that if you tcpdump -i nfe1 during the crash / drop you can see traffic such as ARP requests and incoming NFS requests but to all ESXi hosts, vCenter and all other machines on the network the interface is down.
At this point I can only assume that the NIC / Driver is crashing. The net result is that I've lost VMs on some occasions.
Has anyone seen this issue before or knows of a way to limit the traffic on the card to prevent the crash? I've read in multiple places that the cards should be using the forcedeth driver?
This is the dmesg output:
nfe0: <NVIDIA nForce MCP55 Networking Adapter> port 0x3088-0x308f mem 0xc8045000-0xc8045fff,0xc8041800-0xc80418ff,0xc8041400-0xc804140f at device 8.0 on pci0
miibus0: <MII bus> on nfe0
e1000phy0: <Marvell 88E1116 Gigabit PHY> PHY 1 on miibus0
e1000phy0: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
nfe0: Using defaults for TSO: 65518/35/2048
nfe0: Ethernet address: 6c:f0:49:4e:43:62
nfe1: <NVIDIA nForce MCP55 Networking Adapter> port 0x3090-0x3097 mem 0xc8047000-0xc8047fff,0xc8046000-0xc80460ff,0xc8041c00-0xc8041c0f at device 9.0 on pci0
miibus1: <MII bus> on nfe1
e1000phy1: <Marvell 88E1116 Gigabit PHY> PHY 2 on miibus1
e1000phy1: none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
nfe1: Using defaults for TSO: 65518/35/2048
nfe1: Ethernet address: 6c:f0:49:4e:43:63
Really appreciate any assistance as this is causing a lot of grief each time the internet *appears* to crash or drop from the network.
Any ideas?
Thanks for your time :)