Network interface keeps dropping

Status
Not open for further replies.

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
Hey guys, I've encountered another issue with my FreeNAS 9.x machine and was in need of some help and guidance. I've had a few issues since upgrading from 8.x to 9.x and had everything ironed out for a few months, until recently of course.

The issue seems to be my network card interface flapping. Essentially, I open a few of the web GUIs via a browser (about 5 GUIs including FreeNAS) and the connection dies. After some troubleshooting I determined it was the network interface dropping. I let a ping run until I get a reply back and the FreeNAS machine is reachable again. I may or may not be able to access the GUI again though as the connection will drop once I try and connect. It's really hit-or-miss.

Not sure what to pinpoint as the culprit thus far. I'm on vacation as I've figured this issue out so I haven't spent much time getting my elbows dirty. Thought I'd ask for some feedback before getting home in a few days and digging into it. I don't believe it to be a cabling issue due to the issue happening under some load on the FreeNAS box. I haven't quite ruled out my switches but they're lower on the list (a couple of gigabit, unmanaged switches).

I've included some information below that may assist in this endeavour. Cheers!

Network card: Intel Gigabit NIC (82541PI controller)

Code:
Build    FreeNAS-9.2.1.5-RELEASE-x64 (80c1d35)
Platform    AMD Phenom(tm) II X4 965 Processor
Memory    11752MB
System Time    Sat Jul 05 20:00:14 EDT 2014
Uptime    8:00PM up 3 mins, 0 users
Load Average    0.86, 0.76, 0.35
 
# ifconfig
re0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8209b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,LINKSTATE>
        ether 6c:f0:49:e3:b7:78
        nd6 options=9<PERFORMNUD,IFDISABLED>
        media: Ethernet autoselect (10baseT/UTP <half-duplex>)
        status: no carrier
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=2098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC>
        ether 90:e2:ba:3c:07:2c
        inet 192.168.2.120 netmask 0xffffff00 broadcast 192.168.2.255
        inet 192.168.2.200 netmask 0xffffff00 broadcast 192.168.2.255
        nd6 options=9<PERFORMNUD,IFDISABLED>
        media: Ethernet autoselect (1000baseT <full-duplex>)
        status: active
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
        nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
        options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
        inet6 ::1 prefixlen 128
        inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
        inet 127.0.0.1 netmask 0xff000000
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 02:42:84:e7:a9:00
        nd6 options=1<PERFORMNUD>
        id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
        maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
        root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
        member: epair3a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 16 priority 128 path cost 2000
        member: epair2a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 15 priority 128 path cost 2000
        member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 14 priority 128 path cost 2000
        member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 13 priority 128 path cost 2000
        member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
                ifmaxaddr 0 port 6 priority 128 path cost 20000
epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:09:b9:00:0d:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:88:bc:00:0e:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
epair2a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:f1:61:00:0f:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
epair3a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
        options=8<VLAN_MTU>
        ether 02:40:90:00:10:0a
        nd6 options=1<PERFORMNUD>
        media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
        status: active
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
First thing that comes to mind is a flaky network cable. In the iX lab we've learned that 5e isn't as good for Gigabit as people would like to think and if you have cat6 you should definitely go that route instead. If you are looking for cheap networking cables I'd recommend monoprice.com
 

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
a6bFIdz.png

Here is a rough diagram that includes my main components: my PC, the FreeNAS server, switches, and the gateway.

First thing that comes to mind is a flaky network cable. In the iX lab we've learned that 5e isn't as good for Gigabit as people would like to think and if you have cat6 you should definitely go that route instead. If you are looking for cheap networking cables I'd recommend monoprice.com

This is weird. I don't suspect the cabling. I just replaced CABLE-1 and CABLE-2 in that diagram and no change. I can re-create the issue once I've started a traffic session with the box. I.e., I can run a ping test that will get replies but once I open up all the web GUIs or start an SSH session, the pings die. I don't suspect CABLE-4 as my PC-1 Internet connection would also fail.

What do you get from
netstat -i -I em0
?
Also,

netstat -s

?

Here you go.

Code:
[root@Ramenator_NAS] ~# netstat -i -I em0
Name    Mtu Network      Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
em0    1500 <Link#6>      90:e2:ba:3c:07:2c  2474253  6153    0  2466667    12    0
em0    1500 192.168.2.0  192.168.2.120        75325    -    -  1222711    -    -
em0    1500 192.168.2.0  192.168.2.200      1012790    -    -        0    -    -
[root@Ramenator_NAS] ~# netstat -s
tcp:
        204283 packets sent
                143247 data packets (179400874 bytes)
                4098 data packets (5620603 bytes) retransmitted
                4 data packets unnecessarily retransmitted
                0 resends initiated by MTU discovery
                46004 ack-only packets (0 delayed)
                0 URG only packets
                0 window probe packets
                3 window update packets
                10931 control packets
        106207 packets received
                62422 acks (for 179347837 bytes)
                12956 duplicate acks
                0 acks for unsent data
                38161 packets (9373186 bytes) received in-sequence
                112 completely duplicate packets (5583 bytes)
                1 old duplicate packet
                8 packets with some dup. data (1348 bytes duped)
                321 out-of-order packets (345836 bytes)
                0 packets (0 bytes) of data after window
                0 window probes
                529 window update packets
                474 packets received after close
                0 discarded for bad checksums
                0 discarded for bad header offset fields
                0 discarded because packet too short
                0 discarded due to memory problems
        5245 connection requests
        2494 connection accepts
        0 bad connection attempts
        0 listen queue overflows
        19 ignored RSTs in the windows
        5532 connections established (including accepts)
        10723 connections closed (including 688 drops)
                1566 connections updated cached RTT on close
                1596 connections updated cached RTT variance on close
                36 connections updated cached ssthresh on close
        354 embryonic connections dropped
        43684 segments updated rtt (of 46181 attempts)
        3911 retransmit timeouts
                231 connections dropped by rexmit timeout
        0 persist timeouts
                0 connections dropped by persist timeout
        0 Connections (fin_wait_2) dropped because of timeout
        1 keepalive timeout
                0 keepalive probes sent
                1 connection dropped by keepalive
        27728 correct ACK header predictions
        23083 correct data packet header predictions
        2496 syncache entries added
                19 retransmitted
                11 dupsyn
                0 dropped
                2494 completed
                0 bucket overflow
                0 cache overflow
                0 reset
                3 stale
                0 aborted
                0 badack
                0 unreach
                0 zone failures
        2496 cookies sent
        1 cookie received
        223 hostcache entries added
                0 bucket overflow
        1512 SACK recovery episodes
        2368 segment rexmits in SACK recovery episodes
        3392141 byte rexmits in SACK recovery episodes
        12094 SACK options (SACK blocks) received
        205 SACK options (SACK blocks) sent
        0 SACK scoreboard overflow
        0 packets with ECN CE bit set
        0 packets with ECN ECT(0) bit set
        0 packets with ECN ECT(1) bit set
        0 successful ECN handshakes
        0 times ECN reduced the congestion window
udp:
        1219900 datagrams received
        0 with incomplete header
        0 with bad data length field
        0 with bad checksum
        12529 with no checksum
        1202 dropped due to no socket
        242800 broadcast/multicast datagrams undelivered
        392 dropped due to full socket buffers
        0 not for hashed pcb
        975506 delivered
        1040662 datagrams output
        0 times multicast source filter matched
ip:
        1381485 total packets received
        0 bad header checksums
        0 with size smaller than minimum
        0 with data size < data length
        0 with ip length > max ip packet size
        0 with header length < data size
        0 with data length < header length
        0 with bad options
        0 with incorrect version number
        16732 fragments received
        3867 fragments dropped (dup or out of space)
        3853 fragments dropped after timeout
        4499 packets reassembled ok
        1325484 packets for this host
        5853 packets for unknown/unsupported protocol
        0 packets forwarded (0 packets fast forwarded)
        37915 packets not forwardable
        104 packets received for unknown multicast group
        0 redirects sent
        1248032 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        0 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 tunneling packets that can't find gif
        0 datagrams with bad address in header
icmp:
        3 calls to icmp_error
        0 errors not generated in response to an icmp message
        Output histogram:
                echo reply: 580
                destination unreachable: 3
        0 messages with bad code fields
        0 messages less than the minimum length
        0 messages with bad checksum
        0 messages with bad length
        0 multicast echo requests ignored
        0 multicast timestamp requests ignored
        Input histogram:
                destination unreachable: 1557
                echo: 580
                time exceeded: 10
        580 message responses generated
        0 invalid return addresses
        0 no return routes
        ICMP address mask responses are disabled
igmp:
        4286 messages received
        0 messages received with too few bytes
        0 messages received with wrong TTL
        0 messages received with bad checksum
        1702 V1/V2 membership queries received
        0 V3 membership queries received
        0 membership queries received with invalid field(s)
        1292 general queries received
        410 group queries received
        0 group-source queries received
        0 group-source queries dropped
        2584 membership reports received
        0 membership reports received with invalid field(s)
        0 membership reports received for groups to which we belong
        0 V3 reports received without Router Alert
        766 membership reports sent
carp:
        0 packets received (IPv4)
        0 packets received (IPv6)
                0 packets discarded for wrong TTL
                0 packets shorter than header
                0 discarded for bad checksums
                0 discarded packets with a bad version
                0 discarded because packet too short
                0 discarded for bad authentication
                0 discarded for bad vhid
                0 discarded because of a bad address list
        0 packets sent (IPv4)
        0 packets sent (IPv6)
                0 send failed due to mbuf memory error
arp:
        61 ARP requests sent
        4713 ARP replies sent
        35158 ARP requests received
        6 ARP replies received
        65801 ARP packets received
        54 total packets dropped due to no ARP entry
        45 ARP entrys timed out
        0 Duplicate IPs seen
ip6:
        1241 total packets received
        0 with size smaller than minimum
        0 with data size < data length
        0 with bad options
        0 with incorrect version number
        0 fragments received
        0 fragments dropped (dup or out of space)
        0 fragments dropped after timeout
        0 fragments that exceeded limit
        0 packets reassembled ok
        1241 packets for this host
        0 packets forwarded
        0 packets not forwardable
        0 redirects sent
        4 packets sent from this host
        0 packets sent with fabricated ip header
        0 output packets dropped due to no bufs, etc.
        275 output packets discarded due to no route
        0 output datagrams fragmented
        0 fragments created
        0 datagrams that can't be fragmented
        0 packets that violated scope rules
        38 multicast packets which we don't join
        Input histogram:
                UDP: 1203
                ICMP6: 38
        Mbuf statistics:
                2 one mbuf
                two or more mbuf:
                        lo0= 2
                        bridge0= 1237
                1237 one ext mbuf
                0 two or more ext mbuf
        0 packets whose headers are not contiguous
        0 tunneling packets that can't find gif
        0 packets discarded because of too many headers
        4 failures of source address selection
        source addresses on a non-outgoing I/F
                4 addresses scope=f
        Source addresses selection rule applied:
                4 same address
icmp6:
        0 calls to icmp6_error
        0 errors not generated in response to an icmp6 message
        0 errors not generated because of rate limitation
        0 messages with bad code fields
        0 messages < minimum length
        0 bad checksums
        0 messages with bad length
        Histogram of error messages to be generated:
                0 no route
                0 administratively prohibited
                0 beyond scope
                0 address unreachable
                0 port unreachable
                0 packet too big
                0 time exceed transit
                0 time exceed reassembly
                0 erroneous header field
                0 unrecognized next header
                0 unrecognized option
                0 redirect
                0 unknown
        0 message responses generated
        0 messages with too many ND options
        0 messages with bad ND options
        0 bad neighbor solicitation messages
        0 bad neighbor advertisement messages
        0 bad router solicitation messages
        0 bad router advertisement messages
        0 bad redirect messages
        0 path MTU changes
rip6:
        0 messages received
        0 checksum calculations on inbound
        0 messages with bad checksum
        0 messages dropped due to no socket
        0 multicast messages dropped due to no socket
        0 messages dropped due to full socket buffers
        0 delivered
        0 datagrams output
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Did you try a different port on the switch in the basement or a temporary direct cable from FreeNAS to your gateway?

Can you test using a laptop connected to the switch in the basement, so you would see results instantly? You can try to borrow a switch from upstairs for testing...
 

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
Did you try a different port on the switch in the basement or a temporary direct cable from FreeNAS to your gateway?

Can you test using a laptop connected to the switch in the basement, so you would see results instantly? You can try to borrow a switch from upstairs for testing...


Trying that now. Just realized that I don't even need the switch down there anymore after upgrading to FreeNAS 9.x and migrating all my services from a second server to FreeNAS.

I'll be happy if this turns out to be the cause. I would have eventually gotten to testing that out (using logic, yo) but I'm mush after this travel/vacation.

Will keep you posted!
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
According to that output, you've got a few input errors, it's hard to tell exactly what that error rate is since I don't know when the counters were last reset, but there's definitely some garbage coming in.

I'll be curious to hear how things work when you eliminate the switch.

By the way, when you say "interface flapping".. Do you actually see up and down status messages in the messages file?
 

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
According to that output, you've got a few input errors, it's hard to tell exactly what that error rate is since I don't know when the counters were last reset, but there's definitely some garbage coming in.

I'll be curious to hear how things work when you eliminate the switch.

By the way, when you say "interface flapping".. Do you actually see up and down status messages in the messages file?


Turns out to be the switch after all. Yeah, the interface would flap with the em0 interface going down/up a few times sporadically.

I've been using it for a few hours now without the switch and it's been good. Relief.
 

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
Hm, the plot thickens. The issue has re-surfaced after being directly connected. It worked fine from Sunday night until this morning.

Edit: The two most promising solutions I've come across are updating the Intel NIC driver (https://downloadcenter.intel.com/Detail_Desc.aspx?agr=Y&DwnldID=17509&ProdId=3023&lang=eng via https://forums.freebsd.org/viewtopic.php?&t=21528) and a friend of mine is well-versed in Unix mentioned that MSI may not be working properly).

What do you guys think? Appreciate the support as always!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
MSI is your motherboard brand?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
I think that the software (driver) problem would have manifested earlier.

Does your motherboard have more that one Ethernet NIC ?

Can you try a different port on the router ?
 

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
I think that the software (driver) problem would have manifested earlier.

Does your motherboard have more that one Ethernet NIC ?

Can you try a different port on the router ?


This Intel NIC is a recent addition I've made a few months ago. It's a PCI NIC. I have one NIC on my Mobo that I could revert back to but it's obviously non-Intel.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
If the Intel NIC has failed, it would be failing if taken to another (test) system too.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I wouldn't use PCI anymore. It's a bottleneck for Gb LAN.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I wouldn't use PCI anymore. It's a bottleneck for Gb LAN.
I'm assuming you're talking about old school PCI and not PCI Express... but what board these days even have those old PCI slots anymore?
If you're referring to PCI Express... there is no chance in hell a 1 Gbps NIC can saturate even one lane of PCIe 2.0.... let alone a 2x lane.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm assuming you're talking about old school PCI and not PCI Express... but what board these days even have those old PCI slots anymore?
If you're referring to PCI Express... there is no chance in hell a 1 Gbps NIC can saturate even one lane of PCIe 2.0.... let alone a 2x lane.

Unfortunately (fortunately for those with obscure, old hardware), PCI is still stupidly common. I get the feeling it's mostly used as filler to not have an empty slot. Of course, it might as well be empty for the vast majority of users...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, I was talking about PCI. PCIe is much faster and even a 1x PCIe slot at v1.0 would provide almost enough bandwidth to do dual Gb.
 

Ramenator

Dabbler
Joined
Oct 13, 2013
Messages
18
I'm going to revert back to the on-board NIC as it never gave me this issue. I'm wondering if you guys think it might be the Intel driver causing this problem?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm going to revert back to the on-board NIC as it never gave me this issue. I'm wondering if you guys think it might be the Intel driver causing this problem?
Doubtful, given that there would be a large amount of complaints by now...
 
Status
Not open for further replies.
Top