Weird networking problems after add an NFS share (LAGG related)

Status
Not open for further replies.

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
First off let me state my configurations:
4 x 1GB Nics aggregated as LACP
32GB ECC Memory
6 x 4TB WD Red drives configured as a RaidZ2 pool with encryption enabled
C2550 CPU

I started seeing some problems immediately after I added an NFS share. But I think that might just be because the NFS share is doing something with the networking configuration that causes the real problem to happen.

I just recently built this system, got everything configured on it and have been running it with some transfers back and forth to it for the past 3 days to be sure everything was working right. Everything seemed good so I started configuring a few final things to start using it and immediately after configuring a NFS share things went downhill.

When I initially boot the system everything is ok, but my storage is encrypted so it's locked. As soon as I unlock my pool the problems start happening again. I suspect it's something to do with the Plex Media Server in a Jail loading that is kicking off the problems after load since it's doing something with the networking at that point.

The problems are
1) The menu on the left side takes a long time to load. After unlocking the pool it disappears and after if I refresh the page it will take 10+ minutes to load again.
2) Lots of other areas are very slow to load as well. For example Plugins, they will eventually load if I leave it sitting there long enough but takes a lot longer than it should (and longer than it used to).
3) All CIFS shares are extremely slow to make connection over the file browser. After the connection is made though copying a file off through a samba share maxes out the single 1GB NIC on my desktop (~115 MB/s tranfser speed).
4) I can't ping anything from shell, even though I can still access the server through the web gui, which is odd =/
5) This is probably completely unrelated, but once after a reboot when I tried unlocking my pool it didn't unlock and a message on the server said it couldn't find the pool... 2nd try unlocked the pool... That has me concerned slightly lol, I'm hoping that was simply something wasn't fully initialized yet when I tried unlocking it.

There are no alerts, I can't find anything relevant in any logs under /var/log
I've checked the Reports, everything is very low utilization, showing 30GB free on memory, system load average is .16, cpu idle average is 97.72%, network utilization is low
Ran a smart scan and everything came back Healthy in it.

I've narrowed this down to some kind of networking issue I suspect because if I go to network settings, interfaces, my LAGG interface shows Media Status as Down. If I edit it, and click Ok to force it to restart without changing any settings, it comes back up and everything starts behaving again for a little while. I'm thinking it behaves until I do something that does anything with the network and then has problems again (explaining why I saw this after enable an NFS share possibly?)

I've tested each port individually without Link Aggregation set up, and all work fine. My switch has the LACP group configured for the correct ports and shows all working at 0% utilization most of the time. Switches CPU is 5% utilized and Memory is 45% utilized.

Anyone have any ideas?

EDIT:
Just a couple more tidbits:
I just had the problem repeat itself without the LAGG showing Media Status Down (still showed Active), but it did correct itself again after I did the same thing, edit the lagg interface, click ok in order to force it to restart the network.

There was also a message that came accross on the terminal after this: Failed to generate navtree for app freenasUI.sharing: 'NoneType' object has no attribute 'lower' and it proceeded to duplicate all the items in the left menu a lot of times and then the web UI stopped responding. Closed that tab and reloaded it in a new tab though and it was ok again.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It sounds like your networking hardware doesn't do LACP/LAGG properly. I don't know if this means your NICs, the drivers for the NICs, your switch settings, etc.

Sorry but that's all I can really provide. But your symptoms are classic LACP/LAGG not being properly setup or hardware problem.
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
It sounds like your networking hardware doesn't do LACP/LAGG properly. I don't know if this means your NICs, the drivers for the NICs, your switch settings, etc.

Sorry but that's all I can really provide. But your symptoms are classic LACP/LAGG not being properly setup or hardware problem.

Thanks for the response however that unfortunately is definitely not correct.

I've been running LACP for a long time, and currently have it running correctly on 4 other devices. Everything in my network that matters for it fully supports it. The majority of my networking equipment is all Enterpise level stuff.

It even works fine on this FreeNAS box until I start something else that tries to do something with the network. I can monitor the ports on the switch and see that the LACP configuration is indeed working and balancing the load as it should be. And also just to verify, I was able to copy to 3 different PCs containing 1Gb NICs simultaneously while sustaining > 100 MB/s speeds on each one. That would be impossible without Link Aggregation working.
 
Last edited:

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
Oh, sorry missed replying to the other part of what you said last night as I was tired and heading to bed at that time lol:

As for the drivers for the NICs, well those are in FreeBSD / FreeNAS, so if it's a problem with them then it is something that should probably be corrected in the release at some point. If it were a driver issue I could look into compiling my own if I could find some newer versions for it (but I've never worked with drivers in FreeBSD so not sure if this is something that is even possible with what I have available, would have to look into it). This could be a likely possibility since under any other situation other than activating something that does something with the network configuration, everything works as it should, and I know the rest of my related network hardware fully supports LACP.

I got an email this morning about someone else posting but i don't see it now so not sure if they decided to delete it or something, but in response to one thing they said: No I don't really need 4 NIC link aggregation for my use case. I would most likely be fine with a single NIC for most situations and dual NIC aggregation for the few cases where I am doing enough simultaneous transfers to use it. But it's one of those cases where I have the hardware, so why not use it =P
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not sure who would have replied. I don't see any deleted posts so it could only have been a forum admin as they can delete threads and posts permanently.
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
Not sure who would have replied. I don't see any deleted posts so it could only have been a forum admin as they can delete threads and posts permanently.

Not sure, email says it was a user named Scareh, base on their profile I don't believe they are an admin.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Scareh is not.
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
Was just looking into it a little and it might be quicker to ask. What is the easiest method for determining if it is truly a driver issue? Having not worked with FreeBSD much directly I'm just not experienced enough with it.

I did do another quick test though and loaded up Ubuntu Server with the ifenslave package to set up a 802.3ad mode bond there and everything seemed to be working correctly. I haven't extensively tested it yet though.
 

Scareh

Contributor
Joined
Jul 31, 2012
Messages
182
well thats akward :/
I hit a quadrupple post yesterday in this topic, with some "unexpected error" when i was posting a reply... And now it looks like my post disappeared xD

So yeah i asked if the OP needed the 4 NIC's, if not i'd advise to use no LACP on the both sides (switch + freenas) to see if his problem persists. Just to rule out anything hardware related.
Other then that I asked to see his command/output of how the OP determined the LACP was used correctly on the switch side, and which switch he was using.
 

steven6282

Dabbler
Joined
Jul 22, 2014
Messages
39
well thats akward :/
I hit a quadrupple post yesterday in this topic, with some "unexpected error" when i was posting a reply... And now it looks like my post disappeared xD

So yeah i asked if the OP needed the 4 NIC's, if not i'd advise to use no LACP on the both sides (switch + freenas) to see if his problem persists. Just to rule out anything hardware related.
Other then that I asked to see his command/output of how the OP determined the LACP was used correctly on the switch side, and which switch he was using.

As I said before, no I don't really need 4 NICs, it's just a matter of I have the hardware so why not use it :)

Honestly, it's been a while since I've mucked around in the CLI for my switch because there normally isn't a need, and apparently the recent firmware update I did on it changed a lot of the command structures, so I'm going to have to look up documentation on the new commands.

However, the basic way I determined it was working is because once I added the ports to the LACP Config and started up FreeNAS, it automatically detected the group and created a LAG group for those 4 ports in the LAG table. Usually if there is a problem with the device not supporting LACP or a configuration error somewhere, it wont auto detect and create the LAG group.

I did try a couple of things tonight and I'll see if they make a difference or not. Updated my switch to a firmware that was released in June, and as part of the firmware update rebooted the switch for the first time in 462 days. Maybe if I'm lucky that will solve the problems lol. If not though, I'll figure out the new CLI command structure and post up some more detailed information from it.
 
Status
Not open for further replies.
Top