Experiencing regular drops in write performance on CIFS and NFS Shares

Status
Not open for further replies.

vanoert

Cadet
Joined
Sep 10, 2016
Messages
4
Dear FreeNAS community,

although this is my first post I have been reading these forums for some weeks and have already found some great tips that I couldn't have done without. Thank you for this.

I sincerely apologize if this thread is posted to the wrong forum and would be glad if you could move it to the correct one.

Unfortunately I have encountered a mystic problem with my setup and have - after days of googling and the like - not found a solution, or even someone with a similar problem.

I think it might be best to begin describing the environment/use case we have for this Setup. The Nas has replaced an ill-conceived Windows-Server for SMB/CIFS purposes in our office. Our business is manufacturing large scale print works and we mainly use the CIFS as a central dropoff for customer print files after they have been checked for production readyness. There is an offsite Backup (AWS Cloud) for everything, so no high risk environment, but just more convenient to work with. We do not have production on site so everything is moved to AWS Cloud anyway. On site we have around 5 people pushing and pulling files with SSD-equipped workstations, so we need fast read- and write speeds, but only for single large files and not even continuosly. We need this NAS to do around 80MB/swrite and read.

The hardware is mostly what was on hand, but should be ok suited for our needs.

Hardware/Configuration
HP Microserver Gen8 (Intel Celeron G1610T 2x 2,3GHz)
12 GB of DDR3 ECC
Onboard Storage controller (disabled, just using the ports)
4x 3 TB SATA HDDs (I don't have the exact models at hand, but should be 2x WD Green and 2x Toshiba)
Raid Z+2 (around 3.5 TB used)
Network: 1000Mbit all the way

Problem:
When pushing traffic over CIFS, after reaching about 20GB in transfer (large files) at around 80-90MB/s, write speed drops really low, around 1-1.3MB/s.

Observations:

- when pinging the nas response times go up from 1ms (transfer speed regular 80MB/s) into the lower 3 digits (1MB/s)
- CPU usage is relatively low, even at regular speed
- 8GB automated swap space has never seen use (maybe there is something wrong with the config?)
- Other random problem: the nas is connected with both GBE ports and uses one for TX and one for RX. I can't pull a cable because then it won't respond to ping on any IP. Maybe I misconfigured something?

Solutions found so far:
a) restarting NAS
b) restarting CIFS
c) entering CIFS config, changing nothing and pressing ok (which probably restarts CIFS anyway)
d) pulling one of the network cables and plugging it back in
e) (my favorite): after transferring at 1MB/s for around 10 minutes, the problem fixes itself

Things tested:

- other clients
- disabling autotune
- changing everything on the network link except the lan adapter (because the low speed could have indicated autonegotiation to 10mbit)
- introducing a ZIL (I know, there are reasons not to, but the SSD was unused and I wanted to test in which way this would affect anything)
- Setting up NFS shares and clients to see if the problem persists (it does)
- various things I surely forgot

Does anybody have an idea? Thank you!
 
Last edited:

pirateghost

Unintelligible Geek
Joined
Feb 29, 2012
Messages
4,219
Let's start with the obvious here. This isn't a bug, but a misconfigured server.

The first thing that pops out at me is your bad network interface configuration on the server.

You are doing more harm than good running the dual nics as they aren't actually doing what you think they're doing. You don't have a "one tx, one Rx" setup. If they're both on the same subnet, get rid of one of them. If you're using LACP LAGG, make sure your switch actually supports it. If you don't have many clients, LACP isn't actually doing you any good.

Start with testing your network with iperf and verify that your network isn't an issue.
 

vanoert

Cadet
Joined
Sep 10, 2016
Messages
4
Thanks for the input.

pirateghost: There was no plan for using link aggregation, just for failover. Ethernet is not configured in any way that is not default, one NIC doing RX and one doing TX is just my observation. I actually tried to pull one of the connections today, but the system wouldn't respond to pings on the other one after that, even after a reboot. Since the system is headless and I didn't want to bring a display to the server room to fix it and left both connections on. I will fix that tomorrow and report back.

brando: is 12GB really a problem for a small workload like this?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
When pushing traffic over CIFS, after reaching about 20GB in transfer (large files) at around 80-90MB/s, write speed drops really low, around 1-1.3MB/s.
12 GB of DDR3 ECC
Well the ARC is pretty much swamped at this stage I would think. Considering that FreeNAS' minimum requirement is 8GB of RAM and lets say that 4GB is "in-use" just for the OS alone, that leaves 8GB.

CIFS is a Single-Threaded Protocol, so there may also be sluggishness with the CPU. Have you been actively monitoring ARC and CPU during these transfers? If so, what is the "Hit Ratio"?

For the record a single RaidZ2 vDev operates at basically the speed of one disk. More vDevs in a Volume/Pool provides more IOPs.

Lastly on your CIFS Share are you using "atime" or have you done any settings to improve it?
 

pirateghost

Unintelligible Geek
Joined
Feb 29, 2012
Messages
4,219
Thanks for the input.

pirateghost: There was no plan for using link aggregation, just for failover. Ethernet is not configured in any way that is not default, one NIC doing RX and one doing TX is just my observation. I actually tried to pull one of the connections today, but the system wouldn't respond to pings on the other one after that, even after a reboot. Since the system is headless and I didn't want to bring a display to the server room to fix it and left both connections on. I will fix that tomorrow and report back.

brando: is 12GB really a problem for a small workload like this?
Having 2 NICs configured on the same subnet doesn't provide you with ANY benefits. You don't have any failover capabilities at all. You would need to use LAGG in failover mode to get failover capabilities.
 

brando56894

Wizard
Joined
Feb 15, 2014
Messages
1,537
I wouldn't say it's low, but I've always read that 16 GB is a good baseline. You want room for an ARC to grow but you also need RAM available for FreeNAS itself and for ZFS.

Sent from my AOSP on dragon using Tapatalk
 

vanoert

Cadet
Joined
Sep 10, 2016
Messages
4
Well the ARC is pretty much swamped at this stage I would think. Considering that FreeNAS' minimum requirement is 8GB of RAM and lets say that 4GB is "in-use" just for the OS alone, that leaves 8GB.
That's fair. As mentioned, I'm quite unexperienced with this and just trying to make this work. Unfortunately the board is limited to 2 slots with 8gb max., but if 4GB more will solve the problem I'll buy another 8GB stick.
CIFS is a Single-Threaded Protocol, so there may also be sluggishness with the CPU. Have you been actively monitoring ARC and CPU during these transfers? If so, what is the "Hit Ratio"?

For the record a single RaidZ2 vDev operates at basically the speed of one disk. More vDevs in a Volume/Pool provides more IOPs.
The CPU may very well be a bottleneck. Looking at the graphs it most probably is, since at full write speed 50% load is reached (dualcore), but I can work with this if it is consistent.
As for monitoring, of course I have, but I'm still a bit new to this. I'll attach some screenshots from tests I did today. You will see the issue occuring shortly after 14:10 and fixing itself without further input at around 14:20. I did some reboots for testing purposes afterwards. During the 1MB/s period, disk activity is also very low, which leads me to believe this might not be an ARC issue?
Lastly on your CIFS Share are you using "atime" or have you done any settings to improve it?
I have, mostly because my dear colleagues tend to do windows explorer search operations in CIFS shares containing 15k+ folders and the initial config provided very slow results when doing that. After some googling this is the config now:
Code:
veto files = /Thumbs.db/Temporary Items/.DS_Store/.AppleDB/.TemporaryItems/.AppleDouble/.bin/.AppleDesktop/Network Trash Folder/.Spotlight/.Trashes/.fseventd/
delete veto files = yes
hide dot files = yes
ea support = no
store dos attributes = no
map archive = no
map hidden = no
map readonly = no
map system = no

About atime - sorry, I don't know about that?
Having 2 NICs configured on the same subnet doesn't provide you with ANY benefits. You don't have any failover capabilities at all. You would need to use LAGG in failover mode to get failover capabilities.
Fair enough. I'll get a monitor over there tomorrow and fix the LAN setup. Thank you!
 

Attachments

  • 2016-09-10 21_06_17-nas - FreeNAS-9.10.1 (d989edd).png
    2016-09-10 21_06_17-nas - FreeNAS-9.10.1 (d989edd).png
    30.8 KB · Views: 289
  • 2016-09-10 21_06_49-nas - FreeNAS-9.10.1 (d989edd).png
    2016-09-10 21_06_49-nas - FreeNAS-9.10.1 (d989edd).png
    32.2 KB · Views: 307
  • 2016-09-10 21_07_16-nas - FreeNAS-9.10.1 (d989edd).png
    2016-09-10 21_07_16-nas - FreeNAS-9.10.1 (d989edd).png
    22.6 KB · Views: 322
  • 2016-09-10 21_07_49-nas - FreeNAS-9.10.1 (d989edd).png
    2016-09-10 21_07_49-nas - FreeNAS-9.10.1 (d989edd).png
    53.3 KB · Views: 295
Last edited:

vanoert

Cadet
Joined
Sep 10, 2016
Messages
4
So... I fixed the LAN connection today and after some tests, this seems to have resolved the issue. Writes are now sustained at around 95MB/s with no drops, reads saturate the GBE link.

Thank you for pointing me in the right direction!
 
Status
Not open for further replies.
Top