Low NFS write throughput, desperate for help

Status
Not open for further replies.

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
I mean.. how hard is it to read this message from the manual and just agree with it...

...Instead of mixing ZFS RAID with hardware RAID, it is recommended that you place your hardware RAID controller in JBOD mode and let ZFS handle the RAID...

That's copy/paste straight from the manual. How hard is this?

I thought that's effectively what I suggested? ( present each physical disk as a separate LUN to FreeNAS and do any RAID configurations on the FreeNAS server itself. )
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
...Because frankly, I'm convinced that either I'm being trolled, people are completely and utterly incompetent, or somehow that statement isn't as clear cut as it sounds like it is to me.

No you are not being trolled (at least not by me). I am simply trying to help troubleshoot an interesting (to me) issue with NFS write performance being less than expected.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No you are not being trolled (at least not by me). I am simply trying to help troubleshoot an interesting (to me) issue with NFS write performance being less than expected.

Yeah, but you're basically making an assumption that somehow the problem is in the storage system.

I've suggested more detailed subsystem testing no less than three times in this thread. EonStor is pretty crappy but it isn't as crappy as what's being reported. You'll never get great performance with it but it shouldn't be total suck. You could be here all day long suggesting trying this or that while not finding the actual problem. Testing complex systems is nasty that way.

You need to start doing subsystem testing and validating that the problem isn't something else. The iperf results are very suspicious. DRILL DOWN AND TEST/DEBUG THE FSCKING NETWORK FURTHER. Like, until you're getting consistent 900Mbps++.
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
Yeah, but you're basically making an assumption that somehow the problem is in the storage system.

Oops, I never meant to imply that. Last I heard we were waiting for

1) mosquitou to re-test NFS performance with an Ubuntu client instead of a CentOS client.
2) mosquitou to send us a copy of his dmesg output from his FreeNAS box.

While waiting for the above I spent some time reading up on what EonStor systems are and found out about their NAS offering. But that was just while waiting for the above. Sorry if I wasn't clear. Admittedly I was drawn into replying to cyberjock's rants about RAID for a few posts while waiting, but I never meant to imply that the SAN needs reconfiguring at this point in the testing cycle.
 

mosquitou

Dabbler
Joined
Mar 4, 2014
Messages
11
Sorry for my late post, I'm having quite a tight schedule on other tasks. Yes jgreco I keep in mind your comment since you first mentioned it (just that I have no access to office during weekend). More iperf test result is as follows:
(1) Test with iperf
From CentOS-VM to FreeNAS:
CentOS-VM client:
[root@localhost ORCHID]# iperf -c 10.77.24.16 -i 10 -t 60
------------------------------------------------------------
Client connecting to 10.77.24.16, TCP port 5001
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[ 3] local 10.77.24.17 port 54561 connected with 10.77.24.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.10 GBytes 942 Mbits/sec
[ 3] 10.0-20.0 sec 1.05 GBytes 899 Mbits/sec
[ 3] 20.0-30.0 sec 1011 MBytes 848 Mbits/sec
[ 3] 30.0-40.0 sec 1.05 GBytes 905 Mbits/sec
[ 3] 40.0-50.0 sec 1.04 GBytes 895 Mbits/sec
[ 3] 50.0-60.0 sec 1018 MBytes 854 Mbits/sec
[ 3] 0.0-60.0 sec 6.22 GBytes 890 Mbits/sec

FreeNAS server:
[root@freenas ~]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 7] local 10.77.24.16 port 5001 connected with 10.77.24.17 port 54561
[ ID] Interval Transfer Bandwidth
[ 7] 0.0-60.0 sec 6.22 GBytes 890 Mbits/sec

From FreeNAS to CentOS-VM:
FreeNAS client:
[root@freenas ~]# iperf -c 10.77.24.17 -i 10 -t 60
------------------------------------------------------------
Client connecting to 10.77.24.17, TCP port 5001
TCP window size: 32.5 KByte (default)
------------------------------------------------------------
[ 6] local 10.77.24.16 port 19453 connected with 10.77.24.17 port 5001
[ ID] Interval Transfer Bandwidth
[ 6] 0.0-10.0 sec 1015 MBytes 852 Mbits/sec
[ 6] 10.0-20.0 sec 1004 MBytes 842 Mbits/sec
[ 6] 20.0-30.0 sec 1007 MBytes 844 Mbits/sec
[ 6] 30.0-40.0 sec 994 MBytes 833 Mbits/sec
[ 6] 40.0-50.0 sec 1.00 GBytes 861 Mbits/sec
[ 6] 50.0-60.0 sec 1.06 GBytes 906 Mbits/sec
[ 6] 0.0-60.0 sec 5.98 GBytes 856 Mbits/sec

CentOS-VM server:
[root@localhost ORCHID]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[ 4] local 10.77.24.17 port 5001 connected with 10.77.24.16 port 19453
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-60.0 sec 5.98 GBytes 856 Mbits/sec

(2) Another test with iperf
Another test from CentOS-VM to FreeNAS server, piping dd_zero bigfile:
From CentOS-VM client:
[root@localhost ~]# iperf -c 10.77.24.16 -f m -M -m -n 1000000000 -F bigfile
------------------------------------------------------------
Client connecting to 10.77.24.16, TCP port 5001
TCP window size: 0.02 MByte (default)
------------------------------------------------------------
[ 4] local 10.77.24.17 port 54562 connected with 10.77.24.16 port 5001
[ ID] Interval Transfer Bandwidth
[ 4] 0.0- 8.6 sec 954 MBytes 934 Mbits/sec

From FreeNAS server:
[root@freenas ~]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 7] local 10.77.24.16 port 5001 connected with 10.77.24.17 port 54562
[ ID] Interval Transfer Bandwidth
[ 7] 0.0- 8.6 sec 954 MBytes 931 Mbits/sec

(3) netcat
From CentOS-VM client:
[root@localhost ~]# dd if=/dev/zero bs=128k count=10k | nc -v 10.77.24.16 2222
Connection to 10.77.24.16 2222 port [tcp/EtherNet/IP-1] succeeded!
10240+0 records in
10240+0 records out
1342177280 bytes (1.3 GB) copied, 13.6917 s, 98.0 MB/s

From FreeNAS server:
[root@freenas ~]# nc -v -l 2222 > /dev/null
Connection from 10.77.24.17 50821 received!

So yes I admit it's not constantly 900Mbps++, but I'm not that far anyway...
 

mosquitou

Dabbler
Joined
Mar 4, 2014
Messages
11
Oops, I never meant to imply that. Last I heard we were waiting for

1) mosquitou to re-test NFS performance with an Ubuntu client instead of a CentOS client.
2) mosquitou to send us a copy of his dmesg output from his FreeNAS box.

While waiting for the above I spent some time reading up on what EonStor systems are and found out about their NAS offering. But that was just while waiting for the above. Sorry if I wasn't clear. Admittedly I was drawn into replying to cyberjock's rants about RAID for a few posts while waiting, but I never meant to imply that the SAN needs reconfiguring at this point in the testing cycle.


Yes eraser, thanks for the reminder. The dmesg.today is attached.

I'm planning to roll out 2 more test:

(1) Use ubuntu client as eraser recommended.
(2) Get rid of RAID controller and use JBOD directly as cyberjock recommended
 

Attachments

  • dmesg.today.txt
    27.7 KB · Views: 408

mosquitou

Dabbler
Joined
Mar 4, 2014
Messages
11
More result, no time to prepare nicely, just debrief here:

(1) restore zfs set sync=standard [all zpools], write speed is still 20MB/s (from both VM and Bare metal client)

(2) get rid of RAID Controller, now it become HP DL380 G8 (FreeNAS) sitting on top of JBOD (EonStor J2000R), write speed is still 20MB/s. However, doing this from bare metal CentOS client:
Code:
 dd if=/dev/zero bs=2048k count=10k | nc -v 192.168.1.100 2222 
and correspondingly at server
Code:
 nc -v -l 2222 > /mnt/ZFS-Stripe/testfile 
it gives 100+MB/s write speed. Context: ZFS-Stripe on 1 disk, no atime, no dedup, no compression.
 

mosquitou

Dabbler
Joined
Mar 4, 2014
Messages
11
(3) instead of NFS, I tried with iSCSi over ZFS, CIFS over ZFS with Window client, NFS over UFS, all test result is no greater than 20MB/s
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
You may get better/higher iPerf numbers if you add "-w 128K" (or "-w 256K") to your iperf command line.

Can you include the output of running 'ifconfig' and 'netstat -i' on your FreeNAS system?

So something is up with your network stack somewhere. I assume that NFS/CIFS reads are still fast from your FreeNAS server and writes to it are slow. What is difference between the slower protocols (NFS,iSCSI,CIFS) and the fast ones (nc, iperf)? hmm....

Looks like driver support for the BCM5719 network card is fairly new in FreeBSD (looks like there were some bugs in the bge driver first fixed in FreeBSD 9.2). Are there any interesting messages in your syslog -- specifically anything to do with the "bge" driver? Do you have an Intel network card you could temporarily install in your FreeNAS server to see if it makes a difference? Alternatively you could play around with disabling any offloading features on your bge interfaces using the "ifconfig" command to see if that makes a difference.

Your original diagram also shows that you have set up a lagg group of two interfaces. Perhaps the LAG load balancing algorithm is buggy on either the FreeNAS (bge driver) side or the switch side . Can you temporarily get rid of the lagg group to rule that out as the problem? (although I see that your "bare metal CentOS" server is not connected over a lagg so that should rule that out.)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah all I'm really getting is this itchy "network is goofed somehow" feeling. eraser's suggestions seem sensible.
 

mosquitou

Dabbler
Joined
Mar 4, 2014
Messages
11
Hi guys, I managed to locate the problem. What I did is upgrade FreeNAS to latest version and remove Link Aggregation. I suspect the latter contributes to my problem, even if Link Aggregation itself works fine with iperf test and nc test, it seems to introduce strange behavior to FreeNAS...

Now with zfs sync=disabled I get 110MB/s write throughput and with zfs sync=standard I get 28MB/s.

As a matter of fact, eraser and jgreco did point out the network issue and even specifically link aggregation that I just realize it too late...

Thanks all for your effort, it was nice exchanging with you.
 

eraser

Contributor
Joined
Jan 4, 2013
Messages
147
That is great that you were able to figure things out! Thank you for sharing the solution.
 

aufalien

Patron
Joined
Jul 25, 2013
Messages
374
If don't mind me suggesting, but I'd look into whats wrong with your ZIL were sync=standard yields that much diff performance.

UPDATE; I Reread your post, you have no ZIL correct?

I'd add one and re enable sync, very important to data integrity. Apology if it was mentioned already.
 
Status
Not open for further replies.
Top