HeloJunkie
Patron
- Joined
- Oct 15, 2014
- Messages
- 300
System A:
Supermicro X10SRH-CLN4F Motherboard
1 x Intel Xeon E5-2640 V3 8 Core 2.66GHz
4 x 16GB PC4-17000 DDR4 2133Mhz Registered ECC
12 x 4TB HGST HDN724040AL 7200RPM NAS SATA Hard Drives
2 x 6 Drive RAIDZ2 VDEVs
LSI3008 SAS Controller - Flashed to IT Mode (Firmware Version 12.00.02.00)
LSI SAS3x28 SAS Expander
LSI9211-8i SAS Controller - Flashed to IT Mode (Firmware Version 20.00.02.00)
(connects to external JBOD enclosure)
Dual 920 Watt Platinum Power Supplies
16GB USB Thumb Drive for booting
Chelsio T580-SO-CR Dual 40Gbe NIC (Replication Connection to backup FreeNAS server)
Chelsio T520-SO-CR Dual 10Gbe NIC (Data connection to Plex server & media management server)
FreeNAS-11.0-U1 (aa82cc58d)
System B:
EMC Isilon SuperMicro X8DT6-A Motherboard
2 x Intel Xeon E5603 4 Core Processors
96GB (12 x 8GB) DDR3 PC10600 (1333) REG ECC Memory
2 x SanDisk 8GB SATADOM Boot Drives (Mirrored)
LSI SAS3081E-R w/Expander
36 x 3TB 7200RPM Hitachi HDS72303 Hard Drives
4 x 9 Drive RAIDZ2 VDEVs
Dual 1200 Watt Gold Power Supplies
Chelsio T580-SO-CR Dual 40Gbe NIC (Replication Connection to primary FreeNAS server)
Chelsio T520-SO-CR Dual 10Gbe NIC (Data connection to Plex server & media management server)
APC Smart-UPS RT 3000
FreeNAS-11.0-U1 (aa82cc58d)
I recently upgraded both of my FreeNAS servers from Corral to 11.0-U1. My network utilizes 10Gbe and 40Gbe connections using the Chelsio 5xx series cards. All cards are in x8 or x16 slots on my motherboards listed above. My goal was the rapid replication of backup data to my primary machine in the event that I had a failure of the primary (System A). We are talking about around ~40TB of live data. Both systems are in the same rack connected via twinax, no switches in the mix on the backend.
Since I did the upgrade to 11, I wanted to retest my network and make sure everything was still working as it should.
Upon deployment, I ran the following tests:
1) System B (Backup System) - Local system copy of data utilizing a DD to a new uncompressed dataset. This was both a read and write to verify that the configuration of the system would support the overall read and write speeds that I wanted.
I ran the exact same tests on System A (primary) also to a new, uncompressed dataset:
My next step was to test overall connectivity with iperf. Results were 16Gbits/second in both directions on the primary 40Gbe network card. Since this was well below the 40Gbe target, I engaged Chelsio to determine the reason for the lower performance on these cards. Chelsio support has been able to reproduce the problem in their lab and is working on a reason. For reference, my Chelsio 10Gbe cards are showing 9.9Gbits/sec between the same machines.
Becuase I know that iperf does not test the actual throughput of data between the systems, I created an NFS mount point on one of the systems and transferred test data (in my case about 500gb of movies) using the cp command across this connection from System A to System B. In actual usage I was seeing about 4Gbits/sec. I was seeing this across both my 40Gbe connection as well as my test 10Gbe connection on the same machine. So at this point, I am assuming that I have hit some sort of physical limitations on the actual copying of data from one system to the other based on the system's ability to a) read the data off the drives in system a, b) cp the data across the NFS link to system b and c) write that data to the drives on system b. This copy was also to the uncompressed test dataset on system b and back again. I also used dd from one system to the other with nearly the same results.
I would have expected to see close to the lowest read and write speeds of the slowest box (in this case plexnas) at around 6+Gbits/sec but that didn't happen and I realize that testing is not real world data transfer. Still, 4Gbits/sec was acceptable at this point.
I did swap out the Chelsio cards for 10Gbe Intel cards that I had been using for another project and ran tha same test as above and was astounded that the 4Gbits/sec speed dropped to about 1.2Gbits/sec with everything else being exactly the same. Needless to say I think the Chelsio cards outperform the Intel cards at this point. Since I was not going to use the Intel cards, I did not bother to try and delve into why the Intel cards were performing the way they were.
So now that all of my testing was complete (actually 4 or 5 times in both directions) I set up replication with no compression (since I am moving video files compression did not seem useful), no kbB/s limits and with no encryption on the connection. I received a big warning when I selected this but since I am on my local network, security was not a concern.
The replication started and my jaw about hit the floor: 800Mbits/sec. I deleted the replication process and recreated it thinking I had done something wrong. Same results, same speed. 800Mbits/sec. I deleted the replication and tried it with compression, no better (not that I expected it to be). CPU and system load on both boxes are low, my sending box more so than my receiving box. My drives are not doing anywhere near the work they were doing during the cp process.
On Corral, with the exact same hardware, exact same drives, exact same network configuration I was seeing 2.5Gbit/sec during replication between my two corral servers. (See Graphs and discussion here). I thought that was too slow given my network and hardware performance testing results, but now to go from 2.5Gbits/sec to 800Mbits/sec is another shock.
I am looking for any ideas, thoughts, suggestions, etc on why I am seeing such dismal performance on my replication.
Thanks for an insight!
Supermicro X10SRH-CLN4F Motherboard
1 x Intel Xeon E5-2640 V3 8 Core 2.66GHz
4 x 16GB PC4-17000 DDR4 2133Mhz Registered ECC
12 x 4TB HGST HDN724040AL 7200RPM NAS SATA Hard Drives
2 x 6 Drive RAIDZ2 VDEVs
LSI3008 SAS Controller - Flashed to IT Mode (Firmware Version 12.00.02.00)
LSI SAS3x28 SAS Expander
LSI9211-8i SAS Controller - Flashed to IT Mode (Firmware Version 20.00.02.00)
(connects to external JBOD enclosure)
Dual 920 Watt Platinum Power Supplies
16GB USB Thumb Drive for booting
Chelsio T580-SO-CR Dual 40Gbe NIC (Replication Connection to backup FreeNAS server)
Chelsio T520-SO-CR Dual 10Gbe NIC (Data connection to Plex server & media management server)
FreeNAS-11.0-U1 (aa82cc58d)
System B:
EMC Isilon SuperMicro X8DT6-A Motherboard
2 x Intel Xeon E5603 4 Core Processors
96GB (12 x 8GB) DDR3 PC10600 (1333) REG ECC Memory
2 x SanDisk 8GB SATADOM Boot Drives (Mirrored)
LSI SAS3081E-R w/Expander
36 x 3TB 7200RPM Hitachi HDS72303 Hard Drives
4 x 9 Drive RAIDZ2 VDEVs
Dual 1200 Watt Gold Power Supplies
Chelsio T580-SO-CR Dual 40Gbe NIC (Replication Connection to primary FreeNAS server)
Chelsio T520-SO-CR Dual 10Gbe NIC (Data connection to Plex server & media management server)
APC Smart-UPS RT 3000
FreeNAS-11.0-U1 (aa82cc58d)
I recently upgraded both of my FreeNAS servers from Corral to 11.0-U1. My network utilizes 10Gbe and 40Gbe connections using the Chelsio 5xx series cards. All cards are in x8 or x16 slots on my motherboards listed above. My goal was the rapid replication of backup data to my primary machine in the event that I had a failure of the primary (System A). We are talking about around ~40TB of live data. Both systems are in the same rack connected via twinax, no switches in the mix on the backend.
Since I did the upgrade to 11, I wanted to retest my network and make sure everything was still working as it should.
Upon deployment, I ran the following tests:
1) System B (Backup System) - Local system copy of data utilizing a DD to a new uncompressed dataset. This was both a read and write to verify that the configuration of the system would support the overall read and write speeds that I wanted.
Code:
root@plexnasii:/mnt/vol1/test # zfs get all vol1/test NAME PROPERTY VALUE SOURCE vol1/test type filesystem - vol1/test creation Sun Jul 16 11:44 2017 - vol1/test used 637G - vol1/test available 71.6T - vol1/test referenced 637G - vol1/test compressratio 1.00x - vol1/test mounted yes - vol1/test quota none default vol1/test reservation none default vol1/test recordsize 128K default vol1/test mountpoint /mnt/vol1/test default vol1/test sharenfs off default vol1/test checksum on default vol1/test compression off local vol1/test atime on default vol1/test devices on default vol1/test exec on default vol1/test setuid on default vol1/test readonly off default vol1/test jailed off default vol1/test snapdir hidden default vol1/test aclmode passthrough inherited from vol1 vol1/test aclinherit passthrough inherited from vol1 vol1/test canmount on default vol1/test xattr off temporary vol1/test copies 1 default vol1/test version 5 - vol1/test utf8only off - vol1/test normalization none - vol1/test casesensitivity sensitive - vol1/test vscan off default vol1/test nbmand off default vol1/test sharesmb off default vol1/test refquota none default vol1/test refreservation none default vol1/test primarycache all default vol1/test secondarycache all default vol1/test usedbysnapshots 0 - vol1/test usedbydataset 637G - vol1/test usedbychildren 0 - vol1/test usedbyrefreservation 0 - vol1/test logbias latency default vol1/test dedup off default vol1/test mlslabel - vol1/test sync standard default vol1/test refcompressratio 1.00x - vol1/test written 637G - vol1/test logicalused 637G - vol1/test logicalreferenced 637G - vol1/test volmode default default vol1/test filesystem_limit none default vol1/test snapshot_limit none default vol1/test filesystem_count none default vol1/test snapshot_count none default vol1/test redundant_metadata all default vol1/test org.freenas:description local vol1/test org.freenas:permissions_type PERM inherited from vol1
Code:
[PLEXNAS-II LOCAL WRITE TEST] root@plexnasii:/mnt/vol1/test # dd if=/dev/zero of=testfile bs=10M count=50000 50000+0 records in 50000+0 records out 524288000000 bytes transferred in 558.926291 secs (938027086 bytes/sec) 7.5Gbits/sec [PLEXNAS-II LOCAL READ TEST] root@plexnasii:/mnt/vol1/test # dd if=testfile of=/dev/null bs=10M count=50000 50000+0 records in 50000+0 records out 524288000000 bytes transferred in 506.344610 secs (1035437111 bytes/sec) 8.28Gbits/sec
I ran the exact same tests on System A (primary) also to a new, uncompressed dataset:
Code:
[PLEXNAS LOCAL WRITE TEST] root@plexnas:/mnt/vol1/test # dd if=/dev/zero of=testfile bs=10M count=50000 50000+0 records in 50000+0 records out 524288000000 bytes transferred in 621.702142 secs (843310589 bytes/sec) 6.746Gbits/sec [PLEXNAS LOCAL READ TEST] root@plexnas:/mnt/vol1/test # dd if=testfile of=/dev/null bs=10M count=50000 50000+0 records in 50000+0 records out 524288000000 bytes transferred in 631.273400 secs (830524460 bytes/sec) 6.644Gbits/sec
My next step was to test overall connectivity with iperf. Results were 16Gbits/second in both directions on the primary 40Gbe network card. Since this was well below the 40Gbe target, I engaged Chelsio to determine the reason for the lower performance on these cards. Chelsio support has been able to reproduce the problem in their lab and is working on a reason. For reference, my Chelsio 10Gbe cards are showing 9.9Gbits/sec between the same machines.
Becuase I know that iperf does not test the actual throughput of data between the systems, I created an NFS mount point on one of the systems and transferred test data (in my case about 500gb of movies) using the cp command across this connection from System A to System B. In actual usage I was seeing about 4Gbits/sec. I was seeing this across both my 40Gbe connection as well as my test 10Gbe connection on the same machine. So at this point, I am assuming that I have hit some sort of physical limitations on the actual copying of data from one system to the other based on the system's ability to a) read the data off the drives in system a, b) cp the data across the NFS link to system b and c) write that data to the drives on system b. This copy was also to the uncompressed test dataset on system b and back again. I also used dd from one system to the other with nearly the same results.
I would have expected to see close to the lowest read and write speeds of the slowest box (in this case plexnas) at around 6+Gbits/sec but that didn't happen and I realize that testing is not real world data transfer. Still, 4Gbits/sec was acceptable at this point.
I did swap out the Chelsio cards for 10Gbe Intel cards that I had been using for another project and ran tha same test as above and was astounded that the 4Gbits/sec speed dropped to about 1.2Gbits/sec with everything else being exactly the same. Needless to say I think the Chelsio cards outperform the Intel cards at this point. Since I was not going to use the Intel cards, I did not bother to try and delve into why the Intel cards were performing the way they were.
So now that all of my testing was complete (actually 4 or 5 times in both directions) I set up replication with no compression (since I am moving video files compression did not seem useful), no kbB/s limits and with no encryption on the connection. I received a big warning when I selected this but since I am on my local network, security was not a concern.
The replication started and my jaw about hit the floor: 800Mbits/sec. I deleted the replication process and recreated it thinking I had done something wrong. Same results, same speed. 800Mbits/sec. I deleted the replication and tried it with compression, no better (not that I expected it to be). CPU and system load on both boxes are low, my sending box more so than my receiving box. My drives are not doing anywhere near the work they were doing during the cp process.
On Corral, with the exact same hardware, exact same drives, exact same network configuration I was seeing 2.5Gbit/sec during replication between my two corral servers. (See Graphs and discussion here). I thought that was too slow given my network and hardware performance testing results, but now to go from 2.5Gbits/sec to 800Mbits/sec is another shock.
I am looking for any ideas, thoughts, suggestions, etc on why I am seeing such dismal performance on my replication.
Thanks for an insight!