Storage Performance Degradation

Status
Not open for further replies.

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Some questions / observations

X10SAE:
* Can't I just disable the audio card in the bios?
* It has 6 SATA6Gb/s ports on a single controller.
* Full ATX
* Intel® C226 Express PCH
* LAN: Intel® i217LM

X10SLL-F, X10SLM-F, and X10SLM+-F:
* uATX instead of ATX, is there better heat management on a full ATX instead of the micro stuff?
* None of them have enough SATA6Gb/s ports. They are all 4 + 2, which means they are on different controllers, not sure how that will effect the 6 disk pool?
* Intel® C222 Express PCH
* LAN: Intel® i210AT
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
C226 has one controller and 6 SATA III ports.

C222 and C224 also have only one SATA controller, but it operates only some of the ports as SATA III, the other ones being operated at SATA II speed.

It is likely that when using only hard drives, and not SSDs, there would not be a speed difference.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Some questions / observations

X10SAE:
* Can't I just disable the audio card in the bios?
* It has 6 SATA6Gb/s ports on a single controller.
* Full ATX
* Intel® C226 Express PCH
* LAN: Intel® i217LM

X10SLL-F, X10SLM-F, and X10SLM+-F:
* uATX instead of ATX, is there better heat management on a full ATX instead of the micro stuff?
* None of them have enough SATA6Gb/s ports. They are all 4 + 2, which means they are on different controllers, not sure how that will effect the 6 disk pool?
* Intel® C222 Express PCH
* LAN: Intel® i210AT

You could just disable audio, but the whole board is designed for workstations, not servers.

You won't benefit from the board being full ATX instead of microATX - the only advantage would be expansion slots, and all the boards I recommended expose all the available PCI-e lanes.

As for the different LAN controllers, the i217 is actually just a PHY for the network controller included in Intel's chipsets (including the C22x/H87/Z87/etc controllers), while the i210 is a dedicated controller that handles everything. The i210 is probably better, but the i217 is good enough.

Forget about SATA 6Gb/s. Even a 10k RPM velociraptor can only barely saturate SATA 1,5Gb/s. All SATA ports on those boards are driven by the C22x and are known to work well.
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Going to go with X10SLM-F only because you get a $15.00 Newegg gift card and it only costs $5.00 more than the X10SLL-F. I don't see any need for 8 SAS2 ports... I don't think I'll ever need that except maybe 5 years down the road and at that point I'd rather buy all new hardware.
 

hammong

Dabbler
Joined
Mar 18, 2014
Messages
22
Ericlowe - I don't own any X10 series boards with socket 1150, but I do have two X9 boards with Socket 2011 processors and they use the older 82579LM and 82574L to provide the two on-board gig E connections. There's a solid 4% improvement in performance with the 82574L (the discrete controller+PHY) vs. the integrated chipset controller + 82579LM PHY. I wonder if there's a similar difference on the X10 boards. For me, I went from 880 Mbps using RSYNC between the two boxes using the primary LAN port, to 925 Mbps using the secondary LAN ports.

My point boils down to this - if you get a server/workstation board with dual gig E connections, be sure to benchmark both ports and see if one is measurably faster than the other. Could be the discreet chipset NIC is faster than the chipset controller + PHY.

FWIW my Intel PRO/1000 PT dual port adapters beat BOTH the 82579LM and 82574L integrated NICs and the 1000/PT is a 9-year old PCIe 1.0 design.

Greg
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
This problem is back and is far worse than before. I am unable to copy large files to the NAS anymore... The connection is fine at about 90 MB/s then slowly dies.

dies.jpg


How should I go about diagnosing the problem? CPU & Network usage is not an issue during the transfer. The ZFS Volume is healthy with 6.3TB free.
 
Last edited:

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
So I performed a scrub and I'm able to copy files again. However I can only write to the server at 75MB/sec but I can read at 110MB/sec.

After some testing it turns out that the server can transmit at 110MB/sec through my network but can only receive at 75MB/sec. I have verified transmitting and receiving to other pcs on the same network at both 110MB/sec up and down. Originally I was told that the NIC card wasn't recommended so I changed motherboards which you can see if you read this entire thread. I've tested both NIC cards on the motherboard with the same results.

The motherboard I'm currently using is: SUPERMICRO MBD-X10SLM-F-O uATX
Onboard Nics: 1xi210AT, 1xi217LM
UP: 110MB/sec DOWN: 75MB/sec

A separate network card that I bought and put into the server:
NIC: Intel EXPI9301CTBLK
UP: 110MB/sec DOWN: 95MB/sec

Is it impossible to get 110MB/sec DOWN on the server?

Thanks for your help!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I can do over 400MB/sec each way on my server. I'm on 10Gb Intel LAN. But even on my Intel LAN I can do 100MB/sec both ways.

So you have a bottleneck somewhere or something just isn't quite set up in the most ideal fashion.

Keep in mind the NIC on your desktop that you are using for testing can and sometimes is your bottleneck. So if it's Realtek you kind of know where to go... ;)
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
I've tested this PC to two other PCs on the same network. I get 110MB/s up and down. They are on the same switch as the server.

This PC is using a Intel 82579V NIC.

Server using Intel EXPI9301CTBLK:
Data from Server -> PC : 110MB/sec
Data from PC -> Server 95MB/sec

Server using on board Intel 1xi210AT:
Data from Server -> PC : 110MB/sec
Data from PC -> Server 75MB/sec

I find it unlikely that there's a bottleneck that affects only data being sent to the server. I can send from this PC to other PCs on my network at the full 110MB/sec. Also the PCI-E NIC seems to transfer 20MB/s faster than the on board ones.

I honestly believe there is an issue on the server. The bandwidth to the server fluctuates dramatically however receiving from the server is always fast an reliable.


SERVER -> PC:

fromserver.jpg



PC -> SERVER:

toserver.jpg


Thanks again for all your help!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
rob78,

Usually when you have that kind of fluctuations there's one of two possibilities:

1. You didn't control for factors before running the test (for example, didn't shutdown the sharing services before doing the test).
2. You have a hardware problem with your network or misconfigured setting. (If you have "green" switches, aren't using quality cat6, or are using realteks in your desktop those are common problems.)

Unfortunately I can't help much more than that. You're going to have to try removing different aspects of your network and try to rule out the problem. :(

Good luck.
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Yah, I'm stumped. I turned off every service except for SSH and turned off all jails. I shut down every PC on the network besides this one and the server. In my post I mentioned that I was using an Intel NIC in my PC and the Server. I also mentioned that I can transfer to other PCs through the same switch at 110MB/s up/down. I also switched out the wire to the server using one of the wires that was confirmed at 110MB/sec up/down.

I have eliminated every possibility that I can. Going to start looking for issues with Freebsd and these NIC cards. I'll post again if I find anything.
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Could it be promiscuous mode?

[root@nas] ~# ifconfig
em0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=42098<VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAGIC,VLAN_HWTSO>
ether 68:05:ca:24:f8:35
inet 192.168.0.3 netmask 0xffffff00 broadcast 192.168.0.255
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect (1000baseT <full-duplex>)
status: active
em1: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=4019b<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,VLAN_HWTSO>
ether 0c:c4:7a:03:f3:79
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: no carrier
igb0: flags=8c02<BROADCAST,OACTIVE,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=403bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,TSO6,VLAN_HWTSO>
ether 0c:c4:7a:03:f3:78
nd6 options=9<PERFORMNUD,IFDISABLED>
media: Ethernet autoselect
status: no carrier
ipfw0: flags=8801<UP,SIMPLEX,MULTICAST> metric 0 mtu 65536
nd6 options=9<PERFORMNUD,IFDISABLED>
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x7
inet 127.0.0.1 netmask 0xff000000
nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
bridge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
ether 02:fe:4a:c8:9c:00
nd6 options=1<PERFORMNUD>
id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
member: epair2a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 11 priority 128 path cost 2000
member: epair1a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 10 priority 128 path cost 2000
member: epair0a flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 9 priority 128 path cost 2000
member: em0 flags=143<LEARNING,DISCOVER,AUTOEDGE,AUTOPTP>
ifmaxaddr 0 port 1 priority 128 path cost 20000
epair0a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:21:1a:00:09:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
epair1a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:c4:73:00:0a:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
epair2a: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=8<VLAN_MTU>
ether 02:c0:73:00:0b:0a
nd6 options=1<PERFORMNUD>
media: Ethernet 10Gbase-T (10Gbase-T <full-duplex>)
status: active
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Since the last post on this thread, I have replaced two hard drives on my NAS. What seems to happen is the drive can't keep up with the other drives anymore. These drives are only months old. I'm sure it's possible that they are just defective, but I just replaced a drive two weeks ago and the same thing is happening with the replacement. I'm asking if there is maybe something else going on than a bad drive because I don't want to continue to throw money at this problem.

Here's what happened:
  1. When writing a file to the NAS it transfers at 95MB/sec through my LAN. Then it will drop to 0MB/sec slowly and eventually the transfer will timeout and fail. When I run gstat while the file is being transferred, 1 of the drives in the array is showing 100% busy while the other 5 have 0% busy. After replacing / resilvering the drive gstat shows normal %busy across all drives during a transfer and it succeeds perfectly fine with a straight 95MB/sec line in windows. Problem fixed, bad drive, right?
  2. A different drive does the same thing, so I replace / resilver thinking crap, another bad drive. System works fine again.
  3. Two weeks later the drive I just replaced is doing it again. Do I buy ANOTHER drive or could there be something else going on here?

Please help and I didn't mean to necro this thread but it contains all of the information about my NAS and the previous steps I've taken to solve this problem, and it is actually the original problem of the thread happening again and again.
 

hammong

Dabbler
Joined
Mar 18, 2014
Messages
22
It does seem strange that your DT01ACA300 keep failing, especially on the same port number.

Did you replace the SATA cable going to the device? That would be the easiest first diagnostic step, especially considering you have had failure on the same drive/port more than once.

Is there a spare SATA port that you can switch to? It would help to rule out a port/cabling issue that's masking itself as a bad drive.

Toshiba doesn't go down in my book as a reliable consumer hard drive, and multiplying the risk by having six or more of them in an array makes me even more nervous. I'd give strong consideration to switching it out with a Hitachi or WD if problems continue and you determine the disk itself is bad. A NAS-rated drive would at least have TLER. Upgrading to an Enterprise drive will give you 10x the resilience in unrecoverable read errors - that Toshiba has a BER of 1 in 10^14 bits - enterprise drives are 10^15 or rarely 1 in 10^16 bits.

Greg
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Actually, the first drive I replaced was ada5, the second drive was ada1. ada1 is bad again so technically it was two different ports. I already tried replacing the cable before I replaced ada1 the first time. The drive can still be read perfectly and I have no problem reading anything from the pool. Scrub succeeds with 0 errors. Smart short test doesn't report any issues on the drive either.

Should I replace the drive with a better brand even though all of the drives are the exact same brand/model? That way slowly over time I will replace all of the shitty toshiba's with better ones? Or should I buy 6 new drives and replace them all resilvering one at a time... ??
 

hammong

Dabbler
Joined
Mar 18, 2014
Messages
22
If it were me, I'd probably replace them as they have trouble. If top performance is your primary consideration, then replace them all with enterprise drives. For a 95 Mb/sec network you can limp by with the Toshibas until they flake out. =)
 

rob78

Dabbler
Joined
Jun 2, 2014
Messages
28
Okay, there was definitely something else wrong. I upgraded from 9.2.1.5 to 9.3 and POOF, problem is gone. I might have wasted money on replacing those first 2 drives. Guess I'll have backups ready in case these actually die. Any ideas on what could have been the issue with my old FreeNAS version?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Test all drives thoroughly with SMART, plus badblocks on the replaced drives.
 
Status
Not open for further replies.
Top