FreeNAS as a storage pool for ESXI 5.5 ( networking crashing )

Status
Not open for further replies.

ghostdogg

Cadet
Joined
Apr 4, 2014
Messages
8
I am using freenas 9.3 on a storage server. I also have another server with no storage running vmware esxi 5.5. It connects to the freenas box through nfs for it's storage pool of VMs. When I start a VM, within a few minutes I lose all network access to the freenas server. The server is still running based on logs. I bring the server back online with a hardware reset. Here is the last thing in /var/log/messages before networking goes down.

Code:
Jan 28 19:45:52 serenitynas mountd[2028]: mount request succeeded from 10.0.0.101 for /mnt/internalStorage/vm/aurora


EDIT: Here are the server logs from the esxi server.
Code:
2015-01-28T21:25:28.456Z [3A3E2B70 verbose 'Hostsvc.FSVolumeProvider'] RefreshOneNasVolume called on 10.0.0.102:/mnt/internalStorage/vm/aurora
2015-01-28T21:25:28.457Z [3A3E2B70 verbose 'Hostsvc.FSVolumeProvider'] RefreshOneNasVolume: calling ProcessVmfs on 10.0.0.102:/mnt/internalStorage/vm/aurora
2015-01-28T21:25:28.458Z [3A3E2B70 verbose 'Hostsvc.Datastore'] NotifyIfAccessibleChanged -- notify that datastore 10.0.0.102:/mnt/internalStorage/vm/aurora at path /vmfs/volumes/f8ecb19a-8339786c now has accessibility of false due to AllPathsDown_Start
2015-01-28T21:25:28.458Z [FFA1FB70 verbose 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu 14.04 Log Server/Ubuntu 14.04 Log Server.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x3b30fc58]UPDATE-NOW-DISCONNECTED, 10.0.0.102:/mnt/internalStorage/vm/aurora, /vmfs/volumes/f8ecb19a-8339786c;
2015-01-28T21:25:28.459Z [FFA1FB70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu 14.04 Log Server/Ubuntu 14.04 Log Server.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
2015-01-28T21:25:28.459Z [FFA1FB70 verbose 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu Server/Ubuntu Server.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x3b30fc58]UPDATE-NOW-DISCONNECTED, 10.0.0.102:/mnt/internalStorage/vm/aurora, /vmfs/volumes/f8ecb19a-8339786c;
2015-01-28T21:25:28.459Z [FFA1FB70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu Server/Ubuntu Server.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
2015-01-28T21:25:28.459Z [FFA1FB70 verbose 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/HTPC - Ubuntu Server 14.10/HTPC - Ubuntu Server 14.10.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x3b30fc58]UPDATE-NOW-DISCONNECTED, 10.0.0.102:/mnt/internalStorage/vm/aurora, /vmfs/volumes/f8ecb19a-8339786c;
2015-01-28T21:25:28.459Z [FFA1FB70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/HTPC - Ubuntu Server 14.10/HTPC - Ubuntu Server 14.10.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
-->          value = "10.0.0.102",
2015-01-28T21:27:15.560Z [FF9BC920 info 'Vimsvc.ha-eventmgr'] Event 119 : Lost connection to server 10.0.0.102 mount point /mnt/internalStorage/vm/aurora mounted as f8ecb19a-8339786c-0000-000000000000 (serenitynas aurora).
2015-01-28T21:27:21.665Z [FFAA1B70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu 14.04 Log Server/Ubuntu 14.04 Log Server.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
2015-01-28T21:27:48.457Z [3B680B70 verbose 'Hostsvc.FSVolumeProvider'] RefreshOneNasVolume called on 10.0.0.102:/mnt/internalStorage/vm/aurora
2015-01-28T21:27:48.458Z [3B680B70 verbose 'Hostsvc.FSVolumeProvider'] RefreshOneNasVolume: calling ProcessVmfs on 10.0.0.102:/mnt/internalStorage/vm/aurora
2015-01-28T21:27:48.459Z [3B680B70 verbose 'Hostsvc.Datastore'] NotifyIfAccessibleChanged -- notify that datastore 10.0.0.102:/mnt/internalStorage/vm/aurora at path /vmfs/volumes/f8ecb19a-8339786c now has accessibility of false due to AllPathsDown_Timeout
2015-01-28T21:27:48.459Z [3A3E2B70 verbose 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu 14.04 Log Server/Ubuntu 14.04 Log Server.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x3ad2a6c8]UPDATE-NOW-DISCONNECTED, 10.0.0.102:/mnt/internalStorage/vm/aurora, /vmfs/volumes/f8ecb19a-8339786c;
2015-01-28T21:27:48.459Z [3A3E2B70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu 14.04 Log Server/Ubuntu 14.04 Log Server.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
2015-01-28T21:27:48.459Z [3A3E2B70 verbose 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu Server/Ubuntu Server.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x3ad2a6c8]UPDATE-NOW-DISCONNECTED, 10.0.0.102:/mnt/internalStorage/vm/aurora, /vmfs/volumes/f8ecb19a-8339786c;
2015-01-28T21:27:48.459Z [3A3E2B70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/Ubuntu Server/Ubuntu Server.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
2015-01-28T21:27:48.459Z [3A3E2B70 verbose 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/HTPC - Ubuntu Server 14.10/HTPC - Ubuntu Server 14.10.vmx'] Got DSSYS change: [N11HostdCommon18DatastoreSystemMsgE:0x3ad2a6c8]UPDATE-NOW-DISCONNECTED, 10.0.0.102:/mnt/internalStorage/vm/aurora, /vmfs/volumes/f8ecb19a-8339786c;
2015-01-28T21:27:48.459Z [3A3E2B70 warning 'Vmsvc.vm:/vmfs/volumes/f8ecb19a-8339786c/HTPC - Ubuntu Server 14.10/HTPC - Ubuntu Server 14.10.vmx'] UpdateStorageAccessibilityStatusInt: The datastore 10.0.0.102:/mnt/internalStorage/vm/aurora is not accessible
 
Last edited:

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
without hw specs and more information about your setup,(pool config, network part..) nobody can help you‎
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I thought the great zambanini was a mind reader! ;-)


Sent from my phone
 

zambanini

Patron
Joined
Sep 11, 2013
Messages
479
I use the magic crystall ball only for paying customers.
the forum users will be treated as laboratory ‎rats, so I have less work , p‎
 

ghostdogg

Cadet
Joined
Apr 4, 2014
Messages
8
FreeNAS Server:
AMD A4-5300 Trinity Dual-Core 3.4GHz
5 x SAMSUNG Spinpoint F3 ST1000DM005/HD103SJ 1TB
ASRock FM2A75M-DGS FM2 AMD A75
CORSAIR Vengeance 8GB (2 x 4GB) 240-Pin DDR3 SDRAM

It is connected to the ESXI server through a gigabit switch. I am currently using iSCSI and FreeNAS seems more stable, but still becomes unresponsive over the network if I start three VMs. Looking at the reporting section, it looks like before the connectivity issues happen, there is a very large bandwidth spike.

See graphs below.
serenitynas1.png
serenitynas2.png
serenitynas3.png
serenitynas4.png
 

ghostdogg

Cadet
Joined
Apr 4, 2014
Messages
8
Should this be posted elsewhere or is there more information needed to be able to troubleshoot this problem? This is a pretty serious problem right now and I don't like having to perform hardware resets on my freenas box daily.
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
While you're waiting for someone knowledgeable, is the re0 device a Realtek one? Someone is going to suggest you get an Intel NIC and see if it works better, so it might as well be me.
 

ghostdogg

Cadet
Joined
Apr 4, 2014
Messages
8
Yes, it is a Realtek 8111E. It was rock solid when it was running ESXI, but now that it is only running freenas it's unstable.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
This section is fine - a number of users, like myself, read most of the messages that are posted in English.

That being said, your "server" wasn't built/configured for optimal use as a iSCSI target.

You've got a desktop mobo (I didn't know it included a dehumidifier), no ECC RAM, insufficient memory, and probably didn't configure your pool for best performance.. And a Realtek NIC.

You never told us how you configured your pool - I'll take a guess and say it't probably RAIDz1. For best performance with iSCSI striped mirrors are preferred.


Sent from my phone
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok.. so the answer seems obvious.. why are we having this discussion? Get an Intel NIC. :p

Even if it isn't your NIC, nobody here is going to buy it until you buy an Intel anyway because we *know* Realteks are total crap and horribly unreliable. The only way you can even rule out the NIC is to get an Intel NIC.

gpsguy just summed it up nicely while I was writing my post. You need to stop and go back to the drawing board on this one. What you want to do, the hardware you are trying to use, and what your ultimate outcome are going to be do not jive. Your expectations are FAR too high and you've made far too many fatal flaws with your design to get the outcome you want.
 

ghostdogg

Cadet
Joined
Apr 4, 2014
Messages
8
The box it's self isn't crashing. It's the NIC that is going down.

Yes, the pool is RAIDz1. I am not looking for optimal performance because the virtualized disks are not actively used. The VMs are being used as media servers and have shares on the freenas box mapped to them. They then serve the media, sometimes transcoding, etc...

From the wiki.
  1. FreeNAS Recommended Minimum Hardware

FreeNAS Recommended Minimum Hardware
  • 64-bit x86 Processor
  • 8GB RAM
  • FreeNAS should be installed on a USB flash drive. The recommended minimum is 8GB. 16GB provides more room for boot environments if you plan to test many configurations or follow a fast-moving update line.
  • At least one additional disk for storage.
  • Wired network port (wireless not supported)
  • Computer on the same network with a modern web browser (for management)

I meet all these requirements on my setup. I have 8GB of RAM, a 64-bit processor, installed on a flash drive, 5 1TB disks in RAIDz1, wired gigabit Ethernet port, and am accessing from the same wired network.
 

ghostdogg

Cadet
Joined
Apr 4, 2014
Messages
8
Ok.. so the answer seems obvious.. why are we having this discussion? Get an Intel NIC. :p

Even if it isn't your NIC, nobody here is going to buy it until you buy an Intel anyway because we *know* Realteks are total crap and horribly unreliable. The only way you can even rule out the NIC is to get an Intel NIC.

gpsguy just summed it up nicely while I was writing my post. You need to stop and go back to the drawing board on this one. What you want to do, the hardware you are trying to use, and what your ultimate outcome are going to be do not jive. Your expectations are FAR too high and you've made far too many fatal flaws with your design to get the outcome you want.

OK, I have an old intel controller laying around, though it's only 100BASE-T. I will give that a shot.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
At just 12MB/sec throughput you're probably going to have timeouts and other problems, so not really gaining much.

Besides that, your RAM is going to cause timeouts and other problems. Oh, and the fact you are running a single vdev that is RAIDZ1 is going to potentially cause timeouts and other problems...

See why I said you need to go back to the drawing board? The whole thing is not suited for the workload you are intending to run. So whatever problem you have, even if it isn't actually because of your hardware, is going to be chauked up to the hardware until you can prove otherwise. Why are we so quick to make that assumption? Because its been true 99 times out of 100, so unless you're willing to at least use reasonable hardware for the workload nobody is willing to volunteer their time to help troubleshot a problem that is more than likely going to be "get more powerful hardware" at the end.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
This time around, cyberjock beat me to a response. Ditto what he said.

Another question/suggestion revolves around your storage networking. With just one NIC, you're using it for both LAN and iSCSI traffic. Normally, with ESXi, one would break out the traffic across different subnets/vlans. For example, separate NICs/subnets for LAN, iSCSI and possibly a 3rd for vMotion traffic (if applicable).

So, we know you have a Realcrap NIC on your FreeNAS server, what are the hardware details on the ESXi server? Are you using a Realcrap NIC there too? If you don't have an Intel in it, a cheap test would be to buy 2 Intel Pro/1000 CT's and put one in each machine. Put these NIC's on a different subnet, give them static addresses, and run a cable from one machine to another (directly). Change your iSCSI configuration to use these NIC's.

This doesn't address your other configuration issues, but if it's just the network interface failing for the iSCSI traffic, you should still be able to access the FreeNAS webGUI using the management address on your LAN.

8GB RAM ... are the bare minimums for running FreeNAS as a simple fileserver with a few TB of storage. Once you start adding multiple 2TB+ hard disks, running plugins/jails, using iSCSI, and other features, the requirements start to rise.

I'll close by quoting cyberjock "Your expectations are FAR too high and you've made far too many fatal flaws with your design to get the outcome you want."
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So, we know you have a Realcrap NIC

I have a trademark on that: I'm gonna beat the Realcrap outta you.

No, but seriously, there's a bunch of comments in the driver that describe how frustrating it is to write a driver for low-quality hardware that isn't well-documented.

And the graphs clearly show the Realtek is crapping out.

As for VM storage, the absolute minimum configuration should be 16GB, and I'll note that even 32 is tight. I picked 64GB for the VM storage filer I'm working with and that's suitable for moderate traffic. If you aren't liberally throwing resources at VM storage on ZFS, it's going to frelling suck.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Ar-Tee-Ell-Eight-One-One-One-E. That thing has been plaguing motherboards since the time they started including GbE controllers.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Ar-Tee-Ell-Eight-One-One-One-E. That thing has been plaguing motherboards since the time they started including GbE controllers.

Uhhhhhhhhhhhh. "since the time they started including GbE controllers"?

Code:
 * The RealTek 8139 PCI NIC redefines the meaning of 'low end.' This is
 * probably the worst PCI ethernet controller ever made, with the possible
 * exception of the FEAST chip made by SMC. The 8139 supports bus-master
 * DMA, but it has a terrible interface that nullifies any performance
 * gains that bus-master DMA usually offers.
 *
 * For transmission, the chip offers a series of four TX descriptor
 * registers. Each transmit frame must be in a contiguous buffer, aligned
 * on a longword (32-bit) boundary. This means we almost always have to
 * do mbuf copies in order to transmit a frame, except in the unlikely
 * case where a) the packet fits into a single mbuf, and b) the packet
 * is 32-bit aligned within the mbuf's data area. The presence of only
 * four descriptor registers means that we can never have more than four
 * packets queued for transmission at any one time.
 *
 * Reception is not much better. The driver has to allocate a single large
 * buffer area (up to 64K in size) into which the chip will DMA received
 * frames. Because we don't know where within this region received packets
 * will begin or end, we have no choice but to copy data from the buffer
 * area into mbufs in order to pass the packets up to the higher protocol
 * levels.
 *
 * It's impossible given this rotten design to really achieve decent
 * performance at 100Mbps, unless you happen to have a 400Mhz PII or
 * some equally overmuscled CPU to drive it.
 *
 * On the bright side, the 8139 does have a built-in PHY, although
 * rather than using an MDIO serial interface like most other NICs, the
 * PHY registers are directly accessible through the 8139's register
 * space. The 8139 supports autonegotiation, as well as a 64-bit multicast
 * filter.


That's from a copy of if_rl.c from 1999, at which point it was for 10/100 interfaces and was already pretty old. I am too lazy to determine when the commentary was added to the file but am happy to take it as evidence that Realtek sucked long before the RTL8111E.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Uhhhhhhhhhhhh. "since the time they started including GbE controllers"?

Code:
* The RealTek 8139 PCI NIC redefines the meaning of 'low end.' This is
* probably the worst PCI ethernet controller ever made, with the possible
* exception of the FEAST chip made by SMC. The 8139 supports bus-master
* DMA, but it has a terrible interface that nullifies any performance
* gains that bus-master DMA usually offers.
*
* For transmission, the chip offers a series of four TX descriptor
* registers. Each transmit frame must be in a contiguous buffer, aligned
* on a longword (32-bit) boundary. This means we almost always have to
* do mbuf copies in order to transmit a frame, except in the unlikely
* case where a) the packet fits into a single mbuf, and b) the packet
* is 32-bit aligned within the mbuf's data area. The presence of only
* four descriptor registers means that we can never have more than four
* packets queued for transmission at any one time.
*
* Reception is not much better. The driver has to allocate a single large
* buffer area (up to 64K in size) into which the chip will DMA received
* frames. Because we don't know where within this region received packets
* will begin or end, we have no choice but to copy data from the buffer
* area into mbufs in order to pass the packets up to the higher protocol
* levels.
*
* It's impossible given this rotten design to really achieve decent
* performance at 100Mbps, unless you happen to have a 400Mhz PII or
* some equally overmuscled CPU to drive it.
*
* On the bright side, the 8139 does have a built-in PHY, although
* rather than using an MDIO serial interface like most other NICs, the
* PHY registers are directly accessible through the 8139's register
* space. The 8139 supports autonegotiation, as well as a 64-bit multicast
* filter.


That's from a copy of if_rl.c from 1999, at which point it was for 10/100 interfaces and was already pretty old. I am too lazy to determine when the commentary was added to the file but am happy to take it as evidence that Realtek sucked long before the RTL8111E.

I did say RTL8111E, not 8139. Bad hardware will always exist, but the RTL8111E's longevity is rather annoying.

But yeah, that driver's source is legendary.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I did say RTL8111E, not 8139. Bad hardware will always exist, but the RTL8111E's longevity is rather annoying.

Meaningless and pointless overqualification..... easier to just say realtek sucks.
 
Status
Not open for further replies.
Top