Intermittent Network Drops with vSphere 4

Status
Not open for further replies.
Joined
May 31, 2016
Messages
5
I have a single vSphere 4 host (HP server with 12 cores and 90GiB RAM) connected to FreeNAS 9.10 (NFS) on a Dell server with 12 cores and 64GiB RAM. There are six disks - 2 TiB each - plus the thumb drive that holds the FN system. I run 4 10/100 NICs - physical and virtual - and the virtual machine network and the management network are on the same IP block (that could be an issue but I don't know how to get around it). The 4 NICs should be able to carry the 100MiB traffic, though. (FYI, at the moment there are 13 VMs on the FN, all but 2 Win 7 machines... the other two are: 1 Linux for vulnerability testing and 1 Server 2008 for the vCenter).

The problem is that the FN will lose communication with the VMWare randomly and without any consistent (apparently) reason. I have to restart the VMWare management network and then all is well for anywhere from a couple of minutes to a couple of hours.

This is such a small system that, although I have a vCenter server, I am not doing anything exotic enough so that it really matters. I have looked for IP conflicts - all of my IPs are static - and have found none. I have combed the Internet - including here - without luck. So, apparently I am the only one in the world with this problem, but it is a real irritant, especially when building VMs.

Thoughts, anyone?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Which versions of FreeNAS and VMware are you using? And what VMware NIC driver are you using?
 
Joined
May 31, 2016
Messages
5
As in the original post, FreeNAS 9.10 (NFS) and vSphere 4 Enterprise. The NIC driver is whatever VMWare supplies.... do you have a better suggestion? Driver issues make sense since they often are intermittent.....
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
As in the original post, FreeNAS 9.10 (NFS) and vSphere 4 Enterprise. The NIC driver is whatever VMWare supplies.... do you have a better suggestion? Driver issues make sense since they often are intermittent.....
Yes, sir - you mentioned FreeNAS 9.10, but there are at least two versions of that release in the STABLE train, and there are the nightly trains as well, though I doubt you're using those. Sometimes the specific version matters, as it can help narrow down bug fixes and such things.

vSphere 4 -- isn't that an old release, associated with ESXi v4? If so, your problem may simply be some basic incompatibility between the (newer) FreeNAS and (older) VMware software, though that wouldn't be my first guess. As you mentioned, drivers may be the root cause.

If you'll post your system's specifics as detailed in the forum rules -- FreeNAS release version, motherboard model, network cards (if any), etc., some of the old hands will be more inclined to help and will have a better chance at troubleshooting your problem.
 
Joined
May 31, 2016
Messages
5
You'll have to excuse me.... I was unaware of the level of detail necessary... I (wrongly) assumed that someone here would have had this problem and could ask for some specifics rather than have me shotgun by describing the entire system in intimate detail. However, stupid me.... here you are:

Upgraded to VSphere 5.1 Enterprise (yes, that is ESXi 5.1) - no change in the symptoms

FN Server: Dell Power Edge R710 12.24 GiB RAM, 12 cores and with 4 of whatever NICs are on the system board (and, no, I have no idea what the system board is other than what Dell puts in that server) - running 12 TiB of storage configured RAID 0 - thumb drive with the O/S. All Appears to be working correctly but I really don't know what I'm looking for beyond the obvious to say that it isn't.

VMWare Server: HP Proliant DL3600G7 with 144 GiB RAM and 12 cores, 4 NICs Broadcom NetExtreme IIBCM5709 1000 Base-T - All 4 NICs set to full and all four teaming, same lack of knowledge about the system board beyond what HP puts in the server. 6 TiB on-server (local) storage.

FN version: FreeNAS-9.10-STABLE-201605240427 (64fcd8e)

Frankly, I cannot think of anything else... if this is not enough just ask specifics and I'll do my best to answer... if not, I'll just have to continue stumbling ahead as best I can....

Personally, I am quite the novice with FN beyond the fact that we tried it in our Center for Advanced Computing at Norwich University (VT) where I was director... it failed miserably because the volume of traffic and number of VMs was quite high (over 550 computer/forensic lab VMs - with around 300 live with high I/O at any time) and the number of VMWare hosts (11) seemed beyond it... I had a crackerjack team of admins and they could not make it work efficiently and eventually replaced it with MS Server 2012 configured as a NAS with 100 TiB of storage using a house-built system with a SuperMicro system board... because we designed it from the ground up I knew what was on it... my own lab, though, is all commercial gear (except my malware lab which is house-built, running vSphere 5.1 flawlessly for two years). However, my lab system is minuscule by comparison and FN seemed like a very good choice. The hardware certainly is robust enough.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
FN Server: Dell Power Edge R710 12.24 GiB RAM, 12 cores and with 4 of whatever NICs are on the system board (and, no, I have no idea what the system board is other than what Dell puts in that server) - running 12 TiB of storage configured RAID 0 - thumb drive with the O/S. All Appears to be working correctly but I really don't know what I'm looking for beyond the obvious to say that it isn't.
Which RAID/hard drive controller do you have on the Dell? And how many hard drives of which model and capacity?

The Dell R710 model can come with several different controllers. The ideal device would be the PERC H200, as this it is basically an LSI 9211 HBA (Host Bus Adapter) that can be flashed to the correct IT mode and used as a simple HBA with FreeNAS.

When you use non-ZFS terminology ('RAID 0') it makes me wonder -- do you have your Dell R710's disks configured as a RAID 0 array? With this RAID array passed to FreeNAS as a 'single' disk? If so, that's not a supported setup; FreeNAS/ZFS needs direct control of the disks and using a RAID array is a bad idea.

For use as a VMware datastore, it's generally best to use a pool comprised of mirrored vdevs. This gives the most IOPS and eliminates parity overhead. Not knowing how many disks you have, I can't give an example for your particular case. But, suppose you have 12 x 1 TiB drives: you could create a pool of 6 vdevs, each vdev comprising a pair of 1 TiB drives. This would give you 6 TiB of storage (less overhead).

Can you show us a a screen shot of your Volume Status? From the FreeNAS GUI interface, navigate to Storage, select your pool, then click the rightmost of the three buttons at the bottom of the screen ('Volume Status'). Alternatively, you can start a shell and run 'zpool status', displaying the results here (preferably with CODE tags). Either way, this will tell us at a glance how you have your disks configured.

For example, below is the zpool status of my main system, which is configured as a RAIDZ2 pool of 7 x 2TiB disks:
Code:
[root@boomer] /mnt/tank/atm# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Tue May 31 03:45:15 2016
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  da0p2  ONLINE  0  0  0
  da1p2  ONLINE  0  0  0

errors: No known data errors

  pool: tank
state: ONLINE
  scan: scrub repaired 0 in 2h19m with 0 errors on Thu May 26 03:19:45 2016
config:

  NAME  STATE  READ WRITE CKSUM
  tank  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/d6259956-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/d7a77b85-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/d93713ca-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/dad11601-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/dc54ef62-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/ddddd9a6-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/df6b2800-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  logs
  gptid/8776ff62-f37b-11e5-acd8-000c29a778f1  ONLINE  0  0  0

errors: No known data errors
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Which RAID/hard drive controller do you have on the Dell? And how many hard drives of which model and capacity?

The Dell R710 model can come with several different controllers. The ideal device would be the PERC H200, as this it is basically an LSI 9211 HBA (Host Bus Adapter) that can be flashed to the correct IT mode and used as a simple HBA with FreeNAS.

When you use non-ZFS terminology ('RAID 0') it makes me wonder -- do you have your Dell R710's disks configured as a RAID 0 array? With this RAID array passed to FreeNAS as a 'single' disk? If so, that's not a supported setup; FreeNAS/ZFS needs direct control of the disks and using a RAID array is a bad idea.

For use as a VMware datastore, it's generally best to use a pool comprised of mirrored vdevs. This gives the most IOPS and eliminates parity overhead. Not knowing how many disks you have, I can't give an example for your paticular case. But, suppose you have 12 x 1 TiB drives: you could create a pool of 6 vdevs, each vdev comprising a pair of 1 TiB drives. This would give you 6 TiB of storage (less overhead).

Can you show us a a screen shot of your Volume Status? From the FreeNAS GUI interface, navigate to Storage, select your pool, then click the rightmost of the three buttons at the bottom of the screen ('Volume Status'). Alternatively, you can start a shell and run 'zpool status', displaying the results here (preferable with CODE tags). Either way, this will tell us at a glance how you have your disks configured.

For example, below is the zpool status of my main system, which is configured as a RAIDZ2 pool of 7 x 2TiB disks:
Code:
[root@boomer] /mnt/tank/atm# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Tue May 31 03:45:15 2016
config:

  NAME  STATE  READ WRITE CKSUM
  freenas-boot  ONLINE  0  0  0
  mirror-0  ONLINE  0  0  0
  da0p2  ONLINE  0  0  0
  da1p2  ONLINE  0  0  0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub repaired 0 in 2h19m with 0 errors on Thu May 26 03:19:45 2016
config:

  NAME  STATE  READ WRITE CKSUM
  tank  ONLINE  0  0  0
  raidz2-0  ONLINE  0  0  0
  gptid/d6259956-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/d7a77b85-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/d93713ca-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/dad11601-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/dc54ef62-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/ddddd9a6-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  gptid/df6b2800-1e83-11e5-9dfa-000c29a778f1  ONLINE  0  0  0
  logs
  gptid/8776ff62-f37b-11e5-acd8-000c29a778f1  ONLINE  0  0  0

errors: No known data errors
All of this is great as can you also describe your switch and how you configured this network teaming. Most people don't do this correctly and don't test it and ask questions like 'my network keeps dropping out '.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Personally, I am quite the novice with FN beyond the fact that we tried it in our Center for Advanced Computing at Norwich University (VT) where I was director... it failed miserably because the volume of traffic and number of VMs was quite high (over 550 computer/forensic lab VMs - with around 300 live with high I/O at any time) and the number of VMWare hosts (11) seemed beyond it...

Yeah, that kind of scale will be problematic unless you've actually got someone who knows what they're doing and you're willing to throw resources at the problem. Wouldn't be surprised if something like that required a system with a pair of E5-2643 v3's and a terabyte of RAM and 72 drives in mirrors to make it work well...
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Interesting... and I'm a little surprised that Windows Server 2012 was usable where FreeNAS was not!

But, returning to @Dr.Stephenson's immediate problem: my suggestions are to troubleshoot his network setup and ensure he has a sensible pool design. As of now, we don't know enough about his system to proceed.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Interesting... and I'm a little surprised that Windows Server 2012 was usable where FreeNAS was not!

I'm not. At all. The widespread familiarity with Windows tends to generate lots of "certified professionals" who should be able to crank out a platform that will perform at an adequate (probably never good or great) level of performance.

Unfamiliarity with ZFS is likely to be very dangerous at large scale, though. ZFS typically requires a lot more resources to do the same tricks, but when you've given it those resources, it'll blow away the other solutions.
 
Joined
May 31, 2016
Messages
5
There are 6X2TiB drives in the Dell.... I did set up as RAID 0, single volume.... I am beginning to be convinced that I need to wipe this and start over.... as I have made changes I have, perhaps, exacerbated the problem... I'm going to lose everything on that device (VMs, etc.) but that is a small price to pay... I need this on line and 35 VMs built by end of the weekend so starting over may be the shortest path....

That said, any recommendations on the setup? I cannot answer the drive controller question at the moment... I will need to dig a bit for that....
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
There are 6X2TiB drives in the Dell.... I did set up as RAID 0, single volume.... I am beginning to be convinced that I need to wipe this and start over.... as I have made changes I have, perhaps, exacerbated the problem... I'm going to lose everything on that device (VMs, etc.) but that is a small price to pay... I need this on line and 35 VMs built by end of the weekend so starting over may be the shortest path....

That said, any recommendations on the setup? I cannot answer the drive controller question at the moment... I will need to dig a bit for that....
We can't make anything other than speculative recommendations without knowing what equipment you have. That said:

1> If your R710 has a PERC H200 disk controller, it can be made into a good VM datastore with FreeNAS: you will get the best results setting up a pool of 3 mirror vdevs, where each vdev is comprised of 2 of your 2TiB drives. This will give you a total usable space of ~6TiB, less overhead. If the controller isn't already flashed to IT mode, you will need to flash it.

2> If your R710 is equipped with the PERC H700 or 6/i disk controllers, you will want to consider an alternative to FreeNAS as they are not a good fit with it.

3> If your R710 has some other disk controller, we'll have to know what it is before we can recommend a course of action.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You will want to figure out what controller you have because if it is the wrong one they you need to get the correct controller. After that you need to setup your disks in mirrors. So you will have 3 vdevs with 2 disks each for a total space 6TB but you should only use part of that space to store vm's so you really only get about 3TB of space(minus overhead and TB=>TiB conversion). So just from this I think you should evaluate if you even have enough storage space. You also mention using nfs for your protocol and to use that with a vm workflow you will probably want a slog ssd device to handle the sync writes over nfs.

After you have all that figured out you can start looking at networking. For 35 vm's you will probably need 10Gib network to make everything happy and probably more vdevs after you decide the performance isn't what you want it to be.
 
Joined
May 31, 2016
Messages
5
Thanks all for your help... I am going to abandon FreeNAS due to the drive controller in the Dell (6/i). I'll need to find something else because time is getting short.... last question: Any suggestions for a replacement that won't dislike the controller?
 
Status
Not open for further replies.
Top