FreeNAS 11.1 U6 rebooting randomly when transfering via iSCSI and vmware esxi

Status
Not open for further replies.

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Hello everyone,

My freenas is rebooting randomly when using iSCSI to backup my vm's using Vsphere Data Protection. I added the device to the ESXi without troubles and I have to RJ45 crossover cables which connect the freenas with the ESXi host. I wonder if I'm missing some network configuration. I set the MTU to 9000 as suggested in a tutorial, is it ok?

I'm attaching some images so you can see my setup and behavior while doing the backup. The Freenas is in a HP ML350 G6 server with 24GB in RAM. Both NIC's are GLan.

As you can see in the pictures sometimes the link on both NIC's goes down.

I tested RAM, CPU and Boot and they are ok. My RAID (I've got RAID10 configured) controller is the HP Smart Array P410i

Network description:
FREENAS ESXi
10.0.0.1/24 <-----> 10.0.0.2/24 both MTU 9000
10.0.1.1/24 <-----> 10.0.1.2/24 both MTU 9000

Thanks for your help

IMAGES:
IMAG1454.jpg

IMAG1446.jpg
IMAG1462.jpg
 
D

dlavigne

Guest
Replace the Realtek NIC with an Intel one. Realtek is known to not be able to handle load.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Replace the Realtek NIC with an Intel one. Realtek is known to not be able to handle load.
Where can I see if one of the NICs are Realtek?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
My RAID (I've got RAID10 configured) controller is the HP Smart Array P410i

ZFS should not be used with a RAID controller. This is a major issue and may require the backup and restore of your pool to correct. Please post the output of zpool status -v inside of [code][/code] tags to start with.

Where can I see if one of the NICs are Realtek?

The use of the re driver in your screenshot shows that there is a Realtek NIC being used somewhere, which is even more confusing because the ML350 G6's onboard NIC (NC326i) is a Broadcom, not a Realtek. Have you installed an additional network card beyond the onboard ones?

I set the MTU to 9000 as suggested in a tutorial, is it ok?

In a network with stability issues (eg: flaky cables, weak NICs) jumbo frames can hurt you more than they help. You might send fewer of them but you'll be hurt more by each that is lost.
 
Last edited:

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
ZFS should not be used with a RAID controller. This is a major issue and may require the backup and restore of your pool to correct. Please post the output of zpool status -v inside of tags to start with.
zpool status -v result:
Code:
[root@freenas ~]# zpool status -v																								   
  pool: HDDLABB																													 
 state: ONLINE																													 
  scan: none requested																											 
config:																															 
																																   
	   NAME										  STATE	 READ WRITE CKSUM													
	   HDDLABB									   ONLINE	   0	 0	 0													
		 gptid/31e18360-a6f1-11e8-91b6-0025b3adf25c  ONLINE	   0	 0	 0													
																																   
errors: No known data errors																										
																																   
  pool: freenas-boot																												
 state: ONLINE																													 
  scan: resilvered 1.65G in 0 days 00:12:27 with 0 errors on Tue Sep 11 10:30:19 2018											   
config:																															 
																																   
	   NAME		STATE	 READ WRITE CKSUM																					 
	   freenas-boot  ONLINE	   0	 0	 0																					
		 mirror-0  ONLINE	   0	 0	 0																					 
		   da1p2   ONLINE	   0	 0	 0																					 
		   da2p2   ONLINE	   0	 0	 0																					 
																																   
errors: No known data errors																										
[root@freenas ~]#

Can you guide me in order to do this change from my RAID10 to plain ZFS? I have 8 HDD.


The use of the re driver in your screenshot shows that there is a Realtek NIC being used somewhere, which is even more confusing because the ML350 G6's onboard NIC (NC326i) is a Broadcom, not a Realtek. Have you installed an additional network card beyond the onboard ones?
Yes, I added another NIC (re0) in order to have 2 iSCSI connections with my ESXi host. What I'm gonna do is change the managment network to the re0 and use the other motherboard eth for iSCSI, I'll report back when I do the change.




In a network with stability issues (eg: flaky cables, weak NICs) jumbo frames can hurt you more than they help. You might send fewer of them but you'll be hurt more by each that is lost.
What is the suggested MTU then?
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Hello,

I changed my NIC's config so now I access the WebUI via the re0 and both Broadcom NICs are for iSCSI

I have to say that I first had one usb drive as my boot/installation media, it crashed, I bought 2 new usb drives and reinstalled freenas 11.1 on one and then I did the mirror thing on System -> Boot in order to have them available to boot the freenas in case of of them fail

Still getting some errors, and you can see even in the ESXi I get the timeout error (even though you see that timeout error my backup is being transfered without any other trouble by the Vsphere Data Protection appliance, but it's been on 92% for a while). what do you suggest?

I still want to know if I should keep the MTU = 9000 on both broadcoms NICs.

Why I don't see any network activity on Reporting -> Network tab from both iSCSI NIC's ?

Please check these images:
Captura de pantalla 2018-09-12 a la(s) 12.05.37 p. m..png

IMAG1469.jpg

IMAG1470.jpg

IMAG1471.jpg

Captura de pantalla 2018-09-12 a la(s) 12.14.40.jpg
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
zpool status -v result:
(trimmed for brevity)
Can you guide me in order to do this change from my RAID10 to plain ZFS? I have 8 HDD.

You have created a single RAID10 vdisk on your hardware RAID controller. The only way you will be able to create a proper pool from this is to back up your data, destroy the pool and virtual disk entirely, add a proper HBA (Host Bus Adapter) and not a RAID card to your system, create a new pool from mirror vdevs inside of FreeNAS, and restore the data from backup.

The LSI 9207-8i is the current 6Gbps favorite; you may also find this rebranded as other OEM controllers, or the older LSI 9211-8i which can also be found rebranded.

I am assuming you are using 8x 2.5" SFF drives with the default cage in the ML350 G6 - thankfully this uses standard SFF-8087 connectors on the back of the cage and should allow connections to another non-HP HBA installed in another PCIe slot. You may require longer cables to physically reach the slot though.

Yes, I added another NIC (re0) in order to have 2 iSCSI connections with my ESXi host. What I'm gonna do is change the managment network to the re0 and use the other motherboard eth for iSCSI, I'll report back when I do the change.

The Broadcom driver set isn't quite as good as Intel but it's still miles ahead of Realtek. This should help somewhat.

What is the suggested MTU then?

The default of 1500.

I would suggested checking, testing, and if necessary replacing your network cables between your FreeNAS and ESXi machines.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
You have created a single RAID10 vdisk on your hardware RAID controller. The only way you will be able to create a proper pool from this is to back up your data, destroy the pool and virtual disk entirely, add a proper HBA (Host Bus Adapter) and not a RAID card to your system, create a new pool from mirror vdevs inside of FreeNAS, and restore the data from backup.

The LSI 9207-8i is the current 6Gbps favorite; you may also find this rebranded as other OEM controllers, or the older LSI 9211-8i which can also be found rebranded.

I am assuming you are using 8x 2.5" SFF drives with the default cage in the ML350 G6 - thankfully this uses standard SFF-8087 connectors on the back of the cage and should allow connections to another non-HP HBA installed in another PCIe slot. You may require longer cables to physically reach the slot though.



The Broadcom driver set isn't quite as good as Intel but it's still miles ahead of Realtek. This should help somewhat.



The default of 1500.

I would suggested checking, testing, and if necessary replacing your network cables between your FreeNAS and ESXi machines.

I'm using 8 x 3.5" drives (2TB each). So, I understand I shouldn't keep using my RAID10 (right now I have a 8TB volumen) and should by the LSI 9207-8i

I'm gonna change all MTUs to 1500

I made the crossover network cables myself, I'll test them.

Thank you
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
We're outside the purview of the "Networking" forum now, but I assume you are using 6x drives in the lower hot-swap cage, and the optional 2x 3.5" expansion cage up top? Make sure that the only cables going to the board are the 2x SFF-8087 mini-SAS cables, and you don't need to find a way to connect the upper cage directly.

If the cables are faulty or marginal, this will cause the dropouts - but the likely cause of your system crashing is the RAID card overheating. As I recall, the P400/P410 series liked to run very hot.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
We're outside the purview of the "Networking" forum now, but I assume you are using 6x drives in the lower hot-swap cage, and the optional 2x 3.5" expansion cage up top? Make sure that the only cables going to the board are the 2x SFF-8087 mini-SAS cables, and you don't need to find a way to connect the upper cage directly.

If the cables are faulty or marginal, this will cause the dropouts - but the likely cause of your system crashing is the RAID card overheating. As I recall, the P400/P410 series liked to run very hot.

This is my RAID setup on the server:
IMAG1472.jpg

IMAG1473.jpg

IMAG1474.jpg

IMAG1476.jpg

IMAG1477.jpg

IMAG1478.jpg
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
I'm seeing some regular comm between the ESXi and the freenas
upload_2018-9-12_14-46-19.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
If they're all working now that means you've got the necessary Y-cable to wire it up, but you'll need to determine if that Y-cable will be long enough to reach the SAS port on a card installed in the PCIe bay, or if you'll have to use some manner of extension.

362097.bmp
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
If they're all working now that means you've got the necessary Y-cable to wire it up, but you'll need to determine if that Y-cable will be long enough to reach the SAS port on a card installed in the PCIe bay, or if you'll have to use some manner of extension.

362097.bmp

Ok, I understand. I don't think the cables will reach a card in the PCIe bay.

What I understand is that if I buy the LSI card there wont be any SAS cable attached to the motherboard as I do have them right now, am I understanding right?

Thanks
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ok, I understand. I don't think the cables will reach a card in the PCIe bay.

What I understand is that if I buy the LSI card there won't be any SAS cable attached to the motherboard as I do have them right now, am I understanding right?

Thanks
Correct, it will come as a bare card.

The problem is that your server's drive cages require that specific Y-cable because of the bay-to-port mappings on the two cages. Lower bays 1-4 go through the first port, bays 5-6 through the second port, and then your upper two trays 7-8 have a third SFF-8087 port. The custom HP Y-cable splits those four lanes into two physical plugs of two lanes each.

If the Y-cable won't reach, you'd have to find a way to extend it - which isn't commonly done, and might introduce more potential issues from the coupler.

In your case, since you are using SATA drives, you could use a SATA-to-SFF-8087 "reverse breakout cable" that would let you connect two of your motherboard's SATA ports to the SFF-8087 port of the upper bays, and then just get two longer SFF-8087 cables to handle the lower six bays.
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Correct, it will come as a bare card.

The problem is that your server's drive cages require that specific Y-cable because of the bay-to-port mappings on the two cages. Lower bays 1-4 go through the first port, bays 5-6 through the second port, and then your upper two trays 7-8 have a third SFF-8087 port. The custom HP Y-cable splits those four lanes into two physical plugs of two lanes each.

If the Y-cable won't reach, you'd have to find a way to extend it - which isn't commonly done, and might introduce more potential issues from the coupler.

In your case, since you are using SATA drives, you could use a SATA-to-SFF-8087 "reverse breakout cable" that would let you connect two of your motherboard's SATA ports to the SFF-8087 port of the upper bays, and then just get two longer SFF-8087 cables to handle the lower six bays.

Ok I understand, so the PCIe card and the cables, do I need to buy anything else?
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Here you can see the traffic because I'm doing a backup for testing, the thing is that I just see traffic through the bge0 and not so much through bge1, is it normal? Doesn't the traffic should be going on both iSCSI NICs?
Captura de pantalla 2018-09-12 a la(s) 3.55.19 p. m..png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
"Normal" as in the defaults, but not "optimal."

Check your ESXi path selection policy, it's probably VMW_PSP_MRU for "Most Recently Used" - you want to switch to VMW_PSP_RR (Round Robin) which will cycle between all available paths:

https://kb.vmware.com/s/article/2000552

And then change your RR IOPS limit from the default 1000 to a lower number (1 works, if you're wondering):

https://kb.vmware.com/s/article/2069356
 

titanve

Explorer
Joined
Sep 12, 2018
Messages
52
Status
Not open for further replies.
Top