Failover with one FreeNAS box and 2 Switches

mpyusko

Dabbler
Joined
Jul 5, 2019
Messages
49
Here is my Network Diagram....
1562342882410.png


Recently I experienced a switch failure and half the VMs lost their storage. The reason is because the NAS has 2 NICs. One was originally configured with a public IP and the other aprivate IP. One NIC was plugged into A and the other was plugged into B. Depending on how the SR (Storage Repository) was mapped, it lost it's connection to the NAS (NFS via public IP :rolleyes:, NFS via pivate IP, iSCSI via private IP). So I replaced switch A and, with each NIC still plugged into opposite switches, configured the NICs
1562343283238.png

and bound boht IPs to lagg0. Now both SR mappings work but periodically I am getting disk errors displayed by my Linux VMs. I haven't made any changes to the switches to configute for LAGG or whatnot. My goal is to suffer a switch failure and no interrupt the VMs. (technically there are several Xen Hypervisors in the pool). I realize this turns the FreeNAS box into a single point of failure, but that will be rectified shortly. Also, all but 1 pool currently maps to the Private IP. That will be fixed shortly too, so they all map privately. These are managed 24-port switches, so configuring them is possible. Switch A is still factory default. Switch B may have some port configurations on it, still looking into it.

Can you please give me some advice on how to rectify this?

Thanks.
FreeNAS-9.10.2-U6
 

proto

Patron
Joined
Sep 28, 2015
Messages
269
Can you please give me some advice on how to rectify this?

In my humble opinion, so take it "as is": you should fix that single point of failure represented by that NAS first while considering networking redundancy.
We manage a similar setup, except that the hypervisors are VMware and we have an iSCSI SAN as backend for mission critical applications. The switches are redundant, but even the SAN is: it has a redundant controller so the risk of "losing" the connection with the hosts is reduced***.
You might consider TrueNAS that seems to have that functionality.

Networking: you really don't need/want LAGG/LACP on iSCSI channels. It's bad practice. I would configure dedicated iSCSI links each with its dedicated IP and NIC. For all other purposes (NAS management, NFS, etc) I would buy another dual NIC.

You might want a similar setup:

Screenshot 2019-07-07 08.57.57.png

Where A is iSCSI
B other protocols
C and D redundant links on Hosts


*** except when DC guys play with both power lines at the same time...
 

mpyusko

Dabbler
Joined
Jul 5, 2019
Messages
49
Yeah, so we actually had an outage in our DC because someone (who has probably since been..... handled) managed to fry multiple power conditioners at the same time. So not going to rule that one out.

So as I said, some of these shares are mapped via NFS and not iSCSI. I guess what I would like to see is more combined bandwidth when all links are active, but reduced bandwidth and no interruption if any link fails.

To toss another "just found out" into the mix.... two ports on the FW are bridged, each going to a switch. When the firewall goes down (to reboot for a firmware update) the link between the to switches is broken and this too, is causing an issue with some VMs.
 

proto

Patron
Joined
Sep 28, 2015
Messages
269
So as I said, some of these shares are mapped via NFS and not iSCSI. I guess what I would like to see is more combined bandwidth when all links are active, but reduced bandwidth and no interruption if any link fails.

In any case I think you would do well to think of a solution as I indicated in the figure: minimum two links per switch, "crossed" for redundancy. Better still have 4, that is a total of 8 NICs on the NAS, if redundant.
I'll tell you more: before buying an iSCSI SAN we had "tried" a NAS (NFS): the network configuration was similar, except for some complications that made us opt for the SAN.

I do not think that having only one redundant link is sufficient to guarantee the service. Among other things I see that in your first post it was in "load balance". I don't go any further because I could say nonsense but I don't think it's the best configuration in this case: if I remember correctly, when one link "falls" the other takes charge of all the requests, but there is a minimum amount of time for which virtual machines are likely to lose connection with storage.
It would probably be better to use "failover".

Apart from that, the best thing would be to ask for a quote for a redundant SAN or NAS including network configuration.

When the firewall goes down

Unfortunately this unique firewall clearly introduces another pitfall.
This too should be redundant and properly configured, but it would be better to understand with those who manage the firewall and security.
 
Top