Replication Fails over VPN

dbs_64

Dabbler
Joined
Dec 22, 2017
Messages
13
Hello, I'm working on a project to have a backup server at my parent's house that I can send the data on my home server to over a VPN. I have been able to successfully replicate the data with both servers on my home network, but replication fails over a VPN.

The VPN is being implemented using two pfSense routers in a peer-to-peer connection. I am able to remotely connect to and manage the backup server over the VPN connection. I can also confirm that the home server can ssh to the remote server using the command ssh -i /data/ssh/replication 10.1.1.1.

When I setup replication on the home server, the task runs for a few minutes, never getting past 0% and then I finally get an error in the alerts stating "Replication Pool1/Backup -> 10.1.1.1:ONE_8TB/Home_Backup failed: Failed: Pool1/Backup (auto-20181222.2200-1w)". Using "netdata", there is no activity on either server to indicate that anything is happening. When I try to do this manually via CLI using the command zfs send Pool1/Backup@auto-20181222.2200-1w | ssh -i /data/ssh/replication 10.1.1.1 zfs recv -F ONE_8TB/Home_Backup, it connects for about 30 seconds then gives me the error:
Connection to 10.1.1.1 closed by remote host.
warning: cannot send 'Pool1/Backup@auto-20181228.2200-1w': signal received


Something else worth noting is that the Web GUI for the remote server will suddenly and randomly log out, sometimes while I'm in the middle of something. I'm not sure if it's something to do with the VPN connection between the pfSense routers. I don't see any indication of instability in the VPN tunnel, and currently the "remote" router is connected to my home router until I can confirm this works and deploy it at my parent's house.


Both servers are running FreeNAS-11.2-RELEASE.
 
Joined
Dec 29, 2014
Messages
1,135
Does pfSense have the ability to clear the DF (do not fragment) bit on VPN traffic? It could also be that phSense isn't trying to transport packets that are too big to fit without being fragmented into multiple packets. That is because you lose 40-60 bytes of payload space to IPSec. You might need to bring the MTU down for the replication traffic. I don't know if that can be done with replication traffic in FreeNAS as I haven't tried it, but that is an issue I have seen more than a few times with VPN connections.
 

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
Does pfSense have the ability to clear the DF (do not fragment) bit on VPN traffic? It could also be that phSense isn't trying to transport packets that are too big to fit without being fragmented into multiple packets. That is because you lose 40-60 bytes of payload space to IPSec. You might need to bring the MTU down for the replication traffic. I don't know if that can be done with replication traffic in FreeNAS as I haven't tried it, but that is an issue I have seen more than a few times with VPN connections.

It does have the DF option, in system, firewall and nat.

Also set NAT Reflection mode for port forwards to pure nat.

Also, did you use open vpn or ip-sec?
Randomly disconecting from web gui can bebecause you have some nat misco figuratoin, and packets are not routed corectly. This is why your syncronisation is not working also.
 
Last edited:

dbs_64

Dabbler
Joined
Dec 22, 2017
Messages
13
On both pfSense routers, I enabled the DF option and set NAT reflection to Pure NAT.

I also made sure all devices were set to use the standard 1500 byte MTU. Only the LAN interface on the home pfSense router was set to use 9000 byte MTUs, but since the "managed" switch that everything on my network connects to has no MTU option, I don't think that made any difference.

Lastly, I went through the "Advanced" and "Interface" settings on both routers and made sure they matched where applicable, since I had followed some guides on YouTube when first setting up my home pfSense router, and I'm fairly certain that I had tuned that system pretty well.

Both replication through the Web GUI and CLI failed just as they did before. Also, the Web GUI on my backup server is still logging me out randomly. It logged out while I was typing this, but my home server is still logged in.

Last night, I also tried file sharing over the VPN. I tried copying a folder to the SMB share on the backup server. Simply connecting to it was very slow and the file transfer, just like with replication, never got past 0%. Eventually, it timed out, however, looking at the shared folder I did see that some data copied over, but it was just one file, and it was likely incomplete. Some data is getting through, but it's very slow and eventually times out. I just tried it again, and the results were the same.

My VPN is OpenVPN setup as a peer-to-peer connection.

I'm wondering if the fact that my "remote" router and server are hanging off my home network (again, currently testing this until it's working as my parents’ house is 30+ minutes away) is causing an issue with routing, as my "remote" router likely has routes to my home network through both it's WAN and VPN interfaces. Perhaps the remote router doesn't always know what to do with the reply traffic from the backup server. I suppose I could take the hardware over to my parent's house and test it there.

One last thing. I had issues in the past where I couldn't access the backup server's Web GUI over the VPN. It would get as far as the "Connecting to NAS" screen but the logon page never came up. I tried taking the hardware (server and router) to my parents’ house and found that I could access my home server remotely, but when I came back home, I still could not access the backup server sitting at my parents’ house, though I could access the Web GUI of the remote pfSense router. I then had the random idea to setup HTTPS on both servers. Once I did that I suddenly could access the backup server's Web GUI over the VPN, though at this point everything was back at my place on my home network. I have no idea why this worked. Also, in case this made any difference, I imported the CA certs from each server into the other. So, the home server has the backup server’s CA cert and visa versa.
 

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
Last edited:

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
In firewall - rules- openvpn, did you add a any to any pass rule? on both servers?

Also i saw thet you used a /16 mask. That meanes 65534 ip's. I would not do that. It will be dificult to trace and debug, for home use it is a averkill. Rather use /24 classes. 10.1.0.1-10.1.0.254

For vpn tunel network, use /24.
https://www.youtube.com/watch?v=seScJty_VL8


Also for vpn, go with peer 2 peer shared key. You will have to redo the setup on both, it is more complicated at first look, but more transparent

https://www.netgate.com/docs/pfsense/book/openvpn/site-to-site-example-configuration-ssl-tls.html
 

dbs_64

Dabbler
Joined
Dec 22, 2017
Messages
13
In firewall - rules- openvpn, did you add a any to any pass rule? on both servers?
I thought this might have been the issue, but I forgot that I already had a rule on the server router that was created from the OpenVPN wizard back when I tried a Remote Access VPN with the backup FreeNAS as the client. Nothing changed when I switched to Peer-to-Peer using a second pfSense router, but I had (most of) the hardware lying around, and the second pfSense router will be a vast improvement over my parents' current, all-in-one wireless router, so I'll keep it.

Here's the screenshot of the OpenVPN firewall rule on the server side. All I did was change the description just now. https://imgur.com/UT86nFZ

Also i saw thet you used a /16 mask. That meanes 65534 IP's. I would not do that. It will be dificult to trace and debug, for home use it is a averkill. Rather use /24 classes. 10.1.0.1-10.1.0.254
I realize using /16 networks is unconventional, but here me out. The first octet is simply "10". The second octet represents the network ("0" for home network, "1" for remote network, "10" for VPN network) . The third octet represents the kind of device ("0" for network equipment, "1" for servers, "2" for PCs, etc.). The forth octet is the individual client (10.0.1.1 is FreeNAS, 10.0.1.2 is Plex, etc)

As far as I know, there is no real disadvantage to using a /16 network. Even a /24 is overkill for my needs. I like to keep everything organized using static assignments, and I know what the device is simply by its IP address. If there are arguments against using /16, I'm open to hear them.

I did use /24 for the VPN network, since there's no need for complex organization there. I watched the video, and I already did everything he did, save for using better crypto algorithms.

Also for vpn, go with peer 2 peer shared key. You will have to redo the setup on both, it is more complicated at first look, but more transparent

https://www.netgate.com/docs/pfsense/book/openvpn/site-to-site-example-configuration-ssl-tls.html
Did you mean use SSL/TLS? I'm already using shared key. I might change to SSL/TLS in the future, but I can't say I see the need at this time. Once I get the VPN fully working, I'll explore this option. Then again, I might try it later when I get the chance. Like I previously mentioned, using HTTPS shouldn't have made any difference, but one I did, I was suddenly able to access the Web GUI of the remote server over the VPN.
 

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
I still would not use /16 ( mask : 255.255.0.0 )
We do not even use a 16 mask at clients wich have more than 300 network devices, we use vlans .

Exactly because now to debug it, it is a mess.
And yes, I meant SSL/TLS with certificate for every client.
It might look more complex, but it works perfectly.

In the clients overide tab in pfsense you specify the client ip class/es, in the server you specify your ip class/es and total remote ip class/es.

If you go to diagnostics - routes, there you can see each ip on what interface is routed.
Check if the remote ip class at your parent's house is routed throu your ovpn interface.

Also you should delete any settings made by the ovpn wizzard.
And do a reboot. If you change routing settings, it takes time for some to be applyed.
 

Holt Andrei Tiberiu

Contributor
Joined
Jan 13, 2016
Messages
129
I am not saying that /16 will not work, it is not a BPA.
First thing i learned at CISCO, it was that always keep things as simple as you can.
Now, can you post a screenshot at your routes screen.
And if you want to keep things organised, buy a L2 managed swich and make vlans, each with its ip class
10 - pc - 192.168.10.0/24
20 - servers - 192.168.20.0/24
30 - wifi - 192.168.30.0/24
40 - guest - 192.168.40.0/24

This way if you have a broadcast storm one one vlan, it will not take your whole network down , only that specefic vlan.
pFsense know's to manage vlans, realy well.
 
Last edited:
Top