SOLVED Remote Replication fails always

Status
Not open for further replies.

SRA

Dabbler
Joined
Jun 16, 2017
Messages
11
Hi guys,

I am trying to setup a remote replication of ZFS ZVol of size 2TB from a ServerA Freenas 11-U1 to ServerB Freenas 11-U1 and it fails always.
I used the semi-automatic setup to create the Replication Task and it starts replicating but comes back with failure.
As per the documentation for troubleshooting remote replication, I am able to ssh in to the ServerB from ServerA but when i try to send the snapshot manually it always fails after sometime with

Code:
packet_write_wait: Connection to 10.187.18.2 port 22: Broken pipe
warning: cannot send 'SSDVolume/VmDatastore1@auto-20170705.1443-5h': signal received


Below is what I see when it runs automatically
Code:
storage1 /autorepl.py: [tools.autorepl:150] Replication result: packet_write_wait: Connection to 10.187.18.2 port 22: Broken pipe
Failed to write to stdout: Broken pipe


I have tried replicating a small empty zVol and it succeeds.

I tried to check how much time does ZFS takes to read snapshot and i get 500MB per second

Code:
root@storage1:/var/log # zfs send -v SSDVolume/VmDatastore1@auto-20170705.1443-5h | cat > /dev/null
full send of SSDVolume/VmDatastore1@auto-20170705.1443-5h estimated size is 261G
total estimated size is 261G
TIME		SENT   SNAPSHOT
16:25:58	463M   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:25:59	848M   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:00   1.32G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:01   1.78G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:02   2.18G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:03   2.52G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:04   2.93G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:05   3.38G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:06   3.79G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:07   4.20G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:08   4.72G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:09   5.21G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:10   5.65G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:11   6.00G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:12   6.42G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:13   6.83G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:14   7.24G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:15   7.63G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:16   8.05G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:17   8.43G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:18   8.84G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:19   9.24G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:20   9.63G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:21   10.0G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:22   10.4G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:23   10.8G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:24   11.2G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:25   11.8G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:26   12.3G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:27   12.7G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:28   13.1G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:29   13.5G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:30   13.9G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:31   14.4G   SSDVolume/VmDatastore1@auto-20170705.1443-5h
16:26:32   14.8G   SSDVolume/VmDatastore1@auto-20170705.1443-5h


Can anyone help me fix this?
Thanks
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
I have seen this error before and it was due to improper routing rules. You are using a DNS service or do you have static IPs on both ends?

What routing rules do you have setup? It is a bit strange how you can SSH in but replication fails.
 

SRA

Dabbler
Joined
Jun 16, 2017
Messages
11
I have seen this error before and it was due to improper routing rules. You are using a DNS service or do you have static IPs on both ends?

What routing rules do you have setup? It is a bit strange how you can SSH in but replication fails.

I have DHCP and DNS. Routes are OK as both servers are able to talk to each other. Both of them are in same IP range.
Replication works for small zVols so I don't expect its routing.

How did you solve your issue?
 
D

dlavigne

Guest
If you're getting broken pipe errors, you're timing out. Try adding ClientAliveInterval to the Extra Options field of Services -> SSH. If that doesn't fix it, you might also have to experiment with ClientAliveCountMax or TCPKeepAlive. Refer to https://www.freebsd.org/cgi/man.cgi?query=sshd_config for explanations of these settings.
 

SRA

Dabbler
Joined
Jun 16, 2017
Messages
11
Finally found a solution. It was Jumbo Frames.
I have a LACP Aggression setup with 2 10GbE and I had enabled Jumbo frames with adding mtu 9000 on each links on both servers. Seems like something wasn't happy with that.
Removing it fixed it.
 
Status
Not open for further replies.
Top