Replication fails - operation timed out

MartinHerrman

Dabbler
Joined
Jan 5, 2018
Messages
18
All,

I have two FreeNAS boxes that replicate a small number (<5) datasets over an internet connection. Both boxes are running latest 11.2 and are powered by 16GB RAM and a low power Atom CPU. As they are used to store/secure some family pictures and stuff, the load is very low.

When connecting from/to each other via SSH it works well and fast. However, the replication in both directions fails:

Replication pool1/<label> -> <domainname>:NAS_pool/<label> failed: Failed: ssh: connect to host <domainname> port 22: Operation timed out

This used to work fine, I can't find a related bug in the issue tracker. What should be my next step?

BTW: I have increased the timeout value from 7 to 60 seconds: https://redmine.ixsystems.com/issues/24961
 
D

dlavigne

Guest
Anything else in /var/log/messages or /var/log/auth.log when it fails?
 

MartinHerrman

Dabbler
Joined
Jan 5, 2018
Messages
18
Hi, thanks for the quick reply!

First box, messages is quite empty, but auth contains a lot of these:

Dec 11 20:25:26 freenas sshd[54364]: Accepted publickey for root from <remoteip> port 19632 ssh2: RSA SHA256:<key>
Dec 11 20:25:26 freenas sshd[54364]: Received disconnect from <remoteip> port 19632:11: disconnected by user
Dec 11 20:25:26 freenas sshd[54364]: Disconnected from user root <remoteip> port 19632

Second box, messages contains things like this every several hours:

Dec 11 20:23:03 freenas kernel: igb0: link state changed to DOWN
Dec 11 20:23:03 freenas kernel: igb0: link state changed to DOWN
Dec 11 20:23:10 freenas kernel: igb0: link state changed to UP
Dec 11 20:23:10 freenas kernel: igb0: link state changed to UP
Dec 11 20:23:27 freenas kernel: igb0: link state changed to DOWN
Dec 11 20:23:27 freenas kernel: igb0: link state changed to DOWN
Dec 11 20:23:31 freenas kernel: igb0: link state changed to UP
Dec 11 20:23:31 freenas kernel: igb0: link state changed to UP
Dec 11 20:23:50 freenas dhclient: New IP Address (igb0): 192.168.178.17
Dec 11 20:23:50 freenas dhclient: New Subnet Mask (igb0): 255.255.255.0
Dec 11 20:23:50 freenas dhclient: New Broadcast Address (igb0): 192.168.178.255
Dec 11 20:23:50 freenas dhclient: New Routers (igb0): 192.168.178.1

And auth contains similar messages as on box 1:

Dec 11 21:34:09 freenas sshd[10994]: Accepted publickey for root from <remoteip> port 29131 ssh2: RSA SHA256:<key>
Dec 11 21:34:09 freenas sshd[10994]: Received disconnect from <remoteip> port 29131:11: disconnected by user
Dec 11 21:34:09 freenas sshd[10994]: Disconnected from user root <remoteip> port 29131
 
D

dlavigne

Guest
On the system with the flapping igb0:

Which FreeNAS version?
What MTU?
Any possibility that interface is going bad?
 

MartinHerrman

Dabbler
Joined
Jan 5, 2018
Messages
18
Sorry for the late reply, but Christmas time is a busy time as well :smile:

I discovered that the network cable to the freenas server only had 4 wires and ran at 100mbit speed. All devices are capable of doing 1gb/s. I replaced the cable yesterday with a new cat6 cable and since then the problem seems to have disappeared!

thanks for the help!
 

MartinHerrman

Dabbler
Joined
Jan 5, 2018
Messages
18
hm.

I received another alert an hour ago, telling me that 2 replication tasks failed because of connect to host / operation timed out.

But.. when I login to the webinterface the replication tasks show status 'up to date' and there are no new alerts visible at the right top corner.

I'm running 11.2-RELEASE. I have checked all logs, by 'grep <targethost> *', and found the same errors over and over again, but not much of additional help. However, it showed me that a connecttimeout of 7 seconds was used; while I have increased it to 60. The autoreply.py appears to have defined this timeout 3 times and I only increased it for one of them. So I have now changed that and will see what happens next.

Still it is weird that the webinterface doesn't show any issue..
 
Top