Replication stopped running

Status
Not open for further replies.

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Hello All,

FreeNAS 9.2.0 with 8GB RAM

A replication job was running fine and now has stopped replicating. The snapshots happen once a day and replication is configured to run at any time of day.

I upgraded to 9.2.1.4 and this caused snapshots to not display so I immediately switched back to 9.2.0. Since then the snapshots run but replication does not.

I don't see a way to "force" replication to run.

What can I look for?

~eric
 
D

dlavigne

Guest
Does checking the "Initialize remote side for once" fix it? Note that this option will delete replicated data on the remote side in order to "unstick" a stuck replication.
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
That would be a worst case scenario as it is 500+ GB of data that was sent over the WAN...
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
mp,

I am actually running 9.2.0. I briefly upgraded to 9.2.1.4, received the error noted in #4836 and immediately switched back to 9.2.0, got my snapshots back but replication stopped working.

Or course I can upgrade to 9.2.4.1 and hopefully that will allow replication to restart...
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Okay, after upgrading to 9.2.1.4 the ZFS Replication job status was initially "Waiting".

I modified the Periodic Snapshot Tasks schedule to cause a snapshot to be taken at 9:45 AM EDT.

The ZFS Replication job status was then an error due a Public Key error (due to rolling back to 9.2.0).

Updated the Public Key in the Pull site and now the replication status is "Sending". Unfortunately all of the Reporting statistics are blank so I can not see throughput on the Push side. Not sure what is up with that.

So it appears the root problem all along (after the downgrade) was the Public Key issue. This is something to note for future downgrades.

Having the extra status information is a big help in this version of replication!

There is so much more that can be accomplished with replication. Looking forward to the future of FreeNAS...

~eric
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
Or course I can upgrade to 9.2.4.1 and hopefully that will allow replication to restart...


I'm not saying you should upgrade, I just wanted to mention that it fixes the “missing snapshots” bug. It's up to you to decide if you jump from 9.2.0 to 9.2.1.4.1. I'm not too positive that replication will automagically restart after the update.

At some point we also had replication issues. (As dlavigne already suggested) the solution was to check “Initialize remote side”. I totally agree that it's a problem because all data needs to be retransferred. Since we didn't want to wait six months, I had the server sent in, connected it to the local gigabit network. Then the initial replication took about four days. Then I sent the server back to the remote location. Not a perfect solution, but it worked for us.

Anyway, you can still check if 9.2.1.4.1 fixes the issue, maybe you're lucky.
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Unfortunately the replication job failed, the status is now "Could not create directory '/root/.ssh'. Permission denied (publickey, password)."

I have already updated the public key from PUSH into PULL. The PUSH site obviously already has the PULL sites Remote hostkey.

Is there something else I should be looking for?

~eric
 

mpfusion

Contributor
Joined
Jan 6, 2014
Messages
198
Unfortunately the replication job failed, the status is now "Could not create directory '/root/.ssh'. Permission denied (publickey, password)."

Try to log into PULL from PUSH manually:

Code:
ssh -i /data/ssh/replication <pull_host>


You can also add “-v” to get some debug output. It looks like if the keys aren't set up correctly.
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
I tried to login manually and received, "Could not create directory '/root/.ssh'
The authenticity....
ECDSA key fingerprint is: .....
Are you sure you want to continue.

I have installed the Public Key on the PULL host and the Remote hostkey is downloaded automatically into the replication job.

What else would need to be done?

~eric
 

David Ritter

Cadet
Joined
Mar 14, 2014
Messages
8
I seem to be having a similar issue with the newest versions of FreeNAS (9.2.1.4 and 9.2.1.5).

I have setup the replication keys correctly and can successfully initiate an ssh connection from PUSH to PULL.

When I manually ssh from PUSH to PULL I get this message.

Could not create directory '/root/.ssh'.

But the ssh connection seems to working the shell (it looks like I am connected to PULL and can run shell commands on PULL).

My replication jobs fail and I see this message in the System Alerts on PUSH.

CRITICAL: Replication dallas-zfs -> 10.120.1.76 failed: Could not create directory '/root/.ssh'. Succeeded

Something is not getting the correct permissions, I can probably figure this out but this has not been an issue that I have had to mess with up until the last few 9.2.1.x updates.
 

David Ritter

Cadet
Joined
Mar 14, 2014
Messages
8
I don't know if it matters but after looking at my new FreeNAS setup and comparing it to a working older FreeNAS setup it seems that the /root/.ssh directory does not exist on PUSH. I have never needed to create these directories in the past to get replication working.

Is it possible that some setup script that creates these .ssh directories is broken?
 

David Ritter

Cadet
Joined
Mar 14, 2014
Messages
8
Success! I went back through my replication setup and setup everything again from scratch.

My particular replication issue now seems to be fixed.
 

Idiotzoo

Explorer
Joined
Mar 11, 2013
Messages
55
I have the same problem :(

Two boxes which have happily been replicating for a few months have stopped playing after updating to 9.2.1.5

I've tried deleting and reconfiguring all keys and replication tasks, no joy. I get the "could not create directory '/root/.ssh'". If I try ssh from push to pull authenticating with the key, it works. If I throw in the verbose switch everything looks fine, but I do see "could not create directory '/root/.ssh'"

I've tried manually replicating using zfs send and that did appear to work, although it also gave the error.

I've mucked about with this so much that the gui is completely confused and now states that snapshots exist that don't, so a reboot is necessary... However there does appear to be a fundamental problem that's throwing this error.
 

Idiotzoo

Explorer
Joined
Mar 11, 2013
Messages
55
Ok. After reboots and further mucking about it seems replication is working just fine once more. However the error is still present, which is causing freenas to issue an Alert. Anyone know how this might be fixed?
 

elangley

Contributor
Joined
Jun 4, 2012
Messages
109
Ok. After reboots and further mucking about it seems replication is working just fine once more. However the error is still present, which is causing freenas to issue an Alert. Anyone know how this might be fixed?


I am seeing the same thing. Replication works but still receiving the error, running 9.2.1.4.1 on the PUSH side and 9.2.0 on the PULL side.

It would be interesting to know what coding changes caused these. Replication was definitely modified in 9.2.1.4...
~eric
 

hunter

Explorer
Joined
Nov 24, 2013
Messages
94
I am having the same problem as Idiotzoo, and took the same steps to correct it. On two machines both running 9.2.1.5, that were replicating snapshots fine on an earlier version of FreeNAS. I also tried the "Initialize Remote Side for once". I still get the "could not create directory '/root/.ssh" even though the snapshots seemed to replicate fine...
 

hunter

Explorer
Joined
Nov 24, 2013
Messages
94
OK, the message appears to be referring to /root/.ssh on the FreeNAS used for PUSH. The following steps from another spread, done in PUSH's shell, appear to have fixed the problem:
mount -uw /
mkdir -p /root/.ssh/
chmod 700 /root/.ssh
mount - ur /

And if it worked you should be able to
ssh -i /data/ssh/replication ip_of_PULL
And NOT be asked to sign in with a password.
 
  • Like
Reactions: Oko
Status
Not open for further replies.
Top