SOLVED Cannot setup ZFS replication - "Permission denied (publickey,password)"

Status
Not open for further replies.

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
First off: Thanks for the huge amount of work put into FreeNAS and the support forums. I seem to have avoided a lot of common pitfalls by reading thread after thread on the forums. Thanks!

Setup
I have two nearly identical FreeNAS rigs, same hardware but at two different locations. One is used a lot and has 16 GB (ECC) RAM, the other more lightly used and has 8 GB ECC RAM. Both have three disks setup in RAID-Z1. I've used same names and general settings for volumes and datasets.

Problem
Whereas I'm replicating several datasets from A to B, which was very easy to setup, I simply cannot get replication of any dataset from B to A. Using the same recipe, my replication tasks returns the status:

"No ECDSA host key is know for <FreeNAS-A> and you have requested strict checking. Host key verification failed. Error 33 : Write error : cannot write compressed block"

By the way, autorepl.py is also run with "StrictHostKeyChecking=yes" replicating from A-to-B (by default, I assume), and I copied the public key shown by clicking the "View public key" button on rig B, not SSH key scan.

If I delete and recreate the replication task to use SSH Key Scan, instead of copying/pasting the public key, the replication task returns:

"Permission denied (publickey,password). Error 33 : Write error : cannot write compressed block"

I have verified, with PuTTY, that I can SSH from B to A, and that I actualy end up on my FreeNAS rig.

The permissions on my "zfsbackup" dataset, the target for replication, is owner user: root, owner group: wheel, owner: RWE, group: RWE, other: RE.

I don't understand why B-to-A complains about the ECDSA host key, when A-to-B never did. The common error message (33, write error, cannot write compressed block) makes me suspect that the ECDSA error does not indicate the real problem.

What am I missing? - Any help greatly appreciated!
 
D

dlavigne

Guest
Which versions (build strings) are these systems running. There were some ecdsa changes in the 9.3 series, making sure both are running the same SU might fix it.
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Thanks for your reply!

It makes sense - replication A-to-B was setup while on version 8.x, whereas I've been on 9.3 while trying to get replication B-to-A to work.

I have always ensured the boxes are running the same version, and have applied 2-3 updates (9.3-STABLE branch) while troubleshooting this issue, to see if that resolved it.

Current build (both boxes) is FreeNAS-9.3-STABLE-201503170439.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Easy fix....

1. Go to box A's CLI (local or SSH, it doesn't matter)
2. Do "ssh root@boxB" from the CLI and accept the key.
3. Go to box B's CLI (local or SSH)
4. Do "ssh root@boxA" from the CLI and accept the key.
5. Reboot both boxes and try to do replication. (not required, but ensures the data is kept on reboot).
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Thanks for your reply, cyberjock.

Unfortunately, it did not help.

I was prompted to accept the remote key when connecting from B-to-A but not from A-to-B, which is consistent with the replication problem.

I accepted the key and rebooted both boxes. I no longer get the ECDSA error, but instead I get the following error, regardless if I use the public key copied from the GUI or SSH Key Scan:

"Permission denied (publickey,password). Error 33 : Write error : cannot write compressed block"
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Hi dlavigne,

If you mean turning off compression, no. Unfortunately. I still get:

"Permission denied (publickey,password)" - (but not Error 33 : Write error : cannot write compressed block")

Also, I'm not replicating to an older version of FreeNAS and replication the other way (which works) are using lz4 compression :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I tried to respond to this thread, but I ended up posting my comments to another thread.

It looks like SSH is rejecting your password AND key, so you should probably see if you are even connecting to the right box. I've tried to SSH to the wrong box and wondered why my key and/or password didn't work, until I realized I was going to the wrong IP.

Other than that, I don't have any advice. The public key AND password are being rejected, so something is "not quite right".
 
D

dlavigne

Guest
Yeah, if cyberjock's suggestion doesn't work, it may be time to submit a bug report to bugs.freenas.org. If you do, post the issue number here.
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
It looks like SSH is rejecting your password AND key, so you should probably see if you are even connecting to the right box. I've tried to SSH to the wrong box and wondered why my key and/or password didn't work, until I realized I was going to the wrong IP.

I suspected that as well, but a) my firewall is restricted to allow SSH, and NAT it to my FreeNAS box, at site A only from the public IP address of site B, which to my knowledge should rule out a MITM scenario.

Also, I'm able logon via SSH to box A from box B, using the same FQDN used for replication. I get the FreeNAS greeting banner and can ls my volume and datasets.

I will file a bug report tomorrow and hope someone can decipher what is going on :)

Thanks!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Nothing personal, but this really sounds like a user configuration issue.
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
I don't disagree, cyberjock.

And though I've spent about a year with FreeNAS, reading the many advises given by you and other dedicated guys in the community and studying the manual and your excellent guide, I'm kind of stuck and don't know where to go from here. BSD is not my main expertise ;).

If anyone has any ideas on things to try to fix this, or things to test in order to reveal what's wrong, I'd be happy to hear it.

Meanwhile, I'll do some trial-and-error myself and not submit a new bug, at least until I'm more convinced that it's actually a bug :)
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Hi again,

I spend a couple of hours this evening digging further into this.

The log window in the bottom of the GUI showed which parameters are used, when calling SSH to do replication. I copied the command line of the latest replication attempt on both boxes, added -vvv for triple verbose output, ran the command manually from a shell and compared the debug output.

The main differences between the two, seems to be that box B finds known hosts in /root/.ssh/known_hosts, which doesn't exist on box A. I assume this is because of cyberjock's suggestion.

The other main difference is the number of keys found in /etc/ssh/ssh_known_hosts, so I downloaded this file from both boxes and studied them.

On both boxes, ssh_known_hosts contained several duplicate keys. Furthermore, on box B, on which I cannot setup replication, the file contains both keys for itself and box A. On box A, only keys for box B appears.

I tried to remove the duplicate entries and own keys from the file on box B and reoboted, but the file is replaced automatically with the previous version. I'm guessing this is because of how FreeNAS persists its configuration.

How do I apply my cleaned up file, or clear the ssh_known_hosts file altogether, from the FreeNAS config?

(Or am I on a wild goose chase here?)

Thanks!
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You need the various known_hosts and key files to be correct. Replication has no chance of working correctly until you can ssh from one box to another using the key and without getting warnings.

Ditch replication and focus on getting the ssh subsystem configured correctly for the task. Make sure the configuration persists through a reboot of both hosts. Once ssh works after that, then move on to replication.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Keep in mind that if you edit the files yourself, those changes won't be in the config file. So if you do an upgrade or have to do a fresh install and restore using your config file, you'll be right back where you are right now. The correct answer is to setup the keys (and passwords), in the WebGUI.
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Hi again - hope you're still there.

So not knowing if any garbage has been carried out from a couple of years' worth of upgrades, I pulled the trigger on my FreeNAS config yesterday - downloaded and installed latest version of FreeNAS on two blank USB sticks, imported my pools in both boxes and tried setting up replication again.

Unfortunately, the issue is completely reproducible even on a fresh install - to recap:
  1. Simply copying the public key from PULL and pasting into new replication job on PUSH, as per the documentation, causes replication job to return "No ECDSA host key is known".
  2. Creating a new replication task and using SSH Key Scan, rather than pasting public key, causes replication job to return "Permission denied (publickey,password)"
  3. Running "ssh -vv -i /data/ssh/replication hostname_or_ip" manually from the shell (of either box), as suggested in section 8.3.3 Troubleshooting Replication of the documentation, prompts for the password of Root, instead of authenticating using public key
When running the above command, the output ends with (remote FQDN replaced with "PULL.local"):

debug1: Server host key: ECDSA 4f:4e:b9:6f:43:99:7e:43:4d:c1:e0:be:bc:28:4b:3a
debug1: Host 'PULL.local' is known and matches the ECDSA host key.
debug1: Found key in /etc/ssh/ssh_known_hosts:4
debug1: ssh_ecdsa_verify: signature correct
debug2: kex_derive_keys
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: Roaming not allowed by server
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /data/ssh/replication (0x802830100), explicit
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /data/ssh/replication
debug2: we sent a publickey packet, wait for reply
debug1: Authentications that can continue: publickey,password
debug2: we did not send a packet, disable method
debug1: Next authentication method: password
root@PULL.local's password:

For the fun of it, I've tried with SSH configured both to allow and disallow "Login as Root with password". I tried copying the replication public key on PULL and assigning it explicitly to the Root user. No difference.

Any help would be greatly appreciated! :)
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Nothing personal, but this really sounds like a user configuration issue.

As mentioned above, I can reproduce the problem starting with two fresh installs and factory configs. The *only* legacy are the actual pools and I haven't even reconstructed half my configuration yet.

Still, to rule out any weird artifacts from the pool that might interfere, I created two simple VMs on my VMWare lab server. I had to ignore the minimum recommended memory of 8 GB, as I could only spare 4 GB for each VM, which I hope for the purpose of this test is acceptable. I provided each with an 8 GB disk for the FreeNAS install and a single 10 GB disk for the pool. I also set hpet0.present to false. I then proceeded to install FreeNAS-9.3-STABLE-201503270027.

Keeping the changes to factory configuration to a minimum, this is the *complete* list of changes I made to push and pull respectively:
  1. System/Hostname = push.mydomain.local and pull.mydomain.local respectively
  2. System/Timezone = Europe/Copenhagen (while I suppose not strictly necessary, helped relate to timestamps in logs)
  3. System/Show console messages in footer = Enabled
  4. Created new simple ZFS volume, vol1, using the single 10 GB vDisk, which also seems to create a dataset with the same name underneath. Default permissions
  5. Created a dataset, pushdataset1 and pulldataset1 respectively, underneath the vol1 dataset. Unix type, case sensitive, default permissions
  6. Copied a few random files to pushdataset1 so it wasn't completely empty
  7. Started the SSH service on both boxes and allowed Root password login
  8. Created snapshot task for pushdataset1. 5 minute interval, keep for 1 hour, run 24x7)
  9. Waited for a bunch of snapshots to be taken, then created a ZFS replication task for the snapshot task. Destination vol1/pulldataset1, recurse snapshots, initialize, run 24x7, get remote public key via SSH Key Scan
*** Replication fails with "Permission denied (publickey,password)"!

I would be surprised if this is an actual bug, which is reproducible on just a minimally modified config, and no one else has encountered it? On the other hand, I completely fail to spot the cock-up on my part in the steps described above.

Any clues as to log files to check, config files to tweak etc.?

I've taken snapshots of the actual VMs at various stages in the process, so I can revert to a state before trying to connect to the other box for the very first time, in case anyone can suggest things to try out.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The logs would be /var/log/messages.

Can you post a debug file from your system after the replication fails?
 
L

L

Guest
I kept running into the same problem until I did a key scan on the sending system. I am now trying to figure how to not force ECDSA. In key scan it picks up the ECDSA key too..
 

Woolbox

Dabbler
Joined
Sep 15, 2014
Messages
13
Hi again.

Hope some bright minds are still on the line. Took a while before I could find the time to dive into this again.

I tried connecting via SSH from PUSH to PULL from the shell using verbose switches, but got same response (as expected). I ran:

Code:
/usr/bin/ssh -v -i /data/ssh/replication -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 22 PULL.domain.local


All I see is that it tries publickey, then moves on to password auth, with no useful details:

debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /data/ssh/replication
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: password

Moving on, I added LogLevel DEBUG to Extra Options on the SSHD configuration on PULL, to get more details. This gave the following details in /var/log/debug.log:

May 9 00:46:48 pull sshd[67371]: debug1: trying public key file /root/.ssh/authorized_keys
May 9 00:46:48 pull sshd[67371]: debug1: Could not open authorized keys '/root/.ssh/authorized_keys': No such file or directory
May 9 00:46:48 pull sshd[67371]: debug1: trying public key file /root/.ssh/authorized_keys2
May 9 00:46:48 pull sshd[67371]: debug1: Could not open authorized keys '/root/.ssh/authorized_keys2': No such file or directory

Sure enough:

Code:
[root@pull]ls /root/.ssh/
./  ../


I have no idea why this would be empty. I have searched the sources for references to authorized_keys. Most interesting hit to me is setup-ssh-keys.sh, that seems to generate key pairs, ~/.ssh/authorized_keys and ~/.ssh/authorized_keys2.

I'm not sure if this is only run at install time, or how to otherwise fix the missing files in ~/.ssh/. As I understand it, changes made to the file system not reflected in the FreeNAS config, will not survive a reboot and/or upgrades.

Any input greatly appreciated!
 
Status
Not open for further replies.
Top