9.10.1-U2 replication issue

Status
Not open for further replies.
Joined
Jun 20, 2016
Messages
22
So I upgraded to 9.10.1-U1 and started to see strange replication behavior. After that the U2 update came with fixes so I upgraded. Issues were fixed, but one of them remained from U1. At least replication is running fine now.

I will name the two servers as in the manual with PUSH and PULL.
On PULL I'm noticing extra dataset being created. The dataset remains read only. It also does not exist on the PUSH server.
I wonder how can I check why the push server is deciding to create it?

Some logs from the PUSH server. If i delete the dataset on PULL, it's being created again. This part of the log is after deletion.

Oct 6 17:11:00 storage2 autosnap.py: [tools.autosnap:66] Popen()ing: /sbin/zfs list -t snapshot -H -o name
Oct 6 17:11:00 storage2 autorepl.py: [tools.autorepl:233] Autosnap replication started
Oct 6 17:11:00 storage2 autorepl.py: [tools.autorepl:234] temp log file: /tmp/repl-10752
Oct 6 17:11:00 storage2 autorepl.py: [tools.autorepl:305] Checking dataset storage
Oct 6 17:11:00 storage2 autorepl.py: [common.pipesubr:66] Popen()ing: /sbin/zfs list -H -t snapshot -p -o name,creation -r "storage"
Oct 6 17:11:00 storage2 autorepl.py: [tools.autorepl:334] ds = storage, remotefs = backups/storage
Oct 6 17:11:01 storage2 autorepl.py: [common.pipesubr:66] Popen()ing: /usr/bin/ssh -c arcfour256,arcfour128,blowfish-cbc,aes128-ctr,aes192-ctr,aes256-ctr -i /data/ssh/replication -o BatchMode=yes -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 22 10.10.0.7 "zfs list -H -t snapshot -p -o name,creation -r 'backups/storage'"
Oct 6 17:11:02 storage2 autorepl.py: [tools.autorepl:621] Autosnap replication finished

And here is the next run
Oct 6 17:12:00 storage2 autosnap.py: [tools.autosnap:66] Popen()ing: /sbin/zfs list -t snapshot -H -o name
Oct 6 17:12:01 storage2 autorepl.py: [tools.autorepl:233] Autosnap replication started
Oct 6 17:12:01 storage2 autorepl.py: [tools.autorepl:234] temp log file: /tmp/repl-11016
Oct 6 17:12:01 storage2 autorepl.py: [tools.autorepl:305] Checking dataset storage
Oct 6 17:12:01 storage2 autorepl.py: [common.pipesubr:66] Popen()ing: /sbin/zfs list -H -t snapshot -p -o name,creation -r "storage"
Oct 6 17:12:01 storage2 autorepl.py: [tools.autorepl:334] ds = storage, remotefs = backups/storage
Oct 6 17:12:01 storage2 autorepl.py: [tools.autorepl:342] Unable to create remote dataset backups/storage: cannot create 'backups/storage/storage': dataset already exists
Oct 6 17:12:01 storage2 autorepl.py: [common.pipesubr:66] Popen()ing: /usr/bin/ssh -c arcfour256,arcfour128,blowfish-cbc,aes128-ctr,aes192-ctr,aes256-ctr -i /data/ssh/replication -o BatchMode=yes -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 22 10.10.0.7 "zfs list -H -t snapshot -p -o name,creation -r 'backups/storage'"
Oct 6 17:12:01 storage2 autorepl.py: [tools.autorepl:621] Autosnap replication finished
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
You should create a bug report
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
How is this replication task configured? More specifically: What are your settings in the following fields in the GUI?

Remote ZFS Volume/Dataset:
Recursively replicate child dataset's snapshots:
 
Joined
Jun 20, 2016
Messages
22
Remote ZFS Volume/Dataset: backups/storage
Recursively replicate child dataset's snapshots: checked

As I see there are already two similar bug reports in the tracking system.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
I"m not sure if I'm assessing your configuration correctly, but when comparing my replication jobs (and messages in /var/log/debug.log) to yours I'm tempted to say that backups (instead of backups/storage) would be the correct setting for 'Remote ZFS Volume/Dataset'.

Configured this way, I see
Code:
Oct  7 12:32:06 blunzn autorepl.py: [tools.autorepl:342] Unable to create remote dataset volrep: cannot create 'volrep/common': dataset already exists

messages in debug.log but no extra datasets in 9.10.1-U2.

BUT: I may be totally wrong as I don't have any recursive replication jobs on my system.
 
Joined
Jun 20, 2016
Messages
22
No it's ok like it is now.
My structure on pull now is
backups/storage/dataset1
backups/storage/dataset2
backups/storage/dataset3
Which is I what.

As a mounted folder i have /mnt/backups/storage/datasets

The wrong dataset being created is backup/storage/storage. The rest of them are replicating fine as they were before U1 and U2
 

rafadavidc

Dabbler
Joined
Mar 26, 2016
Messages
38
Yeah, I'm having this issue as well for most of my datasets. I posted my email bodies to the bug that Vladimir posted to the tracker.
 

rafadavidc

Dabbler
Joined
Mar 26, 2016
Messages
38
I ended up deleting redundant replication tasks.

Snapshot vol2, recursive
Snapshot vol2/MainSet, recursive
Snapshot vol2/MediaSet, recursive

The collisions were because the recursive vol2 task had already made some snapshots that the MainSet and MediaSet recursive tasks were also trying to make. Deleting the vol2 task and making sure I had a recursive task for each dataset in the volume took care of it.
 
Joined
Jun 20, 2016
Messages
22
Do you mean mails from push or pull?
The error about replication is seen in debug.log on push and it's not included in the mails.

But I see something else on pull server
statvfs(/mnt/backups/storage/storage) failed: No such file or directory

This can be observed on /var/log/messages . And its true if i try to do ls it does not exist as a directory. BUT if I do zfs list it's there
backups/storage/storage 28.2K 3.14T 28.2K /mnt/backups/storage/storage

I also see it in the web interface and i can delete it. But it gets recreated.
 
Joined
Jun 20, 2016
Messages
22
This issue is finally gone with the 9.10.2 update
Now each minute all I see is
Code:
autosnap.py: [ws4py:360] Closing message received (1000) ''
 
Status
Not open for further replies.
Top