Periodic snapshots /replication fails

Status
Not open for further replies.

Niels Erik

Dabbler
Joined
Aug 9, 2015
Messages
18
Hi

I have a problem with periodic snapshots /replication.
I have two nearly identical systems 'PUSH' and 'PULL', each with an Supermicro A1SAI, Atom C2xxx, 16G ECC, two SATA WD 3T mirrored disks, and OS on 8GB Kingston USB drive.

The 'PUSH' system is the oldest, and was originally created with an FreeNAS 9.2.1, but upgraded to 9.3 Stable.
One month ago i started to look into periodic snapshots /replication on the 'PUSH' system.
But i could not get the system to generate automatic stapshots, I tried different intervals, volume/datasets, with/without system data set, recursive/non recursive,
Nothing was generated.

When I got the 'PULL' system up and running with 9.3 Stable, periodic snapshots worked right away.
I then did a fresh install on of the 'PUSH', by booting on a FreeNAS-9.3-STABLE-201506292332.iso, reconfigured from scratch, and importing the Volume on the mirrored disks.
I created an automatic snapshot (from GUI), with an interval of one day, Keep for 2 weeks, on the volume (complete pool/tank), with system data set included, and recursive.
And it worked right away.

I then configured Replication between PUSH/ PULL this also worked for a few days, until I left the PULL system shutdown (5-Aug).
Now 'PUSH' system have stopped generating automatic snapshots, and I can't get I to start again.I have rebooted the system.. Did not change anything.
I have tried to change the setting of the existing periodic snapshot. This did not change anything
I have tried to configure aditional snapshots with different settings, this did not change anything
There is no alerts.
The disks seems fine (PUSH):

[root@tango] zpool status

pool: FirstVolume
state: ONLINE
scan: scrub repaired 0 in 6h37m with 0 errors on Sun Aug 2 15:37:26 2015
config:
NAME STATE READ WRITE CKSUM
FirstVolume ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/c8d232f9-2b80-11e4-87ae-002590f10208 ONLINE 0 0 0
gptid/71c37d79-3219-11e4-88d2-002590f10208 ONLINE 0 0 0



I have created a manual snapshot with a systematic name like the automatic generated (I dont know if this is a bad idea) This did not change anything.
[root@tango] zfs snap FirstVolume@auto-20150807.1938-2w

List of snaps on the volume (Last one is manually generated):
[root@tango]zfs list -t snapshot | grep 'FirstVolume@'
FirstVolume@auto-20150801.1431-2w 168K - 1.58T -
FirstVolume@auto-20150802.1434-2w 152K - 1.58T -
FirstVolume@auto-20150803.1434-2w 4.96M - 1.58T -
FirstVolume@auto-20150804.1434-2w 128K - 1.58T -
FirstVolume@auto-20150807.1938-2w 80K - 1.58T -


Tried to send the manually generated snapshot by the following command (Don't know if this is a bad idea either):

[root@tango]zfs send FirstVolume@auto-20150807.1938-2w | ssh -i /data/ss/replication 10.0.0.230 zfs receive BackupVolume/Tango@auto-20150807.1938-2w
cannot receive new filesystem stream: destination 'BackupVolume/Tango' exists
must specify -F to overwrite it
warning: cannot send 'FirstVolume@auto-20150807.1938-2w': Broken pipe



[root@tango]bzcat debug.log.0.bz2 | grep 'autorepl' | less
[Last snapshot generated]
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:117] Autosnap replication started
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:118] temp log file: /tmp/repl-12209
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:194] Checking dataset FirstVolume
Aug 4 14:34:09 tango autorepl.py: [common.pipesubr:70] Popen()ing: /sbin/zfs list -Ht snapshot -o name,freenas:state -r -d 1 "FirstVolume"
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:223] Snapshot: FirstVolume@auto-20150804.1434-2w State: NEW
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:230] Snapshot FirstVolume@auto-20150804.1434-2w added to wanted list
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:223] Snapshot: FirstVolume@auto-20150803.1434-2w State: LATEST
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:227] Snapshot FirstVolume@auto-20150803.1434-2w is the recorded latest snapshot
Aug 4 14:34:09 tango autorepl.py: [tools.autorepl:260] Found matching latest snapshot FirstVolume@auto-20150803.1434-2w remotely
Aug 4 14:34:11 tango autorepl.py: [tools.autorepl:382] Replication result: Succeeded
Aug 4 14:34:12 tango autorepl.py: [tools.autorepl:446] Autosnap replication finished


[Lots of one minute repetions deleted....]
Aug 5 09:00:03 tango autosnap.py: [tools.autosnap:70] Popen()ing: /sbin/zfs list -t snapshot -H
Aug 5 09:00:07 tango autorepl.py: [tools.autorepl:117] Autosnap replication started
Aug 5 09:00:07 tango autorepl.py: [tools.autorepl:118] temp log file: /tmp/repl-74897
Aug 5 09:00:07 tango autorepl.py: [tools.autorepl:194] Checking dataset FirstVolume
Aug 5 09:00:07 tango autorepl.py: [tools.autorepl:223] Snapshot: FirstVolume@auto-20150804.1434-2w State: LATEST
Aug 5 09:00:07 tango autorepl.py: [tools.autorepl:227] Snapshot FirstVolume@auto-20150804.1434-2w is the recorded latest snapshot
Aug 5 09:00:07 tango autorepl.py: [tools.autorepl:446] Autosnap replication finished



[Last autorepl.py log generated ]
Aug 5 13:22:07 tango autorepl.py: [tools.autorepl:117] Autosnap replication started
Aug 5 13:22:07 tango autorepl.py: [tools.autorepl:118] temp log file: /tmp/repl-89486
Aug 5 13:22:07 tango autorepl.py: [tools.autorepl:194] Checking dataset FirstVolume
Aug 5 13:22:07 tango autorepl.py: [tools.autorepl:223] Snapshot: FirstVolume@auto-20150804.1434-2w State: LATEST
Aug 5 13:22:07 tango autorepl.py: [tools.autorepl:227] Snapshot FirstVolume@auto-20150804.1434-2w is the recorded latest snapshot
Aug 5 13:22:07 tango autorepl.py: [tools.autorepl:446] Autosnap replication finished


Autorepl.py seem to stop on the Aug 5 13:22


Questions:
1) Are the automatic snapshots known to be fragile?

2) I assumed that snapshots /replication are decoupled, and that PUSH would continue to generate a snapshot even if the communication to PULL is gone.
I would expected that the replication would try to catch up when the connection is restored.
But this seem to imply that this is not the case:
https://bugs.pcbsd.org/issues/8414

3) What happens on PUSH if I manually generate a snap of the same Volume/Dataset, with the same naming as the automatic one?
- Does it treat it as one of the automatic generated ones?
- Does it completely ignore it?
- Or does it confuse the system?

3) What happens on PULL if i do a zfs send of a manually generated a snap of the same Volume/Dataset, to the location of an automatic one?
- How can it tell the difference?

4) What can I do to get it running again (Besides reinstalling the system on PUSH, and deleting the backup history on PULL)
If I have to do this on a weekly basis, then the idea behind snapshot replication is gone.
 

Niels Erik

Dabbler
Joined
Aug 9, 2015
Messages
18
Hi Again
I copied this line from Crontab, and excecuted it.
/usr/local/bin/python /usr/local/www/freenasUI/tools/autosnap.py

This generated a snapshot, and PUSH'ed it to the backup server.
Now I just have to wait to see if it run's automated the next time.
If it does, I will mark the tread as solved.

/Niels
 
Status
Not open for further replies.
Top