SOLVED FreeNAS ZFS incremental replication

Status
Not open for further replies.
Joined
Jun 13, 2017
Messages
7
Hi! First of all I apologize in advance if my question could sound silly, but I'm banging my head on the wall for too much time.
I'm using two FreeNAS 9.10.U3 boxes, let's call them A and B, to replicate a couple of datasets. A and B are connected via 1 Gbps link just across the room.

I've set the auto periodic snapshots (every 15 mins, duration 1h) and set the replication tasks which they start correctly and sync up (message "up to date").
Keeping both boxes up and running, I was expecting the following replication run (driven by auto-snapshot) would transmit incrementally data from A => B.
Since I change little data - let's say just a 1 GByte at max, I was expecting short replication time according.
What happens and bothers me is always a full-sync, no matter what, even for immobile dataset (no change at all).

My config data, in brief:
SNAPSHOTS:
- auto dataset snapshot every 15 min, keep for 1h (replication lasts 8h +)

REPLICATION
Recursively replicate child dataset's snapshots: OFF
Delete stale snapshots on remote system: OFF

Thanks for your time and help.
Regards, Angelo
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
What happens and bothers me is always a full-sync, no matter what, even for immobile dataset (no change at all).
How are you determining that this is happening? Snapshots and replication by nature don't work like that.

Explain more what dataset you are snapshotting and where you are replicating to on the remote machine.
 
Joined
Jun 13, 2017
Messages
7
Thanks for prompt reply, let's see if you can spot my error.
Let's start from the setup: there's a dataset (tank/clone) on machine A which I would like to keep replicated on machine B (tankbig/clone).
The replication process succesfully copies the tank/clone dataset to the tankbig zpool, and creates automatically the tankbug/clone dataset on B.

The snapshots on A fire every 15 minutes. Just after the replication ends succesfully (after 4h+), letting a couple minutes it starts again from 0% and lasts again 4h+
This even little changes are happened on what is inside dataset - let's chaging a couple of word documents.
I've attached screenshots of the snapshot and replication tasks on A.
 

Attachments

  • replication.png
    replication.png
    20.9 KB · Views: 629
  • snapshot.png
    snapshot.png
    15.7 KB · Views: 617

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
dataset tank/clone is 922 GiB , slightly compressed by lz4 (ratio 1.09x).
Are there any child datasets within tank/clone? If so, shouldn't you enable the 'Recursively replicate child dataset's snapshots' checkbox?

FWIW, I run very similar replication tasks between two FreeNAS servers with both the 'Recursively replicate child dataset's snapshots' and 'Delete stale snapshots on remote system' options enabled, with no problems.
 
Joined
Jun 13, 2017
Messages
7
There aren't any child datasets, "clone" is child dataset of pool tank. I willl try anyway your suggested options.
Will let you know soon.
Thanks
 

Attachments

  • pool-dataset.png
    pool-dataset.png
    16.7 KB · Views: 583

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
There aren't any child datasets, "clone" is child dataset of pool tank. I willl try anyway your suggested options.
Will let you know soon.
Thanks
Ah! Then it shouldn't matter.

The only other thing I can think of is to verify that the date and time of your two FreeNAS servers are synchronized.
 
Joined
Jun 13, 2017
Messages
7
Nice idea, but sadly they're reporting same date/time and sync'ing with the same NTP servers (the ones provided by FreeNAS install)
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Nice idea, but sadly they're reporting same date/time and sync'ing with the same NTP servers (the ones provided by FreeNAS install)
Ah, well... it was worth a shot! :)

I (vaguely) remember reading an old bug report about this possibly being a problem.

Suggestion: Work with a smaller dataset to save time while you're debugging the replication setup. You might also consider upgrading to FreeNAS 9.10.2 U5.

Good luck!
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
I think i might see a hint of the problem:

- auto dataset snapshot every 15 min, keep for 1h (replication lasts 8h +)

(i notice that your screencaps say 1 week? is this the same snapshot?)

why are you only keeping for 1h? this means your network has to transfer 100% of the replica in under one hour

zfs replication incremental runs as one snapshot compared to another, sending the differences

you must have an existing snapshot for incremental identical across replication servers. if you are keeping snapshots for one hour, I would guess you are erasing all replicated data every hour. i think the in-process replication would complete sucessfully, the snapshot deletes as its not in use, and...it replicates the entire stream again because you have no snapshots in common anymore.

incremental backups usually work based on tiered copies
ie
1 snap per week for 1 month
1 snap per day for 1 week
1 snap per hour for 1 day

you could probably make a manual snap,then let it replicate every hour, and delete the manual, but if something messes up the replication rotation for an hour you would lose the whole replica sync and have to start from scratch.

snapshots generally have no performance impact (unless over something like 5000 depending on hardware) and only use space for changes since the last snap so I'm not sure why you would want to gimp a zfs feature so much.

fwiw at the command line the replication would look something like this: (syntax may be a bit off)
zfs send -I tank/clone@auto-20170317.0903-2m tank/data@auto-20170318.0945-2w | ssh root@192.168.1.200 'zfs receive tankbig'
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I think i might see a hint of the problem:

- auto dataset snapshot every 15 min, keep for 1h (replication lasts 8h +)

(i notice that your screencaps say 1 week? is this the same snapshot?)

why are you only keeping for 1h? this means your network has to transfer 100% of the replica in under one hour

zfs replication incremental runs as one snapshot compared to another, sending the differences

you must have an existing snapshot for incremental identical across replication servers. if you are keeping snapshots for one hour, I would guess you are erasing all replicated data every hour. i think the in-process replication would complete sucessfully, the snapshot deletes as its not in use, and...it replicates the entire stream again because you have no snapshots in common anymore.

incremental backups usually work based on tiered copies
ie
1 snap per week for 1 month
1 snap per day for 1 week
1 snap per hour for 1 day

you could probably make a manual snap,then let it replicate every hour, and delete the manual, but if something messes up the replication rotation for an hour you would lose the whole replica sync and have to start from scratch.

snapshots generally have no performance impact (unless over something like 5000 depending on hardware) and only use space for changes since the last snap so I'm not sure why you would want to gimp a zfs feature so much.

fwiw at the command line the replication would look something like this: (syntax may be a bit off)
zfs send -I tank/clone@auto-20170317.0903-2m tank/data@auto-20170318.0945-2w | ssh root@192.168.1.200 'zfs receive tankbig'
The same idea occurred to me... but the OP posted an image of his snapshot settings, which show that he's keeping them for a week.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
The same idea occurred to me... but the OP posted an image of his snapshot settings, which show that he's keeping them for a week.

ya i noticed that after writing it all up lol. i crammed a note in about it :/

I think i made the same mistake myself learning how it all worked.

also had 2 dataset replicas nuking each other due to giving them the same location like a total noob.
 
Joined
Jun 13, 2017
Messages
7
Hi, seems Artlessknave was right!
I've set 'Recursively replicate child dataset's snapshots' and 'Delete stale snapshots on remote system' options enabled on the Replication task.
Now everything works correctly! Thanks you all for time and precious help!
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
cool
 
Status
Not open for further replies.
Top