Replication taking a long time...

dpearcefl · Mar 6, 2018

We have two FreeNAS Min XL units that replicate to each other over a 1 Gb/sec link. The biggest dataset takes a long time to replication. Here is a list of the snapshots:

Code:

# zfs list -rt all Storage/ImageBackups
NAME										 USED  AVAIL  REFER  MOUNTPOINT
Storage/ImageBackups						20.1T  6.84T  17.8T  /mnt/Storage/ImageBackups
Storage/ImageBackups@auto-20180219.0700-1d   139M	  -  14.6T  -
Storage/ImageBackups@auto-20180226.0700-1d  92.7G	  -  15.1T  -
Storage/ImageBackups@auto-20180305.0700-1d  1.44M	  -  17.8T  -

When a replication of a snapshot takes place is can take literally days. So it is replicating the snapshot ('used' or changed data) or all of it? For the oldest snapshot is it sending 139M or 14.6 T?

I suspect the snapshot is way larger than the 'used' column. How do I find the actual size of a single snapshot?

Thanks.

leenux_tux · Mar 6, 2018

What type of zfs send are you doing ? A full or an incremental ? Full will be the whole file system, incremental will be what has changed since the current snapshot and the previous

dpearcefl · Mar 6, 2018

Not sure is this is the best/only way to do this:

Code:

# du -A -hd 1 Storage/ImageBackups/.zfs/snapshot/
 18T	Storage/ImageBackups/.zfs/snapshot/auto-20180305.0700-1d
 15T	Storage/ImageBackups/.zfs/snapshot/auto-20180226.0700-1d
 15T	Storage/ImageBackups/.zfs/snapshot/auto-20180219.0700-1d
 48T	Storage/ImageBackups/.zfs/snapshot/

So of a 20.1T dataset with weekly snapshots, 15-18T changes from week to week?

dpearcefl · Mar 6, 2018

Incrementals.

MatthewSteinhoff · Mar 6, 2018

In the simplest terms, 'USED' shows how much data has changed between snapshots and 'REFER' shows how much data that snapshot can restore.

The most recent snapshot, auto-20180305.0700-1d, is only 1.44M different than the previous snapshot but can restore 17.8T of data. Only 1.44M of data crossed the network to create that snapshot.

While the first replicated snapshot requires all the data to cross the network, subsequent snapshots only replicate changed data. The first time replication runs, it can take a lot of time. Subsequent snapshots should be quick.

Snapshot replication is an all or nothing deal. If a snapshot fails, it can not be continued and must be started from the beginning. In most cases, that isn't a big deal because snapshots are relatively small and efficiently transfer. For that first snapshot, however, which may take hours if not days, an interruption is a serious problem because it has to start from the beginning.

How much data do you think changed between 20180226 and 20180305? Does 1.44M (the size of a 3.5-inch floppy disk?) sound right? If so, replication is working like a champ.

Cheers,
Matt

leenux_tux · Mar 6, 2018

Looking at the numbers you have listed for your snapshots, if you are doing (for example) an incremental between the two latest snapshots....

auto-20180226.0700-1d and auto-20180305.0700-1d

The copy should be virtually instant.

Do you mind if I ask you what command you are using ? Or are you using the GUI ?

dpearcefl · Mar 6, 2018

I'm using the GUI and trying to find the actual command line, but I can't find the FreeNAS logs as they are not in /var/log

This data changes a lot so I'm way more likely to believe 18 T then 1.4 M.

bigphil · Mar 6, 2018

to see how much space is being taken up by snapshots, run this command" zfs list -t all -r -o space,refer Storage/ImageBackups

dpearcefl · Mar 6, 2018

Code:

# zfs list -t all -r -o space,refer Storage/ImageBackups
NAME										AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD  REFER
Storage/ImageBackups						6.93T  20.0T	 2.20T   17.8T			  0		  0  17.8T
Storage/ImageBackups@auto-20180219.0700-1d	  -  2.20T		 -	   -			  -		  -  14.6T
Storage/ImageBackups@auto-20180305.0700-1d	  -  46.4M		 -	   -			  -		  -  17.8T

Even that sounds low. So the "used" column in the very first post is not bytes?

bigphil · Mar 6, 2018

You deleted the snapshot named "Storage/ImageBackups@auto-20180226.0700-1d" so now that is why the 20180219 snapshot reflects more used space because it is unique to it now.

dpearcefl said:
So the "used" column in the very first post is not bytes?

I'm not sure I understand the question...used column should be self explanatory. The used column denomination says what it is, i.e. M, G, T, etc.

dpearcefl · Mar 6, 2018

"Used" of 1.4M or 46.4M can't be bytes. This data changes way to often for either of these to be the amount of space used by the snapshot. And the replication takes days of 500Mb/sec traffic. Thanks.

bigphil · Mar 6, 2018

Yes, its bytes. The USED space on a snapshot from the output of "zfs list" is just confusing to most. Here is some reading for you: link1, link2.

Important Announcement for the TrueNAS Community.

Replication taking a long time...

dpearcefl

Contributor

leenux_tux

Patron

dpearcefl

Contributor

dpearcefl

Contributor

MatthewSteinhoff

Guru

leenux_tux

Patron

dpearcefl

Contributor

bigphil

Patron

dpearcefl

Contributor

bigphil

Patron

dpearcefl

Contributor

bigphil

Patron

Similar threads

Important Announcement for the TrueNAS Community.

Replication taking a long time...

Contributor

Patron

Contributor

Contributor

Guru

Patron

Contributor

Patron

Contributor

Patron

Contributor

Patron

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Replication taking a long time..."

Similar threads