Replication target volume freezes on new snapshots

Status
Not open for further replies.

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Hello all!

I have set up replication between two FreeNAS servers, hoping I can use the replication target server in read-only mode (it's being used by a bunch of web servers). Alas - no luck: the whole replica volume freezes for several seconds every time a new snapshot is being added by the origin server.

Is that normal behavior and the target can't be used for any purpose other than as a backup?
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
One odd thing I noticed is that it seems that ARC cache gets completely reset every time a new snapshot is added on the target: normally it's about 20G in size, but when replication is going - it drops down to about 200-300 megabytes... Really odd.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
So, it looks like if the target volume is being used (even in read-only mode) - then it will freeze when a new snapshot is being received (or sometime around that - either before, during or after the snapshot is received from the origin FreeNAS server)... If there are no open files on that volume - then the process takes milliseconds. If many files are open (the volume is shared via NFS and being read from by multiple servers) - then freeze up to 25-30 seconds every time a new snapshot is received... Which kind of makes the whole idea completely useless :(

Looks like ZFS replication is only useful for backing volumes up to a target that's not being used. Back to square 1 for me, then, on figuring out how most efficiently replicate a volume with millions of files in hundreds of thousands of subdirectories from one data center quickly (rsync takes about 3-4 hours)...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, why haven't you posted your hardware? Complaining about something with no information to go on isn't going to help solve your problem...
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Well, why haven't you posted your hardware? Complaining about something with no information to go on isn't going to help solve your problem...

Dell R720xd server, with 8x3GB SAS disks, 2x8core CPUs, 32G RAM (Rambus 1600). The pool is made of four 2-disk mirrors. Autotune was on once, then I added a few TCP parameters to sysctl to increase the size if TCP window.

The replication is done for only one dataset, not the whole volume. The dataset is shared via NFS and mounted on 20 servers on each origin and target Freenas server.

When a new snapshot is being received in target, the entire volume locks up for up to 30 seconds, when there's any read activity on the dataset.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Unrelated - I'd recommend autotune be disabled for 32GB of RAM. Note that you'll have to remove the autotune entries from tunables/sysctls and reboot to remove them.

How big is this dataset and how full is it?
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
The auto tune only changed the arc_max setting and some IP ones - doesn't seem like it's done anything bad.

The dataset has no quota on it. The whole volume is 10T, and the dataset has about 130G used. With other datasets on the volume, about 300G used out of 10T.

I also have another FreeNAS running in VMWare, with 8G of RAM and 256G disk - same problem there as well. Once the replica is being used - "zfs receive" locks up the entire volume :( Tried both with the -F option and without - makes no difference (thought maybe the extra step of rolling back to the last snapshot might be making it worse).
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The arc_max will be overly limiting for the amount of system RAM. Saying "it doesn't seem like it's done anything bad" doesn't mean its done anything good either. In your case, you are probably using less RAM that would be used when autotune is enabled.

So I'll say it again... for your system I'd delete the tunables/sysctls from the autotune. Amazing that you're going to tell me that it doesn't do anything bad.. but you are having problems when I'm not.

If you are rolling back snapshots, that definitely ties up the server for a period of time.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
vfs.zfs.arc_max = 22949403125
~22G of arc_max (default minimum)

At any rate, without this setting the ARC cache is still the same size, and it makes no difference on the problem I'm experiencing :(

On the VM one (with 8GB of RAM) - arc_max = 4432639518.

As for the sysctls - I have changed them all (the default and auto tune ones make ssh extremely slow - about 700KB/s, with my changes - up to 10MB/s).

I have been unable to find any info on someone using the replica, while it's being replicated. All I could find is people using it for backups :(

Are you actively using replica? I mean, I have two completely different FreeNAS set ups (the Dell server with FreeNAS running on bare metal, off of SD cards, and the VM under VMWare ESXi 4.1), both of which are having the same problem...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just for comparison, I have 20GB of RAM for my server. my vfs.zfs.arc_max is 19717812224. your arc_max should encompass more than 90% of your system RAM unless you are running other services that need large quantities of RAM. In your case, 23GB out of 32GB is alot of wasted RAM and I can't imagine you are running any services on FreeNAS that needs 9GB of RAM. Other than that, I've said all I'm going to say. I think your arc_max is very low for having 32GB of RAM.

ZFS replication was meant to be used as backups only. I've never heard of someone trying to use the replica itself because it won't replicate back to the original server. I know people go there to grab a file they deleted when they realize they deleted an old file. But I don't think anyone actually tries to use a replica while it's replicating. Not to mention things might get ugly if you start editing the replica while its replicating or if you later have a need to push it back to the primary server. I'm not really sure if it would work properly or not. Always just considered it like a backup storage location to grab a file you accidentally delete or edit but wouldn't actually use at all besides that. When I do replication servers for other people I do offer to let those people have access to them via a share. But I always make the shares read-only as a conservative choice so they don't do anything to break things.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Obviously the replica is being used in read-only mode. I've had NetApp filers before, and the replication there was completely transparent - I could use replica in read-only mode without ever noticing any slowdowns, but those filers are way past their lifetime, so I thought FreeNAS could be a good substitution...

I was using rsync on FreeNAS before I decided to give the replication a shot, and rsync on nearly 20 million files/subdirectories might eat up quite a big chunk of RAM. But rsync takes nearly 4 hours to complete, and I need to have no more than 5 minute latency between content being changed on the origin server and the changes being replicated to the target one. I've written a Perl script that's been called every time changes were made on the origin server, which was copying them over to the target.

I guess I'm gonna try and see if I can use FAM to detect the changes automatically and replicate them to the remote site, but I don't expect it'll be very efficient, as there are literally hundreds of thousands of subdirectories, up to 5 levels deep, not sure if using FAM on each of them is going to work well...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Why not just point people to the original server? Seems like a lot of work to keep 2 servers in sync and I can't think of a situation where that might be beneficial.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Because origin is in New York and target is in California. And they're serving content to web servers via NFS. I can't have web servers in CA going to NFS server in NY.
 
Status
Not open for further replies.
Top