ZFS Replication problem

Status
Not open for further replies.

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
I have a primary FreeNAS system (ver 9.1.0 Release) set to take hourly ZFS snapshots. I then replicate to another backup box of the same version / etc. The process has been working flawlessly for months and months.

I just decided to upgrade my backup box to something more that a desktop and instead use a Dell 2950 server instead. Unfortunately I didn't find out until after the purchase that I couldn't set up the 2950 disks as JBOD, so instead I am using the Perc controller in a RAID 5 config. I know ... I know ..., ZFS is the reason to not have hardware RAID. But that's not the problem (or maybe it is).

I installed 9.2.1 Release on this new (used) box and I added a ZFS replication task to my primary box to reach out and start replicating to this box (the same as I did for my other backup box). I followed the instructions located here: http://doc.freenas.org/index.php/Replication_Tasks (I included Initialize remote side in my settings).

Once the task was created, I saw the remote side (pull) pool get deleted .. and then nothing for 6 hours. The I get the following email:

Hello,
The system was unable to replicate snapshot ZFS1 to 192.168.254.10
======================
WARNING: could not send ZFS1@auto-20140211.1300-4d: does not exist
16+1 records in
0+1 records out
8424 bytes transferred in 3.564324 secs (2363 bytes/sec)
16+1 records in
0+1 records out
8424 bytes transferred in 3.564435 secs (2363 bytes/sec)
cannot receive: failed to read from stream

Three minutes later I get this email:
Hello,
The system was unable to replicate snapshot ZFS1 to 192.168.254.10
======================
cannot receive incremental stream: most recent snapshot of ZFS2 does not
match incremental source
dd: stdout: Broken pipe
2048+0 records in
0+0 records out
0 bytes transferred in 1.632210 secs (0 bytes/sec)
dd: stdout: Broken pipe
4083+26 records in
1+0 records out
1048576 bytes transferred in 1.633266 secs (642012 bytes/sec)
warning: cannot send 'ZFS1@auto-20140215.1900-4d': Broken pipe
cannot send 'ZFS1': I/O error

I was going to load the new box with the same version of software, not knowing what else could be the problem.

Thoughts?

Tom

PS - Replication to my original backup box is still working fine, even after this failure.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
Question ... Is it possible that the snapshot took so long to replicate that it caused a problem? Reason being is that I now see on the remote box the beginning of the original snapshot that is on the push box. Disk space used is about half of what it should be.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
Yes I think so. I purchased a m1015 and flashed it to lsi -it mode. The card works well ... but didn't fix my problem.

There are two problems from the messages above:

Broken pipe - I figured out that this happens if the connection between the boxes gets broken (for whatever reason) during the transfer. Apparently SSH doesn't recover very well.

The second problem (listed first above) was because a snapshot process deleted the incremental snapshot that the replicate process was expecting to see.

For me I found that setting up a new replication works best during non - snapshot hours.

I also recently received some error messages about the boot loader thinking that the nic drivers are missing? I ordered a dual port Intel nic to replace the broadcom on board nics. I was concerned that this error may also break the SSH trust.

Tom

Sent from my SPH-L720 using Tapatalk
 
Status
Not open for further replies.
Top