ZFS replication issue originally posted in '4 noobs'

Status
Not open for further replies.

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
No response so I suppose I should have posted here::

I have a primary FreeNAS system (ver 9.1.0 Release) set to take hourly ZFS snapshots. I then replicate to another backup box of the same version / etc. The process has been working flawlessly for months and months.

I just decided to upgrade my backup box to something more that a desktop and instead use a Dell 2950 server instead. Unfortunately I didn't find out until after the purchase that I couldn't set up the 2950 disks as JBOD, so instead I am using the Perc controller in a RAID 5 config. I know ... I know ..., ZFS is the reason to not have hardware RAID. But that's not the problem (or maybe it is).

I installed 9.2.1 Release on this new (used) box and I added a ZFS replication task to my primary box to reach out and start replicating to this box (the same as I did for my other backup box). I followed the instructions located here: http://doc.freenas.org/index.php/Replication_Tasks (I included Initialize remote side in my settings).

Once the task was created, I saw the remote side (pull) pool get deleted .. and then nothing for 6 hours. The I get the following email:

Hello,
The system was unable to replicate snapshot ZFS1 to 192.168.254.10
======================
WARNING: could not send ZFS1@auto-20140211.1300-4d: does not exist
16+1 records in
0+1 records out
8424 bytes transferred in 3.564324 secs (2363 bytes/sec)
16+1 records in
0+1 records out
8424 bytes transferred in 3.564435 secs (2363 bytes/sec)
cannot receive: failed to read from stream

Three minutes later I get this email:
Hello,
The system was unable to replicate snapshot ZFS1 to 192.168.254.10
======================
cannot receive incremental stream: most recent snapshot of ZFS2 does not
match incremental source
dd: stdout: Broken pipe
2048+0 records in
0+0 records out
0 bytes transferred in 1.632210 secs (0 bytes/sec)
dd: stdout: Broken pipe
4083+26 records in
1+0 records out
1048576 bytes transferred in 1.633266 secs (642012 bytes/sec)
warning: cannot send 'ZFS1@auto-20140215.1900-4d': Broken pipe
cannot send 'ZFS1': I/O error

I was going to load the new box with the same version of software, not knowing what else could be the problem.

Thoughts?

Tom

PS - Replication to my original backup box is still working fine, even after this failure.


Sent from my SPH-L720 using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You should stop and walk away from the Perc right now. Doing a hardware RAID is going to mask so many disk problems and cause so many other unanswered questions its not even worth your time to continue to try to solve the problem.

My first guess is you have some kind of hardware problem, perhaps with one or more disks. Since the disks are hiding behind the RAID you have no way of running SMART tests or SMART monitoring to even prove/disprove my theory. Go get yourself a compatible controller (M1015 highly recommended) and use that.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
I understand. So you are saying that nothing I am doing is inherently wrong.

I am curious .. would you say:

Perc could be causing the problem
Or
Perc is simply complicating the issue

I am not sure how I can add the controller you mentioned. Doesn't Dell have a proprietary back plain and cabling?

Sent from my SPH-L720 using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I understand. So you are saying that nothing I am doing is inherently wrong.

No, you are doing things inherently wrong by choosing to use a RAID array with ZFS on it. In the manual it makes it clear you should never put ZFS pools on a hardware RAID. You broke that cardinal rule by using the Perc.


I am curious .. would you say:

Perc could be causing the problem
Or
Perc is simply complicating the issue

Either one is possible. There's no way to know until you rule out the Perc.

I am not sure how I can add the controller you mentioned. Doesn't Dell have a proprietary back plain and cabling?

Not sure. I don't buy Dell stuff, and I can't really validate with certainty that your hardware does or doesn't have proprietary requirements. You'll have to find that out for yourself.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
Unfortunately RAID wasn't my choice. It was a 'lets see if it works'. Nuff said, I will replace the Perc

NOW I am having issues with the replication tasks (of course) on my original backup FreeNAS box. Errors on the primary telling me that the snapshot couldn't be deleted as the dataset is busy. Must have been a mid - replication reboot of the backup box....

Would it be safe to delete ALL snapshots from both the primary / back boxes and start over fresh. i.e. this is a production box and I am very afraid of deleting data.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you don't need the data that is in those snapshots then it is safe to delete them.
 

tom__w

Explorer
Joined
Mar 26, 2013
Messages
87
I can select all and delete? Is it process intensive as the manual implies?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
it will be as intensive as the amount of data it has to delete. Might take a few seconds, might take a few minutes. I'd do it when nobody plans to use the server for about 20 minutes as a buffer in case it goes unresponsive while it deletes the snapshots. ;)
And then, just wait it out. If you reboot the server because it takes too long it'll have to start over before the pool will mount.
 
Status
Not open for further replies.
Top