Large data transfer dies after running for a while

Status
Not open for further replies.

jonlake

Cadet
Joined
Feb 19, 2015
Messages
1
We were running open solaris with zfs for some time. Total amount of files is around 8TB across a few different volumes. Recently bought 2 new servers (hardware is super micro storage servers with 16gb ram and WD Red drives). One to keep on site (New Server B), and one to keep off-site (New Server A).

We ran our own rsync script to do the initial migration from our old server to one of our new servers over ssh and this went fine, we are currently up and running on Server A.

Trying to move the data from our new server A to new server B. I have setup rsync through the web client. Once we started having issues of it dying after running for 10+ hours, we started trying to run zfs replication jobs. The same thing happens, it seems to die after a while (10+ hours) for no reason. It has died at different times.

The logs state on both ends that the other server isn't reachable. This usually happens over night. I will try to get the actual log statements.

When I get in the next morning, both servers are online, have been up the whole time, and see no reason why they stop communicating. I haven't found any ways to get more verbose logging than this. I have checked the switch logs and there isn't any indication of ports bouncing up/down.

Server A is running on a single 1gb port. Server B is running over a LAGG (4x 1GB).

Any ideas of what to do here to figure out why either rsync of zfs replication is dying?
 
Status
Not open for further replies.
Top