PhiloEpisteme
Guru
- Joined
- Oct 18, 2018
- Messages
- 969
Hi folks,
Source Machine
Backplane: SAS846EL2
FreeNAS Release: FreeNAS-11.2-RELEASE-U3
Board: SuperMicro X11SSM-F
Processor: Intel Core i3-7100 @ 3.90GHz, 2 Cores, 4 Threads
Memory: 2x16GB Crucial CT16G4WFD8266 DDR4-2666 ECC UDIMM
NIC: Chelsio T520-CR
HBA: LSI/Broadcom SAS9207-8i, 6Gbps SAS PCI-E 3.0 HBA, Firmware 20.00.07.00
tank: 1x RAIDZ2, 6x 3TB vdev
vault (encrypted): 1x RAIDZ2, 6x 2TB vdev
Boot pool: mirrored 2 x 64GB SuperMicro SATA DOM SSD-DM064-SMCMVN1
Target Machine
FreeNAS Release: FreeNAS-11.2-RELEASE-U6
Board: SuperMicro X9SCM-F
Processor: Intel Xeon E2-1220 @ 3.10 GHz, 4 Cores, 4 Threads
Memory: 1x8GB Crucial CT102472BD160B DDR3 SDRAM-1600 ECC UDIMM DDR3L
HBA: LSI/Broadcom SAS9210-8i, 6Gbps SAS PCI-E 2.0 HBA, Firmware 20.00.07.00
Expander: IBM 46M0997 16-Port SAS2, Firmware 634A
Backplane: Supermicro BPN-SAS-836TQ
NIC: Chelsio T520-CR
tank1: 1x mirrored, 2x 8TB vdev
tank2: 1x mirrored, 2x 8TB vdev
vault1: 1x mirrored, 2x 8TB vdev
vault2: 1x mirrored, 2x 8TB vdev
Boot pool: 1 mirror vdev, 2 x 64GB TOSHIBA-TR200 SSD
Network:
The two machines are connected via fibre directly between the two 10G NICs.
For about a year now my backup strategy was as follows. Every month or so I swap out tank1 and vault1 with tank2 and vault2 in my backup server. At the time of the swap I also disable the replication tasks in source machine for those two pools I am swapping out and enable the two new ones. There are 4 replication tasks on the source machine; one for each target on the target machine.
Recently I had to rebuild some of my target replication pools. I rebuilt tank1 and vault1 completely. Replication was normal to vault1 and only took a few hours from an empty pool to replication completed.
Replication for tank1, however, was impossibly slow. Rather than reporting a percent complete when clicking the replication icon in the top right as what happened with vault1, tank1 only shows "sending"; the same was true in the replication task menu. After several hours only a few hundred megabytes were transferred.
I have verified that I am able to SSH between machines using the key in
The replication task is configured to "Recursively Replicate Child Dataset Snapshots" and "Delete Stale Snapshots on Remote System". The begin time is 00:00:00 and the end time is 23:59:00. I am not using a dedicated user.
Snapshots are configured as recursive snapshots also from 00:00:00 to 23:59:00 taken every 15 minutes keeping snapshots up to 3 days. Only a single snapshot task is configured for this pool and it is configured for the top-most dataset.
I've tried debugging myself and come up empty handed. On the sending machine I moved the system dataset off of vault and on to the boot drive; that had no effect.
I have also tried deleting and recreating the zfs send configuration with no luck.
I don't see anything immediately obviously relevant in
In
I'm sure I've got something misconfigured but I am having a hard time identifying exactly what. I've seen posts where folks have had similar issues and either their fix didn't seem to work for me or their situation was different (such as bugs with earlier versions of FreeNAS).
Let me know if there is other information I can provide to help troubleshoot; otherwise I'm hoping someone can point me in the right direction.
Source Machine
Backplane: SAS846EL2
FreeNAS Release: FreeNAS-11.2-RELEASE-U3
Board: SuperMicro X11SSM-F
Processor: Intel Core i3-7100 @ 3.90GHz, 2 Cores, 4 Threads
Memory: 2x16GB Crucial CT16G4WFD8266 DDR4-2666 ECC UDIMM
NIC: Chelsio T520-CR
HBA: LSI/Broadcom SAS9207-8i, 6Gbps SAS PCI-E 3.0 HBA, Firmware 20.00.07.00
tank: 1x RAIDZ2, 6x 3TB vdev
vault (encrypted): 1x RAIDZ2, 6x 2TB vdev
Boot pool: mirrored 2 x 64GB SuperMicro SATA DOM SSD-DM064-SMCMVN1
Target Machine
FreeNAS Release: FreeNAS-11.2-RELEASE-U6
Board: SuperMicro X9SCM-F
Processor: Intel Xeon E2-1220 @ 3.10 GHz, 4 Cores, 4 Threads
Memory: 1x8GB Crucial CT102472BD160B DDR3 SDRAM-1600 ECC UDIMM DDR3L
HBA: LSI/Broadcom SAS9210-8i, 6Gbps SAS PCI-E 2.0 HBA, Firmware 20.00.07.00
Expander: IBM 46M0997 16-Port SAS2, Firmware 634A
Backplane: Supermicro BPN-SAS-836TQ
NIC: Chelsio T520-CR
tank1: 1x mirrored, 2x 8TB vdev
tank2: 1x mirrored, 2x 8TB vdev
vault1: 1x mirrored, 2x 8TB vdev
vault2: 1x mirrored, 2x 8TB vdev
Boot pool: 1 mirror vdev, 2 x 64GB TOSHIBA-TR200 SSD
Network:
The two machines are connected via fibre directly between the two 10G NICs.
For about a year now my backup strategy was as follows. Every month or so I swap out tank1 and vault1 with tank2 and vault2 in my backup server. At the time of the swap I also disable the replication tasks in source machine for those two pools I am swapping out and enable the two new ones. There are 4 replication tasks on the source machine; one for each target on the target machine.
Recently I had to rebuild some of my target replication pools. I rebuilt tank1 and vault1 completely. Replication was normal to vault1 and only took a few hours from an empty pool to replication completed.
Replication for tank1, however, was impossibly slow. Rather than reporting a percent complete when clicking the replication icon in the top right as what happened with vault1, tank1 only shows "sending"; the same was true in the replication task menu. After several hours only a few hundred megabytes were transferred.
I have verified that I am able to SSH between machines using the key in
/data/ssh/replication
.The replication task is configured to "Recursively Replicate Child Dataset Snapshots" and "Delete Stale Snapshots on Remote System". The begin time is 00:00:00 and the end time is 23:59:00. I am not using a dedicated user.
Snapshots are configured as recursive snapshots also from 00:00:00 to 23:59:00 taken every 15 minutes keeping snapshots up to 3 days. Only a single snapshot task is configured for this pool and it is configured for the top-most dataset.
I've tried debugging myself and come up empty handed. On the sending machine I moved the system dataset off of vault and on to the boot drive; that had no effect.
I have also tried deleting and recreating the zfs send configuration with no luck.
I don't see anything immediately obviously relevant in
/var/log/messages
except the following likely related to moving the system dataset.Code:
Oct 15 00:00:00 NAS newsyslog[74345]: logfile turned over due to size>200K Oct 15 00:00:00 NAS syslog-ng[19121]: Configuration reload request received, reloading configuration; Oct 15 00:00:00 NAS syslog-ng[19121]: Configuration reload finished; Oct 15 03:40:55 NAS syslog-ng[19121]: syslog-ng shutting down; version='3.14.1' Oct 15 03:46:34 NAS syslogd: kernel boot file is /boot/kernel/kernel Oct 15 03:47:23 NAS /autosnap.py: [tools.autosnap:535] Failed to create snapshot 'vault@auto-20191015.0347-1w': cannot create snapshot 'vault/.system/cores': dataset does not exist cannot create snapshot 'vault/.system/cores@auto-20191015.0347-1w': dataset does not exist no snapshots were created
In
/var/log/debug.log
I see the followingCode:
Oct 15 03:47:21 NAS /autorepl.py: [tools.autorepl:131] Sending zfs snapshot: /sbin/zfs send -V -p -i vault/iocage/jails/phabricator/root@auto-20191014.1325-1w vault/iocage/jails/phabricator/root@auto-20191014.1330-1w | /usr/local/bin/lz4c | /usr/local/bin/pipewatcher $$ | /usr/local/bin/ssh -i /data/ssh/replication -o BatchMode=yes -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 23 192.168.13.149 "/usr/bin/env lz4c -d | /sbin/zfs receive -F -d 'vault2' && echo Succeeded" Oct 15 03:47:23 NAS /autosnap.py: [tools.autosnap:614] Autorepl running, skip destroying snapshots Oct 15 03:47:23 NAS /autorepl.py: [tools.autorepl:150] Replication result: Succeeded Oct 15 03:47:23 NAS /autorepl.py: [tools.autorepl:131] Sending zfs snapshot: /sbin/zfs send -V -p -i vault/iocage/jails/phabricator/root@auto-20191014.1330-1w vault/iocage/jails/phabricator/root@auto-20191014.1335-1w | /usr/local/bin/lz4c | /usr/local/bin/pipewatcher $$ | /usr/local/bin/ssh -i /data/ssh/replication -o BatchMode=yes -o StrictHostKeyChecking=yes -o ConnectTimeout=7 -p 23 192.168.13.149 "/usr/bin/env lz4c -d | /sbin/zfs receive -F -d 'vault2' && echo Succeeded" Oct 15 03:47:24 NAS /autorepl.py: [tools.autorepl:172] Checking if process 20038 is still alive Oct 15 03:47:24 NAS /autorepl.py: [tools.autorepl:176] Process 20038 still working, quitting
I'm sure I've got something misconfigured but I am having a hard time identifying exactly what. I've seen posts where folks have had similar issues and either their fix didn't seem to work for me or their situation was different (such as bugs with earlier versions of FreeNAS).
Let me know if there is other information I can provide to help troubleshoot; otherwise I'm hoping someone can point me in the right direction.
Last edited: