I have three replications, pulled by my backup NAS from the primary NAS. Two of them, the big ones, work fine.
The third is of my iocage tree, and has been failing since late last week, probably synced with 11.3u5 being installed. I have been getting emails saying:
* Replication "DataPool/iocage - DataPoolBackup/iocage" failed: failed to read from stream
Command failed with code 1..
All the settings are identical to the other two replications apart from the source and destination datasets.
I deleted and re-created the replication templated off one of the existing ones, changing only source and destination datasets, everything else identical. It identifies 17 snapshots to transfer, then again fails with code 1. So that's fun, I guess.
I also deleted and re-created the dataset, then remade the replication, same results.
But today, some excitement. Overnight I got this email instead:
New alerts:
* Replication "DataPool/iocage - DataPoolBackup/iocage" failed: checksum mismatch or incomplete stream.
Partially received snapshot is saved.
A resuming stream can be generated on the sending system by running:
zfs send -t 1-11552caf99-f0-789c636064000310a500c4ec50360710e72765a5269740f80cd8e4d3d28a534b18e00024cf86249f5459925acc802a8facbf243fbd34338581e19de1ca08568657950e48f29c60f9bcc4dc54060697c492c480fcfc1cfdccfce4c4f45487c4d2927c5d23032303430303433d430b03035da35ca8f9dc0c08ff24e7e71614a51617e76723dc0000ed82205d
Command failed with code 1..
Now every time I run the replication I get the same message with the same resume key. Something's stuck.
I can't use that suggested command line as-is, it would need me to construct the rest of the transport details, piping through SSH and netcat manually and I don't even know how to find the ssh keyfile it would need as it doesn't seem to be in the usual places - buried in the DB somewhere perhaps? So, thanks for the message but it's practically useless. Can it be improved?
That aside, there's this partial. How to clean it out? zfs list -t snapshots doesn't show it. Does it go away with a timeout? Reboot? Export-import? Until I delete the dataset? Am I stuck with it forever eating space?
Any idea where to look for clues as to the 'code 1' failure?
/Edit to add: The larger, working replications are 1Tb and 20Tb, the failing one is only 300gig or so.
The third is of my iocage tree, and has been failing since late last week, probably synced with 11.3u5 being installed. I have been getting emails saying:
* Replication "DataPool/iocage - DataPoolBackup/iocage" failed: failed to read from stream
Command failed with code 1..
All the settings are identical to the other two replications apart from the source and destination datasets.
I deleted and re-created the replication templated off one of the existing ones, changing only source and destination datasets, everything else identical. It identifies 17 snapshots to transfer, then again fails with code 1. So that's fun, I guess.
I also deleted and re-created the dataset, then remade the replication, same results.
But today, some excitement. Overnight I got this email instead:
New alerts:
* Replication "DataPool/iocage - DataPoolBackup/iocage" failed: checksum mismatch or incomplete stream.
Partially received snapshot is saved.
A resuming stream can be generated on the sending system by running:
zfs send -t 1-11552caf99-f0-789c636064000310a500c4ec50360710e72765a5269740f80cd8e4d3d28a534b18e00024cf86249f5459925acc802a8facbf243fbd34338581e19de1ca08568657950e48f29c60f9bcc4dc54060697c492c480fcfc1cfdccfce4c4f45487c4d2927c5d23032303430303433d430b03035da35ca8f9dc0c08ff24e7e71614a51617e76723dc0000ed82205d
Command failed with code 1..
Now every time I run the replication I get the same message with the same resume key. Something's stuck.
I can't use that suggested command line as-is, it would need me to construct the rest of the transport details, piping through SSH and netcat manually and I don't even know how to find the ssh keyfile it would need as it doesn't seem to be in the usual places - buried in the DB somewhere perhaps? So, thanks for the message but it's practically useless. Can it be improved?
That aside, there's this partial. How to clean it out? zfs list -t snapshots doesn't show it. Does it go away with a timeout? Reboot? Export-import? Until I delete the dataset? Am I stuck with it forever eating space?
Any idea where to look for clues as to the 'code 1' failure?
/Edit to add: The larger, working replications are 1Tb and 20Tb, the failing one is only 300gig or so.
Last edited: