Rsync push task causes kernel panic on FreeBSD destination machine

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
I'm getting kernel panics on the detination machine when running my scheduled daily rsync push task. Source machine is FreeNAS 11.3-U1. Destination machine is FreeBSD-12.1. The problem is quite repeatable. I suspect it happens on the same file every time, but determining what file is causing the problem has been elusive.

To determine what file(s) is(are) causing the issue, I'd like to run the rsync command from a command line, adding -v to identify what file is being transferred when the panic occurs. Then, I can pursue the underlying cause of the panic on the receiving end. But to do that, I need to know the options being passed on the command line. I tried starting the task, then using 'ps' in a terminal session to capture the command line, but the command line is so long that it gets truncated in the output from ps.

So I'd either like to know how to determine what FreeNAS is going to issue as a command, or at least have a way to make the log file more verbose to identify what files are being transferred.
 

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
So I added -v to the command line in the rsync task setup in an attempt to get more info in the log file. I might need -vv. I'm seeing plenty of 'skipping non-regular file' messages, and quite a few 'rsync: recv_generator: failed to stat: "XXX": Unknown error: 122 (122)' messages. The last two messages before the summary were :

Code:
rsync: recv_generator: failed to stat "<remote path file>.gz": Unknown error: 122 (122)
WARNING: <local path file>.db failed verification -- update discarded (will try again).


My suspicion is that the first of these messages preceded the crash on the destination machine, and the second is failure to locate determine the status of the <local path file>.db on the destination machine.

Here is the fault on the destination machine:

Code:
Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 02
fault virtual address    = 0xfffff80fa929a000
fault code        = supervisor write data, page not present
instruction pointer    = 0x20:0xffffffff810a3bd6
stack pointer            = 0x28:0xfffffe00a08408f0
frame pointer            = 0x28:0xfffffe00a08408f0
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 0 (zio_write_issue_0)
trap number        = 12
panic: page fault
cpuid = 1
time = 1583934767
KDB: stack backtrace:
#0 0xffffffff80c1d2b7 at kdb_backtrace+0x67
#1 0xffffffff80bd05ed at vpanic+0x19d
#2 0xffffffff80bd0443 at panic+0x43
#3 0xffffffff810a7dcc at trap_fatal+0x39c
#4 0xffffffff810a7e19 at trap_pfault+0x49
#5 0xffffffff810a740f at trap+0x29f
#6 0xffffffff81081a2c at calltrap+0x8
#7 0xffffffff8266e5ad at abd_copy_from_buf_off+0x9d
#8 0xffffffff8272a2d2 at zio_ready+0x112
#9 0xffffffff82726b7c at zio_execute+0xac
#10 0xffffffff80c2fa94 at taskqueue_run_locked+0x154
#11 0xffffffff80c30dc8 at taskqueue_thread_loop+0x98
#12 0xffffffff80b90c43 at fork_exit+0x83
#13 0xffffffff81082a6e at fork_trampoline+0xe
Uptime: 17m6s


I'm still looking for a way to run the rsync from the command line. I can't find a log file that lists the entire rsync command line, and the 'ps -au' truncates the end of the command line.
 

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
Thanks for the tip. I quite likely was chasing the wrong problem. There was corrupted data on the destination machine. I guess I'm a bit concerned, though, that this caused a kernel panic.

The destination machine has a striped pool, so zfs couldn't fix the fact that there were some files with data corruption. Not sure how the corruption happened in the first place, but they were all in the data I was rsyncing from the source machine, so I deleted the offending files and I'm running a scrub now.

I'll update when the scrub and re-rsync finished.

Thanks again.
 

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
The scrub completed successfully, and zpool status now reports no errors.

I re-ran the rsync task and the destination crashed again, the kernel panic is very similar. However, the FreeNAS source machine apparently hasn't detected the crash. The GUI reports it's still running. I attempted to start it again, and the GUI reports that the new rsync is "waiting".

I see only two ways to cancel the running rsync: A) kill the processes from the command line, but I'm a bit hesitant to go this route, since I don't know I can identify all the associated processes, given the fact that it appears to be controlled by middleware, or B) reboot.

I guess I'll bite the bullet and reboot and try again.
 

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
After reboot, I ran the rsync again. Destination machine crashed the same way, but this time FreeNAS detected the crash and reported the failure.

Looking at the log, it seems to have been in the middle of transferring a bunch of files varying in size from about 5 to 200 megabytes. I thought there might have been an issue with files being too big, but that's apparently not it. Other larger files tranferred just fine.

I briefly considered memory issues, but the destination has 32 GB of RAM.

I haven't run this by the FreeBSD forum yet. Maybe I should have gone there first, but I wanted to try to run the rsync manually first on the FreeNAS side, but I didn't (still don't) know how to duplicate the command line that the rsync task uses to start it.

Looks like it might be time to find out if there might be any suggestions from the FreeBSD crew.
 

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
ps -axww prints the un-truncated command line that starts the rsync task.
 

BlueNGray

Dabbler
Joined
Nov 27, 2019
Messages
12
Mystery solved: finally tracked the problem down to a bad memory stick. :(
 
Top