Sean Hafeez
Dabbler
- Joined
- Jun 23, 2014
- Messages
- 31
I have opened a Jira:
NAS-113624
Posting here wondering if anyone else has seen this or if anyone has been able to make this work in a similar setup.
Local system is Mini-2.0 running TrueNAS-12.0-U6.1
Remote is Mini-3.0-XL+ running TrueNAS-SCALE-22.02-RC.1-2
Sending a snapshot under 1T works fine. Sending the one that is 1.91T breaks at 1.01T
Remote:
mbuffer -I 9090 | zfs receive data/barrel-old/music@migrate
Local:
zfs snapshot main/music@migrate
zfs send -v main/music@migrate | mbuffer -O barrel2:9090
On Local I see this 1000 times when it hangs:
in @ 0.0 kiB/s, out @ 0.0 kiB/s, 1031 GiB total, buffer 100% full07:45:36 1.01T main/music@migrate
I am able to ctrl-c the process
On the Remote I see this on the console:
VERIFY3(drrw->drr_logical_size > dn->dn_datablksz) failed (118784 > 131072)
This zfs receive is NOT KILLABLE - it ignores kill -9 etc. I have to issue reboot from ssh and that hangs for 3-4 minutes before it reboots.
Manually setting mbuffer memory sizes does not change what happens.
Here is the crash:
[10099.095091] VERIFY3(drrw->drr_logical_size > dn->dn_datablksz) failed (118784 > 131072)
[10099.095155] PANIC at dmu_recv.c:1906:flush_write_batch_impl()
[10099.095190] Showing stack for process 56507
[10099.095218] CPU: 6 PID: 56507 Comm: receive_writer Tainted: P OE 5.10.70+truenas #1
[10099.095270] Hardware name: iXsystems TRUENAS-MINI-3.0-XL+/A2SDi-H-TF, BIOS 1.3.V1 06/08/2020
[10099.095318] Call Trace:
[10099.095345] dump_stack+0x6b/0x83
[10099.095380] spl_panic+0xd4/0xfc [spl]
[10099.095520] ? list_head+0x9/0x20 [zfs]
[10099.095662] ? txg_list_add+0x99/0xd0 [zfs]
[10099.095791] ? dsl_dir_dirty+0x34/0x80 [zfs]
[10099.095918] ? dsl_dir_willuse_space+0xab/0x110 [zfs]
[10099.095951] ? _cond_resched+0x16/0x40
[10099.096077] ? dsl_pool_dirty_space+0x83/0xc0 [zfs]
[10099.096201] ? flush_write_batch_impl+0x23c/0x550 [zfs]
[10099.096326] flush_write_batch_impl+0x452/0x550 [zfs]
[10099.096449] ? dmu_free_long_range_impl+0x38/0x460 [zfs]
[10099.096580] flush_write_batch+0x37/0xb0 [zfs]
[10099.096700] receive_process_record+0x87/0x2b0 [zfs]
[10099.096824] receive_writer_thread+0xb3/0x1c0 [zfs]
[10099.096948] ? receive_process_record+0x2b0/0x2b0 [zfs]
[10099.096989] thread_generic_wrapper+0x78/0xb0 [spl]
[10099.097028] ? IS_ERR+0x10/0x10 [spl]
[10099.097053] kthread+0x11b/0x140
[10099.098145] ? __kthread_bind_mask+0x60/0x60
[10099.099240] ret_from_fork+0x22/0x30
[10271.696313] INFO: task txg_quiesce:11778 blocked for more than 120 seconds.
[10271.699427] Tainted: P OE 5.10.70+truenas #1
[10271.702631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.705920] task:txg_quiesce state:D stack: 0 pid:11778 ppid: 2 flags:0x00004000
[10271.709285] Call Trace:
[10271.712633] __schedule+0x282/0x870
[10271.715970] schedule+0x46/0xb0
[10271.719357] cv_wait_common+0x14e/0x290 [spl]
[10271.722755] ? add_wait_queue_exclusive+0x70/0x70
[10271.726517] txg_quiesce+0x1d1/0x2d0 [zfs]
[10271.730317] txg_quiesce_thread+0xe6/0x230 [zfs]
[10271.734118] ? txg_quiesce+0x2d0/0x2d0 [zfs]
[10271.737629] thread_generic_wrapper+0x78/0xb0 [spl]
[10271.741140] ? IS_ERR+0x10/0x10 [spl]
[10271.744666] kthread+0x11b/0x140
[10271.747733] ? __kthread_bind_mask+0x60/0x60
[10271.750849] ret_from_fork+0x22/0x30
[10271.754016] INFO: task receive_writer:56507 blocked for more than 120 seconds.
[10271.757251] Tainted: P OE 5.10.70+truenas #1
[10271.759925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.762652] task:receive_writer state:D stack: 0 pid:56507 ppid: 2 flags:0x00004000
NAS-113624
Posting here wondering if anyone else has seen this or if anyone has been able to make this work in a similar setup.
Local system is Mini-2.0 running TrueNAS-12.0-U6.1
Remote is Mini-3.0-XL+ running TrueNAS-SCALE-22.02-RC.1-2
Sending a snapshot under 1T works fine. Sending the one that is 1.91T breaks at 1.01T
Remote:
mbuffer -I 9090 | zfs receive data/barrel-old/music@migrate
Local:
zfs snapshot main/music@migrate
zfs send -v main/music@migrate | mbuffer -O barrel2:9090
On Local I see this 1000 times when it hangs:
in @ 0.0 kiB/s, out @ 0.0 kiB/s, 1031 GiB total, buffer 100% full07:45:36 1.01T main/music@migrate
I am able to ctrl-c the process
On the Remote I see this on the console:
VERIFY3(drrw->drr_logical_size > dn->dn_datablksz) failed (118784 > 131072)
This zfs receive is NOT KILLABLE - it ignores kill -9 etc. I have to issue reboot from ssh and that hangs for 3-4 minutes before it reboots.
Manually setting mbuffer memory sizes does not change what happens.
Here is the crash:
[10099.095091] VERIFY3(drrw->drr_logical_size > dn->dn_datablksz) failed (118784 > 131072)
[10099.095155] PANIC at dmu_recv.c:1906:flush_write_batch_impl()
[10099.095190] Showing stack for process 56507
[10099.095218] CPU: 6 PID: 56507 Comm: receive_writer Tainted: P OE 5.10.70+truenas #1
[10099.095270] Hardware name: iXsystems TRUENAS-MINI-3.0-XL+/A2SDi-H-TF, BIOS 1.3.V1 06/08/2020
[10099.095318] Call Trace:
[10099.095345] dump_stack+0x6b/0x83
[10099.095380] spl_panic+0xd4/0xfc [spl]
[10099.095520] ? list_head+0x9/0x20 [zfs]
[10099.095662] ? txg_list_add+0x99/0xd0 [zfs]
[10099.095791] ? dsl_dir_dirty+0x34/0x80 [zfs]
[10099.095918] ? dsl_dir_willuse_space+0xab/0x110 [zfs]
[10099.095951] ? _cond_resched+0x16/0x40
[10099.096077] ? dsl_pool_dirty_space+0x83/0xc0 [zfs]
[10099.096201] ? flush_write_batch_impl+0x23c/0x550 [zfs]
[10099.096326] flush_write_batch_impl+0x452/0x550 [zfs]
[10099.096449] ? dmu_free_long_range_impl+0x38/0x460 [zfs]
[10099.096580] flush_write_batch+0x37/0xb0 [zfs]
[10099.096700] receive_process_record+0x87/0x2b0 [zfs]
[10099.096824] receive_writer_thread+0xb3/0x1c0 [zfs]
[10099.096948] ? receive_process_record+0x2b0/0x2b0 [zfs]
[10099.096989] thread_generic_wrapper+0x78/0xb0 [spl]
[10099.097028] ? IS_ERR+0x10/0x10 [spl]
[10099.097053] kthread+0x11b/0x140
[10099.098145] ? __kthread_bind_mask+0x60/0x60
[10099.099240] ret_from_fork+0x22/0x30
[10271.696313] INFO: task txg_quiesce:11778 blocked for more than 120 seconds.
[10271.699427] Tainted: P OE 5.10.70+truenas #1
[10271.702631] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.705920] task:txg_quiesce state:D stack: 0 pid:11778 ppid: 2 flags:0x00004000
[10271.709285] Call Trace:
[10271.712633] __schedule+0x282/0x870
[10271.715970] schedule+0x46/0xb0
[10271.719357] cv_wait_common+0x14e/0x290 [spl]
[10271.722755] ? add_wait_queue_exclusive+0x70/0x70
[10271.726517] txg_quiesce+0x1d1/0x2d0 [zfs]
[10271.730317] txg_quiesce_thread+0xe6/0x230 [zfs]
[10271.734118] ? txg_quiesce+0x2d0/0x2d0 [zfs]
[10271.737629] thread_generic_wrapper+0x78/0xb0 [spl]
[10271.741140] ? IS_ERR+0x10/0x10 [spl]
[10271.744666] kthread+0x11b/0x140
[10271.747733] ? __kthread_bind_mask+0x60/0x60
[10271.750849] ret_from_fork+0x22/0x30
[10271.754016] INFO: task receive_writer:56507 blocked for more than 120 seconds.
[10271.757251] Tainted: P OE 5.10.70+truenas #1
[10271.759925] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[10271.762652] task:receive_writer state:D stack: 0 pid:56507 ppid: 2 flags:0x00004000