Replication Between Pools Causes Corruption

Joined
Oct 22, 2019
Messages
3,641
Can you import this pool on a different computer or try with CORE?
 

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
Can you import this pool on a different computer or try with CORE?
I can boot an Ubuntu Live ISO and import that way.

I am not sure if I can import the pool to CORE/FreeBSD since I have upgraded the pool. iirc, SCALE is one minor version ahead of CORE on OpenZFS (2.1 v. 2.0):
Code:
$ cat /sys/module/zfs/version  # TrueNAS-SCALE-22.02-RC.2
2.1.1-1
[yottabit@nas1 ~]$ sudo zpool get version vol1
NAME  PROPERTY  VALUE    SOURCE
vol1  version   -        default
[yottabit@nas1 ~]$ sudo zpool get all vol1 | grep 'feature@'
vol1  feature@async_destroy                   enabled                                 local
vol1  feature@empty_bpobj                     active                                  local
vol1  feature@lz4_compress                    active                                  local
vol1  feature@multi_vdev_crash_dump           enabled                                 local
vol1  feature@spacemap_histogram              active                                  local
vol1  feature@enabled_txg                     active                                  local
vol1  feature@hole_birth                      active                                  local
vol1  feature@extensible_dataset              active                                  local
vol1  feature@embedded_data                   active                                  local
vol1  feature@bookmarks                       enabled                                 local
vol1  feature@filesystem_limits               enabled                                 local
vol1  feature@large_blocks                    enabled                                 local
vol1  feature@large_dnode                     active                                  local
vol1  feature@sha512                          enabled                                 local
vol1  feature@skein                           enabled                                 local
vol1  feature@edonr                           enabled                                 local
vol1  feature@userobj_accounting              active                                  local
vol1  feature@encryption                      active                                  local
vol1  feature@project_quota                   active                                  local
vol1  feature@device_removal                  enabled                                 local
vol1  feature@obsolete_counts                 enabled                                 local
vol1  feature@zpool_checkpoint                enabled                                 local
vol1  feature@spacemap_v2                     active                                  local
vol1  feature@allocation_classes              enabled                                 local
vol1  feature@resilver_defer                  enabled                                 local
vol1  feature@bookmark_v2                     enabled                                 local
vol1  feature@redaction_bookmarks             enabled                                 local
vol1  feature@redacted_datasets               enabled                                 local
vol1  feature@bookmark_written                enabled                                 local
vol1  feature@log_spacemap                    active                                  local
vol1  feature@livelist                        active                                  local
vol1  feature@device_rebuild                  enabled                                 local
vol1  feature@zstd_compress                   active                                  local
vol1  feature@draid                           enabled                                 local
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
As long as Ubuntu's repository has zfs 2.1 or higher. I know you can also do something similar with Manjaro's live ISO, and simply install the zfs package that pulls in the module for the currently running kernel. Might need to follow it with a manual modprobe zfs

For example, if running kernel 5.15:
pamac install linux515-zfs
 

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
Fetching the latest Ubuntu live ISO to my PiKVM now. If this in-progress replication does not exhibit the corruption, I will replicate again using a zstd destination, show the corruption follows, and then boot Ubuntu, import the pools, and see if it presents the same.
 
Joined
Oct 22, 2019
Messages
3,641
Looks like the latest version on Ubuntu Impish is ZFS 2.0.6. :confused:
 
Joined
Oct 22, 2019
Messages
3,641
Ugh. Well I guess this may be more difficult than I had hoped. I'll grab the CORE ISO too, just in case.
Manjaro has ZFS 2.1.1 in its repositories.

Boot into live ISO, check the kernel version with uname -r, use pamac to grab linux515-zfs (assuming its kernel 5.15), and then modprobe zfs.

See my earlier post.

 

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
Manjaro has ZFS 2.1.1 in its repositories.

Boot into live ISO, check the kernel version with uname -r, use pamac to grab linux515-zfs (assuming its kernel 5.15), and then modprobe zfs.

See my earlier post.

Fine, fine. I fetched that ISO too. LOL. Replication is about 90% finished... waiting with bated breath...
 

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
Confirmed intermediate pool dataset zstd to destination pool dataset lz4 still results in corruption.

In fact, the file is corrupted in exactly the same way:
Code:
$ grep ' ./.cshrc' /mnt/ssd_scratch/md5sum_orig.txt  # Original dataset.
eb4ea27bddb78d8caf944b9c0f76ccae  ./.cshrc
$ grep ' ./.cshrc' /mnt/ssd_scratch/md5sum_enc.txt  # First destination dataset where corruption was first observed.
16da5f6ca9fb073a578996c538b3167c  ./.cshrc
$ md5sum /mnt/vol1/jacob.mcdonald_migrated/.cshrc  # New destination dataset with the exactly same corruption.
16da5f6ca9fb073a578996c538b3167c  /mnt/vol1/jacob.mcdonald_migrated/.cshrc


The original destination dataset was zstd. The new destination dataset is lz4.

Next I will replicate again, from the original dataset (lz4) to a new destination dataset (lz4). This will further narrow the cause to reading from zstd, or writing to encryption.

Here is the hexdump of the original (good) .cshrc file we've been using as an example:
Code:
[yottabit@nas1 ~]$ hexdump /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
0000000 2023 4624 6572 4265 4453 0a24 0a23 2023
0000010 632e 6873 6372 2d20 6320 6873 7220 7365
0000020 756f 6372 2065 6373 6972 7470 202c 6572
0000030 6461 6120 2074 6562 6967 6e6e 6e69 2067
0000040 666f 6520 6578 7563 6974 6e6f 6220 2079
0000050 6165 6863 7320 6568 6c6c 230a 230a 7320
0000060 6565 6120 736c 206f 7363 2868 2931 202c
0000070 6e65 6976 6f72 286e 2937 0a2e 2023 6f6d
0000080 6572 6520 6178 706d 656c 2073 7661 6961
0000090 616c 6c62 2065 7461 2f20 7375 2f72 6873
00000a0 7261 2f65 7865 6d61 6c70 7365 632f 6873
00000b0 0a2f 0a23 610a 696c 7361 6820 0909 6968
00000c0 7473 726f 2079 3532 610a 696c 7361 6a20
00000d0 0909 6f6a 7362 2d20 0a6c 6c61 6169 2073
00000e0 616c 6c09 2073 612d 0a46 6c61 6169 2073
00000f0 666c 6c09 2073 462d 0a41 6c61 6169 2073
0000100 6c6c 6c09 2073 6c2d 4641 0a0a 2023 6854
0000110 7365 2065 7261 2065 6f6e 6d72 6c61 796c
0000120 7320 7465 7420 7268 756f 6867 2f20 7465
0000130 2f63 6f6c 6967 2e6e 6f63 666e 202e 5920
0000140 756f 6d20 7961 6f20 6576 7272 6469 2065
0000150 6874 6d65 6820 7265 0a65 2023 6669 7720
0000160 6e61 6574 2e64 230a 7320 7465 7020 7461
0000170 2068 203d 2f28 6273 6e69 2f20 6962 206e
0000180 752f 7273 732f 6962 206e 752f 7273 622f
0000190 6e69 2f20 7375 2f72 6f6c 6163 2f6c 6273
00001a0 6e69 2f20 7375 2f72 6f6c 6163 2f6c 6962
00001b0 206e 4824 4d4f 2f45 6962 296e 230a 7320
00001c0 7465 6e65 0976 4c42 434f 534b 5a49 0945
00001d0 0a4b 2023 2041 6972 6867 6574 756f 2073
00001e0 6d75 7361 0a6b 2023 6d75 7361 206b 3232
00001f0 0a0a 6573 6574 766e 4509 4944 4f54 0952
0000200 6976 730a 7465 6e65 0976 4150 4547 0952
0000210 6f6d 6572 0a0a 6669 2820 3f24 7270 6d6f
0000220 7470 2029 6874 6e65 090a 2023 6e41 6920
0000230 746e 7265 6361 6974 6576 7320 6568 6c6c
0000240 2d20 202d 6573 2074 6f73 656d 7320 7574
0000250 6666 7520 0a70 7309 7465 7020 6f72 706d
0000260 2074 203d 2522 404e 6d25 253a 207e 2325
0000270 2220 090a 6573 2074 7270 6d6f 7470 6863
0000280 7261 2073 203d 2522 2223 0a0a 7309 7465
0000290 6620 6c69 6365 090a 6573 2074 6968 7473
00002a0 726f 2079 203d 3031 3030 090a 6573 2074
00002b0 6173 6576 6968 7473 3d20 2820 3031 3030
00002c0 6d20 7265 6567 0a29 7309 7465 6120 7475
00002d0 6c6f 7369 2074 203d 6d61 6962 7567 756f
00002e0 0a73 2309 5520 6573 6820 7369 6f74 7972
00002f0 7420 206f 6961 2064 7865 6170 736e 6f69
0000300 0a6e 7309 7465 6120 7475 656f 7078 6e61
0000310 0a64 7309 7465 6120 7475 726f 6865 7361
0000320 0a68 7309 7465 6d20 6961 206c 203d 2f28
0000330 6176 2f72 616d 6c69 242f 5355 5245 0a29
0000340 6909 2066 2028 3f24 6374 6873 2920 7420
0000350 6568 0a6e 0909 6962 646e 656b 2079 5e22
0000360 2257 6220 6361 776b 7261 2d64 6564 656c
0000370 6574 772d 726f 0a64 0909 6962 646e 656b
0000380 2079 6b2d 7520 2070 6968 7473 726f 2d79
0000390 6573 7261 6863 622d 6361 776b 7261 0a64
00003a0 0909 6962 646e 656b 2079 6b2d 6420 776f
00003b0 206e 6968 7473 726f 2d79 6573 7261 6863
00003c0 662d 726f 6177 6472 090a 6e65 6964 0a66
00003d0 650a 646e 6669 0a0a 6669 2820 2420 743f
00003e0 7363 2068 2029 6874 6e65 200a 2020 2020
00003f0 2020 6220 6e69 6b64 7965 2220 575e 2022
0000400 6162 6b63 6177 6472 642d 6c65 7465 2d65
0000410 6f77 6472 200a 2020 2020 2020 6220 6e69
0000420 6b64 7965 2d20 206b 7075 6820 7369 6f74
0000430 7972 732d 6165 6372 2d68 6162 6b63 6177
0000440 6472 200a 2020 2020 2020 6220 6e69 6b64
0000450 7965 2d20 206b 6f64 6e77 6820 7369 6f74
0000460 7972 732d 6165 6372 2d68 6f66 7772 7261
0000470 0a64 2020 2020 2020 2020 6962 646e 656b
0000480 2079 5c27 5b65 2748 2020 2020 6562 6967
0000490 6e6e 6e69 2d67 666f 6c2d 6e69 2065 2020
00004a0 2020 2320 6820 6d6f 0a65 2020 2020 2020
00004b0 2020 6962 646e 656b 2079 5c27 5b65 2746
00004c0 2020 2020 6e65 2d64 666f 6c2d 6e69 2065
00004d0 2020 2020 2020 2020 2020 2020 2020 2320
00004e0 6520 646e 200a 2020 2020 2020 6220 6e69
00004f0 6b64 7965 2720 655c 335b 277e 2020 6420
0000500 6c65 7465 2d65 6863 7261 2020 2020 2020
0000510 2020 2020 2020 2320 6420 6c65 7465 0a65
0000520 2020 2020 2020 2020 6962 646e 656b 2079
0000530 5c27 5b65 3b31 4335 2027 6f66 7772 7261
0000540 2d64 6f77 6472 2020 2020 2020 2020 2023
0000550 7463 6c72 7220 6769 7468 200a 2020 2020
0000560 2020 6220 6e69 6b64 7965 2720 655c 315b
0000570 353b 2744 6220 6361 776b 7261 2d64 6f77
0000580 6472 2020 2020 2320 6320 7274 206c 656c
0000590 7466 200a 2020 2020 2020 6220 6e69 6b64
00005a0 7965 2720 655c 315b 277e 2020 6220 6765
00005b0 6e69 696e 676e 6f2d 2d66 696c 656e 2020
00005c0 2020 2023 6f68 656d 200a 2020 2020 2020
00005d0 6220 6e69 6b64 7965 2720 655c 345b 277e
00005e0 2020 6520 646e 6f2d 2d66 696c 656e 2020
00005f0 2020 2020 2020 2020 2020 2020 2023 6e65
0000600 0a64 2020 2020 6e65 6964 0a66 000a 
000060d


And here is the hexdump of the corrupted .cshrc file:
Code:
[yottabit@nas1 ~]$ hexdump /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc
0000000 0000 ea03 4bf1 2023 4624 6572 4265 4453
0000010 0a24 0a23 2023 632e 6873 6372 2d20 6320
0000020 6873 7220 7365 756f 6372 2065 6373 6972
0000030 7470 202c 6572 6461 6120 2074 6562 6967
0000040 6e6e 6e69 2067 666f 6520 6578 7563 6974
0000050 6e6f 6220 2079 6165 6863 7320 6568 6c6c
0000060 004f 7380 6565 6120 736c 4f6f f000 281b
0000070 2931 202c 6e65 6976 6f72 286e 2937 0a2e
0000080 2023 6f6d 6572 6520 6178 706d 656c 2073
0000090 7661 6961 616c 6c62 6365 b400 752f 7273
00000a0 732f 6168 6572 212f f300 2f0d 7363 2f68
00000b0 230a 0a0a 6c61 6169 2073 0968 6809 7369
00000c0 6f74 7972 3220 1435 a300 096a 6a09 626f
00000d0 2073 6c2d 0011 6c94 0961 736c 2d20 4661
00000e0 0010 6611 0010 4624 1041 1100 106c f000
00000f0 6c4f 4641 0a0a 2023 6854 7365 2065 7261
0000100 2065 6f6e 6d72 6c61 796c 7320 7465 7420
0000110 7268 756f 6867 2f20 7465 2f63 6f6c 6967
0000120 2e6e 6f63 666e 202e 5920 756f 6d20 7961
0000130 6f20 6576 7272 6469 2065 6874 6d65 6820
0000140 7265 0a65 2023 6669 7720 6e61 6574 eb64
0000150 0000 0048 00f1 6170 6874 3d20 2820 732f
0000160 6962 206e 052f 0100 00e7 0a05 0500 0009
0000170 6c57 636f 6c61 0019 1002 0000 001f 2450
0000180 4f48 454d 003c 2912 0056 16f2 6e65 0976
0000190 4c42 434f 534b 5a49 0945 0a4b 2023 2041
00001a0 6972 6867 6574 756f 2073 6d75 7361 0a6b
00001b0 0823 5300 3220 0a32 330a 9400 4445 5449
00001c0 524f 7609 1169 6000 4150 4547 0952 0192
00001d0 00f0 0a0a 6669 2820 3f24 7270 6d6f 7470
00001e0 d429 f200 6e04 090a 2023 6e41 6920 746e
00001f0 7265 6361 6974 6576 01e6 2031 2d2d 0123
0000200 00f1 6f73 656d 7320 7574 6666 7520 0a70
0000210 ee09 0100 003f 02f8 3d20 2220 4e25 2540
0000220 3a6d 7e25 2520 2023 1d22 5100 6863 7261
0000230 2273 3200 2223 190a 5200 6966 656c 0b63
0000240 0400 01de 3d62 3120 3030 1430 4000 6173
0000250 6576 0018 4700 0001 0016 2072 656d 6772
0000260 2965 001d 6152 7475 6c6f 001d 6190 626d
0000270 6769 6f75 7375 00b9 5540 6573 3520 0000
0000280 022b 01f6 6f74 6120 6469 6520 7078 6e61
0000290 6973 6e6f 003a 1301 1600 1064 6200 6572
00002a0 6168 6873 0010 6d40 6961 736c 5000 762f
00002b0 7261 0d2f 9000 242f 5355 5245 0a29 2b09
00002c0 4001 2420 743f 0330 2b04 f701 0913 6962
00002d0 646e 656b 2079 5e22 2257 6220 6361 776b
00002e0 7261 2d64 6564 656c 6574 772d 726f 2464
00002f0 5400 6b2d 7520 9f70 8400 732d 6165 6372
0000300 2d68 0034 280a 4c00 6f64 6e77 002a 6630
0000310 726f 005d 0a93 6509 646e 6669 0a0a 0007
0000320 970f 0000 2013 0001 9d0f 0f00 2903 0500
0000330 002a a30f 0b00 2e0f 0000 a90f 0900 2f0c
0000340 6000 5c27 5b65 2748 0016 5605 8004 6f2d
0000350 2d66 696c 656e 0015 2090 2320 6820 6d6f
0000360 0a65 000b 0200 0400 0095 3800 1100 3846
0000370 3a00 6e65 3264 0600 0002 234f 6520 736e
0000380 0300 3320 3c7e 0300 0193 8800 0602 0035
0000390 3801 0200 001a 760f 0200 3163 353b 2743
00003a0 7920 0101 01cf 3401 0100 0037 6342 7274
00003b0 7f6c 0f03 003b 2605 2744 0212 0b00 0802
00003c0 0039 6c3f 6665 0038 0104 00ae 210f 0201
00003d0 1f0f 0801 3411 0036 1f0f 0601 1d06 0301
00003e0 0235 001f 0001 dbff 0050 0000 0000 0000
00003f0 0000 0000 0000 0000 0000 0000 0000 0000
*
0000600 0000 0000 0000 0000 0000 0000 0000 
000060d


After the next replication test (original dataset to new pool, lz4 -> lz4+enc), I will try booting the other distros and mounting the pool, to see if the file still reads corrupted.

Edit: I haven't examined the different extensively, but it looks like the first bytes from the original file are offset by +16 bytes in the corrupted file.

Edit 2: Replication is started again. In an hour or so I will have time to try to see if I can create a new dataset with just the single .cshrc file, replicate it around a bit, and see if the corruption can be reproduced on a single file quickly.

Edit 3: This replication is going to take a lot longer since it's to/from the same pool, since I don't have enough spare capacity on the scratch pool without deleting that original intermediate dataset, and it's probably good to keep that one for a bit until I further narrow down the cause of the corruption. I definitely won't have results until tomorrow (CST). But I'll try doing what I wrote in Edit 2 in an hour or so.
 
Last edited:

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
I created four new datasets, with the options of lz4, zstd, encrypted, and unencrypted. Next I copied the example .cshrc file to each dataset from the original snapshot source, and then ran them through md5sum:
Code:
$ md5sum ~/.zfs/snapshot/migration/.cshrc; find . -name .cshrc -exec md5sum '{}' \;
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_lz4_enc/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_zstd_enc/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_lz4/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_zstd/.cshrc


No corruption. Next I used the replication sequence, but only by copying the file: original -> zstd/unenc -> zstd/enc. Results:
Code:
$ md5sum ~/.zfs/snapshot/migration/.cshrc; find . -name .cshrc -exec md5sum '{}' \;
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_zstd_enc/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_zstd/.cshrc


Ok then, this must be a problem with zfs send and/or zfs recv. Next up, perform the same sequence of replicates as before, but with the new small dataset with the single file.
  1. cp .cshrc to the lz4 dataset that will be used as the source of the replication (test_lz4)
  2. Verify md5sum matches from original dataset to the new source dataset (test_lz4)
  3. Replicate dataset test_lz4 -> test_zstd
  4. Verify md5sum matches all
  5. Replicate dataset test_zstd -> test_zstd_enc
  6. Verify md5sum matches all
  7. Replicate dataset test_zstd -> test_lz4_enc
  8. Verify md5sum matches all
Bummer, they all match:
Code:
$ md5sum ~/.zfs/snapshot/migration/.cshrc; find . -name .cshrc -exec md5sum '{}' \;
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_lz4_enc/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_zstd_enc/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_lz4/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  ./test_zstd/.cshrc


Going to do it again the same way, but this time using the same pools instead of my ssd_scratch pool:
  1. cp .cshrc to the lz4 dataset that will be used as the source of the replication (test_lz4)
  2. Verify md5sum matches from original dataset to the new source dataset (test_lz4)
  3. Replicate dataset test_lz4 -> test_zstd_intermediate
  4. Verify md5sum matches all
  5. Replicate dataset test_zstd -> test_zstd_enc_dest
  6. Verify md5sum matches all
Unfortunately this failed to show the corruption:
Code:
$ md5sum ~/.zfs/snapshot/migration/.cshrc /mnt/big_scratch/migration/test_zstd_intermediate/.zfs/snapshot/migration/.cshrc /mnt/vol1/test_zstd_enc_dest/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/big_scratch/migration/test_zstd_intermediate/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/test_zstd_enc_dest/.zfs/snapshot/migration/.cshrc


No easy repro here, I guess. Let's see what happens on the big data replication overnight. See you tomorrow!
 

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
here is the hexdump of the corrupted .cshrc file:

Looks LZ4-ish.

would you please run

Code:
hexdump --canonical /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc


By the way, I think one of the requirements for a repro is changing LZ4 to ZSTD on the original dataset.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
Looks LZ4-ish.

would you please run

Code:
hexdump --canonical /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc


By the way, I think one of the requirements for a repro is changing LZ4 to ZSTD on the original dataset.
I can't reproduce this in CORE (ZFS 2.0.7).

So either something is going on with SCALE (ZFS 2.1.x) or solely ZFS 2.1.x?

I'll try to reproduce this on Linux with ZFS 2.1.1 if I have the chance.
 
Joined
Oct 22, 2019
Messages
3,641
By the way, I think one of the requirements for a repro is changing LZ4 to ZSTD on the original dataset.
Therefore, ZFS reads data from the disk, verifies the checksum (which is correct), and then instead of decompressing, returns compressed data as is.

This might in fact be it, good catch. :cool:

So "something" (SCALE? upstream ZFS?) causes a bug when a dataset's property is changed from LZA compression to ZSTD, but does not insert the correct flag on the newly written record's pointer that it is indeed compressed with ZSTD.

This is why I think it would be interesting to see how the dataset in question would work in a non-TrueNAS environment, such as under Linux with the latest version of ZFS (2.1.1). If this same dataset, unchanged, untouched, still reads a compressed record as if it was "plain", then it's definitely a nasty upstream bug.
 
Joined
Oct 22, 2019
Messages
3,641
Can't seem to reproduce this on Manjaro Linux with ZFS 2.1.1.

Unless I'm not following the original steps in the exact manner as done by @yottabit

Replicating from LZ4 to ZSTD, even later changing the original dataset's compression property from LZ4 to ZSTD, the text files always yield the same md5 hash and open properly.
 

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
Looks LZ4-ish.

would you please run

Code:
hexdump --canonical /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc


By the way, I think one of the requirements for a repro is changing LZ4 to ZSTD on the original dataset.
Ah, good call on sourcing the repro from a dataset changed from lz4 to zstd. I ignored that because I know that data has not been copied/recompressed since the compression property change, so at rest it's still lz4. But you make a good point. Except one difference is that the corruption happens after the data has (supposedly) been recompressed to zstd in the new intermediate dataset (successfully), and then replicated back to the old pool with either lz4 or zstd destination. But I will try to repro with the single-file dataset again using what I believe is the best history of my original pool. Sequence will be as follows:
  1. Create lz4 dataset
  2. Copy .cshrc file to the new lz4 dataset
  3. Change new dataset from lz4 to zstd
  4. Snapshot new dataset
  5. Migrate dataset, keeping its properties
  6. Migrate dataset, keeping its properties, but adding encryption
Here is the revised hexdump output.

Original file:
Code:
[yottabit@nas1 ~]$ hexdump --canonical /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
00000000  23 20 24 46 72 65 65 42  53 44 24 0a 23 0a 23 20  |# $FreeBSD$.#.# |
00000010  2e 63 73 68 72 63 20 2d  20 63 73 68 20 72 65 73  |.cshrc - csh res|
00000020  6f 75 72 63 65 20 73 63  72 69 70 74 2c 20 72 65  |ource script, re|
00000030  61 64 20 61 74 20 62 65  67 69 6e 6e 69 6e 67 20  |ad at beginning |
00000040  6f 66 20 65 78 65 63 75  74 69 6f 6e 20 62 79 20  |of execution by |
00000050  65 61 63 68 20 73 68 65  6c 6c 0a 23 0a 23 20 73  |each shell.#.# s|
00000060  65 65 20 61 6c 73 6f 20  63 73 68 28 31 29 2c 20  |ee also csh(1), |
00000070  65 6e 76 69 72 6f 6e 28  37 29 2e 0a 23 20 6d 6f  |environ(7)..# mo|
00000080  72 65 20 65 78 61 6d 70  6c 65 73 20 61 76 61 69  |re examples avai|
00000090  6c 61 62 6c 65 20 61 74  20 2f 75 73 72 2f 73 68  |lable at /usr/sh|
000000a0  61 72 65 2f 65 78 61 6d  70 6c 65 73 2f 63 73 68  |are/examples/csh|
000000b0  2f 0a 23 0a 0a 61 6c 69  61 73 20 68 09 09 68 69  |/.#..alias h..hi|
000000c0  73 74 6f 72 79 20 32 35  0a 61 6c 69 61 73 20 6a  |story 25.alias j|
000000d0  09 09 6a 6f 62 73 20 2d  6c 0a 61 6c 69 61 73 20  |..jobs -l.alias |
000000e0  6c 61 09 6c 73 20 2d 61  46 0a 61 6c 69 61 73 20  |la.ls -aF.alias |
000000f0  6c 66 09 6c 73 20 2d 46  41 0a 61 6c 69 61 73 20  |lf.ls -FA.alias |
00000100  6c 6c 09 6c 73 20 2d 6c  41 46 0a 0a 23 20 54 68  |ll.ls -lAF..# Th|
00000110  65 73 65 20 61 72 65 20  6e 6f 72 6d 61 6c 6c 79  |ese are normally|
00000120  20 73 65 74 20 74 68 72  6f 75 67 68 20 2f 65 74  | set through /et|
00000130  63 2f 6c 6f 67 69 6e 2e  63 6f 6e 66 2e 20 20 59  |c/login.conf.  Y|
00000140  6f 75 20 6d 61 79 20 6f  76 65 72 72 69 64 65 20  |ou may override |
00000150  74 68 65 6d 20 68 65 72  65 0a 23 20 69 66 20 77  |them here.# if w|
00000160  61 6e 74 65 64 2e 0a 23  20 73 65 74 20 70 61 74  |anted..# set pat|
00000170  68 20 3d 20 28 2f 73 62  69 6e 20 2f 62 69 6e 20  |h = (/sbin /bin |
00000180  2f 75 73 72 2f 73 62 69  6e 20 2f 75 73 72 2f 62  |/usr/sbin /usr/b|
00000190  69 6e 20 2f 75 73 72 2f  6c 6f 63 61 6c 2f 73 62  |in /usr/local/sb|
000001a0  69 6e 20 2f 75 73 72 2f  6c 6f 63 61 6c 2f 62 69  |in /usr/local/bi|
000001b0  6e 20 24 48 4f 4d 45 2f  62 69 6e 29 0a 23 20 73  |n $HOME/bin).# s|
000001c0  65 74 65 6e 76 09 42 4c  4f 43 4b 53 49 5a 45 09  |etenv.BLOCKSIZE.|
000001d0  4b 0a 23 20 41 20 72 69  67 68 74 65 6f 75 73 20  |K.# A righteous |
000001e0  75 6d 61 73 6b 0a 23 20  75 6d 61 73 6b 20 32 32  |umask.# umask 22|
000001f0  0a 0a 73 65 74 65 6e 76  09 45 44 49 54 4f 52 09  |..setenv.EDITOR.|
00000200  76 69 0a 73 65 74 65 6e  76 09 50 41 47 45 52 09  |vi.setenv.PAGER.|
00000210  6d 6f 72 65 0a 0a 69 66  20 28 24 3f 70 72 6f 6d  |more..if ($?prom|
00000220  70 74 29 20 74 68 65 6e  0a 09 23 20 41 6e 20 69  |pt) then..# An i|
00000230  6e 74 65 72 61 63 74 69  76 65 20 73 68 65 6c 6c  |nteractive shell|
00000240  20 2d 2d 20 73 65 74 20  73 6f 6d 65 20 73 74 75  | -- set some stu|
00000250  66 66 20 75 70 0a 09 73  65 74 20 70 72 6f 6d 70  |ff up..set promp|
00000260  74 20 3d 20 22 25 4e 40  25 6d 3a 25 7e 20 25 23  |t = "%N@%m:%~ %#|
00000270  20 22 0a 09 73 65 74 20  70 72 6f 6d 70 74 63 68  | "..set promptch|
00000280  61 72 73 20 3d 20 22 25  23 22 0a 0a 09 73 65 74  |ars = "%#"...set|
00000290  20 66 69 6c 65 63 0a 09  73 65 74 20 68 69 73 74  | filec..set hist|
000002a0  6f 72 79 20 3d 20 31 30  30 30 0a 09 73 65 74 20  |ory = 1000..set |
000002b0  73 61 76 65 68 69 73 74  20 3d 20 28 31 30 30 30  |savehist = (1000|
000002c0  20 6d 65 72 67 65 29 0a  09 73 65 74 20 61 75 74  | merge)..set aut|
000002d0  6f 6c 69 73 74 20 3d 20  61 6d 62 69 67 75 6f 75  |olist = ambiguou|
000002e0  73 0a 09 23 20 55 73 65  20 68 69 73 74 6f 72 79  |s..# Use history|
000002f0  20 74 6f 20 61 69 64 20  65 78 70 61 6e 73 69 6f  | to aid expansio|
00000300  6e 0a 09 73 65 74 20 61  75 74 6f 65 78 70 61 6e  |n..set autoexpan|
00000310  64 0a 09 73 65 74 20 61  75 74 6f 72 65 68 61 73  |d..set autorehas|
00000320  68 0a 09 73 65 74 20 6d  61 69 6c 20 3d 20 28 2f  |h..set mail = (/|
00000330  76 61 72 2f 6d 61 69 6c  2f 24 55 53 45 52 29 0a  |var/mail/$USER).|
00000340  09 69 66 20 28 20 24 3f  74 63 73 68 20 29 20 74  |.if ( $?tcsh ) t|
00000350  68 65 6e 0a 09 09 62 69  6e 64 6b 65 79 20 22 5e  |hen...bindkey "^|
00000360  57 22 20 62 61 63 6b 77  61 72 64 2d 64 65 6c 65  |W" backward-dele|
00000370  74 65 2d 77 6f 72 64 0a  09 09 62 69 6e 64 6b 65  |te-word...bindke|
00000380  79 20 2d 6b 20 75 70 20  68 69 73 74 6f 72 79 2d  |y -k up history-|
00000390  73 65 61 72 63 68 2d 62  61 63 6b 77 61 72 64 0a  |search-backward.|
000003a0  09 09 62 69 6e 64 6b 65  79 20 2d 6b 20 64 6f 77  |..bindkey -k dow|
000003b0  6e 20 68 69 73 74 6f 72  79 2d 73 65 61 72 63 68  |n history-search|
000003c0  2d 66 6f 72 77 61 72 64  0a 09 65 6e 64 69 66 0a  |-forward..endif.|
000003d0  0a 65 6e 64 69 66 0a 0a  69 66 20 28 20 24 3f 74  |.endif..if ( $?t|
000003e0  63 73 68 20 29 20 74 68  65 6e 0a 20 20 20 20 20  |csh ) then.     |
000003f0  20 20 20 62 69 6e 64 6b  65 79 20 22 5e 57 22 20  |   bindkey "^W" |
00000400  62 61 63 6b 77 61 72 64  2d 64 65 6c 65 74 65 2d  |backward-delete-|
00000410  77 6f 72 64 0a 20 20 20  20 20 20 20 20 62 69 6e  |word.        bin|
00000420  64 6b 65 79 20 2d 6b 20  75 70 20 68 69 73 74 6f  |dkey -k up histo|
00000430  72 79 2d 73 65 61 72 63  68 2d 62 61 63 6b 77 61  |ry-search-backwa|
00000440  72 64 0a 20 20 20 20 20  20 20 20 62 69 6e 64 6b  |rd.        bindk|
00000450  65 79 20 2d 6b 20 64 6f  77 6e 20 68 69 73 74 6f  |ey -k down histo|
00000460  72 79 2d 73 65 61 72 63  68 2d 66 6f 72 77 61 72  |ry-search-forwar|
00000470  64 0a 20 20 20 20 20 20  20 20 62 69 6e 64 6b 65  |d.        bindke|
00000480  79 20 27 5c 65 5b 48 27  20 20 20 20 62 65 67 69  |y '\e[H'    begi|
00000490  6e 6e 69 6e 67 2d 6f 66  2d 6c 69 6e 65 20 20 20  |nning-of-line   |
000004a0  20 20 20 23 20 68 6f 6d  65 0a 20 20 20 20 20 20  |   # home.      |
000004b0  20 20 62 69 6e 64 6b 65  79 20 27 5c 65 5b 46 27  |  bindkey '\e[F'|
000004c0  20 20 20 20 65 6e 64 2d  6f 66 2d 6c 69 6e 65 20  |    end-of-line |
000004d0  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 23  |               #|
000004e0  20 65 6e 64 0a 20 20 20  20 20 20 20 20 62 69 6e  | end.        bin|
000004f0  64 6b 65 79 20 27 5c 65  5b 33 7e 27 20 20 20 64  |dkey '\e[3~'   d|
00000500  65 6c 65 74 65 2d 63 68  61 72 20 20 20 20 20 20  |elete-char      |
00000510  20 20 20 20 20 20 20 23  20 64 65 6c 65 74 65 0a  |       # delete.|
00000520  20 20 20 20 20 20 20 20  62 69 6e 64 6b 65 79 20  |        bindkey |
00000530  27 5c 65 5b 31 3b 35 43  27 20 66 6f 72 77 61 72  |'\e[1;5C' forwar|
00000540  64 2d 77 6f 72 64 20 20  20 20 20 20 20 20 23 20  |d-word        # |
00000550  63 74 72 6c 20 72 69 67  68 74 0a 20 20 20 20 20  |ctrl right.     |
00000560  20 20 20 62 69 6e 64 6b  65 79 20 27 5c 65 5b 31  |   bindkey '\e[1|
00000570  3b 35 44 27 20 62 61 63  6b 77 61 72 64 2d 77 6f  |;5D' backward-wo|
00000580  72 64 20 20 20 20 20 23  20 63 74 72 6c 20 6c 65  |rd     # ctrl le|
00000590  66 74 0a 20 20 20 20 20  20 20 20 62 69 6e 64 6b  |ft.        bindk|
000005a0  65 79 20 27 5c 65 5b 31  7e 27 20 20 20 62 65 67  |ey '\e[1~'   beg|
000005b0  69 6e 6e 69 6e 67 2d 6f  66 2d 6c 69 6e 65 20 20  |inning-of-line  |
000005c0  20 20 23 20 68 6f 6d 65  0a 20 20 20 20 20 20 20  |  # home.       |
000005d0  20 62 69 6e 64 6b 65 79  20 27 5c 65 5b 34 7e 27  | bindkey '\e[4~'|
000005e0  20 20 20 65 6e 64 2d 6f  66 2d 6c 69 6e 65 20 20  |   end-of-line  |
000005f0  20 20 20 20 20 20 20 20  20 20 20 20 23 20 65 6e  |            # en|
00000600  64 0a 20 20 20 20 65 6e  64 69 66 0a 0a           |d.    endif..|
0000060d

Corrupted file:
Code:
[yottabit@nas1 ~]$ hexdump --canonical /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc
00000000  00 00 03 ea f1 4b 23 20  24 46 72 65 65 42 53 44  |.....K# $FreeBSD|
00000010  24 0a 23 0a 23 20 2e 63  73 68 72 63 20 2d 20 63  |$.#.# .cshrc - c|
00000020  73 68 20 72 65 73 6f 75  72 63 65 20 73 63 72 69  |sh resource scri|
00000030  70 74 2c 20 72 65 61 64  20 61 74 20 62 65 67 69  |pt, read at begi|
00000040  6e 6e 69 6e 67 20 6f 66  20 65 78 65 63 75 74 69  |nning of executi|
00000050  6f 6e 20 62 79 20 65 61  63 68 20 73 68 65 6c 6c  |on by each shell|
00000060  4f 00 80 73 65 65 20 61  6c 73 6f 4f 00 f0 1b 28  |O..see alsoO...(|
00000070  31 29 2c 20 65 6e 76 69  72 6f 6e 28 37 29 2e 0a  |1), environ(7)..|
00000080  23 20 6d 6f 72 65 20 65  78 61 6d 70 6c 65 73 20  |# more examples |
00000090  61 76 61 69 6c 61 62 6c  65 63 00 b4 2f 75 73 72  |availablec../usr|
000000a0  2f 73 68 61 72 65 2f 21  00 f3 0d 2f 63 73 68 2f  |/share/!.../csh/|
000000b0  0a 23 0a 0a 61 6c 69 61  73 20 68 09 09 68 69 73  |.#..alias h..his|
000000c0  74 6f 72 79 20 32 35 14  00 a3 6a 09 09 6a 6f 62  |tory 25...j..job|
000000d0  73 20 2d 6c 11 00 94 6c  61 09 6c 73 20 2d 61 46  |s -l...la.ls -aF|
000000e0  10 00 11 66 10 00 24 46  41 10 00 11 6c 10 00 f0  |...f..$FA...l...|
000000f0  4f 6c 41 46 0a 0a 23 20  54 68 65 73 65 20 61 72  |OlAF..# These ar|
00000100  65 20 6e 6f 72 6d 61 6c  6c 79 20 73 65 74 20 74  |e normally set t|
00000110  68 72 6f 75 67 68 20 2f  65 74 63 2f 6c 6f 67 69  |hrough /etc/logi|
00000120  6e 2e 63 6f 6e 66 2e 20  20 59 6f 75 20 6d 61 79  |n.conf.  You may|
00000130  20 6f 76 65 72 72 69 64  65 20 74 68 65 6d 20 68  | override them h|
00000140  65 72 65 0a 23 20 69 66  20 77 61 6e 74 65 64 eb  |ere.# if wanted.|
00000150  00 00 48 00 f1 00 70 61  74 68 20 3d 20 28 2f 73  |..H...path = (/s|
00000160  62 69 6e 20 2f 05 00 01  e7 00 05 0a 00 05 09 00  |bin /...........|
00000170  57 6c 6f 63 61 6c 19 00  02 10 00 00 1f 00 50 24  |Wlocal........P$|
00000180  48 4f 4d 45 3c 00 12 29  56 00 f2 16 65 6e 76 09  |HOME<..)V...env.|
00000190  42 4c 4f 43 4b 53 49 5a  45 09 4b 0a 23 20 41 20  |BLOCKSIZE.K.# A |
000001a0  72 69 67 68 74 65 6f 75  73 20 75 6d 61 73 6b 0a  |righteous umask.|
000001b0  23 08 00 53 20 32 32 0a  0a 33 00 94 45 44 49 54  |#..S 22..3..EDIT|
000001c0  4f 52 09 76 69 11 00 60  50 41 47 45 52 09 92 01  |OR.vi..`PAGER...|
000001d0  f0 00 0a 0a 69 66 20 28  24 3f 70 72 6f 6d 70 74  |....if ($?prompt|
000001e0  29 d4 00 f2 04 6e 0a 09  23 20 41 6e 20 69 6e 74  |)....n..# An int|
000001f0  65 72 61 63 74 69 76 65  e6 01 31 20 2d 2d 23 01  |eractive..1 --#.|
00000200  f1 00 73 6f 6d 65 20 73  74 75 66 66 20 75 70 0a  |..some stuff up.|
00000210  09 ee 00 01 3f 00 f8 02  20 3d 20 22 25 4e 40 25  |....?... = "%N@%|
00000220  6d 3a 25 7e 20 25 23 20  22 1d 00 51 63 68 61 72  |m:%~ %# "..Qchar|
00000230  73 22 00 32 23 22 0a 19  00 52 66 69 6c 65 63 0b  |s".2#"...Rfilec.|
00000240  00 04 de 01 62 3d 20 31  30 30 30 14 00 40 73 61  |....b= 1000..@sa|
00000250  76 65 18 00 00 47 01 00  16 00 72 20 6d 65 72 67  |ve...G....r merg|
00000260  65 29 1d 00 52 61 75 74  6f 6c 1d 00 90 61 6d 62  |e)..Rautol...amb|
00000270  69 67 75 6f 75 73 b9 00  40 55 73 65 20 35 00 00  |iguous..@Use 5..|
00000280  2b 02 f6 01 74 6f 20 61  69 64 20 65 78 70 61 6e  |+...to aid expan|
00000290  73 69 6f 6e 3a 00 01 13  00 16 64 10 00 62 72 65  |sion:.....d..bre|
000002a0  68 61 73 68 10 00 40 6d  61 69 6c 73 00 50 2f 76  |hash..@mails.P/v|
000002b0  61 72 2f 0d 00 90 2f 24  55 53 45 52 29 0a 09 2b  |ar/.../$USER)..+|
000002c0  01 40 20 24 3f 74 30 03  04 2b 01 f7 13 09 62 69  |.@ $?t0..+....bi|
000002d0  6e 64 6b 65 79 20 22 5e  57 22 20 62 61 63 6b 77  |ndkey "^W" backw|
000002e0  61 72 64 2d 64 65 6c 65  74 65 2d 77 6f 72 64 24  |ard-delete-word$|
000002f0  00 54 2d 6b 20 75 70 9f  00 84 2d 73 65 61 72 63  |.T-k up...-searc|
00000300  68 2d 34 00 0a 28 00 4c  64 6f 77 6e 2a 00 30 66  |h-4..(.Ldown*.0f|
00000310  6f 72 5d 00 93 0a 09 65  6e 64 69 66 0a 0a 07 00  |or]....endif....|
00000320  0f 97 00 00 13 20 01 00  0f 9d 00 0f 03 29 00 05  |..... .......)..|
00000330  2a 00 0f a3 00 0b 0f 2e  00 00 0f a9 00 09 0c 2f  |*............../|
00000340  00 60 27 5c 65 5b 48 27  16 00 05 56 04 80 2d 6f  |.`'\e[H'...V..-o|
00000350  66 2d 6c 69 6e 65 15 00  90 20 20 23 20 68 6f 6d  |f-line...  # hom|
00000360  65 0a 0b 00 00 02 00 04  95 00 00 38 00 11 46 38  |e..........8..F8|
00000370  00 3a 65 6e 64 32 00 06  02 00 4f 23 20 65 6e 73  |.:end2....O# ens|
00000380  00 03 20 33 7e 3c 00 03  93 01 00 88 02 06 35 00  |.. 3~<........5.|
00000390  01 38 00 02 1a 00 0f 76  00 02 63 31 3b 35 43 27  |.8.....v..c1;5C'|
000003a0  20 79 01 01 cf 01 01 34  00 01 37 00 42 63 74 72  | y.....4..7.Bctr|
000003b0  6c 7f 03 0f 3b 00 05 26  44 27 12 02 00 0b 02 08  |l...;..&D'......|
000003c0  39 00 3f 6c 65 66 38 00  04 01 ae 00 0f 21 01 02  |9.?lef8......!..|
000003d0  0f 1f 01 08 11 34 36 00  0f 1f 01 06 06 1d 01 03  |.....46.........|
000003e0  35 02 1f 00 01 00 ff db  50 00 00 00 00 00 00 00  |5.......P.......|
000003f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000600  00 00 00 00 00 00 00 00  00 00 00 00 00           |.............|
0000060d


My overnight replication from comment #29 has finished. This was replicating from the original dataset, set to lz4, to a new encrypted dataset on the same pool set to lz4. The corruption was not reproduced!
Code:
[yottabit@nas1 ~]$ md5sum /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/jacob.mcdonald/.zfs/snapshot/migration/.cshrc
[yottabit@nas1 ~]$ md5sum /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc
16da5f6ca9fb073a578996c538b3167c  /mnt/vol1/jacob.mcdonald_migrated/.zfs/snapshot/migration/.cshrc
[yottabit@nas1 ~]$ md5sum /mnt/vol1/jacob.mcdonald_migrated_lz4_lz4/.zfs/snapshot/migration/.cshrc 
eb4ea27bddb78d8caf944b9c0f76ccae  /mnt/vol1/jacob.mcdonald_migrated_lz4_lz4/.zfs/snapshot/migration/.cshrc


So now I think this has narrowed down to the only possibility, unless I missed something:
  • Reading from the intermediate dataset which has been recompressed to zstd (undetected zstd read corruption)
It is possible that changing the original dataset back to lz4 could have hidden another problem, even though the migrated snapshot was when the dataset was set to zstd (but the files were nearly all lz4 on disk). I will change the original dataset back to zstd and try the small single-file dataset replication test agian.
 
Joined
Oct 22, 2019
Messages
3,641
I'm having no luck on my end. :confused:

Can you give me a step-by-step to reproduce this, and I'll try it again on my Manjaro Linux computer?

Everything and every combo I tried yields the same hash, no corruption.
 

yottabit

Contributor
Joined
Apr 15, 2012
Messages
192
Here's the first sequence I tried, which caused the corruption:
  1. Old pool, consisting of 2x Z1 vdevs of different ages, originally created around FreeNAS 9, dataset is lz4 compression for most of its life, dozen or so snapshots, ~2 TiB data, ~4 TiB dataset size (with snapshots)
  2. Changed the dataset compression property from lz4 to zstd O(weeks) ago, have not written much data since (and the specific `.cshrc` file we have been discussing here as an example has an mtime of 2019-10-03)
  3. New pool, consisting of a single disk, several years old but only recently put into service, created with TrueNAS 12 SCALE, dataset is zstd compression
  4. Snapshot the source dataset
  5. Replicate source dataset (latest snapshot only) to the intermediate dataset on the new pool; target properties were zstd compression and no encryption
  6. Replicate intermedate dataset to the destination dataset on the old pool; target properties were zstd compression and password encryption
  7. Observe the intermediate dataset returns the same md5sum as the source dataset
  8. Observe the destination dataset returns a different md5sum from the intermediate dataset
  9. In all, 0.9% of the files were corrupted, determined by running an md5sum check against the destination dataset; the corruption seems to occur mostly on specific types of files (`.cshrc`, Rylo `*.properties`, `*.ico`, etc.)
This dataset contains sensitive PII, so I'm unwilling to share it as-is. But I could perhaps delete the PII from a good replicated version and share that if I can still reproduce the problem, or maybe I can delete the PII from the original dataset and share the snapshot if I'm still able to reproduce the problem. I don't know how much, if any, of the history of this particular dataset is triggering the bug (it seems unlikely that the source dataset is the culprit at all, and more likely the intermediate zstd dataset).
 
Joined
Oct 22, 2019
Messages
3,641
Old pool, consisting of 2x Z1 vdevs of different ages, originally created around FreeNAS 9, dataset is lz4 compression for most of its life, dozen or so snapshots, ~2 TiB data, ~4 TiB dataset size (with snapshots)
This is something that stands out. Very old zpool from the FreeNAS 9 era, replicating into TrueNAS SCALE with ZFS 2.1, and then back again to a different dataset on the old pool.

Replicate intermedate dataset to the destination dataset on the old pool; target properties were zstd compression and password encryption
I don't believe ZSTD was a supported compression property back then? Did you upgrade the old pool's features prior to these replications?
 
Last edited:

AlexGG

Contributor
Joined
Dec 13, 2018
Messages
171
The corrupted file is in fact LZ4-compressed data as is.

During replication, it does change the compression mode in the block pointer as requested but does not actually recompress the data. This way, when you copy to the LZ4-compressed dataset, there is no problem as no recompression is needed. I wonder if the problem persists if the target dataset uses LZJB compression.
 
Joined
Oct 22, 2019
Messages
3,641
The corrupted file is, in fact LZ4-compressed data as is.

During replication, it does change the compression mode in the block pointer as requested but does not actually recompress the data. This way, when you copy to the LZ4-compressed dataset, there is no problem as no recompression is needed. I wonder if the problem persists if the target dataset uses LZJB compression.
I can't figure out what's causing this for @yottabit.

Even trying to reproduce the above, everything works as expected. Regardless of LZ4 to ZSTD, or vice versa, or even changing the compression property of the destination dataset after-the-fact.

My only guesses are it could be something to do with
  • the original pool being being very old (from FreeNAS 9 days)
  • or trying to upgrade a pool's features, which spans many ZFS iterations, can unearth these quirks
  • or finally it's a bug within SCALE itself and/or SCALE + ZFS 2.1.x.

My tests to attempt to reproduce the bug were on:
  • Manjaro Linux, fully up-to-date system
  • ZFS 2.1.1
  • Newly created pools (with the default to accept all features)
 
Last edited:
Top