Files growing (a lot) during replication - what am I doing wrong?

mythos

Cadet
Joined
Jul 7, 2020
Messages
1
Hello everyone. I'm seeing some strange behavior in my FreeNAS setup and was hoping that someone could help me to understand what is going on.

Background:
I currently have a FreeNAS setup consisting of a pool of six 12TB drives configured for ZFS2. Underneath the top level dataset, I have a number of other datasets organized mostly by media type (so, photos, music, video, backups, and so on).

I need to grow this pool and am in the process of doing so. I have six more 12TB drives ready to go, and my plan was to
  1. Replicate all of my data to a completely different pool
  2. Destroy the original pool
  3. Add the new drives to the system
  4. Recreate the original pool with the new drives added in
  5. Replicate from the backup pool to the newly created pool.
In order to go about executing this plan, I need enough storage to hold all of my original data. To do this, I managed to get a hold of 16 older 3TB drives which I augmented with another two 4TB drives I had lying around. I am aware that mixing drive sizes is not ideal, but I was hoping that I could just use the 4TB drives as if they were 3TB drives for the duration of this temporary process. The fact that there would be wasted space on the 4TB drives does not bother me.

So, I set up these 18 drives (also as ZFS2), created snapshots of the original pool, and started the replication process.

The Problem:
Replication is getting close to finishing, but it is going to fail. Why? Because most (but not all) of the replicated data seems to have grown to be just about %150 of its original size, and I just don't know why.

Drilling down a bit, I took a look at my "Photos" data, and focused in on one file object in specific using zdb. I turned on a bunch of -ddddd's and dumped the stats for one of the files before replication and after. This is where things start to become confusing. On the surface, everything looks to be the same. The block size is the same, the structure of the file seems to be the same, the logical and physical sizes of each of the blocks seems to be the same. The only things which appear different seem to be
  • the physical locations of each of the blocks in the storage (this makes sense to me)
  • The overall `dsize` of the file (this is the part which is strange)
Here is the start of the zbd dump for each version of the file. I've also attached the full dump to the thread in case anyone is interested.

File in original pool
Code:
Dataset Goliath/photos [ZPL], ID 113, cr_txg 2204, 27.1G, 4726 objects, rootbp DVA[0]=<0:36c1192a9000:3000> DVA[1]=<0:ca0acf6f000:3000> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=597612L/597612P fill=4726 cksum=11a86feea6:64aff8ab68f:12c2edd2d32da:26d6e0c097824f

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
     12406    2    32K    16K  2.86M  2.83M  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 180
    path    <redacted>
    uid     1000
    gid     1000
    atime    Wed Jan  9 23:56:10 2019
    mtime    Sat May 15 13:08:22 2004
    ctime    Wed Jan  9 23:56:10 2019
    crtime    Sat May 15 13:08:22 2004
    gen    3402
    mode    100660
    size    2959582
    parent    10911
    links    1
    pflags    40800000004
Indirect blocks:
               0 L1  0:103a6d2e000:6000 0:d8004185000:6000 8000L/1e00P F=181 B=3402/3402
               0  L0 0:183844a7000:6000 4000L/4000P F=1 B=3402/3402
            4000  L0 0:183844b9000:6000 4000L/4000P F=1 B=3402/3402
            8000  L0 0:183844bf000:6000 4000L/4000P F=1 B=3402/3402
<snip>
          2cc000  L0 0:183850dd000:6000 4000L/4000P F=1 B=3402/3402
          2d0000  L0 0:183850e3000:6000 4000L/4000P F=1 B=3402/3402

        segment [0000000000000000, 00000000002d4000) size 2.83M


Replicated file in backup pool
Code:
Dataset GoliathBackup/photos [ZPL], ID 492, cr_txg 3139, 36.1G, 4726 objects, rootbp DVA[0]=<0:25d26af74000:3000> DVA[1]=<0:94337b3f000:3000> [L0 DMU objset] fletcher4 lz4 LE contiguous unique double size=800L/200P birth=3168L/3168P fill=4726 cksum=dbae30877:49c3d186b91:d391e2f87026:1ad0922f3402e4

    Object  lvl   iblk   dblk  dsize  lsize   %full  type
     12406    2    32K    16K  3.81M  2.83M  100.00  ZFS plain file (K=inherit) (Z=inherit)
                                        168   bonus  System attributes
    dnode flags: USED_BYTES USERUSED_ACCOUNTED 
    dnode maxblkid: 180
    path    <redacted>
    uid     1000
    gid     1000
    atime    Wed Jan  9 23:56:10 2019
    mtime    Sat May 15 13:08:22 2004
    ctime    Wed Jan  9 23:56:10 2019
    crtime    Sat May 15 13:08:22 2004
    gen    3402
    mode    100660
    size    2959582
    parent    10911
    links    1
    pflags    40800000004
Indirect blocks:
               0 L1  0:255a89f8f000:6000 0:a0391560000:6000 8000L/1e00P F=181 B=3168/3168
               0  L0 0:25d226f4c000:6000 4000L/4000P F=1 B=3168/3168
            4000  L0 0:25d226f6a000:6000 4000L/4000P F=1 B=3168/3168
            8000  L0 0:25d226f76000:6000 4000L/4000P F=1 B=3168/3168
<snip>
          2cc000  L0 0:25d26af23000:6000 4000L/4000P F=1 B=3168/3168
          2d0000  L0 0:25d26af29000:6000 4000L/4000P F=1 B=3168/3168

        segment [0000000000000000, 00000000002d4000) size 2.83M


So, before replication, the logical size of the file is 2.83MB while the on-disk size is 2.85MB. So, the on-disk size is ~%101 of the logical size.
After replication, the logical size of the file is still 2.83MB, but the on-disk size is now 3.81MB. Now the on-disk size is ~%134.6 of the logical size.

The recordsize of the top level datasets for the original and backup datasets is 128KB, while the record size for the "photos" dataset is 16KB. If I go into the "edit settings" for the "photos" dataset on the backup pool, it warns me that 64KB is the optimal record size. I assume that the 16KB number was replicated from the original data set (which also shows 16KB, but gives no warning).

So, I'm pretty confused. I understand that there can be wasted space if only a portion of the underlying record/block size is used by a file, but the dumps I have here seem to imply that the block structure of the files are functionally identical, as is the record size of the dataset.

Does anyone have any idea of what I am doing wrong here? Why does this file take up so much extra space after replication to the pool made up of a larger number of smaller drives? Is there something that I can do in order to prevent this from happening? I'm not super concerned with maximizing performance during the backup and restore phase of this plan. I just want to make sure that I don't lose anything and that when I have finally rebuilt the original pool, that it performs well.

Thanks in advance for any help!
 

Attachments

  • after_replication.txt
    13.4 KB · Views: 185
  • before_replication.txt
    13.2 KB · Views: 172
Top