Replication broken due to orphaned system dataset

NomsPlease · Dec 14, 2023

I have been trying to find out why my replication is refusing to work. I had this setup previously, but it was a test scenario only replicating a single dataset in the pool. I am now setting it as the whole pool, excluding the dataset called Media, which looks trivial.

My system Dataset blocked my ability to select the whole pool, giving this error below.

Code:

Active side: cannot unmount '/var/db/system/netdata-ae32c386e13840b2bf9c0083275e7941': pool or dataset is busy.

After getting these errors, I did the most logical thing and moved my system dataset to my boot SSDs. They have plenty of space, so really, it's better there and out of the way of the other datasets. This has been moved, and the Main pool, BigPool, is now free of the .system directories. (Other Pool dataset names are redacted)

Code:

boot-pool                                                                        25.6G  21.9G    96K  none
boot-pool/.system                                                                1.61G  21.9G  1.20G  legacy
boot-pool/.system/configs-f1f5036a6e4448d09a9ddb3c45165866                       6.88M  21.9G  6.88M  legacy
boot-pool/.system/cores                                                            96K  1024M    96K  legacy
boot-pool/.system/ctdb_shared_vol                                                  96K  21.9G    96K  legacy
boot-pool/.system/glusterd                                                        104K  21.9G   104K  legacy
boot-pool/.system/netdata-f1f5036a6e4448d09a9ddb3c45165866                        358M  21.9G   358M  legacy
boot-pool/.system/rrd-f1f5036a6e4448d09a9ddb3c45165866                           52.6M  21.9G  52.6M  legacy
boot-pool/.system/samba4                                                          284K  21.9G   284K  legacy
boot-pool/.system/services                                                         96K  21.9G    96K  legacy
boot-pool/.system/webui                                                            96K  21.9G    96K  legacy
boot-pool/ROOT                                                                   24.0G  21.9G    96K  none
boot-pool/ROOT/22.12.3.3                                                         6.09G  21.9G  6.08G  legacy
boot-pool/ROOT/22.12.4.2                                                         6.07G  21.9G  6.07G  legacy
boot-pool/ROOT/23.10.0                                                           5.94G  21.9G  5.94G  legacy
boot-pool/ROOT/23.10.0.1                                                         5.90G  21.9G  5.90G  legacy
boot-pool/ROOT/Initial-Install                                                      8K  21.9G  2.65G  /
boot-pool/grub                                                                   8.22M  21.9G  8.22M  legacy

Code:

BigPool                                                                          70.0T  46.1T   232K  /mnt/BigPool
BigPool/3*****                                                              47.1M  46.1T  47.1M  /mnt/BigPool/3*****
BigPool/A*****                                                                65.6G  46.1T   151K  /mnt/BigPool/A*****

When setting up my replication task, I select the entire pool and set the Exclude Child Dataset option to ignore the unwanted datasets. After saving and trying the run, I got the same error. So I figured I would see if this directory exists and unmount it. Well, the directory doesn't exist, the dataset that would mount there is non-existent, and it seems stuck. I added BigPool/.system to the excluded child datasets, which again didn't fix it.

Code:

root@truenas[~]# ls -lash /var/db/system/netdata-ae32c386e13840b2bf9c0083275e7941
ls: cannot access '/var/db/system/netdata-ae32c386e13840b2bf9c0083275e7941': No such file or directory

I tried to remake the task entirely, not reusing the previous task in case it somehow got that dataset stuck in it. This made no difference and resulted in the same error again. I'm unsure how replication keeps seeing this dataset that does not exist and how this error keeps getting thrown for a non-existent directory.

The only way I could get the task to run and bypass the dataset error was to pick every dataset instead of the pool; that worked. I want to do the entire pool though as if I add datasets I would rather they be default included and require me to exclude them manually.

Screenshot 2023-12-14 at 11.41.13 AM.png

Could anyone point me in a direction to resolve this? Any input would be appreciated.

morganL · Dec 14, 2023

NomsPlease said:
I have been trying to find out why my replication is refusing to work. I had this setup previously, but it was a test scenario only replicating a single dataset in the pool. I am now setting it as the whole pool, excluding the dataset called Media, which looks trivial.

My system Dataset blocked my ability to select the whole pool, giving this error below.

Code:
Active side: cannot unmount '/var/db/system/netdata-ae32c386e13840b2bf9c0083275e7941': pool or dataset is busy.

After getting these errors, I did the most logical thing and moved my system dataset to my boot SSDs. They have plenty of space, so really, it's better there and out of the way of the other datasets. This has been moved, and the Main pool, BigPool, is now free of the .system directories. (Other Pool dataset names are redacted)

Code:
boot-pool 25.6G 21.9G 96K none boot-pool/.system 1.61G 21.9G 1.20G legacy boot-pool/.system/configs-f1f5036a6e4448d09a9ddb3c45165866 6.88M 21.9G 6.88M legacy boot-pool/.system/cores 96K 1024M 96K legacy boot-pool/.system/ctdb_shared_vol 96K 21.9G 96K legacy boot-pool/.system/glusterd 104K 21.9G 104K legacy boot-pool/.system/netdata-f1f5036a6e4448d09a9ddb3c45165866 358M 21.9G 358M legacy boot-pool/.system/rrd-f1f5036a6e4448d09a9ddb3c45165866 52.6M 21.9G 52.6M legacy boot-pool/.system/samba4 284K 21.9G 284K legacy boot-pool/.system/services 96K 21.9G 96K legacy boot-pool/.system/webui 96K 21.9G 96K legacy boot-pool/ROOT 24.0G 21.9G 96K none boot-pool/ROOT/22.12.3.3 6.09G 21.9G 6.08G legacy boot-pool/ROOT/22.12.4.2 6.07G 21.9G 6.07G legacy boot-pool/ROOT/23.10.0 5.94G 21.9G 5.94G legacy boot-pool/ROOT/23.10.0.1 5.90G 21.9G 5.90G legacy boot-pool/ROOT/Initial-Install 8K 21.9G 2.65G / boot-pool/grub 8.22M 21.9G 8.22M legacy

Code:
BigPool 70.0T 46.1T 232K /mnt/BigPool BigPool/3***** 47.1M 46.1T 47.1M /mnt/BigPool/3***** BigPool/A***** 65.6G 46.1T 151K /mnt/BigPool/A*****

When setting up my replication task, I select the entire pool and set the Exclude Child Dataset option to ignore the unwanted datasets. After saving and trying the run, I got the same error. So I figured I would see if this directory exists and unmount it. Well, the directory doesn't exist, the dataset that would mount there is non-existent, and it seems stuck. I added BigPool/.system to the excluded child datasets, which again didn't fix it.

Code:
root@truenas[~]# ls -lash /var/db/system/netdata-ae32c386e13840b2bf9c0083275e7941 ls: cannot access '/var/db/system/netdata-ae32c386e13840b2bf9c0083275e7941': No such file or directory

I tried to remake the task entirely, not reusing the previous task in case it somehow got that dataset stuck in it. This made no difference and resulted in the same error again. I'm unsure how replication keeps seeing this dataset that does not exist and how this error keeps getting thrown for a non-existent directory.

The only way I could get the task to run and bypass the dataset error was to pick every dataset instead of the pool; that worked. I want to do the entire pool though as if I add datasets I would rather they be default included and require me to exclude them manually.

View attachment 73424

Could anyone point me in a direction to resolve this? Any input would be appreciated.

Need to start with version of SCALE you are using. Bluefin or Cobia?

We do tend to recommend that replicating individual data sets is better that pools with exclusions. Its cleaner, simpler and better tested.

If necessary a top level dataset can be used to hold your child datasets that need replication ( advice for users planning their set-up... no so easy if your systems is already setup).

NomsPlease · Dec 15, 2023

The version would be essential; I am running the current version of Cobia. TrueNAS-SCALE-23.10.0.1

I see how allowing specific datasets would lessen the chance of accidentally replicating something undesired, but I cannot trust myself to update my replication tasks every time I add a data set. Regardless, replication should not get stuck on something non-existent, which is being presented in this case. I can replicate individual datasets, but selecting the pool always brings this .system dataset back into play.

NomsPlease · Dec 18, 2023

Bump if anyone has any ideas.

Important Announcement for the TrueNAS Community.

Replication broken due to orphaned system dataset

NomsPlease

Cadet

morganL

Captain Morgan

NomsPlease

Cadet

NomsPlease

Cadet

Similar threads