ZFS Dataset in Subfolder of another Dataset?

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Is it possible to add a ZFS dataset into an already existing subfolder of another dataset? I'd prefer not to have to create an entire set of nested datasets just for the purpose of padding the depth and, on top of that, migrating all the already arranged data into these padding datasets seems like it'd be a pain.

Side Question: Is there an easy way to set up a folder filled with mounted mirrors to other directories? I need to share a few directories from different areas all to one user but there doesn't seem to be an easy way to laser those specific directories out into a single share.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
So far I have not found a method to create a dataset within a subdirectory of another dataset, however, I have found the answer to my side question.

On FreeBSD, to mount a copy of a folder/directory to another place you use mount_nullfs, if you want it to be a relatively permanent thing, you can add the complete mount command to Tasks > Init/Shutdown Scripts as a command that runs at Pre Init (probably the best time since it's right after filesystems are mounted).

My guess for the original question is that there isn't a way on FreeNAS since datasets are meant to be configured exclusively through the GUI and there isn't a method exposed to do what I asked.

The workaround: combine my side question with my unfortunate discovering regarding my main question.
Create the dataset somewhere convenient, separate from where you want it. Then use the mount_nullfs to mount a mirror of your dataset in the final location you want it.

It isn't exactly how I'd like it and I'm not certain how it'll work exactly with ACLs and other settings but it seems like the only method if you want to sort a subset of some data in a way that doesn't disturb the rest.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
ZFS allows creating a dataset under another dataset, and the mount point can be anything. Now I don't know the FreeNAS GUI supports all options.

But, ZFS supports this;
mypool -> /mypool
mypool/dataset1 -> /home
mypool/dataset1/dataset2 ->/whatever
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Yeah, the GUI doesn't seem to support arbitrary mount points. I saw another thread that didn't match what I was asking here but did have some people saying that trying to go outside the GUI for at least dealing with the mount points and dataset locations is a bad idea because of how FreeNAS wraps ZFS. So I'm unsure of what to do other than the workaround I posted.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Theoretically, this is possible, but not recommended, as datasets are intended to be nested directly. However, you could set a subfolder mountpoint on dataset creation via zfs create -o mountpoint=/mnt/path/to/subfolder <name of dataset>. More in line with the ZFS way is to create the intermediate datasets automatically via zfs create -p root-dataset/path/to/dataset.
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Theoretically, this is possible, but not recommended, as datasets are intended to be nested directly. However, you could set a subfolder mountpoint on dataset creation via zfs create -o mountpoint=/mnt/path/to/subfolder <name of dataset>. More in line with the ZFS way is to create the intermediate datasets automatically via zfs create -p root-dataset/path/to/dataset.
Which way do you think would be more appropriate/maintainable? My nullfs mount at boot or just stacking all the necessary datasets?

I can see the nullfs way being a bit stranger and I could see issues arising with recursive permissions depending on how they are applied.

On the other hand, stacking the datasets seems like it'll become at least a mild pain; it'll mean move operations in/out of the directories affected by the change to datasets will take longer due to the new need to copy then erase. That's the primary reason I've avoided using datasets as my primary sorting mechanism because anything being ingested will have an extra write associated with it when I find a final location for it.

Though I will say that the extra copy would help eliminate the fragmentation that I have seen with multithreaded downloads. I've seen files that will copy off the NAS at 40-70MB/s avg. (with spikes 14-110) and then when copied back to the server will go 480-580MB/s bidirectionally.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Neither, as yet. I don't understand your use case. What are you attempting to do?
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Neither, as yet. I don't understand your use case. What are you attempting to do?
I have a set of related data that are all categorized together a few sub-categories down. One subset of that data looks like it'll best be stored uncompressed at the file level and will be highly compressible at the filesystem level. Along with that it'll likely be almost entirely reads with little-to-no writes. So it seems like the best way to store it would be to use gzip maximum compression, minimizes disk space, seems equivalent to lz4 on reads with the penalty of writes hitting the CPU hard being mostly mitigated.

I'm unsure of if I'd change the record size or not, I have to look more into how the data will be accessed for that but I do know that I want it readable via Samba shares.

EDIT: On further thought, maybe the copy vs move concern is a bit overblown; looking at the data more, most of it is accessed directly instead of moved out and then back in locally. Though I will say that an initial transfer from directories to datasets will take quite a long time since I don't believe it can be done in place.
 
Last edited:

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
OK, this is simpler than I originally thought. You should create a single new dataset with the compression settings you like for sharing via SMB. Then move the relevant directory structure wholesale into the new dataset. Then, you'll need to set permissions and ACLs recursively within the new dataset.

To accomplish the move, you can do something like the procedure described at https://docstore.mik.ua/orelly/unix3/upt/ch10_13.htm:

cd /mnt/path/to/orig-folder
tar -cf - . | tar -xvf - -C /mnt/path/to/dest-dataset
rm -rf /mnt/path/to/orig-folder
 

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
How does taring the data help with the move vs something like rsync? Also is there a convenient method that will confirm the checksums of the newly written data to confirm that the move was properly handled?
Also I'll have to break this up into several separate transfers because of the depth of the directory will mean that I have to transfer other folders into each level of the directory tree.
Finally after confirming the transfer was 100% successful and no data was altered, is there a convenient/safe way to excise this data from my snapshots so that I don't have multiple terabytes of unnecessarily duplicated data sitting in limbo?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Tar doesn't consume a socket, like rsync, and is potentially faster. (On the other hand, rsync can run multithreaded.) Try them both. To verify a transfer, you could run tar -cf - . | md5 -q - in both the source and destination trees to see if you get the same hash.

As for snapshots, those are deltas, so the storage consumed is less than you think. You can run zfs get usedbysnapshots <dataset> to see how much is actually consumed by all snapshots of that dataset. As snapshots expire and are deleted, this issue will go away over time.
 
Last edited:

BetYourBottom

Contributor
Joined
Nov 26, 2016
Messages
141
Tar doesn't consume a socket, like rsync, and is potentially faster. (On the other hand, rsync can run multithreaded.) Try them both. To verify a transfer, you could run tar -cf - . | md5 -q - in both the source and destination trees to see if you get the same hash.

As for snapshots, those are deltas, so the storage consumed is less than you think. You can run zfs get usedbysnapshots <dataset> to see how much is actually consumed by all snapshots of that dataset. As snapshots expire and are deleted, this issue will go away over time.
I understand that snapshots are deltas but by essentially deleting several TBs of data and adding several TBs to a new dataset, the delta will basically the entire transfer. Also the expiry of the snapshots will quite long since I keep a few snapshots for at least a year.
 

RubenKelevra

Cadet
Joined
Feb 11, 2023
Messages
1
Neither, as yet. I don't understand your use case. What are you attempting to do?
Dataset children is a feature to set different zfs options for parts of a dataset, like different quotas, access rights, etc.

You can easily move datasets in and out of other dataset via `zfs rename`. Exception: You can't move them out of an encrypted parent dataset.

Different mount points are supported, so the logical parent/child connection doesn't have to be represented by the folder structure.

The main advantage of dataset children is, that snapshots can be made over a dataset and all its children atomically. And reverted in the same manner. That's pretty helpful if you have children datasets e.g., to set different compression settings, but want to send the parent dataset with all children as a package to the backup storage.

Hope this helps :)
 
Top