Datasets: questions

pts2800

Cadet
Joined
Jul 20, 2022
Messages
3
Hello TrueNAS community! I just finished setting up my first truenas box, and I have some best practice questions about datasets:

More for my own understanding, it seems like datasets are the actual folders, and the shares just allow you to share that data, am I correct in this thinking?

For my real question, I'm looking to create a media Dataset for Movies, Shows, Photos and Music. Since they will all more then likely have the same permissions, would it make more sense to have a single dataset called Media and then have nested Datasets for each type of media? Or would best practice be to create a separate dataset for each media type?

I read that it's good practice to have a separate dataset for each share. I am just unsure if a nested dataset is also best practice, or if this would be a good use case of nested datasets.
 
Joined
Oct 22, 2019
Messages
3,641
More for my own understanding, it seems like datasets are the actual folders, and the shares just allow you to share that data, am I correct in this thinking?
Datasets are not "folders". They are distinct filesystems.

They way they are presented, you will see them as "folders" down your hierarchy, using the default mountpoints. (Such as /mnt/mypool/movies)


Or would best practice be to create a separate dataset for each media type?
This depends on what you plan to do going forward. If you want to manage dataset properties, as well as snapshots and replications separately, then you can organize them into separate datasets.


If you believe a single dataset to consolidate all your multimedia works best for you (and you just need a place to keep your media without much granularity), then a single dataset can serve this purpose.


It also depends on your modification/deletion habits.

For example, if you deal with large movie files, of which you delete movies often, then this can cause snapshots to retain a lot of "space", since they hold onto the records of previously deleted movies. Meanwhile, let's say that you do replications of a separate dataset with only music. This will spare the "music" dataset from remaining larger than it should be, since the "movies" dataset's snapshots are not tethered to it. Hence, replications that you send to a "backup pool", will be much more efficient, as will the space consumed on the backup pool.
 

pts2800

Cadet
Joined
Jul 20, 2022
Messages
3
Datasets are not "folders". They are distinct filesystems.
I guess I got some more research to do :)

If you believe a single dataset to consolidate all your multimedia works best for you (and you just need a place to keep your media without much granularity), then a single dataset can serve this purpose.

I'm not planning on replicating or doing snaps of any of the Media data, with the exception of photos, but those will be backup off site. As most of the media is replaceable, I'd rather not waste the space on my array for backups. So sounds like a single dataset might be the way to go for me.

Thank you for taking your time to respond!
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
More for my own understanding, it seems like datasets are the actual folders, and the shares just allow you to share that data, am I correct in this thinking?
Copying my reply from another similar thread about datasets:
I find datasets is a concept that is not very well-explained by many guides including even the official docs. It's a lot more than just a folder/dir. It's its own distinct file system. Think of it as a completely separate partition that happens to be really flexible (no hard capacity limits unless you set quotas). So if you're moving a file from one dataset to the next, it's not an instant operation because it actually has to copy the file and delete the original rather than a simple pointer operation. Moreover, snapshots are done at the dataset level, so if you want to structure your backups so you don't have a giant dataset to snapshot (which will take forever to initially send to another dataset), multiple datasets is a good way to structure this. It's also not advisable to mix in SMB ACL's and UNIX in the same dataset. I'd advise keeping them separate to avoid a lot of permissions headaches.


For my real question, I'm looking to create a media Dataset for Movies, Shows, Photos and Music. Since they will all more then likely have the same permissions, would it make more sense to have a single dataset called Media and then have nested Datasets for each type of media? Or would best practice be to create a separate dataset for each media type?
It really depends on how you intend to use them and even what kind of share you want to use (SMB vs NFS). This is because NFS mounts do NOT cross filesystem boundaries. So if you plan to just use 1 "universal root NFS share" and expect to see all your files in all the datasets, you'd find that it doesn't work and you have to mount each dataset separately. SMB shares do cross filesystem boundaries, however. Also, refer to my comments about snapshots above.

I read that it's good practice to have a separate dataset for each share. I am just unsure if a nested dataset is also best practice, or if this would be a good use case of nested datasets.
Whether they are nested or not really makes no difference in my opinion. And they can be easily moved (renamed) to different mountpoints (folders) if need be.
 
Top