Fastest way to get list of all files in a dataset

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
From windows I have a program that just connects to SMB and queries *.* and it takes a few minutes to complete. I don't believe it's the size of the information returned because if I do it again it returns in a few seconds. But after a few minutes it is slow again. Is there any way I can speed this up on a permanent basis? Is SMB the problem?
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
Looks like navigating to the dataset and doing
Code:
ls -lR > myfile.txt

will output all the data quickly
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
From windows I have a program that just connects to SMB and queries *.* and it takes a few minutes to complete. I don't believe it's the size of the information returned because if I do it again it returns in a few seconds. But after a few minutes it is slow again. Is there any way I can speed this up on a permanent basis? Is SMB the problem?
I believe the faster result could be due to having the metadata cached, then evicted After a few minutes.

You could try fiddling with some tunables to increase either the metadata longevity or the metadata size.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
I am ssh into the server and running
Code:
find $PWD -name \"*.mp4\" > listmp4.txt

So far it is about instant. But I dont know if it is actually just cached or much faster.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
I rebooted the server and yeah, it was just cached. Might have been a little faster than pulling a complete list from SMB. What do you suggest for preserving the metadata?
 
Joined
Oct 22, 2019
Messages
3,641
There's actually two different caches at play:
  • ZFS ARC (on the server's RAM itself)
  • SMB cache (on the client's RAM)

Both affect performance.

I know with Linux clients, you can override the default SMB cache behavior from cache=strict to cache=loose, which yields a notable performance difference.

Not sure how it's done on Windows. Probably some obscure registry key.

If you're the only user accessing the SMB share, there's no risk in using cache=loose.


EDIT: Either way, it's best to retrieve a directory tree locally on the server itself, rather than through SMB.
 
Joined
Oct 22, 2019
Messages
3,641
I rebooted the server and yeah, it was just cached. Might have been a little faster than pulling a complete list from SMB. What do you suggest for preserving the metadata?
Here is a thread about this issue, and how to prevent metadata from being aggressively evicted from the ARC:


However, starting with OpenZFS 2.2.0+, this will no longer apply:
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
Here is a thread about this issue, and how to prevent metadata from being aggressively evicted from the ARC:


However, starting with OpenZFS 2.2.0+, this will no longer apply:
Thanks. I got up this morning and pulled a file list from the SSH client and it was still near instant so maybe it will just be slow on the first run. Not sure but that is the first time I have seen the file list return quickly after not being accessed for several hours. It's not a huge deal, I just pull a list every few days to keep a SQL Server Database up to date with what is on my NAS. If it continues to take a few minutes to get going then I can live with it, but to have it a little more "on demand" would be nice.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
Here is a thread about this issue, and how to prevent metadata from being aggressively evicted from the ARC:


However, starting with OpenZFS 2.2.0+, this will no longer apply:
I updated the vfs.zfs.arc.meta_min parameter and rebooted. Before my cache was wiped out if I copied a large file. I just copied 100 files and the files list was still returned in about 1 second. SMB is taking about 9 seconds, but it retrieves everything regardless of extension. The new files were also in the list so it is accurate. Thanks, this does seem to help a lot.
 
Joined
Oct 22, 2019
Messages
3,641
Thanks, this does seem to help a lot.
Glad to hear!

Just keep in mind:
  • The tunable will not be available in OpenZFS 2.2+ (which TrueNAS Core will eventually inherit).
    • You'll have to play it by ear with ZFS 2.2.x. If things work well? Leave it as is. If not? You can adjust the new tunable (seen in the updated thread).
  • Keep in mind that the local client's cache (via SMB) is untethered from the server's ZFS ARC. A slowdown over SMB does not necessarily mean anything was evicted from the ZFS ARC on your TrueNAS server.
  • There's always going to be overhead when accessing anything (even a directory tree) over a network share protocol.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
Glad to hear!

Just keep in mind:
  • The tunable will not be available in OpenZFS 2.2+ (which TrueNAS Core will eventually inherit).
    • You'll have to play it by ear with ZFS 2.2.x. If things work well? Leave it as is. If not? You can adjust the new tunable (seen in the updated thread).
  • Keep in mind that the local client's cache (via SMB) is untethered from the server's ZFS ARC. A slowdown over SMB does not necessarily mean anything was evicted from the ZFS ARC on your TrueNAS server.
  • There's always going to be overhead when accessing anything (even a directory tree) over a network share protocol.
Update. So far even my SMB has greatly improved. Several days and many files later, accessing the file list is much faster. I think it takes about 30 seconds to get a full list if I have not requested it in day or so and this is a 50TB dataset more than 90% full. Before it was about 2 minutes. I only do this a few times each week, probably less if I'm not collecting junk which I always am. General performance of the pool seems to have increased but that may be subjective, but it does make since because if the files locations are cached of course they will all open faster not just when I am doing a huge query.
 
Top