SMBD process goes 100% and copy speed collapse, ARC request demand metadata issue ?

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello @anodos
Thanks for your update, I'll go to update my system in the morning . I'll Let you know asap how it works.
And yes I've read about samba 4.18, I'm sure it's a lot of work to update. Let's hope the path won't be too long
Thanks again
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Hello @anodos
Thanks for your update, I'll go to update my system in the morning . I'll Let you know asap how it works.
And yes I've read about samba 4.18, I'm sure it's a lot of work to update. Let's hope the path won't be too long
Thanks again
It's not really about lots of work. I've already completed it. It's about risk-tolerance on the stable / enterprise train. This is the sort of change we save for major releases.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
It's not really about lots of work. I've already completed it. It's about risk-tolerance on the stable / enterprise train. This is the sort of change we save for major releases.
Yes and I completely understand that. I can do some testing on my test setup for you if needed, let me know.
Thanks
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Should I add the fruit:streamname_optimization = true in my share's aux parameters after I've done the update or is this optimization automatically system wide activated ?
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
You'll have to enable it. That parameter is still not default in 13.0-stable.
Ok I'll do that.
Maybe just a dumb question but what preset is the best to use when create smb share for macs ? Is selecting no preset and enabling or disabling options is the same as selecting an already existing preset. Also is there a way to store a preset with all setup as I need, options and auxiliary parameters. Thanks again
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130

I spotted an inefficiency in file / dir deletion in case of streams. Assuming that our config hasn't been hacked in such a way that streams are being written to files rather than xattrs the above should significantly improve perf on deletion.

NOTES:
0. update file is based on 13-stable MASTER (not through QE). It is provided for validation purposes _only_.
1. this is a tarball and should be copied as-is to NAS
2. update may be installed via freenas-update <path to tar>
3. new boot environment should be visible in beadm list output with an "R" next to it.
4. if (3), then reboot NAS.

This will only improve deletion perf in cases where streams are present on files. For general metadata perf improvements (reduction of opens, shared mode lock contention, etc) we'd have to upgrade from Samba 4.15 to 4.18, which is a bridge too far for 13.0 U release.
Hello @anodos

I installed the update this morning and ran some test: (I left everything as yesterday with fruit:streamname_optimization = true added as aux parameter at share level). (I ve checked that nas is boot on the custom samba)

What I can say with my sample 107GB 16k items
copy is the same speed about 5minutes
Delete is now 5 minutes but smbd process is back to 90/100% cpu (instead of 60% with U4 instead of custom samba and streamname optimization )

1681382277435.png
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
It speeds things up considerably over here, too. But I don't see any CPU "spikes". (Maybe because the pool's i/o is not really high class.)
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
It speeds things up considerably over here, too. But I don't see any CPU "spikes". (Maybe because the pool's i/o is not really high class.)
In terms of what's deliverable for U5 and Samba 4.15 this is probably the limit. Any deeper changes would be probably greater risk than bringing in Samba 4.18.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
That's what I guessed. Meanwhile: Those "finder precalculates file lists and deletes one by one"-hours are completely gone. So, (for now) it's a substantial improvement. Thank you!
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
FWIW, the higher CPU utilization during deletion in the patched version probably is indicative of doing less pointless disk IO and so you're back to being CPU-bound on delete speed.
Under 150gb of data I'm mainly on ram arc. (256Gb) there is almost no disk activity during those tests. And I do deletion just after the copy so everything is arc. What surprise me is that it was only 60% yesterday. If I reboot in U4 I will be able to run the test to confirm this behaviour.
Meanwhile things have clearly speed up those last two days .
Beside, is there a way to multithread these SMBD processes as 31 cores sleep while one is overloading.
Thanks again
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Only via multichannel for multiple NICs/sessions. AFAIK.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
@anodos
Another question. after patching should I delete and recreate the smb shares or it's not needed ?
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
No changes to shares are needed.
Hello @anodos

I've done some more testing with modifying 2 parameters on mac side (Monterey)
the set of data is 107 GB 16k items (audiofiles .wav)
Mac is rebooted between each settings modification and is the only client in the setup.
Truenas is v13 Samba port update @anodos sent a few days ago and fruit:streamname_optimization = true is added in aux parameters at share level. (rest of settings are as before, sync disable in pool and dataset, no compression, no atime... strict sync no in smb service.

The test is : Copy to server // open folder with about 2K/8k items inside // Delete (everything done from finder on the mac)
the figure (XXM) indicate the level of ARC request demand metadata during the operation

tested with or without use of the DS Store files on network volume
defaults write com.apple.desktopservices DSDontWriteNetworkStores -bool TRUE (or false)

and use or not of dir_cache_off=Yes in nsmb.conf

the result is very interesting as truenas behave completely differently depending on which is "on" or "off".

to simplify :

"DS Store disabled" mean there will be no usage of DS store file by the finder. "Enabled" mean the opposite
"with dir cache off" mean the dir_cache_off=Yes is added to nsmb.conf. "Without" mean the opposite.

1 : DS Store disabled + with dir cache off =
Copy 6min30s (<10M) // Open Folder : minutes to show content so barely usable (50M/smbd100%) // Delete 1min30sec (<20M) !! Fastest ever

2 : DS Store enabled + with dir cache off =
Copy 5min (<10M) // Open Folder : Unusable minutes to show content and arc metadata stick to 50M for 30 minutes!!! smbd 100% even after closing finder windows and don't do anything. During this time you can't do anything on the server from the mac and the mac user don't know that the server is crawling // Delete (after waiting that the arc metadata activity is over) 1m30sec (<20M) Fastest ever.

3 : DS Store enabled + without dir cache off = (the default apple configuration for these two settings)
Copy 6min (<10M) // Open Folder a second to show content, best browsing experience // Delete 6minutes (40M)

4 : DS Store disabled + without dir cache off =
Copy 5min (<10M) // Open Folder : Not the best experience but acceptable a few second to show content // Delete 6minutes (35M)

Voilà, all these results are reproducible.

So we can see that there are clearly some difference in performance depending how the mac is configured. And it's about how the mac will handle his DS stores files and use them and also if it can use local caching mac side while using network shares.
We can also see that we can achieve very fast delete performance in SMB. but in this case the browsing experience is a nightmare.
And, that's my question, Why ? Is there some more adjustments possible on truenas side or mac side ? the same way than streamname_optimization and the last update achieve.

Let me know if anything
Nicolas
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Directory listing is impacted by two things mainly:

1) if this is via Finder, MacOS will often list the dir contents a least once, and then iterate the list of files in the directory and open each one separately, list its xatts, and possible read them. This changes what should be something done in linear time (like Windows does) to something quadratic or worse. The solution in this case is to design your storage so that you have fewer files per directory. E.g. only way to make O(n^2) (or worse) closer to O(n) is to reduce n.

2) There is a fundamental design issue in Samba 4.15 that impacts certain types of workloads like this that is not fully addressed until Samba 4.18.
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Directory listing is impacted by two things mainly:

1) if this is via Finder, MacOS will often list the dir contents a least once, and then iterate the list of files in the directory and open each one separately, list its xatts, and possible read them. This changes what should be something done in linear time (like Windows does) to something quadratic or worse. The solution in this case is to design your storage so that you have fewer files per directory. E.g. only way to make O(n^2) (or worse) closer to O(n) is to reduce n.

2) There is a fundamental design issue in Samba 4.15 that impacts certain types of workloads like this that is not fully addressed until Samba 4.18.
Hello
Every test I run is via Finder as I want to reproduce real world usage as my users will use finder.
The problem is that I can't control the number of files in folder as it is session output by Protools (project file + all the audio files used by this session) the typical number of files is around 2k/10k.

We see with test that usage or not of smb local cache completely Change browsing experience. Which might mean finder perform all his action through in local cache instead of network.

We also see it impact deletion as it's 5 time faster and it halves metadata usage on server if no local cache is used on the mac.

Network Ds store usage clearly improve browsing and metadata requests drastically on the server. Browsing with ds stores will spike low about 5/10M but.without ds stores it's up to 40/50M for a few seconds (or ages if no local cache).

I hope I'll be able to beta test a version with 4 18 to see if it speed up things in my workflow.

Is there a beta train with core ?

Thanks
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Hello
Every test I run is via Finder as I want to reproduce real world usage as my users will use finder.
The problem is that I can't control the number of files in folder as it is session output by Protools (project file + all the audio files used by this session) the typical number of files is around 2k/10k.

We see with test that usage or not of smb local cache completely Change browsing experience. Which might mean finder perform all his action through in local cache instead of network.

We also see it impact deletion as it's 5 time faster and it halves metadata usage on server if no local cache is used on the mac.

Network Ds store usage clearly improve browsing and metadata requests drastically on the server. Browsing with ds stores will spike low about 5/10M but.without ds stores it's up to 40/50M for a few seconds (or ages if no local cache).

I hope I'll be able to beta test a version with 4 18 to see if it speed up things in my workflow.

Is there a beta train with core ?

Thanks

Samba 4.18 isn't even in nightlies for core yet (and won't be until we branch out for 13.1. I'd have to build a custom ISO for you, which would distract from what work can be put into 13.0-U5 to help out. FWIW, I have another round of improvements that appear to reliably give 20% speedup in my test cases for a "plain" Windows-style directory listing (which are part of what Macs do). Will try to get that into U5 before code freeze.
 
Top