SMBD process goes 100% and copy speed collapse, ARC request demand metadata issue ?

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Today I've done at least 2TB of copy and still no issue... I don't understand why it work somedays and not the others...
The only thing that really get the system overload is the "delete" from the finder
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
On my TrueNAS Core test system the following auxilary parameters are default:

Code:
min protocol = SMB2
vfs objects = fruit streams_xattr
fruit:metadata = stream
fruit:model = MacSamba
fruit:posix_rename = yes
fruit:veto_appledouble = no
fruit:wipe_intentionally_left_blank_rfork = yes
fruit:delete_empty_adfiles = yes


According to


they won't influence general availability (but some of them enhance certain performance/compatibility aspects).

I'd remove the smb strict sync setting, paste the above parameters, restart smbd, set the test dataset (-> edit global options) to sync=disabled and atime=disabled and start benchmarking on the client side.

Edit: Disabling sync and atime gives you maximum filesystem performance. The downside is, without sync your data will be toast, if something (kernel panic, power failure etc.) goes wrong. But since you are testing on a pure stripe pool just for performance figures ... no harm done.
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Ok thanks
I try that
in the dataset sync is already "disabled" and atime already "off"
for info the settings in shares :
1679161970481.png
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Again. The share seems pretty default.

If you already disabled sync on the dataset, then the strict sync = no had no (additional) effect AFAIK. Reset the auxillary service parameters to the "default" (the ones that seem to be standard, that is) values and test again.

Just out of curiosity:

Are you using a switch? If so: What exact model?
What 10Gb NIC do you use on the client, exactly?
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
I've launched a 670GB copy (the whole set of data I have for testing) I have a reference from previous days in terms of copy time. It's the same copy time as before.
In terms of ARC Requests demand_metadata It's approx 5M more than previous tests for the same set of files (when I compare the graphs bellow)
I try the "delete" from finder and it goes 60M request demand_metadata, SMBD processes 100%... (see last graph)

HDD stats are about less than 1% so the problem is not in the SAS array

Question about SMBD processes : Can there be more than one or two SMBD processes ? Can it speed up things ?

It's a direct connection for testing purpose. I don't have 10GB switches available for now
it's 10Gbe (so it's not as good as fiber but mac mini only offer this).
The server has a Qlogic sfp+ card dual port with a 10Gbe module (FS.com).
Flow control is on on both sides, MTU 1500
In terms of copy or benchmark from the mac if it's large files it goes full 10Gbe speed. (I have a 100GB folder with 5 20GB quicktimes)
If I do a blackmagic speed test it goes full speed also.
In terms of copy speed I get the same result between 2 mac minis. So that is not what worries me. The issue is when the metadata demand overload the server during certain copies or file operation like delete.


Previous settings
1679163662517.png


New settings

1679163713866.png


During Delete

1679164398646.png
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
The server NIC is ok (=it's freebsd compatible). I checked that first, when I read your signature. But the client NIC's chipset ... is it an internal NIC? (Just curiosity.)

Meanwhile: Please test with another MacOS FileManager like this one ...


The free version will be fine for the test.
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Yes it's the 10Gbe Nic from apple (when you buy the mac mini you can opt for 10Gbe instead of 1Gbe) it's an AQC107-AFW. (Aquantia/Marvell)
I have to wait for the delete process to complete (37k out of 55k) and I try.
Yes The finder can also be a cause of slowdown in the way it manage file operation over smb.
Thank you for your time :)
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
So with Finder the delete of 670GB 55Kfiles took 1 hour and 2 minutes (3x the time it took to cop y those files : 23minutes)
1679167742970.png


I've done some tests with Commander 1 (alternative file manager for mac os)
Copy of 180GB 8300items folder is about 8minutes with Commander 1 and 4minutes with Finder so Finder in copy is twice the speed of C1.
Delete about 3minutes for commander one and for finder. Same on this case

Very interesting test :
a folder of 16k files 107GB (Sounds .wav audiofiles)

Delete from the server with finder : 19 minutes with ARC Requests demand_metadata between 50M and 60M
Delete from the server with commander one : 5 minutes with ARC Req Dem_meta mostly under 15M
Such a difference for the same bunch of files. See the graph below

Our problem is in ARC Requests demand_metadata and the way it works depending who send commands.
As soon as it hit above 50M the server is really in difficulty what ever it is trying to do (copy, delete...)

1679171249540.png


We really need to find a solution to this :)
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
I hesitate to make a final judgement here because there seem to have been some recent changes to the ARC code that affect the balance between MRU and MFU blocks (and the corresponding metadata).


I'm not sure yet if the "effects" this is showing on my test system (ARC melts down overnight from 110 GB to 2GB and grows again) are intentional. It may be that this is related to the problem you observed.


This is also the reason that keeps me from updating the productive systems to U4.
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello
I'm also already on testing this morning.
Just to test, I've tried to disable apple SMB2/3 extensions but performance become so bad I put them back with no hesitation.

I saw those post, you're right. MAybe there is also an issue in ARC management. I merely upgraded to U4 just after intallation so I don't have real test in previous version.

I also have some questioning about Samba version. Now 4.15.13.
It is mentioned in some post than with all the security corrections done recently, performance have been really affected and new version 4.18 will bring us back to 4.12 performance.


So I will certainly wipe the pool, change system drives and do a Truenas 12.0 install (the version with samba 4.12)
And see if there is any differences in performances.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hi guys
To go further in tests, I erase pool and reinstall a Truenas V12 version.
So V12 U8.1 Testing to compare (Samba v4.13.17) (I don't know where to find older versions)
I've reproduce the same pool and user and ACL config as in V13U4
No special setting except Apple SMB2/3 extension enable in SMB service and these settings in share :

1679231704699.png


180GB 8300 items Copy 5 minutes (Almost same as V13) // Delete 2 minutes. So WAY faster than v13U4
Also we see that for the same data (and metadata) ARC Req demand_metadata is around 10M for copy instead of around 15M in V13 and delete doesn't go more than 45M instead of 65M

1679230936966.png


Now test with the full set of data 670GB 55k items :

Copy : 24minutes so almost the same as V13
Delete : 45 minutes so better than V13 of 17 minutes but still not very fast. Same issues with SMBD processes that goes overloading cores.

Is that normal that these processes goes overloading CPU ? there is 31 thread sleeping beside and one only overloading... (randomly, it moves from one core to another)

in the screenshot on the left the end of the copy process so we see less than 15/20M in demand metadata and after the delete process goes up to 50M and lowering as long as it complete.
HDD stat are sleeping during the whole process, no overloading in HD side.
1679235522468.png


I launch the same test between the 2 mac minis (12.6.3) to see what happen.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Some more tests between 2 macs Minis (12.6.3) .

In response of the previous posts :

Delete of the same set of files (670GB, 55k items) on the mac from another mac took 4 minutes !!!.
To summarize :
V13u4 : 1h02
V12u8.1 : 45minutes
Mac to mac : 4 minutes

So there is clearly an issue there.

Beside some strange result on copy between macs
iperf3 show nearly full 10Gbe but copy of the 670GB took more than 1h30 (24 minutes to Truenas).

iperf3 between macs.png

Set
[default] signing_required=no
in /etc/nsmb.conf

No difference in copy time and Signing appears to be still ON... Any thing to get rid of it ? Is it only possible in recent versions ?

but statshare show signing is still on between mac and SMB is 3.0.2 :
Statshares between mac.png


With Truenas : Signing is not mentioned and SMB is 3.1.1

Statshares Mac to Truenas.png


Well... after one week testing still no clue of what the problem is and how to resolve it.
If anybody at iX system has an idea... let me know
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
If I were you, I‘d try sysctl vfs.zfs.arc.meta_prune=0 on the TrueNAS server (won’t survive reboots) and use a different client (os). A Linux smbclient preferably. I still doubt, Apple‘s implementation is reference.
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello
I have To find a 10gbe machine to install linux or windows... I have only mac minis for the moment. Also my company (audio post) only use mac and that unfortunately won't change anytime soon... :(

What does this command line do ?

Thanks
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
It prevents the (meta)data in ARC getting pruned.

A 10 year old laptop would be enough to issue a delete via smbclient. 1GbE will be enough as well. It‘s more about „watching things done differently“ and not „done with better/faster hardware“.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hi

So, I've copied back data (670GB 55kitems) to the server
and delete this folder from my Windows 10 laptop (dell xps13 i7/32gb ram) 1Gbe Ethernet
24 minutes to copy
18 minutes to delete

To be noted ( I don't know if it makes any difference : I use the same share mount with apple extensions for both mac and pc )

I haven't tried the sysctl command for now.

1679332958743.png
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Well … windows with AntVirus/Defender etc. and via Explorer is not the best alternative to smbclient. But alas … I‘ll try myself. Just need a bunch of testfiles. Maybe Linux kernel, LaTeX-sources and portage tree.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Strange thing is when after the delete from the pc I wanted to upload back the files from the mac, speed was really slower and copy crashed after 1/3 of the files with an eror 1407 (finder)...

Previous test was just done with v12 from my yesterday's setup.
I go back to v13 and check sysctl line you talked about.

Regarding windows There was no antivirus (except defender you're right)

I'll try the delete from an ubuntu machine we have (quite old but let's see how it perform) .
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
server side:

Code:
root@truenas # cd /mnt/data/prod/Test
root@truenas /mnt/data/prod/Test # find . -type f | wc -l

  253015
root@truenas /mnt/data/prod/Test # ls -lahR /mnt > /dev/null &
root@truenas /mnt/data/prod/Test #


client side:

Code:
user@devuan # smbclient \\\\truenas\\Test
smb: \Test\> dir
 .
 ..
 linux-6.2.7
 ports
 texlive-20220321-source

             80731918 blocks of size 1024. 60360582 available

smb: \Test\> deltree *


Exit with NT_STATUS_ACCESS_DENIED after 11min while deleting/trying to delete some remote file. The stats (deltree command was issued 19:25:00):

smbclient.png
 
Last edited:

awasb

Patron
Joined
Jan 11, 2021
Messages
415
The same via finder:

finder.png


Hit the button at 19:42:00 (after unpacking deleted files again):

finder_dashboard.png

After 7min 3.900 files (out of 253015) are done. When deltree gave the finger salute, more than 50% were done. So ... while not a die hard objective scientific test, I would still say, that finder adds to the SMB inherent slowness.
 
Top