SMBD process goes 100% and copy speed collapse, ARC request demand metadata issue ?

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello
I'm following up with my test setup for TrueNAS 13 U4

Server :
Super Micro X10DRH-CLN4 with LSI 3008 IT mode SAS12gb adapter
Dual E5-2620 V4
Ram 256GB
System on mirrored sata SSD (mirrored pool in true nas)
Storage : 12 x 18TB Ultrastar SAS12gb 7200rpm HDD
Testing Pool Stripe of 12 hdd (I wanted the fastest pool to test performance. I'll go Raid-Z in a second time) Record block 128k (default ?)
Network 10Gb ethernet
SMB Apple SMB2 3 Extension Enabled

Client :
Macmini i7 6cores
Ram 64gb
10Gb Ethernet
OS Monterey 12.6.3
Client Server direct 10Gb Ethernet connection MTU default 1500

SMB 3.1.1 negociated
strict sync = no

I have a big issue copying large folders of files (180GB for 8300 items 99% sound files .wav )
Normally this copy goes in around 8 minutes but for unknown reason it randomly take around 27 minutes to complete

The copy start normally and after approx 100GB
then speed collapse to around 100mbs
and SMBD processes goes 100% on random CPU cores for 20 minutes before everything goes back to normal and it finishes the copy. The server goes ventilating accordingly. ARC request demand seems overbooked but why ? Disks are mostly idle.

It's a simple Mac Os finder copy (drag n drop from one window to the other). I do nothing on the server, I do nothing on the mac.
I'm really concern about this one as it's a random issue that does not appear each time I copy the same set of files. (it appears 1out of 5 I would say)

here's the screenshot.

Capture d’écran 2023-03-15 200745.png
Capture d’écran 2023-03-15 200821.png
Capture d’écran 2023-03-15 201628.png


Capture d’écran 2023-03-15 200845.png
Capture d’écran 2023-03-15 200926.png

Capture d’écran 2023-03-15 203001.png


Capture d’écran 2023-03-15 203211.png


Capture d’écran 2023-03-15 203133.png


Capture d’écran 2023-03-15 203243.png


Capture d’écran 2023-03-15 203401.png

Capture d’écran 2023-03-15 203422.png


Capture d’écran 2023-03-15 203438.png


Thanks
Nicolas
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Any idea ?
Is that the same issue ?

Any solution for Mac os Finder ?

That is a very common scenario for our users to move such folders and we never had this kind of issue on our actual Windows server / Acronis File connect (ExtremeZ-ip) in also in SMB.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello
Another bunch of test today and it's not getting better. I begin to assume we have a bug here... Any idea ?

So the test is 4 folder copy, one after the other, from still the same setup (see signature) mac mini to truenas SMB share
Dataset clean and new from today.
1- 180GB Folder with Audiofiles in diverse subfolder hierarchy (8300 items 99% sound files .wav)
2- 107GB Folder with one subfolder and 16150 Audiofiles .wav
3- 100GB Folder with 4 Quicktimes 25GB each
4- a second copy of the same 180GB folder from Step 1- in a different folder of the server (so no erase, no file replacement)

so the same reproducible issue as yesterday the more you copy the worse it get.
I can't really explain the speed difference of the two first copy as it is a bunch of audiofiles in both case. But what is more of a problem is that the first 180gb copy is about 5 minutes and the second 180gb copy (same source files) is 24 minutes with smbd processes going 100% on random cores and arc request demand metadata going sky high. HDDs (12x 7200rpm SAS12gbs stripe pool) seams to be completly idle during the last part.

Capture d’écran 2023-03-16 200938.png


Capture d’écran 2023-03-16 201050.png



Capture d’écran 2023-03-16 201140.png


Please help :'(
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello Guys :)

So today no change in configuration and started 1,2 TB of copy from mac mini to server (addition of the same folders of files as usual and No issue...
ARC request demand_metadata didn't exceed 20M instead of 60M yesterday...

Why ? I don't have any idea...
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
It looks like the demand_metadata requests are getting hit (your purple "miss" line remains pegged at near-zero) so it's not likely to be a case of metadata being pushed out of ARC and hammering your disks with random I/O.

Does this also happen if you copy using Terminal and the "cp" command? Notably - try copying multiple times, without opening Finder or ls'ing the directory (avoid MacOS trying to enum the metadata) and see if it falls over again/eventually.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello

Thanks for your reply
I'll made a test using cp and let you know.
but is that strange that for the same bunch of files I have such a difference un metadata request ? (60M vs 20M)
Thanks
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
If I can get the cp command to work.... I have only some "no such file or directory"... and i've copied pasted the pathname of source and destination
Do you know how to specify the path of a network share on mac os ? If I copy the path in "get info" it's not working
Thanks
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
I've tried this command
cp -R '/Users/nicolas/Documents/Test Data/FILMFILES' 'smb://truenas._smb._tcp.local/FILM3/folder/'
with or without the quotes ' and with truenas.local as server name.
Sorry I'm not a command line expert
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Found ;)
/volumes/sharename for destination
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
So I've done a copy vie terminal with CP
Here the reporting screenshots. It seams to be the same behaviour than with finder.
The first copy of this folder is always around 15M/20M max
I launch more copy to see what happen when the zfs cache fill up completely the ram (around 228GB)
1679081496405.png


1679081536403.png
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Ram usage during copy
1679082219164.png
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
For an unknown reason the 10Gbe connection between the mac mini and the server dropped in the middle of the copy of a 670GB. (after 275GB approx). So the server share got disconnected and the copy stopped (timeout messages in the terminal)

So know I delete all the files of this aborted copy from the server (approx 31000 files for 275GB) via the mac os finder, select the folder and delete.
As soon I click "delete" the ARC demand metadata jump to around 45M and the smbd processes hit 100% on random cpu cores during all the delete process that last for ages (as when arc hit 60M during copy yesterday)
The "Services" Memory goes up from 18GB to around 197 GB (!!!) by the end of the delete process.
And the most impossible :
The copy of the approx 31000 files/ 275GB took 11minutes
The delete of the same files took 30 minutes. 3 times longer to delete than to copy...
There is almost no network traffic during delete.

1679084825630.png



1679085283758.png


Let me know
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
What does iperf show?
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
What does iperf show?
between the mac mini and the server ?
I didn't tried (I don't know how to use it but I might try to test)
If I copy 100GB of large files (20GB each) (quicktime movies) It goes 9,5Gb
Blackmagic speed test goes around 1000MB read and write so 10Gbe full speed approx.
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
I can't get iperf install on the mac.... nightmare
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Ca y'est...
------------------------------------------------------------

Client connecting to 10.77.0.1, TCP port 5001

TCP window size: 129 KByte (default)

------------------------------------------------------------

[ 4] local 10.77.0.2 port 49272 connected with 10.77.0.1 port 5001

[ ID] Interval Transfer Bandwidth

[ 4] 0.0-10.0 sec 11.0 GBytes 9.41 Gbits/sec
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Even if it is quite unlikely: does the system also „stall“ if you delete the folder with the 8300 files locally on the server? (E.g. via ssh rm -fr $BIGFOLDER … but be carefull with rm -fr ... one typo will be enough to erase a lot more than you had intended to.)

Apart from that:

It looks to me as if your system could benefit from a decent SLOG SSD (optane). But before buying that, I would take a look at the SMBD-log.
 
Last edited:

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Hello, thanks for your reply
I'll try to do it locally (I have figure out how it works )
And I'll try to find SMB Log
 

Nicolas_Studiokgb

Contributor
Joined
Aug 7, 2020
Messages
130
Regarding Slog, I was wondering if 256gB of ram would be enough instead of slog but I don't know.
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Not with Macs as clients. Believe me, I went through this. It changed everything (with rsync-backups via NFS plus TimeMachine-Backups, since both required sync-writes). But it's too early for that. First you need to find out, whether your hardware is functioning at the "system level" before you optimize applications.

Local Access for starters (the output and terminal width is far from being nice, though):

TNC_shell.png


Trace SMBD-logs on the TrueNAS Core server::

Code:
tail -f /var/log/samba4/log.smbd


With the terminal window open and this command issued, start a file copy via finder to smb-share. Or start erasing. To view the entire log:

Code:
less /var/log/samba4/log.smbd
 
Last edited:
Top