Poor SMB performance & high CPU load

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
Hi, I'm a noob to TrueNAS and just set it up. So far I'm pretty disappointed in the performance but there's a good chance of misconfiguration on my side, hopefully.

I did trawl the forum and websites for infos quite a bit and found similar threads but the solutions didn't help me so far. At this point it feels like I'm just wasting time because I'm not exactly sure what I'm looking for. Maybe there's something obviously wrong to an experienced eye...

I think my setup is fairly straight forward:

- Supermicro X10SDV-TP8F (XEON D-1518)
- Proxmox 8.0.3 Host (on SSD)
- VM for TrueNAS Scale (latest version) with 4 cores and 32GB RAM
- Broadcom SAS3008 Controller HBA (IT mode), pass-through to VM
- 4x 8TB Seagate IronWolf in a VDEV: 1 x RAIDZ1 | 4 wide | 7.28 TiB

- Main Dataset with Media sub dataset and SMB share
- Compression turned off
- Dedupe off

I'm not running apps or VMs within TrueNAS, it's just storage. My issues are copy & write speed through SMB (haven't tried anything else):

a) TrueNAS SMB share mounted on Synology DS214+ (DSM7)
Copy large video files 3+ GB to TrueNAS starts as at expected 70-100 MB/s but drops down to 1/3 after approx 2-3GB and stays low, CPU shoots up to 100%.

b) Copying the same file back from TrueNAS to Synology only does 10-15MB/s from the beginning!

c) Copying PC > Synology via SMB steady 80-100 MB/s

d) Copying Synology > PC via SMB steady 80-100 MB/s

So, it's not the PC and not the Synology I would say, it's something on the TrueNAS system. Even though the Xeon D-1518 is a bit older, it should easily match my old DS214+ Synology which doesn't have any speed issues.

I've attached some screenshots. Help would be much appreciated.
 

Attachments

  • HBA-Controller.png
    HBA-Controller.png
    35.6 KB · Views: 141
  • ProxmoxVME.png
    ProxmoxVME.png
    18.9 KB · Views: 145
  • TrueNASCPU.png
    TrueNASCPU.png
    44.9 KB · Views: 154
  • TrueNASdashboard.png
    TrueNASdashboard.png
    302.4 KB · Views: 150
  • TrueNASdatasets.png
    TrueNASdatasets.png
    14.9 KB · Views: 134
  • CopyPC2TrueNASshare.png
    CopyPC2TrueNASshare.png
    10 KB · Views: 115
  • CopyTrueNASshareToPC.png
    CopyTrueNASshareToPC.png
    6.6 KB · Views: 143

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Moderator note: moved to an appropriate subforum

I recommend reviewing the forum virtualization guide, see


and especially focus on the "start on bare metal" advice. Your choice of a relatively underpowered D-1518 platform along with Proxmox, which isn't recommended, means you are fighting several simultaneous battles. It is not only possible to under-resource your NAS VM, but it is also possible to over-resource your NAS VM, and having the comparison to bare metal performance is very helpful to help judge how much of the CPU your workload should actually consume. Your workload may be swamping the 1518, especially with issues such as compression, which can be taxing on low-end CPU's (the D-1518 is a 2.2GHz part), you're using virtio for reasons you haven't explained, and your ARC performance as shown looks pretty crappy. Why the heck are you using SCALE, anyways? If you are just doing pure storage, SCALE is a horrible platform to be on. The Linux stuff is a bit dicey, especially its memory interactions with ZFS.

So, it's not the PC and not the Synology I would say, it's something on the TrueNAS system. Even though the Xeon D-1518 is a bit older, it should easily match my old DS214+ Synology which doesn't have any speed issues.

The Synology is not comparable to a ZFS system. I believe the Syno typically does BTRFS or EXT3, neither of which are particularly stressy. ZFS does checksumming, compression, and RAIDZ computations on the CPU, and Samba is quite CPU-demanding and is expected not to perform particularly well on your low speed CPU. It's why I typically like to encourage people to use something fast, like an E5-1650 v3, which offers a lot more elbow room.
 

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
Thanks for your reply, not too encouraging though :-( on running a power efficient NAS.
Regarding TrueNAS SCALE being a horrible platform for pure storage, do you have other suggestions? Otherwise it's probably back to another Synology Diskstation.

I'll check the links, thanks.
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Instead of Scale try Core. Core is the more mature, stable and performance tuned version of Truenas. Core is the base for iX enterprise customers.
Scale is in it's first release cycle and currently has to fight some zfs on linux limitations regarding ARC (50% Limitation). Core is the battle proven veteran for pure storage needs. If you dont need the apps system or the scale out functionality give core a try.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Yeah, I would second @LarsR on this point, especially since you are already virtualizing anyway. I personally don't see the point of running SCALE on top of a VM host, since the whole selling point of SCALE is really the Apps section, which is rendered moot since you are better off just spinning up another VM that does exactly that.

I myself run CORE on Proxmox (specs in my signature). And for other services, I just have another vanilla FreeBSD VM with a bunch of jails managed by BastilleBSD. SCALE, in my opinion, is horrible for apps & services because of the high idle CPU utilization of the k3s process.
 

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
First tests are promising. Same VM hardware setting, pool settings, but running TrueNAS Core 13.0-U5.1

Copy SMB from Win11 to TrueNAS (3x 3GB files) continuously 95-98 MB/s, CPU max. 60% (4 cores), RAM max 33% used.

Copy back from TrueNAS to Win11, continuously ~105 MB/s, less CPU usage.

I'll get my 10GbE adapter for my PC later this week and see how much throughput I get. Is there a benchmark how much roughly 4x 8TB IronWolf can deliver in a RaidZ (read/write)?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's probably no "benchmark"; speeds will vary based on the number of devices you have, and are also highly dependent on CPU (SMB likes core speed, compression affects speed, etc) plus of course ZFS is a copy-on-write filesystem so that throws a very ugly wrench into any attempts to "benchmark" in a consistent manner.
 

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
OK, I'll report back what I get once the NIC is here. I'm still hoping around 300MB/s when running a backup on a single client...I've compression turned off because most data is already compressed anyway.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Disabling compression should help a good bit. Even if it cannot compress data, ZFS will try the configured compression to see if a block can be compressed, and this turns out to be wasted effort on already compressed data.
 

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
I think my setup is fairly straight forward:

- Supermicro X10SDV-TP8F (XEON D-1518)
- Proxmox 8.0.3 Host (on SSD)
- VM for TrueNAS Scale (latest version) with 4 cores and 32GB RAM
- Broadcom SAS3008 Controller HBA (IT mode), pass-through to VM
- 4x 8TB Seagate IronWolf in a VDEV: 1 x RAIDZ1 | 4 wide | 7.28 TiB


c) Copying PC > Synology via SMB steady 80-100 MB/s

d) Copying Synology > PC via SMB steady 80-100 MB/s

So, it's not the PC and not the Synology I would say, it's something on the TrueNAS system. Even though the Xeon D-1518 is a bit older, it should easily match my old DS214+ Synology which doesn't have any speed issues.

I've attached some screenshots. Help would be much appreciated.

Looking at your config, don't forget that that your TrueNAS VM is connected to the outside world through 1 network adapter, and it is going through a physical 1Gbe Ethernet adapter.

Try doing some data transfers from another VM on that host and you should see higher speeds.

I to am experimenting with virtual TrueNAS, and on the disk side here is my physical vs virtual disk configuration

Physical: 4 cores | 32GB RAM | 4 x 4TB in 1 VDEV Z1 | 480GB SSD SLOG | 480GB SSD L2ARC
Virtual: 8 cores | 32GB RAM | 6 x 10TB in 2 VDEV Z1

And I am seeing very comparable performance of a 1.2 GB dataset copy. Both are giving me about 8MB /s when writing from my wireless laptop, and about 25MB/s when using a virtual machine on the same host as the virtual TrueNAS

I am not really worried about actual real world numbers, just if my virtual is as fast as my physical.
 

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
OK, I'll report back what I get once the NIC is here. I'm still hoping around 300MB/s when running a backup on a single client...I've compression turned off because most data is already compressed anyway.
I think you will see those numbers. I did a an ESXi datastore copy from a local Datastore (SATA Disk) to both my physical and virtual TrueNAS, and seeing about ~550 MB/s for physical, and ~250MB/s for virtual

Physical: 4 cores | 32GB RAM | 4 x 4TB in 1 VDEV Z1 | 480GB SSD SLOG | 480GB SSD L2ARC
Virtual: 8 cores | 32GB RAM | 6 x 10TB in 2 VDEV Z1
 

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
NIC is in and running, I copied ~40GB of videos from the NAS to Windows SSD with an average of 315MB/s. And the other way, writing to the NAS actually gave me surprisingly 371MB/s on average. Don't know why because my Samsung 990Pro on Windows should really be quick enough either way. Anyway, happy with the results so far. So, smcclos you think VM slows down the transfer rate by 50%? (in your case)
I thought my RAIDZ setup is close to what the HDDs can actually write combined. Probably need more to saturate the 10GbE network connection.
 

smcclos

Dabbler
Joined
Jan 22, 2021
Messages
43
NIC is in and running, I copied ~40GB of videos from the NAS to Windows SSD with an average of 315MB/s. And the other way, writing to the NAS actually gave me surprisingly 371MB/s on average. Don't know why because my Samsung 990Pro on Windows should really be quick enough either way. Anyway, happy with the results so far. So, smcclos you think VM slows down the transfer rate by 50%? (in your case)
I thought my RAIDZ setup is close to what the HDDs can actually write combined. Probably need more to saturate the 10GbE network connection.
No, I think the absence of the SLOG is a factor and possibly the actual HDD. My physical setup has 4TB Toshibas, and virtual has 10TB HGST, and don't have the actual numbers, but those Toshiba's were fast.

My ultimate goal is for the virtual to have the same SLOG and L2ARC as the physical. The one last test I have not tried, but maybe I will is I have another pool of 3x3TB in my physical server. Curious how that will perform with the same test.
 
Last edited:

Volts

Patron
Joined
May 3, 2021
Messages
210
Disabling compression should help a good bit. Even if it cannot compress data, ZFS will try the configured compression to see if a block can be compressed, and this turns out to be wasted effort on already compressed data.

I'm surprised by this comment. LZ4 has a remarkably low penalty for already-compressed data. I've never seen anything suggesting otherwise for LZ4 or ZFS. Is there some additional interact with the described config, or with Samba?
 

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
I've not tried to switch on LZ4 again, I just turned everything off that could impact the performance. I might switch it on later for testing but most of my data is compressed in a way already...(backups, videos, jpgs)
 

Volts

Patron
Joined
May 3, 2021
Messages
210
I'm not encouraging you to enable compression for a backup/truly incompressible dataset.
I'm just curious about the suggestion that disabling compression might be helpful.
I would expect LZ4 to be not-harmful at very worst, and slighty-to-significantly better in mostly all cases.
 

kiwiana

Cadet
Joined
Jul 2, 2023
Messages
8
Anyone tried to run SLOG on a Samsung Pro 980? I mostly saw enterprise SSDs being used...(not in my budget currently)
I'm also unsure which size is required for my setup. The docs mention 16GB with over provisioning.
How about Samsung PRO 980 256GB, use 64GB and the rest for over provisioning?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I'm surprised by this comment. LZ4 has a remarkably low penalty for already-compressed data. I've never seen anything suggesting otherwise for LZ4 or ZFS. Is there some additional interact with the described config, or with Samba?

Well, then, you have an opportunity to learn something new today. As I said,

Disabling compression should help a good bit. Even if it cannot compress data, ZFS will try the configured compression to see if a block can be compressed, and this turns out to be wasted effort on already compressed data.

ZFS will attempt to compress the block with the configured compression. Even if that is LZ4, it still tries to compress the data. Upon discovering that the block is not usefully compressible (i.e. shorter in sectors when compressed) then ZFS will store the block verbatim. The point is that disabling compression will avoid the attempt at compression entirely, which can make a difference especially for high speed fast networks, but also usually makes a difference on other heavy-write workloads. The fact that LZ4 is fast does not compare favorably with simply omitting compression. See the following example.

Code:
# /usr/bin/time lz4 < /boot/kernel/kernel > /dev/null
        0.15 real         0.11 user         0.03 sys
# /usr/bin/time cat < /boot/kernel/kernel > /dev/null
        0.06 real         0.00 user         0.06 sys
#


That's a 41MB FreeBSD kernel. LZ4 compression takes an extra 0.09 seconds.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Anyone tried to run SLOG on a Samsung Pro 980? I mostly saw enterprise SSDs being used...(not in my budget currently)
I'm also unsure which size is required for my setup. The docs mention 16GB with over provisioning.
How about Samsung PRO 980 256GB, use 64GB and the rest for over provisioning?

Do you know what the requirements for a SLOG device are? Don't bother with the 980, you'll just burn it out for no value received.

 
Top