10 gigabit network slow speeds

russotron · Jan 24, 2020

Hello everybody,

i have a little problem and would need some advice where to look in order to fix it.

my current setup consists of
1. PC: ryzen3600 + 32gb ram + 970 PRO ssd with a onboard 10Gbps Aquantia card on x470 Taichi ultimate
2. Microserver gen8 (16gb, 1265Lv2) with 2x 1tb red and 1x 256gb samsung 850 pro. it has the SFP+ Mellanox Connext-2 (mnpa19-xtr) with the Microtik S+R10

they are connected directly with static IPs. (PC >> cable cat6>> wall socket cat6 >> cat6 cable in the wall>>patch panel cat6a keystone >> patch cable cat6)

the problem is that i am not getting close to the full speed when trying to copy files over samba. you can see some peaks about 8-9Gbps but usually it is around 3.5Gbps. I am not sure what it is caused by. First i thought it might be the cable/sockets/patch panel issue but i run iperf and it looks fine:

then i tried to use jumbo frame so i set MTU to 9014 because this setting was available on the Aquantia. this only helped for the iperf a little:

So currently i think it might be a samba problem but i am free to any suggestions.
btw it is my testing server. the "production" is the same GEN8 but with 4x3tb in raid and the reason i want to go 10gb is that i plan to switch to 4x1tb ssd. i dont need to but i want to :)

thanks for you time

jgreco · Jan 24, 2020

You're probably not going to get perfect 10Gbps out of a single Samba connection. Samba is notoriously singlethreaded. Try several copies in parallel, preferably from several different PC's.

There will also be some point around which the NAS unit will seem to not be able to go faster than. A ZFS system with only 16GB RAM on a Microserver with a 1265L is at a significant handicap. I'm pretty sure the 1265L will only turbo to 3.5 if only one core is active, and since both Samba and ZFS are using cores, you are probably effectively running on a 2.5GHz CPU that goes a little faster sometimes. This will definitely be damaging the Samba performance.

For best Samba performance, you really need the high clock speed CPU's. It isn't clear that this is actually what's hurting you (you could try watching the results of top(1) while doing tests to get a better sense) but it definitely isn't helping you.

Kcaj · Jan 24, 2020

Is it possible your RAM / L2ARC is saturated and the physical disks are too slow?

jgreco · Jan 24, 2020

And maybe I got fixated on the wrong thing there. What's your pool? Two drives in a mirror? If you're writing to that pool, the fastest you're ever going to get is maybe about 150-200MBytes/sec (maybe 2.5Gbps) sustained, because that's all the drives can sustain.

russotron · Jan 24, 2020

i was very surprised to see that 90-95% usage of the RAM after i start the copy

regarding the pools it just a simple disk ssd one pool the SSD should be able to sustain 500MB. it is a PRO disk and not EVO with all those turbo cache magic.. and the 2x1tb HDD are in RAID i will check it if it RAID0 or i made a mistake for RAID1.

if the samba would be the problem here is there another way to use something else. I need the data to be visible from Win10 and for emby/plex/transmission/sonarr etc

russotron · Jan 24, 2020

btw on the other gen8 i have a e3-1270v2 which clockspeeds are a little bit higher so i can try with that one and post results here as well

Kcaj · Jan 24, 2020

russotron said:
i was very surprised to see that 90-95% usage of the RAM after i start the copy

You might need to read up on how ZFS works, start here. From link:

ZFS actively leverages system RAM to improve performance....

russotron said:
...SSD should be able to sustain 500MB...

Run some benchmarks to see what your SSD is actually capable of.

russotron said:
...RAID0 or i made a mistake for RAID1...

I hope you have put your RAID controller into AHCI mode, and allowing ZFS direct access to the disks.
Then you should be talking about RAIDZ, mirrors or a single disk.

russotron · Jan 24, 2020

i found that my 2x1tb hdd were mirrored and not striped so i changed that and the write speeds increased to 270-300 MB/s. strangely enough the read speeds from that share was between 150-270 .. usually at 210

for the read and write to the SSD i am starting to think that it is not sata6gb but only sata3gb which maxes out about right at 270MB/s

russotron · Jan 24, 2020

oh and samba using top was when using just one copy at 35%-45% .. when using 2 copies (one to SSD and one to HDD) 65% and sometimes jumped to 125%. then i think the speeds slowed down basically to zero

russotron · Jan 24, 2020

very strange that the WD are SATA III and the only drive which could really benefit from SATA III is just SATA II

russotron · Jan 25, 2020

i checked my other server with 4x3tb hdd and it was the same so it looks like slot 1 and 2 are SATAIII and slot 3 and 4 are SATAII so i switched the disks and now i am getting write speeds to the ssd stable at 500MB/S but reads are fluctuating around ~450MB/s

i also tried to copy at the same time from PC to server HDD and from server SSD to PC:

so i would say i am finally quite satisfied with the results. i will have to think about the swith to 4x1tb ssd.

thank you for your suggestions i will check the ZFS pdf mentioned above

Johnnie Black · Jan 25, 2020

russotron said:
but reads are fluctuating around ~450MB/s

For as far as I can remember read speeds from FreeNAS are always noticeble slower than write speeds, at least with SMB, which is what I've always used.

jgreco · Jan 25, 2020

In order to take the most advantage of multiple drives for speed, you need concurrent operations or a really aggressive read-ahead strategy. ZFS does not "stripe" between vdevs because it's designed for concurrent operations on large servers, not a single user.

Read speeds of around 200MBytes/sec for your current configuration are fine, everything's working as it should.

Johnnie Black said:
For as far as I can remember read speeds from FreeNAS are always noticeble slower than write speeds, at least with SMB, which is what I've always used.

This should generally be true for any data not in ARC. Writes are happening to RAM. Reads have to go out to the pool to be read.

https://www.ixsystems.com/community/threads/performance-issues.73411/#post-565532

Johnnie Black · Jan 25, 2020

jgreco said:
This should generally be true for any data not in ARC. Writes are happening to RAM. Reads have to go out to the pool to be read.

No, I'm talking about large writes sustained speed, much more than can fit in RAM, for example my current FreeNAS pool (using older disks) can sustain writes of around 350MB/s, reads are at around 200MB/s.

In the past I've used many different pools, including SSD only pools, also with very different hardware, and it was always the same, these are from memory so might not be 100% accurate but it will be close:

8 HDD RAIDZ2 pool - writes around 600MB/s, reads around 400MB/s
8 SSD RAIDZ2 pool - writes around 900MB/s, reads around 600MB/s

jgreco · Jan 25, 2020

Johnnie Black said:
No, I'm talking about large writes sustained speed, much more than can fit in RAM, for example my current FreeNAS pool (using older disks) can sustain writes of around 350MB/s, reads are at around 200MB/s.

In the past I've used many different pools, including SSD only pools and it was always the same, these are from memory so might not be 100% accurate but it will be close:

8 HDD RAIDZ2 pool - writes around 600MB/s, reads around 400MB/s
8 SSD RAIDZ2 pool - writes around 900MB/s, reads around 600MB/s

I included a link. Please review the link. I'm *also* talking about large writes. The speed comes from writing to RAM. This allows ZFS to stage the writes as fast as it can, before pushing it out to the pool at what is hopefully the maximum speed that is possible for the pool. For reads, if the data isn't already in ARC, then ZFS has to pull from pool, which causes a slight delay.

Johnnie Black · Jan 25, 2020

I'm sorry but that doesn't make sense to me, large writes should be limited by device speed, FreeNAS can't sustain writes faster than RAM can be emptied to the actual devices, no matter how much staging is done, large reads should be limited by device read speed, device read speed is usually similar to write speed, it can even be faster for SSDs, so I would expect to get similar sustained speeds for reads/writes, as I do when using other NAS OSes.

jgreco · Jan 25, 2020

Ok. Fine.

So you're writing a terabyte to your NAS that has 16GB of RAM. By default, 1/8th RAM is used for transaction groups. Now this is actually complicated by the write throttle and newer developments, but just for the sake of explanation, we're going to say that ZFS has 2GB of RAM available to it for the purposes we are discussing. We're also going to ignore one txg state, quiescing, which isn't hugely relevant here. This leaves us with up to 1GB for the open txg and 1GB for the syncing txg.

You write 1GB into the open txg. This happens at full RAM speed, as long as ZFS is able to allocate space, which means you need that free space metadata available in the ARC. But basically as long as ZFS can figure out where it intends to put the data in the pool, you are in a very tight loop of "read-data-from-client"->"write-data-to-RAM" until the open transaction group is full.

Now, as long as there isn't already a syncing transaction group in progress, this open transaction group switches to "syncing" status. Now ZFS has a list of things to write to the pool, and it is a large list. It can be very aggressive about it, and write to all component devices simultaneously. So if you have eight component devices that can each handle 150MBytes/sec, ZFS could conceivably be writing out to the pool at 1200MBytes/sec, assuming all the writes are for contiguous LBA's.

Now here's the thing. WHILE THAT IS HAPPENING, ZFS has opened a new txg and is continuing to accept data from the client at full RAM speed into the new open-state txg.

Default time period for a txg is five seconds, and a 10Gbps link can be dumping 1250MBytes/sec at you, so you could potentially gather up to 6250MBytes within a 5 second txg group. So if your pool can maintain that, you get massive speed. If your pool cannot maintain that, ZFS locks out an open txg that wants to transition to syncing, until the currently syncing txg is completed.

So your misunderstanding here is that you need to recognize that in the case of writes, all your disks can get hit simultaneously because ZFS has cached up a bunch of traffic. It isn't writing a hundred blocks to HDD 0, finishing, then moving on to a hundred blocks to HDD 1, finishing, then writing a hundred blocks to HDD 2, then HDD 3, etc. If it was, then you'd be right, it'd be limited to the speed of a single device. But it's flooding all of them simultaneously. It has a massive crapton of data in the transaction group and no reason not to. So you're limited to device speed TIMES the number of devices. And that's the theoretical limit. In practice it isn't quite that good.

Once you get THAT, then you see that reads are a problem. If ZFS has to go out to the pool to retrieve the data, there's no way reads from the pool can compete with the speed of writes to RAM. ZFS can (and does) try to analyze what's going on to see if it should be doing read-ahead, but it doesn't have a crystal ball.

Johnnie Black · Jan 27, 2020

jgreco said:
So you're limited to device speed TIMES the number of devices. And that's the theoretical limit. In practice it isn't quite that good.

Yes, I understand that, I should have said writes are limited by the total device speed in the pool, so once RAM needs to be flushed speed will be limited by total pool write speed, but so should the reads, reads are also done from all devices in the pool, at the time I tested the same pool with zfs and btrfs and while write speed for both was the about the same, limited by the devices used in the pool, read speed with btrfs was very close to write speed, a little faster IIRC, while it was noticeable slower with zfs, same hardware, same pool config (or as similar as you can get, e.g. raidz1 vs raid5), no RAM cache since transfer was much larger than server RAM.

I'll add that this is more likely a ZFS thing, not FreeNAS, and likely it will depend on pool width/config, someday when I have some time I'll do some tests with various pool sizes/configurations, and sorry @russotron for the hijack.

jgreco · Jan 27, 2020

No, you're still not getting it.

The writes, there's a GIGABYTE of fully-concurrent data sitting there waiting to be written. ZFS can write all of that in parallel to all the drives ALL AT ONCE.

When you are READING, this isn't the case. You ask ZFS for some data. ZFS either has it in cache (yay speed-of-RAM) or doesn't. Because most of the time it doesn't, it then has to go to a disk and read it. Because ZFS does not read-ahead an infinite amount, it only reads a certain amount. During that read, it is paused to the client reading the data. This might well only be something on the order of a megabyte. So you read a megabyte, can return that to the client, and when you get to the end of that transmit-to-client, and the client wants more, another read call is issued, which causes the system to go back out to disk, repeat. This has obviously got a lot more latency in it.

Important Announcement for the TrueNAS Community.

10 gigabit network slow speeds

Cadet

Resident Grinch

Contributor

Resident Grinch

Cadet

Cadet

Contributor

Cadet

Cadet

Cadet

Cadet

Guru

Resident Grinch

Guru

Resident Grinch

Guru

Resident Grinch

Guru

Resident Grinch

Similar threads