Yet Another ZFS Tuning Thread

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Just as an example of ZFS's caching:

Someone in the forum was complaining because they set their hard drives to go to sleep after 30 mins. They use their server to stream movies and they were watching movies and part way through the movie it would freeze 30 seconds or so because the file had been cached up to this point and the drives went to sleep due to inactivity.

I can't remember the command, but if you watch the read rates from your zpool if you start reading a very large file you'll see 200MB/sec+ for the first few seconds you read the file. That's ZFS reading ahead on the file.

Edit: Sorry about sounding aggressive on my previous posts. Your complaint about slow network performance is EXACTLY what us senior members complain about all the time. People didn't read or ignored the manual, the FAQ, and my guide. Then come posting in the forum about their low performance. Well gee, if we say in as many places as we can that you need something for maximum performance and you ignore it what else should we do? Tell you AGAIN? If you haven't listened the other 3 times why would we bother spending time to tell you a 4th time? Yes, some people STILL insist that Realtek are great cards etc. One of the forum admins posted a link to a Realtek quad-port NIC and said "Why.. WHY would someone do this to themselves?"

Honestly, if I had known you were complaining about network performance in post #1 I might never have responded because you should have known better. You only got additional responses from me because I was already discussing it with you and I didn't feel like giving you a "RTFM" answer and leaving you hanging high and dry.
 

TheSmoker

Patron
Joined
Sep 19, 2012
Messages
225
Thanks for taking the time. That's something I am always greatfull as it is not coming back.

Managed to create the memory disk:
/dev/md3a 19G 8.0k 18G 0% /mnt/tank3

The problem is that I cannot point the samba share it. The error that I got from GUI is "The path must reside within a volume mount poin".

Also for the record, I do not want to increase the RTL performance, I merely want that crappy RTL performance to be met by the disk/zfs subsistem. That's all. I am very aware that RTL network card cannot do more, I just want it to be as close as possible to the performance I get from it in iperf, minus the overhead, of course.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, your disk can already do over 200MB/sec. That's not possible with Gb LAN. The closest you're going to get is an Intel NIC. You should be able to get 100MB/sec+ with Intel NICs.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Can you create a symbolic link to /mnt/whateveryourzfsvolis/RAM or something like that?

Also, can you post the commands you used? I'd like to try this on my 2 FreeNAS servers ;) Maybe we can do some interesting data gathering.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
There's NO way it's filling 10G of RAM with a write cache, then flushing it to disk. In fact, if the write cache was over 1GB I'd be shocked. There's a calculation somewhere in the forum that will tell you roughly what your write cache size is. I want to say it was 6x the estimated performance of your zpool but also had a time limit too.

CAUTION: This is largely what bug #1531 is all about. Prepare to be shocked. He said his system had 32GB. By default, vfs.zfs.write_limit_max sizes to 1/8th the system's memory ... that's 4GB. That's no guarantee that 4GB will be used, because ZFS tries to feel its way around for the right limit, but on an idle system, and suddenly writes happen, it'll probably shoot out to 4GB and then sit there in txg flush for quite some time. I had it doing something like 3 minutes per stall.

You don't want to be making this buffer larger than what can be flushed out to disk in a small number of seconds. Or it HURTS.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, I think that's what he's trying to do hoping that he'll see a performance increase from it. But the real problem is his Realtek NIC. His zpool can smoke Gb speeds so the limitation is almost certainly his LAN card. This is another posterchild of someone expecting more from Realtek.
 

TheSmoker

Patron
Joined
Sep 19, 2012
Messages
225
Managed to create the ramdisk and shared it via CIFS. Prepare to be amazed:
Writes: 85MBps
Reads: 86MBps
test read-write.jpg

First one is writing to FreeNAS second one is reading from FreeNAS as parte of the verify process. Those numbers comes quite close to the abysmal performance from the RTL network card&drivers. Can I manage to match it with ZFS reads/writes?

Where is the bad juju? :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Managed to create the ramdisk and shared it via CIFS. Prepare to be amazed:
Writes: 85MBps
Reads: 96MBps

Where is the bad juju? :)

Your network card. Can you provide the commands you used? I have a setup I can test this on right now with 2 Intel NICs. :D

Edit: If you do a DD of the RAM drive what do you get? LOL

The speeds you got are pretty much exactly what I thought you'd get. I get better speeds with my zpool on a server with 12GB of RAM. :)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Just as an example of ZFS's caching:

Someone in the forum was complaining because they set their hard drives to go to sleep after 30 mins. They use their server to stream movies and they were watching movies and part way through the movie it would freeze 30 seconds or so because the file had been cached up to this point and the drives went to sleep due to inactivity.

I can't remember the command, but if you watch the read rates from your zpool if you start reading a very large file you'll see 200MB/sec+ for the first few seconds you read the file. That's ZFS reading ahead on the file.

I'm not aware of ANYTHING in ZFS that will do what you describe, at least not without an outside push. The DMU read-ahead is limited to 256 blocks. Something has to be asking ZFS for the data.

My guess is that what you're seeing is a userland app that has opened the file, and has maybe done a MADV_WILLNEED on a large chunk of the file, teasing the VM system into paging in a large hunk of data. That would be hard to differentiate from ZFS doing it, and it would have the result you mention.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Can we get you to run iperf please? It's really hard to make good conclusions about the big picture without knowing a little bit about the details. By establishing what each subsystem is capable of, it becomes easier to understand what is going on.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm not aware of ANYTHING in ZFS that will do what you describe, at least not without an outside push. The DMU read-ahead is limited to 256 blocks. Something has to be asking ZFS for the data.

My guess is that what you're seeing is a userland app that has opened the file, and has maybe done a MADV_WILLNEED on a large chunk of the file, teasing the VM system into paging in a large hunk of data. That would be hard to differentiate from ZFS doing it, and it would have the result you mention.

I see it every time I try to stream a movie to my HTPC. It reads a boatload of data for a few seconds.
 

TheSmoker

Patron
Joined
Sep 19, 2012
Messages
225
Your network card. Can you provide the commands you used? I have a setup I can test this on right now with 2 Intel NICs. :D

Edit: If you do a DD of the RAM drive what do you get? LOL

I've used TerraCopy to copy to and from FreeNAS an approximative of 14G of data, mostly video files which are bigger. My local host is a Windows 7 machine ALSO with a shitty RTL card. lol.
During the copy process the CPU usage of samba topped out at 28% out of a single core.

The dd for ramdrive:
Code:
[root@jukebox] /mnt/tank1/test# dd if=/dev/zero of=/mnt/tank1/test/test13G.dd bs=1m count=13000
13000+0 records in
13000+0 records out
13631488000 bytes transferred in 15.068816 secs (904615734 bytes/sec)
 

TheSmoker

Patron
Joined
Sep 19, 2012
Messages
225
Can we get you to run iperf please? It's really hard to make good conclusions about the big picture without knowing a little bit about the details. By establishing what each subsystem is capable of, it becomes easier to understand what is going on.

Iperf test:
Code:
C:\Users\PLEX\Downloads>iperf.exe -c 192.168.1.254 -w 64K -t 60
------------------------------------------------------------
Client connecting to 192.168.1.254, TCP port 5001
TCP window size: 64.0 KByte
------------------------------------------------------------
[156] local 192.168.1.103 port 57696 connected with 192.168.1.254 port 5001
[ ID] Interval       Transfer     Bandwidth
[156]  0.0-60.0 sec  5.88 GBytes   842 Mbits/secdd on disks test:


dd on hard drives:
Code:
[root@jukebox] /mnt/tank0# dd if=/dev/zero of=/mnt/tank0/test48G.dd bs=1m count=48000
48000+0 records in
48000+0 records out
50331648000 bytes transferred in 197.302184 secs (255099295 bytes/sec)


EDIT: Please keep in mind that the zpool is NOT empty!
 

TheSmoker

Patron
Joined
Sep 19, 2012
Messages
225
Your network card. Can you provide the commands you used? I have a setup I can test this on right now with 2 Intel NICs. :D

Edit: If you do a DD of the RAM drive what do you get? LOL

The speeds you got are pretty much exactly what I thought you'd get. I get better speeds with my zpool on a server with 12GB of RAM. :)

The problem is that I get those speeds with a ram drive not with ZFS although ZFS in practice can sustain twice the speed of a gigabit link ...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yes, but those ramdrive speeds are pathetic compared to my actual ZFS performance on my zpool with far less powerful hardware. The only thing I have that is superior to your system is I have 8 drives instead of 6 and an Intel NIC. The NIC makes a bigger difference than people think because crappy NICs like Realtek also require the CPU to handle the packet checksums(aka latency of the data from the CPU to the network card).

I'd really appreciate it if you could post the commands you used to create the RAM drive so I can do some speed tests on my spare FreeNAS server to my Windows box.
 

TheSmoker

Patron
Joined
Sep 19, 2012
Messages
225
Yes, but those ramdrive speeds are pathetic compared to my actual ZFS performance on my zpool with far less powerful hardware. The only thing I have that is superior to your system is I have 8 drives instead of 6 and an Intel NIC. The NIC makes a bigger difference than people think because crappy NICs like Realtek also require the CPU to handle the packet checksums(aka latency of the data from the CPU to the network card).

I'd really appreciate it if you could post the commands you used to create the RAM drive so I can do some speed tests on my spare FreeNAS server to my Windows box.

For creating the ramdrive:
mdmfs -s 16384m md5 /mnt/tank3/
Afterward I have unmounted it and mounted it under an already share directory just beacause I was lazy to add another CIFS share.
Copied files to and from it using TerraCopy. http://codesector.com/teracopy

Why do you say that the ramdrive speed is pathetic? I see there is does more than 900GBps per second and eating up 1 core of the CPU. BTW dd is not multithreaded. ;)
CPU usage:
Code:
12253 root          1 114    0  7744K  1824K CPU3    3   0:16 81.49% dd
dd for ramdisk:
Code:
[root@jukebox] /mnt/tank1/test# dd if=/dev/zero of=/mnt/tank1/test/test13G.dd bs=1m count=14000
14000+0 records in
14000+0 records out
14680064000 bytes transferred in 15.693831 secs (935403474 bytes/sec)


Also copying to and from ramdisk maximizes the network card. Which is quite OK with me. The question remains: why that does not happen when I do transfers on and from ZFS, because the rest of the subsistems can handle trasnfer of around 85MBps/850Mbps.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Why do you say that the ramdrive speed is pathetic? I see there is does more than 900GBps per second and eating up 1 core of the CPU. BTW dd is not multithreaded. ;)
CPU usage:
Code:
12253 root          1 114    0  7744K  1824K CPU3    3   0:16 81.49% dd
dd for ramdisk:
Code:
[root@jukebox] /mnt/tank1/test# dd if=/dev/zero of=/mnt/tank1/test/test13G.dd bs=1m count=14000
14000+0 records in
14000+0 records out
14680064000 bytes transferred in 15.693831 secs (935403474 bytes/sec)


Also copying to and from ramdisk maximizes the network card. Which is quite OK with me. The question remains: why that does not happen when I do transfers on and from ZFS, because the rest of the subsistems can handle trasnfer of around 85MBps/850Mbps.

85MBps is NOT equal to 850Mbps. 133MBps = 1000Mbps. https://www.google.com/search?q=MB+...s=org.mozilla:en-US:official&client=firefox-a

So in reality at 85MB/sec you are getting about 68% of your maximum network performance(680Mb/sec). That's TERRIBLE for what you SHOULD be getting. Most everyone that complains about their network speeds with a Realtek, if they aren't CPU or RAM bound, are 55-65% of Gb speeds(68MB/sec to 81MB/sec). You fall slightly above that range, but still not high enough to discredit the NIC as the cause. Not to mention I've seen multiple threads where people argue that their Realtek should do better.. blah blah blah. Untimately, if they buy an Intel NIC, problem goes away. That's wh I'm saying you need a better network card. Also, read below for more proof.

Also, since you are using a RAMdrive you are taking out the entire hard drive subsystem from the equation. Your only limitations will be how fast you can access the RAM and "overhead" from the file system. Your speeds are only limited by your RAM speed and RAM latency. In effect, you've created the closest thing you can to an infinitely fast disk for your computer. So anything less than your maximum speeds through your NIC means you have a bottleneck somewhere. Since your RAMdrive is "infinitely" fast, anything less than about 120MB/sec means something is wrong. So we're not talking about optimizing a RAM cache(we used a RAM drive for gosh sake!), needing more RAM(already used RAM!), needing a faster CPU(that's not a bottleneck for this situation either), nothing of the sort. Something is limiting your ability to receive and dump information to your network. Hint: I've mentioned it multiple times in this thread.

With all of this in mind, that is why I say your copy speeds to the RAMdrive over the network are terrible. You should be able to hit 100MB/sec without breaking a sweat. In fact, you should(and I am expecting to hit this) hit at least 125MB/sec. I'm hoping to get a screenshot of 130+MB/sec.

Overall, despite your "infinitely fast" drive, you still couldn't hit 100MB/sec. I do, with everything inferior(by a large margin too!) except I have 2 more drives and use Intel cards. I don't tune beyond Autotune, my CPU is circa 2008, I have only 12GB of RAM. But... I have Intel NICs. I bet I could post 10 threads in 15 minutes of people that saw a performance jump JUST by upgrading to Intel NICs. I've put the comments about Intel NICs in my guide because "it works". There may be some kind of tweak you can do to increase performance, but it will certainly be network tweaks and not protocol, hard drive, cache, etc. Your NIC(or network infrastructure) is your limitation, period. You proved it with your test to the RAMdrive.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I see it every time I try to stream a movie to my HTPC. It reads a boatload of data for a few seconds.

Right, but so what? Is that ZFS doing it for you, or is it your userland app that's sending data to your HTPC? Because I know when I write code that's expecting to read big long files, I make every effort to write the code such that I/O requests are minimized where possible. If the app is aware that it is going to need a lot more sequential data out of the file, it tells the operating system, and if the OS has free resources, you may see what appears to be substantial read-ahead. That doesn't mean that it is ZFS reading ahead. It could be the VM system, it could be the application.

It doesn't even have to be a suggestion by the userland program. A rude userland program can basically force lots of "read ahead" in a unified buffer cache operating system by memory mapping the file in question and then spawning a thread to run forward through the file; doing this every few seconds would virtually guarantee that the pages would be in-core by the time they were needed, and would provide a lot of responsiveness if you were to call for fast forwarding, etc.

So here's my challenge to you. Go to your ZFS filesystem. Locate a multi-hundred-meg file. Do a "dd if=thatfile.foo of=/dev/null bs=1048576 count=1" while also doing "zpool iostat 1" in another window.

The DMU prefetch only fetches up to 256 records, which, if I've figured it right, is 32MB. I'll bet that when you ran that command you saw some sort of read happen that was less than 32MB.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Right, but so what? Is that ZFS doing it for you, or is it your userland app that's sending data to your HTPC? Because I know when I write code that's expecting to read big long files, I make every effort to write the code such that I/O requests are minimized where possible. If the app is aware that it is going to need a lot more sequential data out of the file, it tells the operating system, and if the OS has free resources, you may see what appears to be substantial read-ahead. That doesn't mean that it is ZFS reading ahead. It could be the VM system, it could be the application.

It doesn't even have to be a suggestion by the userland program. A rude userland program can basically force lots of "read ahead" in a unified buffer cache operating system by memory mapping the file in question and then spawning a thread to run forward through the file; doing this every few seconds would virtually guarantee that the pages would be in-core by the time they were needed, and would provide a lot of responsiveness if you were to call for fast forwarding, etc.

So here's my challenge to you. Go to your ZFS filesystem. Locate a multi-hundred-meg file. Do a "dd if=thatfile.foo of=/dev/null bs=1048576 count=1" while also doing "zpool iostat 1" in another window.

The DMU prefetch only fetches up to 256 records, which, if I've figured it right, is 32MB. I'll bet that when you ran that command you saw some sort of read happen that was less than 32MB.

I'm pretty sure that's ZFS because less than 25MB are sent through the network before the file starts playing, but more than 800MB are read from the server before the file starts. Usually there's a 2-3 second delay between when the VLC window opens and when you see the video and hear audio. It COULD be that VLC needed certain info from certain parts of the file yielding the high read rates and low utilization of data.

I did the dd you requested and it read 17.4MB. I ran it a second time and got 0, which I assume means the data is in the cache?

Back to messing around with my network configuration!

So.. drumroll please! I did a copy from/to RAMdrive on FreeNAS. Here's what I got!
Untitled.png

133MB/sec! I'd LOVE to have those speeds all the time :D It fluctuates between 128 and 134MB/sec. Not too shabby.

So then I changed the FreeNAS server to the onboard Realtek and I got 91.3MB/sec.

Then with both my desktop and FreeNAS server using the onboard Realteks I got 87.5MB/sec.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The DMU prefetch only fetches up to 256 records, which, if I've figured it right, is 32MB. I'll bet that when you ran that command you saw some sort of read happen that was less than 32MB.

Is there a way to change that?
 
Status
Not open for further replies.
Top