Shouldn't this run faster?

Status
Not open for further replies.

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
Hello everyone! I am troubleshooting a FreeNAS performance issue and was hoping someone out there could provide insight.

First, here is my single zPool setup:

Code:
 pool: zPool01
 state: ONLINE
  scan: none requested
config:

		NAME											STATE	 READ WRITE CKSUM
		zPool01										 ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/9144804f-2d70-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/926ecc51-2d70-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
		  mirror-1									  ONLINE	   0	 0	 0
			gptid/3336bb1e-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/34124ad5-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
		  mirror-2									  ONLINE	   0	 0	 0
			gptid/6f4654c3-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/701c8b6f-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
		  mirror-3									  ONLINE	   0	 0	 0
			gptid/7adbb06f-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/7bbe348a-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0

errors: No known data errors


Each drive is a WD 2 TB RED drive. Model WD20EFRX.. Rough math, I am expecting each mirror to provide a sustained transfer speed of about 60 MB/s. Depends on the workload I realize, but that is a rough guide.. Let me know if I am miss informed this ar.

So, by 60 MB/s per mirror, I should be able to push 140 MB/s total. Or 1.92 Gigabit.

So, when I do an internal speed test using the following commands I am seeing some interesting results:

Code:
root@TYL-NAS01:/mnt/zPool01 # dd if=/dev/zero of=testfile bs=1024 count=3000000
3000000+0 records in
3000000+0 records out
3072000000 bytes transferred in 16.906389 secs (181706453 bytes/sec)

root@TYL-NAS01:/mnt/zPool01 # dd if=testfile of=/dev/zero bs=1024 count=300000
300000+0 records in
300000+0 records out
307200000 bytes transferred in 6.937014 secs (44284184 bytes/sec)

-rw-r--r--  1 root  wheel   2.9G Mar 26 11:04 testfile


It seems this command wrote and read a 3 Gb test file and came up with 1.45 Gigabit write and 0.3542 Gigabit read.. Huh? Shouldn't read be much faster than write? Am I not interpreting these numbers correctly?

The FreeNAS server has 32 Gb RAM and is run by an I7 2nd gen processor. Using a new LSI SAS controller (9207-8I) if memory serves.

If I take this a step further and use a VM running on this zPool connected via iSCSI to FreeNAS using round robin <multipath IO> load balancing between 2, 1 Gb NICs I am getting about 103 MB/s write and up to 140 MB/s read. Doesn't seem to be consistent with the local FreeNAS results. And no where near the 1.92 Gb the drives should* be capable of. That isn't even considering their cache ability to burst.

When I attempt to perform a "storage vMotion" job between a "different" storage device and this FreeNAS box, take a look at the throughput metrics on the NIC links.

This output is reading from other storage device and writing to FreeNAS.
upload_2018-3-26_11-18-7.png


As you can see from the screenshot vmnic2 and vmnic3 are receiving just over 860 Mbps (RX) from the "other" storage device and writing (TX) it to the FreeNAS storage device using vmnic6 and vmnic7. Two completely separate 2 Gb links for each storage device perfectly load balanced. Only, each link is performing at about half of what it can...

-Adam
 

millst

Contributor
Joined
Feb 2, 2015
Messages
141
Read the sticky here on benchmarks/performance (top of Storage section) and try your tests again.

-tm
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Shouldn't read be much faster than write
Not if you're writing to a buffer in RAM. Besides, zeros compress really well.

Benchmarking ZFS is really hard. Free space and fragmentation make a huge difference.

Also, you writing into /dev/zero, which is a very weird thing to do. The bitbucket is at /dev/null, /dev/zero is a source of zeros.
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
Not if you're writing to a buffer in RAM. Besides, zeros compress really well.

Benchmarking ZFS is really hard. Free space and fragmentation make a huge difference.

Also, you writing into /dev/zero, which is a very weird thing to do. The bitbucket is at /dev/null, /dev/zero is a source of zeros.

I just pulled that command from the web. So you are saying that the commands should look like this?


Code:
root@TYL-NAS01:/mnt/zPool01 # dd if=/dev/null of=testfile bs=1024 count=3000000

root@TYL-NAS01:/mnt/zPool01 # dd if=testfile of=/dev/null bs=1024 count=3000000
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
Not if you're writing to a buffer in RAM. Besides, zeros compress really well.

I assume you mean that using SLOG would be the way to go... I realize an enterprise SSD or NVMe device would be ideal, but this seemed really slow for the native drives... Here is the free space info on the pool. I have 32 Gb RAM in the server which should be more than enough compared to what I've read for sizing.
Code:
			   capacity	 operations	bandwidth
pool		alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
freenas-boot  1.11G   118G	  0	  0  2.55K	123
zPool01	 1.56T  5.69T	417	821  5.40M  10.9M

 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
Benchmarking ZFS is really hard. Free space and fragmentation make a huge difference.

"wiping your bum with a hook for a hand is really hard".. Lol... So would a way to make sure the pool is in its healthiest state to run a "scrub" first? is that the only required maintenance task to reduce fragmentation? I think free space is probably fine on this pool. 2 Tb used, 5.69 Tb free.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I just pulled that command from the web. So you are saying that the commands should look like this?
The second one is fine, but the first one is wrong, you're not supposed to read from /dev/null, you read from /dev/zero.

I assume you mean that using SLOG would be the way to go...
No, not at all.

but this seemed really slow for the native drives
That's because it is really slow.

So would a way to make sure the pool is in its healthiest state
It's empty enough for performance not to suck. So, there's a problem elsewhere. Could be a failing drive slowing everything down... What's the output of smartctl -a /dev/adaX for all drives?
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
It's empty enough for performance not to suck. So, there's a problem elsewhere. Could be a failing drive slowing everything down... What's the output of smartctl -a /dev/adaX for all drives?

Exactly what I was thinking.. Results in attached txt file. Couldn't paste it all into thread.
 

Attachments

  • SMART.txt
    42.2 KB · Views: 474

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
The second one is fine, but the first one is wrong, you're not supposed to read from /dev/null, you read from /dev/zero.

Okay, so the commands should look like this?

Code:
dd if=/dev/zero of=testfile bs=1024 count=3000000
dd if=testfile of=/dev/null bs=1024 count=3000000


They seemed to run and work. Here was the result:
3072000000 bytes transferred in 16.677444 secs (184200884 bytes/sec) or 1.473607072 Gigabit
3072000000 bytes transferred in 68.607682 secs (44776327 bytes/sec) or 0.358 Gigabit

Again, what is with the read performance? That is the second value correct? First commands writes the file and the second reads it back...?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The drives don't report any errors, but that doesn't mean much because they've never had a single SMART test run on them. Run a long test on all drives using smartctl -t long /dev/adaX and wait a few hours for them to finish.
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
The drives don't report any errors, but that doesn't mean much because they've never had a single SMART test run on them. Run a long test on all drives using smartctl -t long /dev/adaX and wait a few hours for them to finish.

Thanks for your help. I am still pretty new to this. The long test will hammer performance while it is running? does it make sense to run a scrub too? Pool was just created, so not sure if that makes sense or not.

what is the easiest way to queue each smartctl -t long /dev/adaX command? Do I need to keep the SSH session live while it is running or is it something I can just kick off and check status later? Command to check status is simply smartctl -a /dev/adaX?

-Adam
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
does it make sense to run a scrub too?
Not simultaneously.
what is the easiest way to queue each smartctl -t long /dev/adaX command? Do I need to keep the SSH session live while it is running or is it something I can just kick off and check status later? Command to check status is simply smartctl -a /dev/adaX?
The disks run the test internally, you just have to issue it. -a will report the current status of the active test, if any.
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
Not simultaneously.

The disks run the test internally, you just have to issue it. -a will report the current status of the active test, if any.

Got it.. Looks like the disks remain "online" and available to the pool while they are running the test too. I was worried because of the "offline" language the test used. I ran it on only one disk in each mirror as a precaution. Doesn't look like that was necessary.

Code:
Self-test execution status:	  ( 249) Self-test routine in progress...
										90% of test remaining.
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
Okay, so the commands should look like this?
Code:
dd if=/dev/zero of=testfile bs=1024 count=3000000
dd if=testfile of=/dev/null bs=1024 count=3000000

No, these will be very slow due to the tiny block size. bs in this case amounts to buffer size. The classic analogy is draining a well with a teaspoon instead of a bucket. Use at least a 64K buffer. Many people use 1M, which is fine. And you can enter those values that way, bs=64k or bs=1m.
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
No, these will be very slow due to the tiny block size. bs in this case amounts to buffer size. The classic analogy is draining a well with a teaspoon instead of a bucket. Use at least a 64K buffer. Many people use 1M, which is fine. And you can enter those values that way, bs=64k or bs=1m.

Wow. That yielded some seriously different results.

Creating and reading a 3 Gb file:
Code:
dd if=/dev/zero of=testfile bs=1m count=3000
dd if=testfile of=/dev/null bs=1m count=3000

3145728000 bytes transferred in 0.967769 secs (3250496374 bytes/sec) or 26 gigabit
3145728000 bytes transferred in 0.420830 secs (7475057679 bytes/sec) or 59 gigabit


That seeems odd. Let's try with a 20 Gb file to see if it equalizes some.
Code:
dd if=/dev/zero of=testfile bs=1m count=20000

20971520000 bytes transferred in 4.515265 secs (4644582270 bytes/sec) or 37 gigabit
dd if=testfile of=/dev/null bs=1m count=20000

20971520000 bytes transferred in 2.827612 secs (7416688466 bytes/sec) or 59 gigabit



Hmm.. These numbers seem to high now.... Now what?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
You need to turn off compression on the dataset you're writing on for the numbers to be meaningful.
 

Adam Tyler

Explorer
Joined
Oct 19, 2015
Messages
67
You need to turn off compression on the dataset you're writing on for the numbers to be meaningful.

Interesting. You'd think having compression enabled would slow things down.

You are saying that lz4 needs to be turned off here?
upload_2018-3-26_19-37-10.png


Here are the results after disabling lz4 on the dataset:
Code:
dd if=/dev/zero of=testfile bs=1m count=15000
15728640000 bytes transferred in 28.206669 secs (557621326 bytes/sec) or 4.4 gigabit

dd if=testfile of=/dev/null bs=1m count=15000
15728640000 bytes transferred in 3.990380 secs (3941639589 bytes/sec) or 31.5 gigabit



This seems a bit more in line.. At least the write is slower than the read which is what you'd expect. Read still seems higher than possible. Anything else I am missing?

If this does prove to some extent the array internally at least is performing as expected, what would be the best way to troubleshoot why the 2Gb possible iSCSI link is only performing at 800 Mbps.?
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
"wiping your bum with a hook for a hand is really hard"
My daughters love that show.
So would a way to make sure the pool is in its healthiest state to run a "scrub" first? is that the only required maintenance task to reduce fragmentation?
There is no defrag for ZFS. Scrub just checks for errors.
Interesting. You'd think having compression enabled would slow things down.
No, because the CPU handles the compression and it is much faster than the drives.
 
Status
Not open for further replies.
Top