Shouldn't this run faster?

Adam Tyler · Mar 26, 2018

Hello everyone! I am troubleshooting a FreeNAS performance issue and was hoping someone out there could provide insight.

First, here is my single zPool setup:

Code:

 pool: zPool01
 state: ONLINE
  scan: none requested
config:

		NAME											STATE	 READ WRITE CKSUM
		zPool01										 ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/9144804f-2d70-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/926ecc51-2d70-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
		  mirror-1									  ONLINE	   0	 0	 0
			gptid/3336bb1e-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/34124ad5-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
		  mirror-2									  ONLINE	   0	 0	 0
			gptid/6f4654c3-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/701c8b6f-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
		  mirror-3									  ONLINE	   0	 0	 0
			gptid/7adbb06f-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0
			gptid/7bbe348a-2d74-11e8-8f45-003048b3d5b8  ONLINE	   0	 0	 0

errors: No known data errors

Each drive is a WD 2 TB RED drive. Model WD20EFRX.. Rough math, I am expecting each mirror to provide a sustained transfer speed of about 60 MB/s. Depends on the workload I realize, but that is a rough guide.. Let me know if I am miss informed this ar.

So, by 60 MB/s per mirror, I should be able to push 140 MB/s total. Or 1.92 Gigabit.

So, when I do an internal speed test using the following commands I am seeing some interesting results:

Code:

root@TYL-NAS01:/mnt/zPool01 # dd if=/dev/zero of=testfile bs=1024 count=3000000
3000000+0 records in
3000000+0 records out
3072000000 bytes transferred in 16.906389 secs (181706453 bytes/sec)

root@TYL-NAS01:/mnt/zPool01 # dd if=testfile of=/dev/zero bs=1024 count=300000
300000+0 records in
300000+0 records out
307200000 bytes transferred in 6.937014 secs (44284184 bytes/sec)

-rw-r--r--  1 root  wheel   2.9G Mar 26 11:04 testfile

It seems this command wrote and read a 3 Gb test file and came up with 1.45 Gigabit write and 0.3542 Gigabit read.. Huh? Shouldn't read be much faster than write? Am I not interpreting these numbers correctly?

The FreeNAS server has 32 Gb RAM and is run by an I7 2nd gen processor. Using a new LSI SAS controller (9207-8I) if memory serves.

If I take this a step further and use a VM running on this zPool connected via iSCSI to FreeNAS using round robin <multipath IO> load balancing between 2, 1 Gb NICs I am getting about 103 MB/s write and up to 140 MB/s read. Doesn't seem to be consistent with the local FreeNAS results. And no where near the 1.92 Gb the drives should* be capable of. That isn't even considering their cache ability to burst.

When I attempt to perform a "storage vMotion" job between a "different" storage device and this FreeNAS box, take a look at the throughput metrics on the NIC links.

This output is reading from other storage device and writing to FreeNAS.

As you can see from the screenshot vmnic2 and vmnic3 are receiving just over 860 Mbps (RX) from the "other" storage device and writing (TX) it to the FreeNAS storage device using vmnic6 and vmnic7. Two completely separate 2 Gb links for each storage device perfectly load balanced. Only, each link is performing at about half of what it can...

-Adam

millst · Mar 26, 2018

Read the sticky here on benchmarks/performance (top of Storage section) and try your tests again.

-tm

Ericloewe · Mar 26, 2018

Adam Tyler said:
Shouldn't read be much faster than write

Not if you're writing to a buffer in RAM. Besides, zeros compress really well.

Benchmarking ZFS is really hard. Free space and fragmentation make a huge difference.

Also, you writing into /dev/zero, which is a very weird thing to do. The bitbucket is at /dev/null, /dev/zero is a source of zeros.

Adam Tyler · Mar 26, 2018

millst said:
Read the sticky here on benchmarks/performance (top of Storage section) and try your tests again.

-tm

Sorry, I am an idiot.. "Read the sticky"? Do you have a link?

Adam Tyler · Mar 26, 2018

Ericloewe said:
Not if you're writing to a buffer in RAM. Besides, zeros compress really well.

Benchmarking ZFS is really hard. Free space and fragmentation make a huge difference.

Also, you writing into /dev/zero, which is a very weird thing to do. The bitbucket is at /dev/null, /dev/zero is a source of zeros.

I just pulled that command from the web. So you are saying that the commands should look like this?

Code:

root@TYL-NAS01:/mnt/zPool01 # dd if=/dev/null of=testfile bs=1024 count=3000000

root@TYL-NAS01:/mnt/zPool01 # dd if=testfile of=/dev/null bs=1024 count=3000000

Adam Tyler · Mar 26, 2018

Ericloewe said:
Not if you're writing to a buffer in RAM. Besides, zeros compress really well.

I assume you mean that using SLOG would be the way to go... I realize an enterprise SSD or NVMe device would be ideal, but this seemed really slow for the native drives... Here is the free space info on the pool. I have 32 Gb RAM in the server which should be more than enough compared to what I've read for sizing.

Code:

			   capacity	 operations	bandwidth
pool		alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
freenas-boot  1.11G   118G	  0	  0  2.55K	123
zPool01	 1.56T  5.69T	417	821  5.40M  10.9M

Adam Tyler · Mar 26, 2018

Ericloewe said:
Benchmarking ZFS is really hard. Free space and fragmentation make a huge difference.

"wiping your bum with a hook for a hand is really hard".. Lol... So would a way to make sure the pool is in its healthiest state to run a "scrub" first? is that the only required maintenance task to reduce fragmentation? I think free space is probably fine on this pool. 2 Tb used, 5.69 Tb free.

Ericloewe · Mar 26, 2018

Adam Tyler said:
I just pulled that command from the web. So you are saying that the commands should look like this?

The second one is fine, but the first one is wrong, you're not supposed to read from /dev/null, you read from /dev/zero.

Adam Tyler said:
I assume you mean that using SLOG would be the way to go...

No, not at all.

Adam Tyler said:
but this seemed really slow for the native drives

That's because it is really slow.

Adam Tyler said:
So would a way to make sure the pool is in its healthiest state

It's empty enough for performance not to suck. So, there's a problem elsewhere. Could be a failing drive slowing everything down... What's the output of smartctl -a /dev/adaX for all drives?

Adam Tyler · Mar 26, 2018

Ericloewe said:
It's empty enough for performance not to suck. So, there's a problem elsewhere. Could be a failing drive slowing everything down... What's the output of smartctl -a /dev/adaX for all drives?

Exactly what I was thinking.. Results in attached txt file. Couldn't paste it all into thread.

Adam Tyler · Mar 26, 2018

Ericloewe said:
The second one is fine, but the first one is wrong, you're not supposed to read from /dev/null, you read from /dev/zero.

Okay, so the commands should look like this?

Code:

dd if=/dev/zero of=testfile bs=1024 count=3000000
dd if=testfile of=/dev/null bs=1024 count=3000000

They seemed to run and work. Here was the result:
3072000000 bytes transferred in 16.677444 secs (184200884 bytes/sec) or 1.473607072 Gigabit
3072000000 bytes transferred in 68.607682 secs (44776327 bytes/sec) or 0.358 Gigabit

Again, what is with the read performance? That is the second value correct? First commands writes the file and the second reads it back...?

Ericloewe · Mar 26, 2018

The drives don't report any errors, but that doesn't mean much because they've never had a single SMART test run on them. Run a long test on all drives using smartctl -t long /dev/adaX and wait a few hours for them to finish.

Adam Tyler · Mar 26, 2018

Ericloewe said:
The drives don't report any errors, but that doesn't mean much because they've never had a single SMART test run on them. Run a long test on all drives using smartctl -t long /dev/adaX and wait a few hours for them to finish.

Thanks for your help. I am still pretty new to this. The long test will hammer performance while it is running? does it make sense to run a scrub too? Pool was just created, so not sure if that makes sense or not.

what is the easiest way to queue each smartctl -t long /dev/adaX command? Do I need to keep the SSH session live while it is running or is it something I can just kick off and check status later? Command to check status is simply smartctl -a /dev/adaX?

-Adam

Ericloewe · Mar 26, 2018

Adam Tyler said:
does it make sense to run a scrub too?

Not simultaneously.

Adam Tyler said:
what is the easiest way to queue each smartctl -t long /dev/adaX command? Do I need to keep the SSH session live while it is running or is it something I can just kick off and check status later? Command to check status is simply smartctl -a /dev/adaX?

The disks run the test internally, you just have to issue it. -a will report the current status of the active test, if any.

Adam Tyler · Mar 26, 2018

Ericloewe said:
Not simultaneously.

The disks run the test internally, you just have to issue it. -a will report the current status of the active test, if any.

Got it.. Looks like the disks remain "online" and available to the pool while they are running the test too. I was worried because of the "offline" language the test used. I ran it on only one disk in each mirror as a precaution. Doesn't look like that was necessary.

Code:

Self-test execution status:	  ( 249) Self-test routine in progress...
										90% of test remaining.

wblock · Mar 26, 2018

Adam Tyler said:
Okay, so the commands should look like this?

Code:
dd if=/dev/zero of=testfile bs=1024 count=3000000 dd if=testfile of=/dev/null bs=1024 count=3000000

No, these will be very slow due to the tiny block size. bs in this case amounts to buffer size. The classic analogy is draining a well with a teaspoon instead of a bucket. Use at least a 64K buffer. Many people use 1M, which is fine. And you can enter those values that way, bs=64k or bs=1m.

rs225 · Mar 26, 2018

Just note that if you increase the bs, you need to lower the count by the same factor.

Adam Tyler · Mar 26, 2018

wblock said:
No, these will be very slow due to the tiny block size. bs in this case amounts to buffer size. The classic analogy is draining a well with a teaspoon instead of a bucket. Use at least a 64K buffer. Many people use 1M, which is fine. And you can enter those values that way, bs=64k or bs=1m.

Wow. That yielded some seriously different results.

Creating and reading a 3 Gb file:

Code:

dd if=/dev/zero of=testfile bs=1m count=3000
dd if=testfile of=/dev/null bs=1m count=3000

3145728000 bytes transferred in 0.967769 secs (3250496374 bytes/sec) or 26 gigabit
3145728000 bytes transferred in 0.420830 secs (7475057679 bytes/sec) or 59 gigabit

That seeems odd. Let's try with a 20 Gb file to see if it equalizes some.

Code:

dd if=/dev/zero of=testfile bs=1m count=20000

20971520000 bytes transferred in 4.515265 secs (4644582270 bytes/sec) or 37 gigabit
dd if=testfile of=/dev/null bs=1m count=20000

20971520000 bytes transferred in 2.827612 secs (7416688466 bytes/sec) or 59 gigabit

Hmm.. These numbers seem to high now.... Now what?

Ericloewe · Mar 26, 2018

You need to turn off compression on the dataset you're writing on for the numbers to be meaningful.

Adam Tyler · Mar 26, 2018

Ericloewe said:
You need to turn off compression on the dataset you're writing on for the numbers to be meaningful.

Interesting. You'd think having compression enabled would slow things down.

You are saying that lz4 needs to be turned off here?

Here are the results after disabling lz4 on the dataset:

Code:

dd if=/dev/zero of=testfile bs=1m count=15000
15728640000 bytes transferred in 28.206669 secs (557621326 bytes/sec) or 4.4 gigabit

dd if=testfile of=/dev/null bs=1m count=15000
15728640000 bytes transferred in 3.990380 secs (3941639589 bytes/sec) or 31.5 gigabit

This seems a bit more in line.. At least the write is slower than the read which is what you'd expect. Read still seems higher than possible. Anything else I am missing?

If this does prove to some extent the array internally at least is performing as expected, what would be the best way to troubleshoot why the 2Gb possible iSCSI link is only performing at 800 Mbps.?

Chris Moore · Mar 26, 2018

Adam Tyler said:
"wiping your bum with a hook for a hand is really hard"

My daughters love that show.

Adam Tyler said:
So would a way to make sure the pool is in its healthiest state to run a "scrub" first? is that the only required maintenance task to reduce fragmentation?

There is no defrag for ZFS. Scrub just checks for errors.

Adam Tyler said:
Interesting. You'd think having compression enabled would slow things down.

No, because the CPU handles the compression and it is much faster than the drives.

Important Announcement for The TrueNAS Community.

Shouldn't this run faster?

Explorer

Contributor

Server Wrangler

Explorer

Explorer

Explorer

Explorer

Server Wrangler

Explorer

Attachments

Explorer

Server Wrangler

Explorer

Server Wrangler

Explorer

Documentation Engineer

Guru

Explorer

Server Wrangler

Explorer

Hall of Famer

Similar threads

Important Announcement for The TrueNAS Community.