Slow disk burn in tests with solnet array test script

MrJK · Aug 1, 2016

I have some strange results from some burn in tests of 4x WD Red 1Tb drives using this test script ( https://forums.freenas.org/index.php?resources/solnet-array-test.1/ ).

The box is an HP Proliant N40L 16Gb ECC RAM running Freenas 9.10. http://m.hp.com/h20195/v2/GetPDF.aspx/c04111079.pdf

There was nothing scary in the SMART tests or badblocks so I moved on to the solnet v2 test script. I have run the same test a few times, becoming increasingly frustrated, before deciding to be more scientific- recording and posting my results. These are the results from the parallel seek stress-test from three runs, moving the disks to different bays (cables) between each test.

Test 1 shows disk C50 on ada0 is the slowest and Y75 on ada2 is next. Fastest is DKA on ada3.

For Test 2 I moved DKA to the slowest bay, C50 to the fastest bay, and swapped the two middle ones. Test 2 shows C50 on ada3 is again the slowest disk by far. It must be a bad disk, right?

Let's do one more before I return it...

Test 3 I reordered the disks again and now C50 is the fastest disk.

This range of results is much wider than I had expected. If it's not the disk, what else might cause one (or two) disks to underperform the others by such a wide margin? Is this to be expected in tests like this?

Thanks for any suggestions.

Code:

TEST 1
======
ada0 WD-WCC4J4CKNC50
ada1 WD-WCC4J5KJAL7Z
ada2 WD-WCC4J5KJAY75
ada3 WD-WCC4J5KJADKA

Awaiting completion: initial parallel seek-stress array read
Fri Jul 1 06:08:08 PDT 2016
Completed: initial parallel seek-stress array read

Disk's average time is 29487 seconds per disk

Disk   Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
ada0       1000204886016   39078   133 --SLOW--
ada1       1000204886016   23656     80 ++FAST++
ada2       1000204886016   35930   122 --SLOW--
ada3       1000204886016   19285     65 ++FAST++


TEST 2
======
[root@freenas] ~# camcontrol identify ada0 | grep 'serial'
serial number         WD-WCC4J5KJADKA
[root@freenas] ~# camcontrol identify ada1 | grep 'serial'
serial number         WD-WCC4J5KJAY75
[root@freenas] ~# camcontrol identify ada2 | grep 'serial'
serial number         WD-WCC4J5KJAL7Z
[root@freenas] ~# camcontrol identify ada3 | grep 'serial'
serial number         WD-WCC4J4CKNC50

Awaiting completion: initial parallel seek-stress array read
Sat Jul 2 08:35:36 PDT 2016
Completed: initial parallel seek-stress array read

Disk's average time is 28306 seconds per disk

Disk   Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
ada0       1000204886016   22063     78 ++FAST++
ada1       1000204886016   19573     69 ++FAST++
ada2       1000204886016   22131     78 ++FAST++
ada3       1000204886016   49458   175 --SLOW--


TEST 3
======
[root@freenas] ~# camcontrol identify ada0 | grep 'serial'
serial number         WD-WCC4J5KJAL7Z
[root@freenas] ~# camcontrol identify ada1 | grep 'serial'
serial number         WD-WCC4J4CKNC50
[root@freenas] ~# camcontrol identify ada2 | grep 'serial'
serial number         WD-WCC4J5KJADKA
[root@freenas] ~# camcontrol identify ada3 | grep 'serial'
serial number         WD-WCC4J5KJAY75

Awaiting completion: initial parallel seek-stress array read
Tue Jul 5 06:10:24 PDT 2016
Completed: initial parallel seek-stress array read

Disk's average time is 26778 seconds per disk

Disk   Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
ada0       1000204886016   36073   135 --SLOW--
ada1       1000204886016   19194     72 ++FAST++
ada2       1000204886016   30929   116 --SLOW--
ada3       1000204886016   20916     78 ++FAST++

Binary Buddha · Aug 6, 2016

Are they in an active pool that's being written to by another process? I'm not expert but, I'd say that it's ZFS doing it's thing and you're undermining it by dd'ing the individual disks. I don't think ZFS likes to play nice with direct disk access tools like how the script is implementing. You don't actually interact with the drives in reality. That's why you make zfs volumes with pools and ZFS figures out the rest.

https://blogs.oracle.com/bonwick/entry/zfs_block_allocation

And remember ZFS is memory based; and CPU based depending on if you're using encryption and the level of compression. It stores all the data in memory and then writes it to disk when it can or needs to; in a general explanation. More memory means faster access. If you need more than the memory can provide, that's where the ZIL and L2ARC comes in. ZIL/SLOG is the "write cache" and L2ARC is the read cache. Both usually are on SSDs. ZIL usually writes the SLOG to the volume via copy-on-write. And depending on your redundancy setup, it's writing that data twice or more which may impact performance. That's where setting up an slog device comes in; reduces the amount of writes "at that time" that would have gone to the volume.

http://www.zfsbuild.com/2010/04/15/explanation-of-arc-and-l2arc/
http://www.freenas.org/blog/zfs-zil-and-slog-demystified/

MrJK · Aug 10, 2016

Thanks for the reply.

Just to clarify: the burn in was on a clean system immediately after a fresh install of freenas. There were no ZFS volumes, no data, and no other processes should have been touching the disks.

Binary Buddha · Aug 10, 2016

MrJK said:
Thanks for the reply.

Just to clarify: the burn in was on a clean system immediately after a fresh install of freenas. There were no ZFS volumes, no data, and no other processes should have been touching the disks.

Ok. So it's probably that you're subverting the ZFS process that I mentioned above. Aside from adding or removing the drives to the pool, you shouldn't really be interacting with drives directly. With ZFS read and write speeds are dependent on RAM then disk speed. It's not like UFS, EXT, Reiser, or MD drives where the speed is dependent on the controllers and drive RPMs because you're interacting with it at that level. Instead, you work with it more like it's LVM2 with pimped out spinner rims and NOS in the trunk.

Nick2253 · Aug 11, 2016

MrJK said:
I have run the same test a few times, becoming increasingly frustrated, before deciding to be more scientific- recording and posting my results. These are the results from the parallel seek stress-test from three runs, moving the disks to different bays (cables) between each test.

The most important question here is what are you trying to measure? If you're trying to measure disk speed, then I would not perform the test from FreeNAS, but rather something like PartedMagic that is designed to be completely hand-off from your drives. The irregularity of your tests indicate that there may be some activity in the background that is throwing off your results.

If you want to measure the pool speed, then that test is inappropriate, because it bypasses the filesystem. You can still use dd, but you'll need to write to or read from a file on the pool, not the devices themselves.

Binary Buddha said:
Ok. So it's probably that you're subverting the ZFS process that I mentioned above. Aside from adding or removing the drives to the pool, you shouldn't really be interacting with drives directly. With ZFS read and write speeds are dependent on RAM then disk speed. It's not like UFS, EXT, Reiser, or MD drives where the speed is dependent on the controllers and drive RPMs because you're interacting with it at that level. Instead, you work with it more like it's LVM2 with pimped out spinner rims and NOS in the trunk.

dd doesn't "subvert" anything related to ZFS. dd as used doesn't even care what the file system of the drive is. It's reading byte-for-byte directly from the device, completely bypassing the filesystem abstraction. In general, there's no problem with directly interacting with disks in this manner, you just have to understand what you're measuring.

Before I go any further, I have to comment on your ridiculous claims that ZFS is somehow completely different from any other filesystem in terms of disk speed. ZFS read and write speed is absolutely dependent on drive and controller speed. The distinction is that ZFS is sensitive to memory in a way that other file systems are not, such that insufficient memory can indeed slow down a ZFS pool in a way that would not slow down a single disk, but that is a far cry from claiming that ZFS read and write speeds are more dependent on RAM than disk speed. It other words, if you put SSDs in a machine without changing the memory, you bet your britches that your pool is going to get faster.

Binary Buddha said:
If you need more than the memory can provide, that's where the ZIL and L2ARC comes in. ZIL/SLOG is the "write cache" and L2ARC is the read cache. Both usually are on SSDs. ZIL usually writes the SLOG to the volume via copy-on-write. And depending on your redundancy setup, it's writing that data twice or more which may impact performance. That's where setting up an slog device comes in; reduces the amount of writes "at that time" that would have gone to the volume.

I would strongly suggest you re-read your sources, because your understanding of the ZIL/SLOG is warped at best, and wrong at worst. First off, a ZIL is not a write cache, in any sense of the word. It is an intent log. It is not about speed, but about data integrity. Every synchronous write to the pool is stored in the write cache (in memory), and first written to the ZIL. The writes are subsequently written from the write cache directly to disk as a transaction group (there's no writes directly from the ZIL to the pool, unless the machine loses power or other nastiness happens). If you add a SLOG, all you've done is move the ZIL to a separate device. Beyond the obvious benefit of reducing the number of writes to your pool (instead of one or more writes to the ZIL, along with a corresponding transaction group write, there's now only the transaction group write), the higher speed of the SLOG means that the latency between synchronous write request and confirmation of write to the client is much reduced. Also, the higher speed of the SLOG means that more data can be received by the SLOG from the client (compared to the same time with an on-pool ZIL), which means your write cache contains more data, which in turn means that each transaction group write actually include more data. But I'll reiterate: every bit of synchonous data is written both to the ZIL and pool. Adding a SLOG does not change that whatsoever. It just changes which underlying device receives the ZIL.

Also, the need for a SLOG has nothing to do with the amount of memory you have in your system; instead, it has everything to do with your workload, and the speed of your pool. If you have few to no synchronous writes, or your synchronous writes don't need to be fast, then you probably don't need a SLOG. If you have a very fast pool capable of high I/O (i.e. an SSD pool), a SLOG is likely unnecessary, because your pool is fast enough to handle the ZIL. On the other hand, if you have a slow pool with lots of synchronous writes, it doesn't matter if you have multiple TB of memory: every synchronous write is written to the ZIL, and if your pool is slow, synchronous writes will be slow, and you'll benefit from a SLOG.

Lastly, there's no "usually" when it comes to copy-on-write. Every write done in ZFS is copy-on-write, no matter where it comes from.

MrJK · Aug 15, 2016

Nick2253 said:
The most important question here is what are you trying to measure? If you're trying to measure disk speed, then I would not perform the test from FreeNAS, but rather something like PartedMagic that is designed to be completely hand-off from your drives. The irregularity of your tests indicate that there may be some activity in the background that is throwing off your results.

This is indeed the pertinent question. I was testing that the drives were OK to use and didn't need an RMA. I'm now happy that they are all ok but just wondered about the cause of the irregularities. I had assumed that there wouldn't be much going on in the background to disturb a fresh new FreeNas system with no data or pools but perhaps I should have run that test script on a fresh FreeBSD instead.

Thanks for your time.

Stux · Aug 16, 2016

I'm getting similar results when testing 8 new drives. One of the drives is 10-20% slower. Moving the drive amongst bays moves the slowness.

will do some more testing, but it's about RMAing the slow drives as I don't want one or two slow drive slowing down the whole array

Binary Buddha · Aug 18, 2016

Nick2253 no can engrish

That's what I said... Granted I was using general none specific statements meant to not confuse the crap out of someone that's new to zfs. The links I posted; obviously intended to be read, stated the specifics that you said. Because they, like us, will eventually learn the specifics over time and that this isn't the Matrix. Yes. I know ZIL/SLOG isn't a "write cache".

But, lets not make this about us and both agree that the script being used is appropriate for the environment.

Nick2253 · Aug 19, 2016

Binary Buddha said:
That's what I said... Granted I was using general none specific statements meant to not confuse the crap out of someone that's new to zfs. The links I posted; obviously intended to be read, stated the specifics that you said. Because they, like us, will eventually learn the specifics over time and that this isn't the Matrix. Yes. I know ZIL/SLOG isn't a "write cache".

But it's not at all what you said. I have no problem with general statements, but not if those general statements are blatantly wrong or misleading. And that will confuse the crap out of someone who doesn't have a background in ZFS. I mean, if you know that a SLOG is not a write cache, why the heck did you say that it was a write cache!? More than that, you made definitive, inaccurate claims about how ZFS works. Accurate statements wouldn't have required any more effort or upped the difficulty of comprehension.

If you want to post links for further reading, that's great. Just don't paraphrase inaccurately.

Just so you know, this isn't about "us". It's about a responsibility that we have to pass on accurate information to those who seek help on these forums.

Important Announcement for the TrueNAS Community.

Slow disk burn in tests with solnet array test script

MrJK

Cadet

Binary Buddha

Contributor

MrJK

Cadet

Binary Buddha

Contributor

Nick2253

Wizard

MrJK

Cadet

Stux

MVP

Binary Buddha

Contributor

Nick2253

Wizard

Similar threads