Resource icon

solnet-array-test

Back in the late '90's, I was managing a bunch of large whitebox storage servers. For the largest of these, I had the pleasure of building and deploying a massive storage server, 8 shelves of 9 drives each, Seagate ST173404LW 73GB drives, a whopping 5TB ... (*grin*)

Part of the problem was burning in these systems, and so I devised some shell scripty stuff that the hardware techs could use. I've become convinced that a variation on this would be helpful in the FreeNAS community, so I'm playing with a stripped-down version that does some basic disk read testing (at the time of this writing). It is suitable for testing and burn-in use.

I've included just two main passes, a parallel read pass, and a parallel read pass with multiple accesses per disk. The script will do some rudimentary performance analysis and point out possible issues. It needs more work but here it is anyways. This script is expected to be safe to run on a live pool, even though that's not a good idea for performance testing purposes. As with anything you download onto your machine, you are expected to verify the safety to your own satisfaction. Note that the only things that touch the disks are "dd" and they're all structured as "dd if=/dev/${disk}".

Link to the original version of the script. (Note that Xenforo breaks the ftp: link. Please do the obvious fix and change the http://ftp// bit to ftp:// )

Link to the current new SCALE-compatible version (12/2022)

To run it, download it onto a FreeNAS box and execute it as root. It will give you a simple menu

Code:
sol.net disk array test v3

1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list

Option:


You probably want to look at the disklist (option 4), then pick your target disks with option 2 and an appropriate pattern. For a Seagate ST4000DM000, you could select "ST4000" for example.

The test will run a variety of things and report status. It takes a while. Be patient. It will never terminate as it is intended as a burn-in aid, but you do want to let it do its thing for at least a pass or two to get an idea of how your system performs.

It is best to do this while the system is not busy and preferably before a pool is up and running. That said, it should be safe to use even on a busy filer. I've picked on a busy filer here to give an example of how this looks. Note that da14 is a spare drive and you'll notice that all the other drives are testing much slower (because they're in use). Also note that my numbers here are in a testing mode that doesn't have the script actually doing the entire disk, real results would look a bit different and take forever.

Code:
sol.net disk array test v3

1) Use all disks (from camcontrol)
2) Use selected disks (from camcontrol|grep)
3) Specify disks
4) Show camcontrol list

Option: 2

Enter grep match pattern (e.g. ST150176): ST4

Selected disks: da3 da4 da5 da6 da7 da8 da9 da10 da11 da12 da13 da14
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 44 lun 0 (da3,pass5)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 45 lun 0 (da4,pass6)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 46 lun 0 (da5,pass7)
<ATA ST4000DM000-1F21 CC51>  at scbus3 target 47 lun 0 (da6,pass8)
<ATA ST4000DM000-1F21 CC51>  at scbus3 target 48 lun 0 (da7,pass9)
<ATA ST4000DM000-1F21 CC51>  at scbus3 target 49 lun 0 (da8,pass10)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 50 lun 0 (da9,pass11)
<ATA ST4000DM000-1F21 CC51>  at scbus3 target 51 lun 0 (da10,pass12)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 52 lun 0 (da11,pass13)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 53 lun 0 (da12,pass14)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 54 lun 0 (da13,pass15)
<ATA ST4000DM000-1F21 CC52>  at scbus3 target 55 lun 0 (da14,pass16)
Is this correct? (y/N): y
Performing initial serial array read (baseline speeds)
Tue Oct 21 08:21:23 CDT 2014
Tue Oct 21 08:26:47 CDT 2014
Completed: initial serial array read (baseline speeds)

Array's average speed is 97.6883 MB/sec per disk

Disk  Disk Size  MB/sec %ofAvg
------- ---------- ------ ------
da3  3815447MB  98  100
da4  3815447MB  90  92
da5  3815447MB  98  100
da6  3815447MB  97  99
da7  3815447MB  95  97
da8  3815447MB  82  84 --SLOW--
da9  3815447MB  87  89 --SLOW--
da10  3815447MB  84  86 --SLOW--
da11  3815447MB  97  99
da12  3815447MB  92  94
da13  3815447MB  102  104
da14  3815447MB  151  155 ++FAST++

Performing initial parallel array read
Tue Oct 21 08:26:47 CDT 2014
The disk da3 appears to be 3815447 MB.
Disk is reading at about 74 MB/sec
This suggests that this pass may take around 860 minutes

  Serial Parall % of
Disk  Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da3  3815447MB  98  86  88 --SLOW--
da4  3815447MB  90  74  82 --SLOW--
da5  3815447MB  98  82  84 --SLOW--
da6  3815447MB  97  91  95
da7  3815447MB  95  72  76 --SLOW--
da8  3815447MB  82  80  97
da9  3815447MB  87  84  96
da10  3815447MB  84  111  133 ++FAST++
da11  3815447MB  97  120  124 ++FAST++
da12  3815447MB  92  116  126 ++FAST++
da13  3815447MB  102  123  121 ++FAST++
da14  3815447MB  151  144  95

Awaiting completion: initial parallel array read
Tue Oct 21 08:39:32 CDT 2014
Completed: initial parallel array read

Disk's average time is 741 seconds per disk

Disk  Bytes Transferred Seconds %ofAvg
------- ----------------- ------- ------
da3  104857600000  743  100
da4  104857600000  764  103
da5  104857600000  752  101
da6  104857600000  737  99
da7  104857600000  748  101
da8  104857600000  754  102
da9  104857600000  738  100
da10  104857600000  762  103
da11  104857600000  748  101
da12  104857600000  756  102
da13  104857600000  740  100
da14  104857600000  653  88 ++FAST++

Performing initial parallel seek-stress array read
Tue Oct 21 08:39:32 CDT 2014
The disk da3 appears to be 3815447 MB.
Disk is reading at about 58 MB/sec
This suggests that this pass may take around 1093 minutes

  Serial Parall % of
Disk  Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da3  3815447MB  98  52  53
da4  3815447MB  90  48  53
da5  3815447MB  98  50  51
da6  3815447MB  97  50  52
da7  3815447MB  95  48  50
da8  3815447MB  82  48  59
da9  3815447MB  87  54  62
da10  3815447MB  84  47  56
da11  3815447MB  97  49  50
da12  3815447MB  92  50  55
da13  3815447MB  102  49  48
da14  3815447MB  151  52  34

Awaiting completion: initial parallel seek-stress array read
Author
jgreco
Views
203,091
First release
Last update
Rating
5.00 star(s) 5 ratings

More resources from jgreco

Latest updates

  1. I'm not sure how this Resource "update" thingy works. I guess we'll find out.

    Merry Christmas to all you believer-in-Santa-Claus folks, Happy Holidays to everyone else, and...

Latest reviews

Thanks for your tool, @jgreco! It helped me pinpoint a bottleneck in my setup – a single connector between the HBA and a 24-port backplane (BPN-SAS3-846EL2). The issue was resolved by connecting two SFF-8643 cables from the SAS3008/9300-8i running firmware 16.00.12.00 to the primary and secondary expander ports on the same backplane. This resolved the "--SLOW--" indication in the parallel array read test upon retesting. For reference, this 24-port backplane has 16x 20TB Seagate Exos X20 ST20000NM007D-3DJ103 SATA 6Gb/s Standard Model FastFormat (512e/4Kn) disks attached to it.
jgreco
jgreco
The parallel stress test is actually one of the main goals of the tool, and is meant to bring your awareness to bottleneck issues that might otherwise have gone unnoticed. Funny that all these years later with completely different bus topologies that we still have these problems...
Great tool. Definitely it gives quite some level of confidence.
Just be ready, it might take quite some time before it is completed.
Just as a reference: 10x12TB set of disks on Xeon system was up and running for ~7 days before was completed.
But then there is a good understanding about the setup.
Thanks
Thanks for this great tool.
Best read-only array test
Top