solnet-array-test Discussion Thread

jgreco · Nov 19, 2023

VoxUnthank said:
As highlighted in earlier posts you can't view and download this script in an FTP app.

Only half true; it cannot be *seen* in an FTP app, because the directory is -wx. Your FTP app ought to be able to download it just fine (because of the x).

VoxUnthank said:
wget and fetch do not work on windows.

Also only half true;

Code:

C:\Users\jgreco\Downloads>wget ftp://ftp.sol.net/incoming/solnet-array-test-v3.sh
--2023-11-19 06:38:44--  ftp://ftp.sol.net/incoming/solnet-array-test-v3.sh
           => ‘solnet-array-test-v3.sh’
Resolving ftp.sol.net (ftp.sol.net)... 206.55.64.92
Connecting to ftp.sol.net (ftp.sol.net)|206.55.64.92|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /incoming ... done.
==> SIZE solnet-array-test-v3.sh ... 14846
==> PASV ... done.    ==> RETR solnet-array-test-v3.sh ... done.
Length: 14846 (14K) (unauthoritative)

100%[==============================================================================>] 14,846      --.-K/s   in 0s

2023-11-19 06:38:45 (37.5 MB/s) - ‘solnet-array-test-v3.sh’ saved [14846]

wget works just fine, go install Cygwin. I believe I also saw a version of fetch at one point for Cygwin. Also Windows did eventually integrate an ftp client of its own over in \windows\system32\ftp

You still get bonus points for the PowerShell thing.

logan893 · Jan 5, 2024

@jgreco Great script, thanks! Been using it on several new systems and drives, for quick benchmarking and for longer duration stress testing of new and old drives alike.

Regarding the seek-heavy test, the suggested time estimate is for a single pass of "dd" while the test runs 6 in parallel for each drive. Scaling up this estimate by at least a factor 6 would be prudent. Something simple like having the first argument be the multiplication factor works.

Code:

approximatepasstime() {
    # multiplication factor as first argument, followed by one or more disks
    disksize=`getdisksize "${2}"`
    if [ "${disksize}" -gt 0 ]; then
        echo "The disk ${1} appears to be ${disksize} MB.        "
        speed=`samplediskspeed "${2}"`
        echo "${disksize} ${speed} ${1}" | awk '{if ($2 != 0) {speed=$2} else {speed=1}; printf "Disk is reading at about %0.0f MB/sec        \nThis suggests that this pass may take around %0.0f minutes\n", speed, $1 / speed / 60 * $3}'
    else
        echo "Unable to determine disk ${2} size from dmesg file (not fatal but odd!)"
    fi
}

# parallel read
approximatepasstime 1 ${disklist}
# seek heavy read
approximatepasstime 6 ${disklist}

Additionally, estimates are disregarding the somewhat unknown of how read speeds typically reduce over the span of a drive. A quick read sampling near the end of the drive could be used to interpolate the average over a full drive. It also only looks at the first drive, while there may be larger and/or slower drives. I understand if you don't feel the need to dump a lot of code in here though, and that many (most?) arrays will be quite homogenous and it's not too difficult extrapolating yourself from the data provided.

(and there's a typo: "thhis")

Code:

This next test attempts to read all devices while forcing seeks.
This is primarily a stress test of your hard disks.  It does thhis
by running several simultaneous dd sessions on each disk.


Performing initial parallel seek-stress array read
Wed Jan  3 17:25:52 CET 2024
The disk da0 appears to be 9537536 MB.
Disk is reading at about 217 MB/sec
This suggests that this pass may take around 733 minutes


                   Serial Parall % of
Disk    Disk Size  MB/sec MB/sec Serial
------- ---------- ------ ------ ------
da0      9537536MB    243    216     89
da1     15259648MB    211    166     79
da2     15259648MB    218    168     77
da3      9537536MB    244    217     89


Awaiting completion: initial parallel seek-stress array read

If anyone is interested in getting a progress snapshot from at least the one instance of "dd" per drive, the first dd started has stderr captured, and a non-interrupting kill signal can be sent to all "dd" processes to trigger the output below.

kill -INFO on TrueNAS CORE (BSD) or kill -USR1 on TrueNAS SCALE (Linux).

Code:

root@truenas[~]# kill -INFO $(pgrep ^dd$)
root@truenas[~]# cat /tmp/sat.da0.err /tmp/sat.da1.err /tmp/sat.da2.err /tmp/sat.da3.err
5598014+0 records in
5598014+0 records out
5869943128064 bytes transferred in 143745.645171 secs (40835624 bytes/sec)
4842212+0 records in
4842212+0 records out
5077427290112 bytes transferred in 143745.678217 secs (35322295 bytes/sec)
4761574+0 records in
4761574+0 records out
4992872218624 bytes transferred in 143745.704238 secs (34734062 bytes/sec)
5496479+0 records in
5496479+0 records out
5763475963904 bytes transferred in 143745.888619 secs (40094893 bytes/sec)
root@truenas[~]#

bent98 · Jan 12, 2024

I am new to this script and a noob when it comes to Unix. I ran this script and selected 1 pass vs burn in to test it out. It should be done by now as it estimated 19 hours for all disks. I pulled the cover off my server and all hard drives seem to be idle (meaning the heads are not moving). How do I know when test is over? As per my shell prompt in the screenshot ,its still showing awaiting completion.

Patrick M. Hausen · Jan 12, 2024

Possibly your shell timed out. Don't use the web UI shell. Use SSH and screen or tmux to keep your process running.

bent98 · Jan 12, 2024

Patrick M. Hausen said:
Possibly your shell timed out. Don't use the web UI shell. Use SSH and screen or tmux to keep your process running.

so how do I find out the results?

Patrick M. Hausen · Jan 12, 2024

Login again via SSH and reconnect to tmux/screen to the session you left. The results of your last run are gone - unless that script writes a result log file, which I don't know.

Davvo · Jan 12, 2024

Patrick M. Hausen said:
unless that script writes a result log file, which I don't know.

It doesn't.

nasenbaer · Feb 25, 2024

Thanks for the awesome script. Read the whole thread and just want to make sure: it is normal that the hdds get a lot slower after some time? Guess since different parts of the hdds are read which are harder to get to? Or do I have some overheating problems with my HBA (the card is around body temp - meaning not too hot to touch) ?
All four hdds have the same graph / speed. Down from 280 MB/s to 170 MB/s. Disk temp never changed from around 35°

dxun · Mar 4, 2024

Read the whole thread and just want to make sure: it is normal that the hdds get a lot slower after some time?

It certainly appears so. Here's a Seagate Exos X22 20 TB drive that I've been stress testing for a week now.

It's been doing a this seek-stress read operation for a full week now.

Etorix · Mar 4, 2024

bent98 said:
so how do I find out the results?

Use redirection:

tmux
./solnet-array-test-v3.sh | tee test_result.txt

You can then follow the progress in your SSH session or detach it and let the test run.
If you want to come back:
tmux attach
Or just read the log-file-in-progress:
more test_result.txt
Expect the test to take much more time than the estimate, tough.

(You can even launch the session from the web shell…)

Important Announcement for The TrueNAS Community.

solnet-array-test Discussion Thread

jgreco

Resident Grinch

logan893

Dabbler

bent98

Contributor

Patrick M. Hausen

Hall of Famer

bent98

Contributor

Patrick M. Hausen

Hall of Famer

Davvo

MVP

nasenbaer

Cadet

dxun

Explorer

Etorix

Wizard

Similar threads