HDD Spindown Timer

HDD Spindown Timer 2.2.0

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
One more note. This error only occurs when the drives are already in standby. If I wake them up in between timeout periods, they simply spin down again and there is no error, so my guess is something in your script that executes at end of timeout on drives in standby is causing the error. Is it possible you are searching for data that is unavailable on drives in standby?

Code:
root@nas:/mnt/Main/Files/Software/Scripts # ./spindown_timer.sh -m -t 60 -p 15 -v -i da16 -i da17 -i da18 -i da19
Monitoring drives with a timeout of 60 seconds: da16 da17 da18 da19
I/O check sample period: 15 sec
2019-10-18 15:34:28 Drive timeouts: [da16]=60 [da17]=60 [da18]=60 [da19]=60
2019-10-18 15:34:43 Drive timeouts: [da16]=45 [da17]=45 [da18]=45 [da19]=45
2019-10-18 15:34:58 Drive timeouts: [da16]=30 [da17]=30 [da18]=30 [da19]=30
2019-10-18 15:35:13 Drive timeouts: [da16]=15 [da17]=15 [da18]=15 [da19]=15
2019-10-18 15:35:29 Spun down idle drive: da16
2019-10-18 15:35:30 Spun down idle drive: da17
2019-10-18 15:35:31 Spun down idle drive: da18
2019-10-18 15:35:31 Spun down idle drive: da19
2019-10-18 15:35:31 Drive timeouts: [da16]=60 [da17]=60 [da18]=60 [da19]=60
2019-10-18 15:35:46 Drive timeouts: [da16]=45 [da17]=45 [da18]=45 [da19]=45
2019-10-18 15:36:01 Drive timeouts: [da16]=30 [da17]=30 [da18]=30 [da19]=30
2019-10-18 15:36:17 Drive timeouts: [da16]=15 [da17]=15 [da18]=15 [da19]=15
2019-10-18 15:36:32 Spun down idle drive: da16
2019-10-18 15:36:33 Spun down idle drive: da17
2019-10-18 15:36:34 Spun down idle drive: da18
2019-10-18 15:36:34 Spun down idle drive: da19
2019-10-18 15:36:34 Drive timeouts: [da16]=60 [da17]=60 [da18]=60 [da19]=60
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
If I'm understanding "iostat call" correctly, you basically run iostat for the polling period (by default 600 seconds) and then seeing if drive activity happens during that 600 second period to determine if the drive is idle or not. As soon as that polling period ends, you get the results, update the idle status, spindown if needed, and then immediately start another polling period (or what I'm thinking of as sampling period).
Yes, that's exactly how the script works :)

I've included the following from running the camcontrol on one of the disks. Results are the same whether active or in standby.
[...]
The errors are faily easy to trigger by specifying a short timeout and short polling. As soon as the timeout counter runs down and it polls, i get the error.
Okay so the camcontrol identify call correctly detects your drives as ATA drives. The script itself must also recognizes your drives as ATA ones since the error messages show a problem while issuing the epc subcommand, which only gets issued for ATA drives. It is used in Line 172 to determine whether the drive is currently spinning or already spun down.

This, however, doesn't explain why you are getting SCSI errors while issuing ATA commands... Maybe there is something weird going on with the SAS controller? I did a quick search and most of the results I found pointed in the direction of hardware problems that some people were able to fix by refitting the SATA cables.

Furthermore the error seems to only happen on some of your drives (2 errors for 4 drives?) and the SCSI command that seems to be issued is, according to the SCSI specification, a general INQUIRY (See 3.6) requesting just an overview of the available VPD Pages (See 5.4.1) but nothing that requests any information about the current power conditions of the drive.

I must admit that I'm a bit stumped at this point and not quite sure how I can help you :/

One more note. This error only occurs when the drives are already in standby. If I wake them up in between timeout periods, they simply spin down again and there is no error, so my guess is something in your script that executes at end of timeout on drives in standby is causing the error. Is it possible you are searching for data that is unavailable on drives in standby?
After each polling period the following commands are issued :
  • camcontrol identify $DRIVE (all drives)
  • camcontrol epc $DRIVE -c status -P (ATA drives only)
  • camcontrol standby $DRIVE (ATA drives only, only if command above indicates that the drive is NOT already spun down)
The identify and epc subcommands are executed regardless of the drives power condition. Therefore it must be a problem with one of those.

Have you tried manually spinning down the drives trough camcontrol standby $DRIVE and reading it's power condition while spun down? Maybe it's even a good idea to try this on multiple drives since the error seems to only happen on some of the drives.
Code:
$ camcontrol standby $DRIVE
$ camcontrol identify $DRIVE
$ camcontrol epc $DRIVE -c status -P

My current guess would be that your drives don't like the camcontrol identify call while spun down.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
I've tried issuing those commands to all drives spun down and cannot manually replicate the issue.

I was however able to replicate the error by running the following in a script:

Code:
camcontrol identify da16
camcontrol epc da16 -c status -P
camcontrol standby da16

camcontrol identify da17
camcontrol epc da17 -c status -P
camcontrol standby da17


It also appears that camcontrol identify sometimes wakes the drives up. If I put one into standby, i can verify it's in standby, but running identify wakes it, but not always.

Changing my script to the following with a 5 second sleep prevents the drives from waking up and prevents any errors:

Code:
camcontrol identify da16
sleep 5
camcontrol epc da16 -c status -P
camcontrol standby da16

camcontrol identify da17
sleep 5
camcontrol epc da17 -c status -P
camcontrol standby da17
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
Changing line 162 to be:

Code:
if [[ -z $(sleep 3; camcontrol epc $1 -c status -P | grep 'Standby') ]]; then echo 1; else echo 0; fi


with sleep 3 seems to have fixed it.
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
I was however able to replicate the error by running the following in a script:
[...]
It also appears that camcontrol identify sometimes wakes the drives up. If I put one into standby, i can verify it's in standby, but running identify wakes it, but not always.
[...]
Changing line 162 to be:
Code:
if [[ -z $(sleep 3; camcontrol epc $1 -c status -P | grep 'Standby') ]]; then echo 1; else echo 0; fi

with sleep 3 seems to have fixed it.

That's a funny behavior... Thanks for the tests!

I wouldn't add a sleep instruction at this point since it introduces a delay of at least 3 * N_DRIVES seconds during which I/O would be ignored. Furthermore that won't fix the wake problem you reported. But it should be possible to do all camcontrol identify calls once during script startup and therefore remove the calls completely in-between polling periods. I'll write a patch in the next days once I find some time for it and reach back to you here :)
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
@mgittelman I updated the script to do the ATA/SCSI detection trough camcontrol identify once at startup, but for the moment without a delay. This completely removes all camcontrol identify calls from in-between the polling periods and only leaves the camcontrol epc / camcontrol modepage calls to detect if a drive is already spun down. Can you check if this fixes your problem before I create a release? Thanks!

Updated script: https://github.com/ngandrass/freena...6f82c6af11d0f3670acb36244d3/spindown_timer.sh
Changes: https://github.com/ngandrass/freenas-spindown-timer/commit/3d4804f75efd66f82c6af11d0f3670acb36244d3
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
@ngandrass Testing your updated script for the last 10 minutes and it looks good - I am unable to reproduce the error. Thank you for making those changes and for creating this resource!
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
@ngandrass Testing your updated script for the last 10 minutes and it looks good - I am unable to reproduce the error. Thank you for making those changes and for creating this resource!
Perfect! I also tested the script for 24 hours on my system without problems.Therefore I'm releasing the new version now.

Thanks again for testing :)
 

loblawbob

Cadet
Joined
Oct 26, 2019
Messages
2
I'm seeing three entries when checking with ps -aux:
Code:
root       4013   0.0  0.0   7840   3716 v0  I+   21:01    0:00.00 bash /usr/spindown_timer.sh -m -t 5400 -p 600 -i ada1 -i ada3 -i ada4
root       4028   0.0  0.0   7840   3728 v0  I+   21:01    0:00.00 bash /usr/spindown_timer.sh -m -t 5400 -p 600 -i ada1 -i ada3 -i ada4
root       4029   0.0  0.0   7840   3716 v0  I+   21:01    0:00.00 bash /usr/spindown_timer.sh -m -t 5400 -p 600 -i ada1 -i ada3 -i ada4


Is this correct or did I make a mistake during configuration? I'm pretty new to FreeNas.

Also, does it make a difference which value is chosen for the selected disks in "Adv. Power Management " in "Storage - Disks"?

Thank you for making this script!
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
I'm seeing three entries when checking with ps -aux
[...]
Is this correct or did I make a mistake during configuration? I'm pretty new to FreeNas.

This is perfectly fine. The behavior is based on the way the script retrieves information about the disk I/O :)

Also, does it make a difference which value is chosen for the selected disks in "Adv. Power Management " in "Storage - Disks"?
I'd stick to values of at least 128 (Minimum power usage without Standby (no spindown)) to prevent from FreeNAS interfering with the script's operation.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
@ngandrass Just a feature request when you get around to it: It would be cool if your script could determine which drives were SSDs and which were regular spinners, and then only try to shut down the spinners (if it doesn't already do something like that.) Specifying them manually with the options you provide is a good workaround, but I have one system that is mostly SSDs and a couple HDDs, but I change the HDDs sometimes which changes the script I'd run if that makes sense.
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
@ngandrass Just a feature request when you get around to it: It would be cool if your script could determine which drives were SSDs and which were regular spinners, and then only try to shut down the spinners (if it doesn't already do something like that.) Specifying them manually with the options you provide is a good workaround, but I have one system that is mostly SSDs and a couple HDDs, but I change the HDDs sometimes which changes the script I'd run if that makes sense.

I threw a SSD into my FreeNAS box and did a quick test. At first I verified that camcontrol identifies the disk as an SSD by checking the camcontrol identify output for "media RPM non-rotating". I then ran an instance of the spindown script to check whether it throws errors when operating on non-rotating disks.

The result was that the script handled the SSD like any other HDD without any error. Therefore, it should just work out of the box for you, I guess? Or am I missing something here?

Note: If you change drives you have to restart the script anyway since the device identifiers change.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
Thanks @ngandrass. No actual errors so I'm sure you are right and it all works fine. I was just concerned that maybe there would be an issue with trying to spin down SSDs, but I guess it doesn't matter.
 

GreaseMonkey88

Dabbler
Joined
Dec 8, 2019
Messages
27
Great script - fixing a big missing feature in FreeNAS! Thanks for your effort!

Is the current script compatible to FreeNAS 11.3? I think the update to 11.3 somehow deleted my script (was in the root user folder).
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
Great script - fixing a big missing feature in FreeNAS! Thanks for your effort!

Is the current script compatible to FreeNAS 11.3? I think the update to 11.3 somehow deleted my script (was in the root user folder).

Thanks! The script should be compatible with FreeNAS 11.3 as I found no deal-breaker in the release notes. However, I haven't found time to update my test system yet in order to properly confirm it. I'll try to find some time at the end of next week and reach back to you afterwards :)

Regarding the deleted script: That sounds strange... Maybe /root gets cleared during updates? I'd suggest placing the script onto the same drive your system dataset also resides on.
 

ngandrass

Explorer
Joined
Jun 26, 2019
Messages
68
Is the current script compatible to FreeNAS 11.3? I think the update to 11.3 somehow deleted my script (was in the root user folder).

The script should be compatible with FreeNAS 11.3 as I found no deal-breaker in the release notes. However, I haven't found time to update my test system yet in order to properly confirm it. I'll try to find some time at the end of next week and reach back to you afterwards :)


I upgraded my test instance to FreeNAS-11.3 and was able to confirm that the script works as expected. Moreover, I wasn't able to observe problems with deleted script files. After the upgrade everything worked out-of-the box and as expected for me. If you are able to reproduce the problem please let me know.
 
Top