Fan Scripts for Supermicro Boards Using PID Logic

Fan Scripts for Supermicro Boards Using PID Logic 2020-08-20, previous one was missing a file

spinpid2.sh has a few minor changes. Again, if your current version is working, no need to change.
  1. Changed the timing of the mismatch test (tests whether fan RPMs are way off from the current duty) so it now happens right after fan data are read. This avoids false mismatch findings that were due to things changing in the CPU cycles.
  2. Rearranged some code so it's easier to maintain.
  3. Edited some comments so they make more sense (at least to me).
  4. The previous configuration file, from 2020-06-17, is still compatible with this update, so no need to update the config. The new config has some minor changes:
    1. hopefully improved the explanation of the HOW_DUTY setting.
    2. changed default location of log files to the parent directory of the one where spinpid2 resides
There are no changes to the single-zone script or config.
This is a minor update with no changes to the actual fan control logic. If the fan scripts are working fine, you can safely ignore this. There are some changes to make the scripts compatible with running in a virtual machine, and with SAS drives. Changes:
  1. Moved IPMITOOL definition to config file for spinpid.sh and spinpid2.sh (other scripts don't use config file). If doing VM, this makes it easier to add your IP, user, and password and survive updates. This is the only real change to configs. You don't need to redo the whole config, you can just copy the IPMITOOL definition from the new config to your existing one.
  2. Made compatible with SAS drives, at least those with Hitachi/HGST brand.
  3. Added more exclusions to device list so we are just reading spinning drives.
  4. For reading CPU temp, we now use best method available. If sysctl is available, we use that, read all the cores and use the hottest (this is what we did unconditionally before). Otherwise we use the ipmitool sensor reading, which may be the only method available in a VM.
  5. Fixed formatting with a leading 0 in spinpid2.sh.
  6. Added conditional calls to optional user-defined functions after DRIVES_check_adjust() and CPU_check_adjust() (spinpid2.sh only). See barbierimc's post for example.
  • Like
Reactions: Dice and barbierimc
Changes for version 2019-11-01 since 2019-10-04. All scripts changed a bit. Here are bigger changes:

A. The two fan control scripts are now set up for configuration files, which should go in the same directory as the script. This way it is easier to install new script versions; you won't have to fiddle with settings unless there are new ones.

B. In spinpid2.sh, added a setting to tell the script how to determine what the duty cycles (%) are. The options are assuming they are what the script set (as recent versions have done) or reading the data from the board. Caveats and tips are in the config file above the setting.

C. In spinpid2.sh, there is a completely new and more elaborate approach to the whole BMC reset thing. I've tested as much as I can on my one-zone board, but will be looking for some feedback.
  1. First, each major cycle there is a mismatch test to see if either zone is way off the setting, either too high or too low.
  2. If mismatch, it will try to force the offending zone back into line. Then it reads fan data again, and repeats the mismatch test.
  3. If there is still a mismatch, reset the BMC, wait for it to come back (2 minutes), force-set fans again, read fan data again, and repeat mismatch test.
  4. If still a mismatch, go through one more time (force set fans, read fan data, repeat mismatch test) then give up and move on with the script.

If things are really screwed up and this can't fix them, the BMC will reset again the next major cycle (5 minutes or so. But I don't see any point to killing the script in that case, as I don't think that will help. As always, you need to have your fan settings correct based on running spintest.sh. That is needed to determine if fan speeds are appropriate or BMC reset is needed.

I hope this is the last change for a while, but depends on feedback.
  • Like
Reactions: WW1 Flying Ace
Discovered that BMC reset on my machine takes 105 seconds, so lengthened the wait after reset from 60 to 120 sec.
  • Like
Reactions: Dice
In both scripts:
  1. removed Ki and I (integral) term
  2. code tweaks to marginally improve efficiency
  3. added option to output to log only vs. log + console
  4. revised tuning advice at end of scripts to make it easier
In spinpid2.sh (dual-zone script):
  1. because some boards don't report correct fan duty, the script will now try to read it the first time and make needed adjustment, then assume the duty remains as it was set and not read it further
  2. added a setting to control whether interim CPU data are logged
  3. changed BMC reset code to avoid cycling resets that a few people reported.
Note that I don't have a way to test the dual zone script. Please report any issues. Also if you have previous versions, don't forget to copy over your settings.

See discussion for more details.
A couple of slight imrovements to the code for getting CPU temps. Thanks to @bestboy for the research, it can now handle any number of cores. Also the actual temperature reading might be slightly more efficient.
I had earlier switched to the best solution recommended by @Stux for reading CPU temperature in spincheck.sh. Now, with refined, complete bash code suggested by @bestboy, the apparently more efficient method has been incorporated into all 3 of the scripts that read temperature.

If you want the gory details, instead of reading CPU temp from the IPMI, we are now reading it from sysctl. We use the hottest of up to 10 cores as CPU temperature. I used awk instead of cut to get the actual numbers from the sysctl output, not sure if that is much of an improvement.

Thanks to Stux and bestboy!
Changed the command for reading CPU temperature. The new command, suggested by @Stux, is probably somewhat more efficient.
  • Like
Reactions: lmannyr and Dice
In spinpid2.sh, the reading of duty cycles from the motherboard is now commented out (disabled) by default. Some boards report incorrect duty, and this causes the script to go bonkers. I have no idea how widespread this fault is, but there is really no harm in not asking the board for duty cycles - the script assumes the duty is what it sets. There is also some minor code cleanup.

spinpid.sh (the single-zone version) has some improvements in logic that make it work better. It holds drive temps better. Also after a CPU-intensive task, the fans can come down quickly as the CPU cools until the drives need cooling. I finally got it perfect. Also some minor cleanup.
The single-zone script had an issue that caused fan control based on drive temperatures to be inaccurate if the system went through a long period of high CPU use. That's fixed here, other scripts not updated.
Top