PID fan controller Perl script

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I am planning to use the script to control the fans of an ASRock Rack X470D4U2-2T, which requires some changes to the logic. In particular, the ASRock MB does not know anything about fan modes ("full", "optimal", etc). The duty cycle can be set for each fan between 1% and 100% (in practice 30% is the lowest setting at which the fans still reliably spin), so I guess this is comparable to only having the Supermicro "optimal" mode. Also, it simply has FAN1 through FAN6, and the CPU fan is connected to FAN1, i.e., no expllicit zones.
You may want to check out my Fan Control Tool as it is meant for ASRock Rack boards from the outset and is rather configurable and will take advantage of the additional temp sensor header should you choose to use it.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703

treefrob

Cadet
Joined
Sep 25, 2018
Messages
9
You may want to check out my Fan Control Tool as it is meant for ASRock Rack boards from the outset and is rather configurable and will take advantage of the additional temp sensor header should you choose to use it.
I will look at it. Unfortunately, bash is the Wrong Language for such endeavors :/
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
what sensor values exactly?
As you see in the chart a few posts back, there are temperature sensors (attached to the Commander pro... sample output from the command below, temperatures 0 through 3) and the disk temperatures which are gathered by the script as part of calculating the fan speeds to set.

Code:
root@nas:/mnt/scripts # ./OpenCorsairLink.elf.new --device 0 --fan channel=0,mode=0
Dev=0, CorsairLink Device Found: Commander PRO!

Vendor: Corsair
Product: Commander PRO
Firmware: V0.5.180
Temperature 0: 35.58 C
Temperature 1: 38.00 C
Temperature 2: 54.00 C
Temperature 3: 38.00 C
Output 12v: 12.11 V
Output 5v:  5.01 V
Output 3.3v:  3.38 V
Fan 0:    Mode: 4-Pin
    PWM: 0%
    RPM: 0
Fan 1:    Mode: 4-Pin
    PWM: 85%
    RPM: 1772
Fan 2:    Mode: 4-Pin
    PWM: 63%
    RPM: 1898
Fan 3:    Mode: 4-Pin
    PWM: 63%
    RPM: 1904
Fan 4:    Mode: 4-Pin
    PWM: 63%
    RPM: 1895
Fan 5:    Mode: 4-Pin
    PWM: 100%
    RPM: 0


I uploaded influx.pl to my repo which covers how I currently read the fan speeds and temps to influx. I run it as a separate cron job, but as you see on the chart, that causes some nonsensical values to happen sometimes, I think due to reading the values too close together with the pid fan script.
 

Mark Levitt

Explorer
Joined
May 21, 2017
Messages
56
In case anyone is interested, I forked a script that was originally written for Linux and modified it to run in FreeNAS. It's not particularly elegant, but it works for me.

It uses SMART to read the temp and then uses the "ipmitool" to set the fan speed.

It's written in Python 3 and for an Asrock Rack motherboard.

 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I will look at it. Unfortunately, bash is the Wrong Language for such endeavors :/
Eh, I would say that bash can work fine for something like this but most people do not write that sort of bash, which is its own issue.
 

treefrob

Cadet
Joined
Sep 25, 2018
Messages
9
I seem to have gotten my adapted and cleaned-up script working. Except for one place (read_config()), "use strict;" is enforced and a syntax check reports no errors. Because the ASRock does not know about fan zones for fan modes (or is it profiles?), but rather only allows all the fans to be set with a single command. For that reason I introduced another global variable (sigh) @asrock_current_fan_duty_cycle_values, which holds the current values for all fans. I added checks for the "script mode" (SuperMicro or ASRock) to the various functions that set or get fan attributes using ipmitool. The function to get the SuperMicro "fan mode" is a no-op if "script mode" is set to ASRock, for example.
If/when I find time I might do more clean-up, or possible rewrite parts of the script.
The newest version (5-JUN-2020) of the script is attached to this message.
-rob
 

Attachments

  • PID_fan_control.pl.gz
    14.7 KB · Views: 488

treefrob

Cadet
Joined
Sep 25, 2018
Messages
9
I fixed a few bugs in the script already. It seems (still) to work for me. I should mention that added dependencies on two Perl modules, Proc::Daemon and IPC::Run. If people find this particularly onerous, I can probably get rid of the dependencies.
I forked Kevin's repo as well and comitted all my changes to the new repo:


There is definitely more work to be done. An incomplete list:
  • get rid of all the chaff (functions no longer used)
  • re-vamp the config file format:
    • use standard config-file format ("key = value")
    • possibly use different "profiles" with one marked active as a substitute for zillions of different .ini files
  • variable-name cleanup -- use standard Perl naming convention
  • use hashes for passing groups of parameters back and forth instead of individual values

-rob
 

msbxa

Contributor
Joined
Sep 21, 2014
Messages
151
Just tried the the following fan script : screen ./PID_fan_control.pl on Supermicro X10SL7-F (upgraded Firmware to 3.88) but it seems I am having some errors. Also did some test as follows:set zone 1 and 0 at 50% duty cycle, run ipmitool raw 0x30 0x70 0x66 0x01 1 50 and all worked.. Run with Debug set to 1.

2020-06-16 14:16:04: CPU Temp: 34.0 <= 35, CPU Fan going low.
2020-06-16 14:16:04: temperature error = -2.5
2020-06-16 14:16:04: PID control new duty cycle is 30%
2020-06-16 14:16:15: CPU Fan speed: No reading
2020-06-16 14:16:28: CPU Fan speed: No reading
2020-06-16 14:16:40: CPU Fan speed: No reading
2020-06-16 14:16:52: CPU Fan speed: No reading
2020-06-16 14:17:04: CPU Fan speed: No reading
2020-06-16 14:17:15: CPU Fan speed: No reading
2020-06-16 14:17:26: CPU Fan speed: No reading
2020-06-16 14:17:38: CPU Fan speed: No reading
2020-06-16 14:17:50: CPU Fan speed: No reading
2020-06-16 14:18:02: CPU Fan speed: No reading
2020-06-16 14:18:13: CPU Fan speed: No reading
2020-06-16 14:18:24: CPU Fan speed: No reading
2020-06-16 14:18:24: Fan speeds are unreadable after 120 seconds, rebooting BMC
2020-06-16 14:18:24: Resetting BMC
No data available
Get Device ID command failed
No data available
No data available
No valid response received
Error obtaining SDR info: Invalid command
Unable to open SDR for reading
2020-06-16 14:19:21: temperature error = -4
2020-06-16 14:19:34: CPU Fan speed: No reading
2020-06-16 14:19:45: CPU Fan speed: No reading

oot@PRODNAS[/mnt/msbv01/scripts]# ipmitool sensor | grep FAN
FAN1 | 2000.000 | RPM | ok | 300.000 | 500.000 | 700.000 | 25300.000 | 25400.000 | 25500.000
FAN2 | na | | na | na | na | na | na | na | na
FAN3 | na | | na | na | na | na | na | na | na
FAN4 | 1400.000 | RPM | ok | 300.000 | 500.000 | 700.000 | 25300.000 | 25400.000 | 25500.000
FANA | 1800.000 | RPM | ok | 300.000 | 500.000 | 700.000 | 25300.000 | 25400.000 | 25500.000

Hope someone will help me out where to start?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Hope someone will help me out where to start?
In the config section, did you set the fan headers correctly?

Seems like you should see something like this:
Code:
## FAN ZONES
# Your CPU/case fans should probably be connected to the main fan sockets, which are in fan zone zero
# Your HD fans should be connected to FANA which is in Zone 1
# You could switch the CPU/HD fans around, as long as you change the zones and fan header configurations.
#
# 0 = FAN1..5
# 1 = FANA..FANC
$cpu_fan_zone = 0;
$hd_fan_zone  = 1;


## FAN HEADERS
## these are the fan headers which are used to verify the fan zone is high. FAN1+ are all in Zone 0, FANA is Zone 1.
## cpu_fan_header should be in the cpu_fan_zone
## hd_fan_header should be in the hd_fan_zone
$cpu_fan_header = "FAN1";                 # used for printing to standard output for debugging   
$hd_fan_header  = "FANA";                 # used for printing to standard output for debugging   
@hd_fan_list = ("FANA");  # used for logging to file  


You need to re-read that section as it seems you have a fan connected on FAN4, which isn't how the script instructions recommend doing things. You may be able to work something out by reversing the cpu and hd fan zone allocations in the first section, which would look like this:

Code:
## FAN ZONES
# Your CPU/case fans should probably be connected to the main fan sockets, which are in fan zone zero
# Your HD fans should be connected to FANA which is in Zone 1
# You could switch the CPU/HD fans around, as long as you change the zones and fan header configurations.
#
# 0 = FAN1..5
# 1 = FANA..FANC
$cpu_fan_zone = 1;
$hd_fan_zone  = 0;


## FAN HEADERS
## these are the fan headers which are used to verify the fan zone is high. FAN1+ are all in Zone 0, FANA is Zone 1.
## cpu_fan_header should be in the cpu_fan_zone
## hd_fan_header should be in the hd_fan_zone
$cpu_fan_header = "FANA";                 # used for printing to standard output for debugging   
$hd_fan_header  = "FAN1";                 # used for printing to standard output for debugging   
@hd_fan_list = ("FAN1", "FAN4");  # used for logging to file  
 

msbxa

Contributor
Joined
Sep 21, 2014
Messages
151
The following is true to my system FAN1 is connected to CPU while FAN4 is connected to Rear fan (zone 0)and FANA is connected to HDD fans (2 fans with splitters) zone 1. I don't see any conflicted here.
# 0 = FAN1..5
# 1 = FANA
$cpu_fan_zone = 0;
$hd_fan_zone = 1;

I think I found the zone problem and I have to change the script here, used to point to the wrong zone from the original set.
$cpu_fan_header = "FAN1"; # used for printing to standard output for debugging
$hd_fan_header = "FANA"; # used for printing to standard output for debugging

But now I am getting the following messages:
2020-06-16 18:51:22: CPU Temp: 32.0 <= 35, CPU Fan going low.
2020-06-16 18:51:22: temperature error = -6
2020-06-16 18:51:22: PID control new duty cycle is 30%
2020-06-16 18:51:35: CPU fan speed should be low, but 2000 > 1600.
2020-06-16 18:51:35: HD fan speed should be low, but 1700 > 1440.
2020-06-16 18:51:52: CPU fan speed should be low, but 2000 > 1600.
2020-06-16 18:51:52: HD fan speed should be low, but 1800 > 1440.
2020-06-16 18:51:57: Resetting BMC
2020-06-16 18:52:09: CPU fan speed should be low, but 2000 > 1600.
2020-06-16 18:52:09: HD fan speed should be low, but 1800 > 1440.
2020-06-16 18:52:25: CPU fan speed should be low, but 1900 > 1600.
2020-06-16 18:52:25: HD fan speed should be low, but 1700 > 1440.

Edit: I've done thresholds tuning on all fans FAN1, FAN4 and FANA and now i get the following messages: I hope this looks ok?

2020-06-16 19:52:25: CPU Temp: 37.0 dropped below 45, CPU Fan going med.
2020-06-16 19:52:25: temperature error = -3
2020-06-16 19:52:25: PID control new duty cycle is 30%
2020-06-16 19:52:28: CPU Temp: 35.0 <= 35, CPU Fan going low.
2020-06-16 19:55:25: temperature error = -2
2020-06-16 19:55:25: PID control new duty cycle is 38%
2020-06-16 19:58:25: temperature error = -1.5
2020-06-16 19:58:25: PID control new duty cycle is 34%
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Edit: I've done thresholds tuning on all fans FAN1, FAN4 and FANA and now i get the following messages: I hope this looks ok?

2020-06-16 19:52:25: CPU Temp: 37.0 dropped below 45, CPU Fan going med.
2020-06-16 19:52:25: temperature error = -3
2020-06-16 19:52:25: PID control new duty cycle is 30%
2020-06-16 19:52:28: CPU Temp: 35.0 <= 35, CPU Fan going low.
2020-06-16 19:55:25: temperature error = -2
2020-06-16 19:55:25: PID control new duty cycle is 38%
2020-06-16 19:58:25: temperature error = -1.5
2020-06-16 19:58:25: PID control new duty cycle is 34%
Looks like it could be normal... if you hear the fans spinning up and down to go with those changes (maybe they are too minor... you could start a scrub and see what happens, then use -s to stop it)
 

msbxa

Contributor
Joined
Sep 21, 2014
Messages
151
Looks like it could be normal... if you hear the fans spinning up and down to go with those changes (maybe they are too minor... you could start a scrub and see what happens, then use -s to stop it)
No up and down spinning fans but I decided to run the script for a day and read the debug later. Thanks
 

msbxa

Contributor
Joined
Sep 21, 2014
Messages
151
I see lots of continuing temperature error messages without seeing CPU Temp and PID control new duty cycle. What am I missing this time?
temperature error = -2.5
temperature error = -2
temperature error = -1.5
temperature error = -1.5
 

Soloam

Contributor
Joined
Feb 14, 2014
Messages
196
Hello, one question, what would happen if the script is running and the system is idle, the fans go down, and then for some reason the script dies... If the system then goes to full, the fan will still be at low speed. That would make my temperatures raise out of control correct?

Thank You
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Hello, one question, what would happen if the script is running and the system is idle, the fans go down, and then for some reason the script dies... If the system then goes to full, the fan will still be at low speed. That would make my temperatures raise out of control correct?

Thank You
After running this script (or a variant of it) for a couple of years, I have seen no such issue. Where I have seen something, it has been where my fan controller loses USB connection to the system (passthrough in an ESXi VM) and gets stuck... then I see disk temperature warnings starting to come in email from my FreeNAS thresholds (set under storage/disks), so I can react and restart the system (rather inconvenient, but better than overheating disks). On my system running baremetal FreeNAS, I have never encountered that and the script is rock solid in both cases.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I see lots of continuing temperature error messages without seeing CPU Temp and PID control new duty cycle. What am I missing this time?
temperature error = -2.5
temperature error = -2
temperature error = -1.5
temperature error = -1.5
I was hoping somebody with one of those boards would jump in and assist...

Are you only looking at the debug log? or do you also have the log? (what is it showing?)
 
Top