Multi_Report Critical Temp 17*C

Paul5

Contributor
Joined
Jun 17, 2013
Messages
117
Installed Multi_Reports and the first tests on sending emails I keep getting the same critical error for HDD temp at 17*C, this also occurred when the temp was at 23*C. O' the 17*C is after a cold boot and the 23*C was when running for a while.

temp.jpg


SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 37
3 Spin_Up_Time 0x0027 179 175 021 Pre-fail Always - 6041
4 Start_Stop_Count 0x0032 085 085 000 Old_age Always - 15147
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 16012
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 093 093 000 Old_age Always - 7322
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 199
193 Load_Cycle_Count 0x0032 196 196 000 Old_age Always - 14947
194 Temperature_Celsius 0x0022 133 103 000 Old_age Always - 17
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Any ideas how to rectify.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Are you running that system outside in the winter?

17 degrees is so low as to indicate the drive isn't spinning... is it set to spin down?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I would consider that disk not healthy though, due to the RAW_READ_ERROR_RATE > 0.

Plan to replace it.
 

Paul5

Contributor
Joined
Jun 17, 2013
Messages
117
Are you running that system outside in the winter?

17 degrees is so low as to indicate the drive isn't spinning... is it set to spin down?
It's actually warmer outside than inside my house. Fact. It is also set to sleep after 10minutes.

sretalla you misunderstood or I explained wrong. The problem is the 'Critical' error of 'RED' for the HDD Temperature.
The script notes :
###### Temperature Settings
HDDtempWarn=45 # HDD Drive Warning Temp (in C) when a WARNING message will be used.
HDDtempCrit=50 # HDD Drive Critical Temp (in C) when a CRITICAL message will be used.

So I check under Reports and this is what I get. No HDDs are recording temperatures see below.
diskada0.jpg


So Multi_Reports is using smart data obviously but not TN. In any case, why is it reporting critical when even smart does not record a max temp and a currently temp of 21*C a faulty HDD that hit 50*C would be running hotter don't you think unless it was summer with atleast an ambient temp of 30*C plus.

Also since your here is there any draw back to using 'Force HDD Standby' as in the image above.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
As @Davvo said (and thanks Davvo), if you would like me to examine your drive data, just run the script and use the -dump email switch. Example: multi_report.sh -dump email and it will send me an email with all your data. The only personal data will be your email address which I do not share, and many here can attest to that.

Here is what I see based on what information you provided:
1. Your drives are all cold. The three you listed range from 17C to 21C. SO this means I believe the drives are reporting the correct temperature. I can't tell you why your drives are so cold, that is 62F which is cold for me but I like my house at 74F.
2. Your Warning message is due to the drive being 17C, however I don't recall having a lower limit other than -60C to rule out bogus temperatures.

It looks like I have a problem in the script so if you would send me the dump data, I could figure it out and fix the error. That is why I provided the -dump email option. By the way, this is the first time I've seen this problem, it intrigues me.

EDIT: I also just noticed that your drive in question does not list an RPM, that is odd as well. It should list as 5400 for that exact model number.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I tried a similar drive and to manipulate the temperature data but was unable to reproduce the error mentioned above. See screen capture below.
@Paul5 please send me a dump of the requested data so I can figure out what is going on.

Screenshot 2023-07-11 155905.jpg
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
is there any draw back to using 'Force HDD Standby' as in the image above.
Since the system is telling you to do that, maybe it's OK. I am with a bunch of the other forum members here who think that spinning down drives isn't a good idea in total, so can't comment more than that.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
When you spin down a HDD you stop its plates' motion and park the heads; in order to read or write data the platters need to spin and the heads need to be in position. Starting and stopping, starting and stopping, starting and stopping... dozens or more times a day for a few years, is your drive gonna like it? One side is sure that current-day drives are built to withstand such stress, the other doesn't want to put unnecessary strain on the drives.

This should be a brief explaination of the spindown debate.

As a side note, a study found that temps that low are more harmful to the drives than operating around 40-50 C.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Here are my thoughts about your setup for sleeping the drives... (from the information which is viewable above)

Drive ada0: Spins down on average once every 1 hour and 5 minutes.
Drive ada1: Spins down on average once every 1 hour and 12 minutes.
Drive ada2: Spins down on average once every 35 minutes.

My values are approximate and are an assumption that these drives have only been used for the NAS, not in any other machine. These values are also just to represent how frequently these drives appear to park. If I have my calculations off by a few minutes and that becomes your main focus, you are missing the bigger picture.

If these are accurate assumptions then this is definitely not something most of the experienced forum members recommend. There are some thoughts that a certain number of start/stop cycles is okay, but I'm not sure you will find many, if any, that would support this with the high values your drives have.

For my system I have 41764 hours on three drives, Start/Stop count is 227. My drives had a 3 year warranty which expired 2 years, 9 months, and 10 days ago and they are running strong with no errors. I did have a fourth drive which had the same type of values about 3.5 months ago when it started to show some errors. I replaced it before it became a real problem, so I got about 2.5 years beyond the 3 year warranty of life. I was a happy camper. The majority of my Stop/Start count is from the initial setup, then from moving the NAS and taking it to the garage to blow out the dust periodically, or when a bad electrical storm passes though, I unplug the NAS.

So my advice is, do not spin your drives down or if you need to spin your drives down, set the activity interval for 120 minutes to prevent excessive start/stop cycles, and monitor the stop/start count to ensure your changes are effective.

I hope this makes sense and it's just advice. No one will give you a hard time if your path is to keep things the way they are. We just want to give you the best advice we can and you can make the decision on how you want to run your system.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
One side is sure that current-day drives are built to withstand such stress, the other doesn't want to put unnecessary strain on the drives.
I agree with you for the most part.
Laptop spinning rust drives have low mass platters to start spinning and laptop drives are designed to perform thousands of start/stop cycles.

3.5" drives are generally not near as robust when it comes to handling the surge of current repeatedly, as in a laptop drive. The 3.5" set of platters, especially for high capacity drives have a lot of mass and it takes a lot of current to spin those up. This repeated high current draw tends to overheat or just over stress the motor drive circuits and then you get the famous 'tick tick tick' of the drive head wanting to load but the drive isn't spinning fast enough or at a consistent speed, or just as bad, the drive doesn't spin up at all. Keep one thing in mind, the drive manufacturers only warranty a drive for 3 to 5 years (there are a few exceptions), so they build them to at least last for that period of time, then they expect the drive to fail so you can buy a new drive. They want to make money and if they designed a drive that would last 20 years, they would never make lots of money. Remember that. So if you take it easy on the drive start/stop cycle, especially if it's a large capacity drive, it will likely last significantly longer than the warranty period.

Time to take this over the edge and share some knowledge: If you purchase several of the same model drive (generally it needs to be in the same few lots of drives) and you have an electronics failure, you could swap the electronics between the two and recover the drive. If the drive motor fails, you could (if you are skilled enough and it's not difficult if you take your time and are careful) transfer the platters from a bad drive to a good drive. I've done it at least 3 times that I recall, but I've done all kinds of stuff over the decades. I use to recalibrate by hand the hard drive head alignment using a simple O'Scope, this was before the days of voice coil head alignment. Yea, I just wanted to go off topic for a few minutes, it's been a very long day.

Wow, I rambled on. Sorry about that.
 

Paul5

Contributor
Joined
Jun 17, 2013
Messages
117
Sorry for the delay, I don't seem to be getting email updates.
and it will send me an email with all your data.
First I am not a programmer or anything so 99% of the time I don't know what I'm looking at, but I am privacy paranoid (read on).

Now, without going into detail things like 'send me all your data' scares me. I've seen this statement a few times, Manual as well I think. Yes, I know you state it's just the email but in your script something else grabbed my attention at line #904:

### Clean up TrueNAS Configuration files
if test -e "/tmp/${Config_Name}.db"; then rm "/tmp/${Config_Name}.db"; fi
if test -e /tmp/config_backup.md5; then rm /tmp/config_backup.md5; fi
if test -e /tmp/config_backup.sha256; then rm /tmp/config_backup.sha256; fi
if test -e "/tmp/${Config_Name}.zip"; then rm "/tmp/$Config_Name.zip"; fi
if test -e "/tmp/freenas-v1.db"; then rm "/tmp/freenas-v1.db"; fi
if test -e "/tmp/freenas-v1.md5"; then rm "/tmp/freenas-v1.md5"; fi
if test -e "/tmp/freenas-v1.sha256"; then rm "/tmp/freenas-v1.sha256"; fi
if test -e "/tmp/pwenc_secret"; then rm "/tmp/pwenc_secret"; fi
if test -e "/tmp/pwenc_secret.md5"; then rm "/tmp/pwenc_secret.md5"; fi
if test -e "/tmp/pwenc_secret.sha256"; then rm "/tmp/pwenc_secret.sha256"; fi
### Clean up complete!

As I said, I have no idea what I'm looking at but I see files being moved around then I see things like Config_Name.db access and config.zip why zip the config file or even access it. Then things that my brain sees as pwenc_secret would interpret as is there access to passwords or other?

Yes this may be all benign and harmless for others would have said something but the original 'send me all your data' scared me. So I changed all the email address I could find in your script and sent myself a -dump.

Can anybody else confirm that it is benign information.

(You should see may paranoia when TN devs ask for a debug data dump of TN yet alone removing all personal information.)


When you spin down a HDD you stop its plates' motion and park the heads; in order to read or write data the platters need to spin and the heads need to be in position. Starting and stopping, starting and stopping, starting and stopping... dozens or more times a day for a few years, is your drive gonna like it? One side is sure that current-day drives are built to withstand such stress, the other doesn't want to put unnecessary strain on the drives.

This should be a brief explaination of the spindown debate.
Thanks to all for the spin down debate and I will/may change my ways.

First, My TN box is started on demand usually once or twice a day if that, it is also set to auto shutdown when no one is using it. Currently with the 4 drives, two are rarely accessed, Possibly once or twice a week other than what ever maintenance TN does. The other two, one is shared with and the other is Media which the most used and all currently set to sleep after 10minutes.

I guess I will remove sleep from media or set it to 1.0 hour that way TN will shutdown before it sleeps or if I'm doing a clone/s to another disk it can sleep. I'll have to think about this one. I just don't like having things running if not being used.

The other two rarely accessed disks I'm not to sure. Personally it's a 50/50 for spinning up can cause damage but constantly running can wear the bearings out quicker as well as having the platter under constant centrifugal force.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
We need to cleanup any left over files from the previous run before we run the script again, that is the purpose for the lines of code you looked at. Notice it's in the /tmp/ file space. This is like a RAM Disk, when you reboot the computer it all goes away, it is volatile, has nothing to do with TrueNAS files themself. So we can do this another way if you desire and I get the same result.

1. Start a conversation with me.
2. Run the script with -dump and this will send only you the dump data, that same stuff I would receive.
3. Examine the files yourself.
4. In the conversation with me, attach all the files. At a minimum right now I need the output.html and all the .json and the drive .txt files and multi_report_config.txt (you may edit out your email address in this file if you desire). This dump action should not produce any .zip files, and I do not need them, nor could I even use them. The secret password file is encrypted by TrueNAS and is there for your password recovery efforts, but if you do a search on the forum for 'pwenc_secret' then you will be able to independently verify it's an encrypted file.

As for trust, well you ran the script and have no idea exactly what it does so you are running it on faith that I am not hacking into your machine.

But if you do not desire to send me the information I need to isolate and fix the problem, I do understand. I've had my identity solen many years ago, it really sucked restoring it. It was a bad several years. However the only way I can fix the issue is to use the -dump data to recreate the issue (as best as possible) in order to troubleshoot it and fix the issue. This script has been out for several years and I created the original one about a decade ago. So the ball is in your court.
 

Paul5

Contributor
Joined
Jun 17, 2013
Messages
117
Sent, I think. Have had a bit of trouble with MS. TN says mail.send failed but MS says sent? Tell me it you didn't get it.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Got it and I just sent you a response. Sorry it was kind of long, just trying to cover all the bases in as few emails as possible. We can post here the final resolution, assuming we can make it work properly.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
An Update: The user was running version 2.4 so I asked him to update to the current version 2.4.3 which corrected the temp issue. I actually suspect it was the multi_report_config.txt file was possibly corrupt. But things are working now.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
An Update: The user was running version 2.4 so I asked him to update to the current version 2.4.3 which corrected the temp issue. I actually suspect it was the multi_report_config.txt file was possibly corrupt. But things are working now.
As always, thank you joe. Really appreciate what you are doing.
 
Top