SMART error (Temperature) detected on host

Status
Not open for further replies.

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I dont think it's going to help much. They're low RPM drives, in a cold part, of a cold house, in a well ventilated case. They're going to be well under 30 no matter what I do, short of filling the case up with more shizzle.

Plus, the SuperMicro X10SLM board which I'm running isn't exactly equipped with the most useful BiOS when it comes to fan speed. You can click "hurricane", "lots of i/o", or "optimal" or something like that. I have chosen the one, already, with the lowest RPMs.

But whatever, I'm not going to worry about it too much.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
going back to my issue :)
the Fractal Define Mini has 2 fans, one above the PSU and one on the front (next to the drive cage).

does is sound possible that even when the motherboard is set to OPTIMAL, the fans are spinning at 1200++ RPM?
(even on boot, and when tested without drives)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Possible, yes.

Likely, only you can really determine that. I have fans that range from 1000 to 3000RPM, and I have fans that range from 2000 to 9000RPM. So that's not a question that is easily answered. You'd have to figure out for yourself if that sounds reasonable for your hardware. :)
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
thanks.

I took another Fractal Front Case Fan (120mm) from another case I have, so now I have 2 fans in the front (where the drive cases are) and one in the back.
drive temperature is still over 40 (around 44-45)...
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
OK sorry, something is confusing me.

# smartctl -a /dev/ada0
Code:
194 Temperature_Celsius    0x0022  117  101  000    Old_age  Always      -      35


while:
# smartctl -l /dev/ada0
Code:
118    2013-12-02 16:30    44  *************************

(16:30 is the current local time)

this look similar for all 6 drives...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, what's your server's time? Something tells me your server's time is off. :)
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
#date
Code:
Mon Dec  2 16:37:59 IST 2013
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And what's the actual smartctl command you ran? smartctl -l /dev/ada0 won't actually work. ;)

Does the -l output differ from smartctl -x /dev/ada0?
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
sorry the command was:
Code:
smartctl -l scttemp /dev/ada0
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, was googling this and here's what I think is going on...

Some disks have strange behavior with the scttemp list. In particular, if a disk's temperature changes it won't be logged until the entry is complete. So if a disk changes from 27 to 26 degrees and sits there for 4 days you won't see the entry for the 26 degrees until it changes again. So for those 4 days it'll show 27 degrees in the scttemp and 26 degrees as the actual parameter.

Some disks only update that table when they do temperature calibration. The frequency varies widely by disk model and brand.

Some disks also seem to only log certain entries in the scttemp. Apparently there's some situations where entries shouldn't be logged as they add no value.

Some disks record the time the temperature changed with the prior temp, so it appears to be lying to you if you actually collect the data yourself and look at the logs.

Did I confuse you yet? I'm confused too!

I just checked my disks, the system clock, and the actual attributes. I have 2 models of disks(all WD though) and none of my disks have matching temperatures as reported by scttemp and the attributes.

So I think the proper way to use this data is to use scttemp for historical data, but the actual attribute for "right here right now". Even using scttemp for historical data is apparently not necessarily reliable and it is recommended you gather the data yourself via the attribute and a running script if you wish to use it to analyze historical temps.
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
so I guess it is safe to assume the actual temp is below 40c... THANKS!
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
apparently not.
I'm still getting a warning about temperature being above 40 on boot...
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
You're absolutely right, and Cyberjock and I were already discussing this at length weeks ago about my drive temps being "too low"!!!

But meh, what am I going to do? Put a heater in there? And in any case, the failure curve for "low" temperature is when the drive is first put into service. Low temperatures don't seem to matter once the drive has been in service for a certain number of hours.

Whereas, high temperatures are always bad, at any time, in the drive's life.

Mine have a min of 11c and a max of 31c, will this cause issues with the drives or is it not a big enough problem to really be a concern?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
apparently not.
I'm still getting a warning about temperature being above 40 on boot...

Keep in mind as you start using your disks, temperature will go up. Especially if you do a scrub.

Are you sure you are checking the right disk?
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
Keep in mind as you start using your disks, temperature will go up. Especially if you do a scrub.
yes, of course, but this happens even on the SMART boot test, not necessarily on stress.

Are you sure you are checking the right disk?

yes well, the problem is with all 6 drives - not with the USB stick :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Hmm.. I don't know what to say. You're the only one with that particular condition.. :(

It makes me wonder what is wrong with your configuration over everyone else's
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
Hmm.. I don't know what to say. You're the only one with that particular condition.. :(

It makes me wonder what is wrong with your configuration over everyone else's

6 failing drives...? :)
 

MtK

Patron
Joined
Jun 22, 2013
Messages
471
UPDATE:
I added one more Stock Fractal Define Fan (from another case I have).
so now there are 2 front intake + 1 top/rear exhaust + 1 bottom exhaust.
apparently this make the airflow a lot (!) better:
Code:
# smartctl -a /dev/ada0
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:     Western Digital RE Serial ATA
Device Model:     WDC WD2500YS-23SHB0
Serial Number:    WD-WCANY3785393
LU WWN Device Id: 5 0014ee 1ab39da37
Firmware Version: 20.06C04
User Capacity:    250,059,350,016 bytes [250 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Fri Dec  6 14:51:55 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
 
General SMART Values:
Offline data collection status:  (0x84) Offline data collection activity
                                        was suspended by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                ( 8280) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  97) minutes.
Conveyance self-test routine
recommended polling time:        (   6) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   190   187   021    Pre-fail  Always       -       5466
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       93
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   200   200   051    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0032   092   092   000    Old_age   Always       -       6183
 10 Spin_Retry_Count        0x0013   100   253   051    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x0013   100   253   051    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       90
194 Temperature_Celsius     0x0022   120   098   000    Old_age   Always       -       30
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0009   200   200   051    Pre-fail  Offline      -       0
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      6169         -
# 2  Extended offline    Completed without error       00%      6137         -
 
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


the only thing I need to solve now is the noise :)
 

Knowltey

Patron
Joined
Jul 21, 2013
Messages
430
the only thing I need to solve now is the noise :)

You'll get used to it. In my living room I have an Antec 900 with 4 120s and a 180 (iirc) running at full tilt as well as my NAS in an Antec 300 with 3 120mms anf a 140 running at full tilt as well as it's heatsink fan set to full go, and then my webserver with 2 120s going full. I don't even notice it anymore but most people are like "Holy shit, how can you stand it in here?"
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I know a few years ago my server sat in my computer room. One summer it was particularly hot in Washington State(no A/C as it's only "hot" for like 2 weeks of the year). The temp alarm started going off. So I shut the server down. It was so quiet in that room it was disorienting. I was so used to the noise.
 
Status
Not open for further replies.
Top