CPU overheated- how to assess the damage?

Status
Not open for further replies.

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
Last night my CPU overheated in the sense that it went to its critical temperature threshold. My current hypotheses is that he did so due to a faulty use of Mprime95, so I've made an attempt to correct the issue. I guess we will see if it worked tonight ( see logs below. This happen for about an hour I think. My question now is: how can I assess the damage it might of done to the CPU and the motherboard socket?

Bonus question: I ran the torture test at one point and the CPU didn't do that well. Is that something to be expected, given its type, or should I contact the retailer about it? Alternatively, assuming I didn't do it quite correctly, how do you go about testing it?


Sent from my iPhone using Tapatalk
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
Logs
Code:


Event Type Timestamp Sensor Type Sensor Event Type
511 System Event 2016/12/29 02:21:56 Thu Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
512 System Event 2016/12/29 02:21:56 Thu Temperature CPU Temp De-assertion: Upper Critical - going high
510 System Event 2016/12/29 02:07:36 Thu Temperature CPU Temp Assertion: Upper Critical - going high
509 System Event 2016/12/29 02:07:16 Thu Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
508 System Event 2016/12/29 02:07:14 Thu Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
507 System Event 2016/12/29 02:07:04 Thu Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
506 System Event 2016/12/29 02:07:00 Thu Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
505 System Event 2016/12/29 02:06:59 Thu Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
503 System Event 2016/12/28 02:13:34 Wed Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
504 System Event 2016/12/28 02:13:34 Wed Temperature CPU Temp De-assertion: Upper Critical - going high
502 System Event 2016/12/28 02:07:28 Wed Temperature CPU Temp Assertion: Upper Critical - going high
501 System Event 2016/12/28 02:07:07 Wed Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
500 System Event 2016/12/28 02:07:06 Wed Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
499 System Event 2016/12/28 02:07:03 Wed Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
498 System Event 2016/12/28 02:07:02 Wed Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
497 System Event 2016/12/28 02:06:59 Wed Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
496 System Event 2016/12/28 02:06:57 Wed Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
495 System Event 2016/12/28 02:06:55 Wed Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
490 System Event 2016/12/27 23:02:50 Tue Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
489 System Event 2016/12/27 23:02:49 Tue Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
488 System Event 2016/12/27 23:02:47 Tue Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
487 System Event 2016/12/27 23:02:46 Tue Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
486 System Event 2016/12/27 23:02:33 Tue Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
485 System Event 2016/12/27 23:02:32 Tue Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
484 System Event 2016/12/24 21:47:24 Sat Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
483 System Event 2016/12/24 21:38:23 Sat Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
481 System Event 2016/12/24 21:38:21 Sat Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
482 System Event 2016/12/24 21:38:21 Sat Temperature CPU Temp De-assertion: Upper Critical - going high
480 System Event 2016/12/24 21:37:22 Sat Temperature CPU Temp Assertion: Upper Critical - going high
479 System Event 2016/12/24 21:37:19 Sat Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
478 System Event 2016/12/24 21:20:31 Sat Temperature CPU Temp Assertion: Upper Critical - going high
477 System Event 2016/12/24 21:20:29 Sat Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
476 System Event 2016/12/24 21:20:23 Sat Fan FAN1 Assertion: Upper Critical - going high
474 System Event 2016/12/24 20:49:08 Sat Processor CPU Temp De-assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
475 System Event 2016/12/24 20:49:08 Sat Temperature CPU Temp De-assertion: Upper Critical - going high
473 System Event 2016/12/24 20:48:43 Sat Temperature CPU Temp Assertion: Upper Critical - going high
472 System Event 2016/12/24 20:48:40 Sat Processor CPU Temp Assertion: defineq.Processor| Event = Processor Automatically Throttled  BusFF(DevFnFF)
471 System Event 2016/12/24 20:48:34 Sat Fan FAN1 Assertion: Upper Critical - going high
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I'd expect there to be no damage. Your CPU started throttling to cool itself down.

You need to improve your CPU/case cooling.

Sounds like correct mprime usage. Although I would've been monitoring the temperatures if i was running an mprime torture test.
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
I'd expect there to be no damage. Your CPU started throttling to cool itself down.

You need to improve your CPU/case cooling.

Sounds like correct mprime usage. Although I would've been monitoring the temperatures if i was running an mprime torture test.
That may well be. Don't know what to do about that though. I wasn't very clear on the details . The throttling that happened on around 23:00, in both case, was due to intended mprime usage, and I did keep a close watch on the temp at that time. The stuff that happened around 2:00 AM, however , was not. I don't even know mprime was on at that time (I certainly didn't intend for it to be running, and in fact I thought I removed it from the system). What scare med is that I have no clue what stressed the CPU at that time. For all I know, it should have been idle. (I was running the Fan control script in a tmux session at the time)

Which of the three torture tests do you recommend?
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
With modern processors, it would take EXTREME abuse (basically intentional abuse) to get them to damage themselves from overheating. Improve your cooling (substantially) and carry on.
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
I guess that what i will have to do. I am a bit surprised though. I was led to believe Intel Core i3-6100 was quite easy to cool, and was told the stock cooler would likley be enough. When I run the torture test on each of the 4 threads simultanously, it reaches 95-100 degrees within seconds. I get the feeling something is off, but I cannot quite figure out what.

@nojohnny101 or @Kevin Horton , an of you have an idea of what could be going on , since you have the same case?
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
The stock cooler should be up to the task, assuming it's clean and the fan is running properly. I would suspect poor thermal coupling between the CPU and the heatsink... either no or too much thermal grease or improper torque on the bolts/clips holding the heatsink down. What connector do you have the CPU fan plugged into?

There are rare instances where either the bottom of the cooler or the top of the CPU aren't perfectly flat, causing issues. As I said, this is quite a rare condition, so I wouldn't jump to this conclusion until you've tested absolutely everything else.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
it reaches 95-100 degrees within seconds.
It's normal that it heats up very quickly. That's Prime95 for you.

However, the system should keep it at 90 degrees maximum. Investigate the possibility of a poorly-seated cooler.
 

nojohnny101

Wizard
Joined
Dec 3, 2015
Messages
1,478
That would be my first guess as well, poor or improper seating between the heatsink and the CPU itself.
 
Joined
Dec 2, 2015
Messages
730
I guess that what i will have to do. I am a bit surprised though. I was led to believe Intel Core i3-6100 was quite easy to cool, and was told the stock cooler would likley be enough. When I run the torture test on each of the 4 threads simultanously, it reaches 95-100 degrees within seconds. I get the feeling something is off, but I cannot quite figure out what.

@nojohnny101 or @Kevin Horton , an of you have an idea of what could be going on , since you have the same case?
Grasping at straws for possible explanations:

1. Is it possible that the BMC had locked up, and left the CPU fan and/or the chassis fans on that side of the case at low rpm? I've had more than one occurrence where the BMC locked up on me. Usually the fans go to max rpm, but I've had at least one occurrences where they ended up at idle rpm. Any script that controls fans must have some sort of bulletproof check for a locked up BMC, and automatically reset the BMC if necessary.

2. What type and number of fans do you have on the motherboard side of the chassis pushing air in, and how many pulling air air?

Good luck figuring this out.

Kevin
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Since everyone is putting in thier two cents...

1) Never run a CPU stress test for a long period of time, it's not neded and only can cause bad things to happen. Run it for 20 minutes, if your temps are fine then you are done. I figure that if you don't see anything within 20 minutes then you should be good. Also you can damage your CPU by running it at these extremes for long periods of time. Is your CPU damaged? I don't know but not much you can do about it now. Also running your CPU that hard causes extra stress on the motherboard voltage regulators, again, not a good thing. I'm curious what guide you were following.

2) As others have said, you have a cooling problem and you already know you need to figure that out.

Good luck and hopefully you can resolve your situation in a short amount of time.

HAPPY NEW YEAR!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
1) Never run a CPU stress test for a long period of time, it's not neded and only can cause bad things to happen. Run it for 20 minutes, if your temps are fine then you are done. I figure that if you don't see anything within 20 minutes then you should be good. Also you can damage your CPU by running it at these extremes for long periods of time. Is your CPU damaged? I don't know but not much you can do about it now. Also running your CPU that hard causes extra stress on the motherboard voltage regulators, again, not a good thing. I'm curious what guide you were following.
I must disagree. If it does not survive several hours of torture testing, it has no business being on the market. It'd be a figurative time bomb, just waiting for some script kiddy to destroy a few by running Prime95.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I must disagree. If it does not survive several hours of torture testing, it has no business being on the market. It'd be a figurative time bomb, just waiting for some script kiddy to destroy a few by running Prime95.
Several hours is not the same as letting it run overnight. I do understand that a properly cooled CPU even while at 100% ulitization should be perfectly fine but a CPU that hits it's thermal shutdown/throttle is excessive if it is allowed to continue for a long period of time. Most motherboards monitor the CPU teperature and can shutdown the system or throttle it before the maximum temperature can be obtained, such as if the CPU fan were to fail.

I guess you could think of it this way... If you didn't attach a heatsink to the CPU and ran the system, would the CPU survive a Prime95 test for 24 or more hours? Basically what I'm hearing is the thermal limit should be hit and it would throttle the CPU and all would be good in life. Of course this would be an extreme but I'm just trying to make a point. If you don't like no heatsink then lets just say add a 1" cube of copper to CPU as a passive heatsink. It's a crappy heatsink but it will sink the initial heat generated, it will just fail to dissapate heat well.

We have and will disagree, I'm fine with that. I learn at the same time and you might be able to convince me that I'm wrong and I'm good with that but I need some sort of proof. Maybe Intel has a paper on it's thermal throttling being able to protect a CPU from a total meltdown, that would be a good read.

Happy New Year!

EDIT: I digress, modern Intel CPUs "should" be able to survive but I have not idea how long.
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Several hours is not the same as letting it run overnight. I do understand that a properly cooled CPU even while at 100% ulitization should be perfectly fine but a CPU that hits it's thermal shutdown/throttle is excessive if it is allowed to continue for a long period of time. Most motherboards monitor the CPU teperature and can shutdown the system or throttle it before the maximum temperature can be obtained, such as if the CPU fan were to fail.

I guess you could think of it this way... If you didn't attach a heatsink to the CPU and ran the system, would the CPU survive a Prime95 test for 24 or more hours? Basically what I'm hearing is the thermal limit should be hit and it would throttle the CPU and all would be good in life. Of course this would be an extreme but I'm just trying to make a point. If you don't like no heatsink then lets just say add a 1" cube of copper to CPU as a passive heatsink. It's a crappy heatsink but it will sink the initial heat generated, it will just fail to dissapate heat well.

We have and will disagree, I'm fine with that. I learn at the same time and you might be able to convince me that I'm wrong and I'm good with that but I need some sort of proof. Maybe Intel has a paper on it's thermal throttling being able to protect a CPU from a total meltdown, that would be a good read.

Happy New Year!

EDIT: I digress, modern Intel CPUs "should" be able to survive but I have not idea how long.
I see what you mean. Yeah, keeping it running after it's been established that it's throttling at 95-100 degrees Celsius is not a good idea.

My point was just that the processor must pass a long torture test *assuming proper cooling*.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I see what you mean. Yeah, keeping it running after it's been established that it's throttling at 95-100 degrees Celsius is not a good idea.

My point was just that the processor must pass a long torture test *assuming proper cooling*.
I do agree that given proper cooling, a CPU should be able to pass a long torture test. For me 20 minutes is long enough to ensure your system is cooling properly, in a closed up case. I'm not sure what the benefit is of running it longer unless the test is to find out if the CPU might die from infant mortality.

2 Hour 43 Minutes until 2017!

Edit: BTW, if my CPU ever got to 85C, I'd stop it and fix the cooling. I have never seen a CPU get above 85C, but I have heard of it. My i7 in my main computer gets up to 70C when crunching numbers at 100% utilization (running BOINC) and I still want it to be lower in temp but I know 70C is actually good. I actually don't run BOINC at 100% often, typically it runs at 30% or 70% depending on how I feel.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
When testing thermals you should set your BMC to maximum cooling Ie 100% fans. Then run your tortutre test. The temps *should* stabilize below a critical temp... say 95C. Look up the max junction temp on your CPU.

If it does not you have inadequate cooling.

Only once your cooling is adequate at 100% should you consider further optimizing with custom fan controllers.

Sometimes it's necessary to run an extended test because you need to ensure that the chassis is fully heat soaked, and in my case, also that the system was capable of lasting across the weekend without a/c while under load.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
also that the system was capable of lasting across the weekend without a/c while under load.
I can see that in a critical business system (critical being the operative word) but for home use which is what I typically think of, I prefer to have the system shutdown for a thermal event.
 

Ceetan

Contributor
Joined
Apr 29, 2016
Messages
139
Repositioning the CPU cooler seems to have solved the issue. The CPU temperature is also much less jumpy now.


Sent from my iPhone using Tapatalk
 
Status
Not open for further replies.
Top