Memory Error on FreeNAS Mini

Status
Not open for further replies.

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
Good catch. Looks like the BMC fan controller is as useless in your system as it is in mine. I upgraded my main case fan (the 80mm in the back) to a Noctua model and set its output to 80% permanently because the BMC doesn't track / react to HD temperatures.

However, the BMC should be able to deal with CPU temperature issues. Isn't there a small thermal probe that is used for that purpose?

Your best course of action is likely to get a replacement FreeNAS mini motherboard. Extracting the motherboard is not trivial but now you have experience, right? :mad:
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Thanks for the moral support. Yes, I've had the MB out three times all told now - the first two without sliding out the tray because I didn't know it was there (before @wblock wrote his tutorial). Even then the longest part was tagging /recording cable locations - 30 minutes out and in probably. 20 minutes taking the tray out I would say.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So if the CPU temps arer going crazy then I must ask the stupid questions so don't take offence.

1) When you got the replacement motherboard, was the CPU and heatsink already mounted? I suspect it was.
2) You have verified that all the case fans are running properly, pushing air across the CPU? This means that the case must be closed to work properly.
3) Was it working fine before the BMC firmare upgrade? If you had no previous issues then I would expect the BMC firmware caused the issue.
4) How is the cable management in the case? Is it all done well to ensure there is a lot of airflow over the CPU heatsink?

Here is a link and a long 5 page thread about this motherboard CPU overheating. It boils down to air flow.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
@joeschmuck, no way I would I take offense at anything you say or ask, especially not in this context where you are filling in the holes in my knowledge. You've asked questions I am asking myself. And I'm happy to answer them for the reference of others.
Yes the CPU/Heatsink was mounted - the CPU is integral with this board.
There is one case fan (and one tiny one in the PS) - it's where it always was, never been removed and is pushing air to the bst of its ability. Yes, case has always been in place and all factory vents kept clean.
The first couple of warning shots in FreeNAS predated the recent BMC upgrade - the version of the BMC in the warranty-replacement board was different from that of the original and I had that running since May of this year. I didn't notice anything amiss until the FreeNAS warnings, but I was not looking at the sensor readings so I could have missed it. Updating the BMC wiped the logs (which I should have anticipated buit didn't) I do recall that there had been some CPU temp excursions.
Cable management I would say isn't too bad - and it is as iXsystems supplied it. There's no significant obstruction.
Yes, I agree it all boils down to air flow.
With respect to the steps forward, I want to talk to iXsystems first as this board may be considered to be in the three-year extended warranty period.
Thanks again for your help and support.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm scratching my head. If basically nothing has changed AND the system worked fine when you first got it then I must suspect the motherboard to be faulty. I recommend contacting iXsystems to see what they can do, maybe there is a BIOS setting that needs to be adjusted.

I wish I could offer something else to do but I'm fresh out of ideas. The only good thing to come from this is knowing it's not a RAM issue and you are not chasing your tail on that one.

Did I read this correctly, you are on motherboard #3 ? Maybe a full refund is in order and you go build yourself your own system. Cyber Monday is almost here. Keep the hard drives, refund for the computer, buy a new computer, maybe the HP ML10 Gen 9, and call it a day. Or build one from scratch, that is always fun. If you cannot resolve this issue with iXsystems then I'd install a fan to force air across the heatsink. You just don't have much space to work with so being creative may be required. I have a lot of ideas however they require cutting into the case. Also since I don't have the system in my hands, it's a bit difficult to give accurate advice on the next step.

I sincerely wish you the best of luck on this one.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It could be a bad batch of thermal paste that dries out after a bit. Just a possibility I'm throwing in the ring.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It could be a bad batch of thermal paste that dries out after a bit. Just a possibility I'm throwing in the ring.
I was thinking that too but from what I could find out this is the ceramic cement type thermal compound. This stuff can be removed and replaced however it can be risky if you are not very patient, and even then you could break something. I would never recommend anyone try this unless they had no other option. Also, I don't know if this material does thermally break down over time. This stuff is used on GPUs all the time.

Presently I think air flow needs to be verified and maybe another option is to place the large vent fan on the back of the case on high speed, direct connection to 12VDC so you know it's running at full speed, then check everything again, but not the CPU stress test. Even a passive cooling CPU can only do so much.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
It could be a bad batch of thermal paste that dries out after a bit. Just a possibility I'm throwing in the ring.
Thanks @Ericloewe - I've been reviewing the history as best I can because I'm sure "something changed".

@joeschmuck - no, I'm on mobo 2 (one removal from case was for the recent memory reseat exercise).

I looked at the performance of my system back in 2016 when the thread you referenced was initiated - and found nothing untoward with mine, so didn't focus on it thereafter. My impression now is that either the new board I received was/is deficient in some way, or there has been some degradation of either the hardware/firmware or operating conditions. My home office has A/C running 24/7/365 for two servers and a desktop which are never shutdown. Cases and filters are kept clean.

I have all of the options and remedies you mentioned in mind, including moving to another, more inherently robust, hardware/bmc grouping. This experience has also firmed my resolve on a 3d printer purchase to allow elegant cooling duct production.

Thanks, both.
 

Constantin

Vampire Pig
Joined
May 19, 2017
Messages
1,829
FWIW, I am so happy with the fan control in the Mini XL that I'm upgrading to a H440. The H440 has three 120mm fans for the 11 HDD spots up front and another exhaust fan in the rear. It will be interesting to see how well that keeps the CPU and the HDDs cool, especially since the HDD temperatures seem to have little correlation to those of the CPU.

If the answer is negative then I might take it a step further and fit a CPU cooler to the stock CPU fan connector on the motherboard and add a separate fan controller to cool the HDDs independently from the BMC. I'm still looking for a good controller, may have to roll my own. Oh well!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
My home office has A/C running 24/7/365 for two servers and a desktop which are never shutdown. Cases and filters are kept clean.
This is clearly not a problem with the ambient air supply but either poor air flow, bad thermal compound, or bad motherboard.

This experience has also firmed my resolve on a 3d printer purchase to allow elegant cooling duct production.
I was looking at one of these things 2 days ago. While it does give me a drool factor, I will not be purchasing one. I just don't have the need for it. One of my employees is planning to buy a 3D printer so I'll crack the whip and beg him to create some stuff on my behalf, and of course pay him for it, or let him have Christmas off. :rolleyes:

So when I was looking at photos of the FreeNAS case I was also looking at good ways to remove heat from your CPU. Here is what I came up with and I think it would actually work very well, assuming thsi is an air flow issue only. On the rear of the case is a large fan so what you want to do is create a shroud that covers this fan completely and out of the bottom portion a tunnel leads to the heatsink. When you get to the heatsink the tunnel opens up to suround the heatsink. This means that the RAM sticks are on the outside of this tunnel. The tunnel will drop down only as deep as it needs to go without interfering with any components when the motherboard is removed by sliding it out on the tray. So lets say you need more air flow for the motherboard and hard drives, well you could have a predetermined cutout at the top of the fan shroud to allow one third of the air flow to pass through. In my mind I would open up 60% of the air shroud and then build a few small covers or one sliding cover in order to adjust that air flow. Hey, if you have a 3D printer then you could do a lot of different things.

Also, if you are stuck with the FreeNAS case then I'd highly recommend one modification that will help air flow greatly. Cut out all the honeycomb metal on the case fan hole and repalce it with a wire fan cover. This works great, but also file/grind the edge smooth. I like using a dremmel tool for this kind of work. I'd do that for the power supply exhaust as well. But if you are considering shipping the system back to iXsystems, don't modify the case.

My 6TB drives arrive an hour ago, time to crack open the box and see what I got and burn them in.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
An update for anyone interested:
Today I cut holes in the front of the chassis and mounted a couple of side-by-side 40mm fans directly in front of the open end of the fins of the CPU passive heat-sink. The two fans are pulling ambient air in through the case front grillwork and discharging it in a manner that hopefully will cause some flow between the heat-sink fins to increase the convective element of heat transfer from the CPU. I also installed a rear case fan of greater spec capacity than that of the original. I set the bios to run these 3 fans at max speed constantly.
A cpu stress test has been running now for about 60 minutes - situation is much improved, CPU temperature not running away, so far highest is 86C, one degree above the Upper Critical, but stable. Now I am hopeful that I can checkout the CPU then move on to the memory, and find out what ails this machine.

I'm still very concerned that I don't have an answer as to why this high temp problem seems to have appeared recently only.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
I opened it up again and added another fan - an 80 mm, this time horizontally directly above the heatsink, pulling air upwards. It's not connected to the heatsink, but actually sitting on the top edges of the memory sticks with a small rubber rod in each mounting hole that drops down between the stick in each pair. It's also running at 100% from the bios.

Stress test is running again - I have not see a temp over 65C yet - and then only for a moment. This looks like a significant improvement in the situation. More later.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I don't have the dimensions of the heatsink but could you take an 80mm fan and using two or four screws (realizing it may not be square to the heatsink) and mount it directly to the heatsink. I don't know if the RAM would allow this. If you needed some height clearance then a tube similar to a straw could be used to provide the extra distance. I am definitely an out of the box thinker.

EDIT: Also, you don't need to run the CPU stress test for that long, just run it long enough to know that you made a difference or not. Also, has the system been more stable since installing the fans?
 
Last edited by a moderator:

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
System completed one run memtest86+ (free) with no errors and no temperature excursion reported. Running a second now. On "stability" - apart from the IPMI temperature reports, only evidence of bad behavior previously was the reports in FreeNAS of the memory errors. So, yes, more stable.

After breakfast I'll get the pro edition of memtest86+ so that I can run ECC error injection.

Ram sticks too close together either side of heatsink to fit any fan that I had in my parts collection. Before I drive 20 miles to Microcenter I'm going to try to make a rectangle-to-square "conical" transition from stiff card with one end to "fit" to the heatsink, the other to the fan, as a test. I used to know how to lay that kind of piece out on metal for bending many moons ago...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Before I drive 20 miles to Microcenter I'm going to try to make a rectangle-to-square "conical" transition from stiff card with one end to "fit" to the heatsink, the other to the fan, as a test. I used to know how to lay that kind of piece out on metal for bending many moons ago...
You could buy a 3D Printer :D
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
It's backordered - coming February or March - so a cardboard prototype is it for now.

Memtest86+ with ECC injection running now - no errors so far.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Memtest crashed on Pass 2/4 with 5 errors:

upload_2017-11-28_15-57-55.png



Following @joeschmuck's "rules" kindly stated earlier in this thread, we have DIMM1, Bank 0, likely in the second blue slot. So, for the next step I'm going to pull all the memory and put the suspected stick in the first priority slot, run dmidecode again the check the serial number, and run memtest just on this stick for starters.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Did you get a good deal on the price of the MemTest86 Pro? I should have looked for it during Cyber Monday. I'd pay $20 for it but as rare as I would use this type of tester, it's hard to justify. Well it's not that hard to justify if it were a life long license including upgrades.

Please keep us up to date on the testing.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
MemTest86+Pro $35 with updates for 6 months. But only way to get error injection for ECC.
Looks like your memory mapping was right on the money - thanks once again.
Just starting memtest again with the suspect stick only...
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Status
Not open for further replies.
Top