Register for the iXsystems Community to get an ad-free experience and exclusive discounts in our eBay Store.

FYI, Intel C2000 family of processors: System Fault may lead to dead system.

eldo

FreeNAS Aware
Joined
Dec 18, 2014
Messages
99
Looks like the Avaton family, used in the FreeNAS Mini, many home builds (myself included), and other products may have a potential fatal failure due to the clock generator on the chip.
Not sure if it's been reported or made known to y'all here on the forum, but wanted to share if it hasn't.


http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/atom-c2000-family-spec-update.pdf

The processor can fail to produce a clock signal required to drive the whole device


System May Experience Inability to Boot or May Cease Operation
Problem:
The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock
outputs) may stop functioning.
Implication: If the LPC clock(s) stop functioning
the system will no longer be able to boot.
Workaround: A platform level change has been iden
tified and may be implemented as a workaround
for this erratum.
 

m0nkey_

FreeNAS Expert
Joined
Oct 27, 2015
Messages
2,736
You may want to open a bug report for iX systems.
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
16,076
That's somewhat worrying, but I haven't seen any mysterious issues that would trace back to this, so it's either age-related or a very very weird edge case.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,339
Workaround: A platform level change has been identified and may be implemented as a workaround for this erratum.
Can someone elaborate in more detail what this remark in the family spec update actually means?
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
16,076
Can someone elaborate in more detail what this remark in the family spec update actually means?
It means, in essence, "New motherboard designs can work around this problem".

I speculate that it involves adding an external timer instead of relying on the internal one.
 

Stux

FreeNAS Wizard
Joined
Jun 2, 2016
Messages
4,166
Be interesting if this accounts for the other early-death issues. Ie not the BMC ones...

This was actually mentioned as a drag on intels financials this year!
 

Adrian

FreeNAS Aware
Joined
Jun 29, 2011
Messages
85
Apparently the problem bites after around 18 months. Unknown if it is all or "just" some CPUs failing.
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
16,076
Definitely not a widespread issue, then. Nearly all C2000 units seem to make it at least to two years, even with ASRock's crap BMC firmware.

Edit two years later: As it turns out, most units seem to crap out around 2.5-3 years and it definitely is widespread.
 
Last edited:

brando56894

FreeNAS Guru
Joined
Feb 15, 2014
Messages
1,466
According to this article on The Register, a bunch of Cisco's network products have been affected by this. So it looks like Intel is at fault and not AsRock since it affects numerous models.

We asked Intel to provide specific details about when it began and stopped shipping Intel Atom C2000 processors with faulty clock outputs. Intel declined to comment. The official errata says the B0 stepping of C2xxx Atoms are vulnerable to failure, and these parts began shipping in 2013. The specific SKUs are: C2308, C2338, C2350, C2358, C2508, C2518, C2530, C2538, C2550, C2558, C2718, C2730, C2738, C2750, and C2758.
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
16,076
So it looks like Intel is at fault and not AsRock since it affects numerous models.
ASRock is very much at fault for the majority of their failures. Their BMC firmware is stupid and the idiot who programmed it should be relegated to web development, where his lack of skill and dubious code are features and not bugs.

However, there have been a few further failures. We haven't seen C2000 boards dying at the rate the article suggests, with everything being seemingly within the norm for failure rates, but this could explain some of those failures.

I'll note that Intel has sold truckloads of C2000 CPUs to big cloud companies and truckloads more to the small server/medium router market. Even a small increase in failure rate could cost them a lot of cash, while still being barely perceptible to end users like us, who deal with a handful of units at most.
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
16,076
Some more bits of information on servethehome.com:
https://www.servethehome.com/intel-atom-c2000-series-bug-quiet/
Hmm, I'd really like to see the new revision boards Supermicro is supposedly shipping. I'd bet they're adding an external clock.

There seems to be widespread effort to replace affected parts, so clearly something is up - and it suggests that it's bigger than just a slight increase in failure rates.

It's possible, however, that much of the effort focuses on the C2xx8 parts, which are specced for a seven-year life. If Intel has spare 22nm capacity at the moment, it makes sense to accelerate replacement of parts that are likely to fail, making use of sunk fixed costs before fabs are converted to newer processes.
 

Stux

FreeNAS Wizard
Joined
Jun 2, 2016
Messages
4,166
Another good reason to *not* recommend avoton systems
 

Ericloewe

Not-very-passive-but-aggressive
Moderator
Joined
Feb 15, 2014
Messages
16,076
Don't worry, I'm an equal opportunity offender. The only truly sane programmers use bare C. Unfortunately for the rest of us, C every day is like only eating/drinking soylent - it's bland, it's boring and occasionally makes you feel bad.
 
Joined
Dec 28, 2016
Messages
1
FYI here's the response from ASRock Support, I have a C2550D4i:
Thank you for reaching out to us with your concern! Your Server board is using Intel Avoton CPU which is also listed in Intel’s errata announcement. However, even though Intel is aware of the unexpected failure with their C2000 family line CPUs, but they didn’t guarantee that similar failure will affect all units in C2000 family line.



We provide 3-year manufacture warranty; if you ever experienced a sudden No POST with your board, please contact us. We can run a simple test to find the telltale sign. If the issue is confirmed that it may be related to the CPU, we will help you replace the board under our warranty policy.



Hope you understand. Thank you!
I think it should save them a bunch e-mails if I post it here.
 

anodos

Belly-button Lint Extraordinaire
iXsystems
Joined
Mar 6, 2014
Messages
5,646
Don't worry, I'm an equal opportunity offender. The only truly sane programmers use bare C. Unfortunately for the rest of us, C every day is like only eating/drinking soylent - it's bland, it's boring and occasionally makes you feel bad.
Bah! Visual Basic is the best! It makes programming so easy and accessible!

Why this morning, I was working out how to make a VB app drive my car. For some reason, it keeps running over kittens and leaving smoking pentagrams in the parking lot.
 
Last edited:

brando56894

FreeNAS Guru
Joined
Feb 15, 2014
Messages
1,466
Unfortunately for the rest of us, C every day is like only eating/drinking soylent - it's bland, it's boring and occasionally makes you feel bad.
Now I'm offended!! I was getting tired of the plain 2.0 so I'm happy that they came out with a few different flavors. Cacao is amazing, it tastes like chocolate Muscle Milk! Nectar (the red and white bottles, lower right) is just odd. Coffiest is good but tastes like cheap coffee, the convenience factor is what keeps me drinking it.


:D

I love The Oatmeal hahaha
 
Last edited:
Top