System locks up after 5-15 minutes, acpi=off prevents lock up

smelting

Cadet
Joined
May 12, 2016
Messages
2
I have been trying to get a new system up and running with seemingly no luck. Tried fresh installs of both latest releases of Cobia and Bluefin with the same results.

After about 5-15 minutes of uptime the system will completely lock up and requires a hard reset. I have tried memtest, memory seems fine. Tried pulling out ram and only running 1 stick at a time of each of my DIMMS, same results. Tried removing GPU, same results. The same hardware can run windows 10 without issue. I ran a cpu and gpu stress test along with a network transfer onto one of the hard drives attached to the HBA card on windows for a couple hours without issue.

The only thing that stops the lockups seems to be setting "acpi=off" at startup. The downside being losing hypertreading and gpu passthrough support.

The system also seems to be stable with "pnpacpi=off noapic nolapic" but only 1 core is present...

Hardware:
Motherboard: Asrock E3C242D4U
CPU: Intel Xeon E-2236
RAM: 4x 16GB DDR4 ECC 2400MHz
Hard drives: 8x 16tb MDD drives in raidz2 (HBA Card), 500GB Samsun 870 EVO (Sata Controller on motherboard)
Controller: LSI 9305-24i latest IT firmware
GPU: Titan xp


I have tried multiple bios versions, all seem to give the same results.
 

smelting

Cadet
Joined
May 12, 2016
Messages
2
As an update to this, it seems to be my motherboard/bios that is the issue. As a general warning, the bios for the asrock E3C242D4U is broken (unsure if their other coffee-lake c242 boards are also broken for linux distros). I reached out to support at asrock and was not able to come to a solution. They provided a beta version that did not fix the issue (board shipped with P2.20 bios, I tried 2.1, 2.43, & 2.45, all didn't work).

The solution was to get a different motherboard. I would recommend the supermicro MBD-X11SCL-F-O for Coffee-Lake CPUs, same form factor and basic features as the E3C242D4U. Been running the MBD-X11SCL-F-O for a short period and the acpi issues are not present.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
As a general warning, the bios for the asrock E3C242D4U is broken
Sadly, you could say the same about most systems. System firmware is a dumpster fire.
 
Top