SOLVED SAS9211-8i HBA gets removed after some minutes: mpt2sas_cm0 failed

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Hi,

just trying to install some new HW and I used a new LSI HBA: https://www.amazon.de/-/en/LSI-2008-8E-LSI9200-8E/dp/B01M9GRAUM
I can see it in the console (using sas2flash -list):
LSI Corporation SAS2 Flash Utility
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Controller Number : 0
Controller : SAS2008(B2)
PCI Address : 00:06:00:00
SAS Address : 56c92bf-0-0003-90f0
NVDATA Version (Default) : 14.01.00.08
NVDATA Version (Persistent) : 14.01.00.08
Firmware Product ID : 0x2213 (IT)
Firmware Version : 20.00.06.00
NVDATA Vendor : LSI
NVDATA Product ID : SAS9211-8i
BIOS Version : 07.39.00.00
UEFI BSD Version : 07.27.01.01
FCODE Version : N/A
Board Name : SAS9211-8i
Board Assembly : SAS2008IT
Board Tracer Number : SAS2008IT90F0

Finished Processing Commands Successfully.
Exiting SAS2Flash.

Above output is from a different PC with Windows since this PC won't remove the card after some minutes. E.g. everything except addresses are the same on different PCs :)

I searched the forum and saw that some people had problems with BIOS Version : 07.39.02.00 and their HP systems. Since I also have a HP system:
I already downgraded the bios from 07.39.02.00 to 07.39.00.00

However truenas / the HP PC will still remove the card after some time and I can't access my HDDs.
To test it I removed everything from the PC (e.g. I rmeoved all cables and HDDs for the HBA) and only the bare minimum is still connected: one nvme disk for trueas and the HBA card itself.

Again I can see the card in the console for around 5 minutes then cards gets removed:
Sorry for the picture:


mp2sas_failed2.jpg
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Firmware Version : 20.00.06.00
You certainly want to get that to 20.00.07.00 at some point.

You show that it's being removed form an iommu group on that screen... is it a virtual instance of TrueNAS we're talking about here? Have you reserved the device for PCI passthrough?
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Thanks for your answer.

No virtualization. Bare metal install of Truenas (TrueNAS-SCALE-23.10.1.3) - hence no pcie passthrough, bifurcation fiddling or anything else. Installed truenas via USB directly to the nvme disk.

I can put the card back into the Win PC and flash to 07 if this helps. I am a bit doubtful tho (I am aware that the bios shouldn't be used by truenas itself. just the firmware is important)
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Flashed the firmware to 07: card still got kicked out after 2 - 5 minutes with the same error.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
How hot is it getting? do you have fans pushing air across it?
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Well yes that is a point which might be concerning: after flashing the card and removing it from Win it was quite warm up to hot.

There are fans near it but I will put some external fan on it to test. I also think that the fansink is poorly connected and the card may overheat even tho nothing is connected and freshly rebooted...
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Thanks for your idea regarding overheating. I just snatched a big fan from my livingroom and blasted the card with it. It ran easily for 30 mins, I rebooted and let it run for another 10 minutes - again no problem.

Then i removed the fan and the card was gone after 2 minutes.

Surely a normal card in idle should not overheat this easily?! Do you think it makes sense to ask for a new one? Since this card is new I can easily ask for a replacement. However if all cards need some serious cooling I am in trouble...
Right now the cpu has a medium sized cooler and there is one 12 cm cooler removing the air from the case. I can try and add a small cooler directly onto the heatsink of the card - yet I think if this is really necessaryt that the card should come with one active fan attached.

Too be honest I am kind of mistrusting this card :(
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
For sure there is a minimum airflow expectation for those cards which would be available in the spec sheet somewhere.

You may need to adjust the airflow in your case to stay within that spec, but as you say, it may also be a questionable card, so maybe just not put together well (thermal paste/glue not the right thickness or spread).
 

PhilD13

Patron
Joined
Sep 18, 2020
Messages
203
Cards like HBA cards are meant for servers and have high airflow requirements across their heat sinks for proper cooling. Commercial made server chassis from Supermicro, Dell, etc. generally have good airflow front to back and air tunnels over the processors as the sinks quite often for servers are generally too small for adequate cooling with low or passive air flow due to space constraints. Some accessory cards in some slots will suffer from a dead air space and poor air flow depending upon the server configuration. I have a card in one server on an outer slot that gets poor air flow at the normal auto fan setting due to the servers chassis configuration and I have attached a fan to the chassis near the card to blow right on the cards heat sink. Problem solved.
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Yeah I get it now. Guess i got blinded by people writing that they used their mini desktop as a mini nas. Also this card (and others) are advertised as an easy way to add more drives to your normal pc. Well that clearly didn't work.
Guess Synology and other brands are not using any LSI chips internally since their cooling isn't the best either?

Right now I am trying to fit some 80mm fan next to the card and see if this is enough. Running 20 min+ so far without any load tho. Still feeling a bit peeved that the card might crash any minute without having any chance to monitor its temp or getting any warning. Feels kind of a bad build for nas....
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Well the card got removed after 2 - 3 hours. I am giving up. This time the card didn't get warmer than body temp since I put a nice cooler on it.
Guess it might be smart to order another card and hope the replacement works.
 

nasenbaer

Cadet
Joined
Feb 21, 2024
Messages
9
Got a "new" card. Replaced the HBA with an used Dell H200e and finally it seems to work. Doing some stress tests from you guys now.
The PC also boots alot faster now. With the old one it kept hanging for quite some time at the boot logo before Truenas was able to boot.

Guess I can do a happy dance now! Thanks for your help!
 
Top