SOLVED Errors during disk burnin

Status
Not open for further replies.

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I got the following messages to the console while running the disk burnin (cpu and ram passed with flying colors) with FreeNAS 11.1-U1 installed:

Code:
Jan 21 10:33:19 freenas mps0: IOC Fault 0x40002622, Resetting
Jan 21 10:33:19 freenas mps0: Reinitializing controller,
Jan 21 10:33:19 freenas mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
Jan 21 10:33:19 freenas mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
Jan 21 10:33:19 freenas mps0: mps_reinit finished sc 0xfffffe0000e9f000 post 4 free 3
Jan 21 10:33:20 freenas mps0: SAS Address for SATA device = d2635a47eecce085
Jan 21 10:33:20 freenas mps0: SAS Address from SATA device = d2635a47eecce085
Jan 21 10:33:20 freenas mps0: SAS Address for SATA device = d2635a47eed8bb75
Jan 21 10:33:20 freenas mps0: SAS Address from SATA device = d2635a47eed8bb75
Jan 21 10:33:21 freenas mps0: SAS Address for SATA device = d2635a47eebddc6e
Jan 21 10:33:21 freenas mps0: SAS Address from SATA device = d2635a47eebddc6e
Jan 21 10:33:21 freenas mps0: SAS Address for SATA device = d2635a47eebcb988
Jan 21 10:33:21 freenas mps0: SAS Address from SATA device = d2635a47eebcb988
Jan 21 10:33:21 freenas mps0: SAS Address for SATA device = d2635a47eee1b897
Jan 21 10:33:21 freenas mps0: SAS Address from SATA device = d2635a47eee1b897
Jan 21 10:33:21 freenas mps0: SAS Address for SATA device = d2635a47eed4cf75
Jan 21 10:33:21 freenas mps0: SAS Address from SATA device = d2635a47eed4cf75
Jan 21 10:33:21 freenas mps0: SAS Address for SATA device = d2675b4900c0da6e
Jan 21 10:33:21 freenas mps0: SAS Address from SATA device = d2675b4900c0da6e
Jan 21 10:33:21 freenas mps0: SAS Address for SATA device = d2635a47eeccd170
Jan 21 10:33:21 freenas mps0: SAS Address from SATA device = d2635a47eeccd170


Any ideas on causes and or solutions?
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

dak180

Patron
Joined
Nov 22, 2017
Messages
310

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Did you look at #21? Essentially the same card (assuming you have flashed to IT mode) and running the latest firmware. It's the card complaining under load.

I searched and didn't see anything else that jumped out at me.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
It's the card complaining under load.
I was under the impression that once a smart test was started everything happens on the drive its self which should mean that the load on the card is as close to zero as it can get; if this is true then should I be thinking hardware fault in the card?
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Yes, that sounds reasonable. Do you have the latest firmware (20.00.07.00 I believe) and the card in IT mode?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, that sounds reasonable. Do you have the latest firmware (20.00.07.00 I believe) and the card in IT mode?

It shows that in the messages, so, presumably, yes.

Make sure there is sufficient airflow over the card, and that the card is not overheating. The LSI RAID products, both HBA and high end, are intended for use within a server with fairly strong airflow, and it is easy to inadvertently overlook proper airflow over the HBA controller.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
It shows that in the messages, so, presumably, yes.

Yes, dammit - and I saw it BEFORE I asked (I had been reading the bug reports in between) ....
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Make sure there is sufficient airflow over the card, and that the card is not overheating. The LSI RAID products, both HBA and high end, are intended for use within a server with fairly strong airflow, and it is easy to inadvertently overlook proper airflow over the HBA controller.
That was my biggest potential concern when I got the card but when touch the heatsink it feels warm but not burning hot.

Do you have the latest firmware (20.00.07.00 I believe) and the card in IT mode?
Yes, I have flashed it to IT mode.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Could the heatsink compound have dried out?
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Could the heatsink compound have dried out?
Anything is possible but I only bought, received and installed the card in the last 10 days (it has only been out of its packaging for about 4 days); I would tend to rate this as an unlikely scenario.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'd suspect the cooling as well if you are able to communicate with the hard drives at all, however it would be nice to know what all of your hardware is since you are troubleshooting hardware. For instance, what motherboard is this card plugged into? Some don't like the Dell PERC H310. Did you have to cover two pins on the edge connector to make this card work?
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
For a full hardware list see my other thread; the motherboard is the ASRock Rack Motherboard E3C236D4U and I did not have to cover any pins.
Naw, I'm not going to jump back and forth between threads to troubleshoot a hardware problem. How about using the "copy & paste" function to copy your hardware specs in this thread. When looking at a build thread, who knows what the final outcome is going to be.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
My (limited) experience is that the H310 runs appreciably cooler than the H200. I still added a small fan to the heatsink.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I'd suspect the cooling as well if you are able to communicate with the hard drives at all, however it would be nice to know what all of your hardware is since you are troubleshooting hardware.
Given my hardware list and that I am still well within the return period for the card would returning it and getting another be a good next step?

My (limited) experience is that the H310 runs appreciably cooler than the H200. I still added a small fan to the heatsink.
What did you use for that? I looked into this a bit but did not find any good guides on the subject.
 

wblock

Documentation Engineer
Joined
Nov 14, 2014
Messages
1,506
It was just a 12V fan, wired to a Molex connector. 40mm? I don't recall. Find the right size screws, and they will fit between the fins of the heatsink.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
My next advice is to start taking your system apart if the cooling doesn't fix it.

Remove the PERC card and connect the drives up directly to the motherboard. You appear to have a few SSD's also, those may need to come out and you may need to boot from USB, then test the hard drives that had failed. I suspect all things will go well. Then place your PERC back in but do not connect any hard drives to it and retest. Lastly if all goes well then hook up the SSD's to it and see how things work out. You may be able to isolate the problem to the PERC. But I doub't I'd replace it with the same board, I'd want a different HBA because it's possibel the same problems would come back.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
My next advice is to start taking your system apart if the cooling doesn't fix it.

Remove the PERC card and connect the drives up directly to the motherboard. You appear to have a few SSD's also, those may need to come out and you may need to boot from USB, then test the hard drives that had failed. I suspect all things will go well. Then place your PERC back in but do not connect any hard drives to it and retest. Lastly if all goes well then hook up the SSD's to it and see how things work out. You may be able to isolate the problem to the PERC. But I doub't I'd replace it with the same board, I'd want a different HBA because it's possibel the same problems would come back.
First, to clarify: while the card was having issues during when the drives were running smart tests (not badblocks) there where no actual failures from the drives themselves; they completed short, conveyance and long test without issue.

To go over in detail what I have already done: I am currently booting from one of the usbs, the SSD were already hooked directly to the motherboard and already passed their burn-in testing (I have since disconnected them), I have hooked up all the HDs I have sata cables for (6 of them) and restarted their burn-in testing: currently they are in the last badblocks pass with no errors or issues so far. While this has been going on the card has been in the computer, not connected to any drives and has not had any issues. There are 4 temperature sensors I have access to (cpu, phc, mb, cardside) they are ranging from 22-36C with the average not exceeding 27C (the phc is running at 36 all the others are in the 22-25C range).

Given this information and assuming the HDs all pass their burn-in tests; would you recommend: additional testing, returning the card and getting a replacement (or a different card and if so any suggestions as which one?), additional cooling for the current card or something else?
 
Last edited:
Status
Not open for further replies.
Top