Intermittent Drive Disappearances.

Status
Not open for further replies.

Finch!

Cadet
Joined
Oct 24, 2012
Messages
6
Hi,

I have a FreeNAS-9.3-STABLE-201512121950 system running the following hardware:

Chassis: Supermicro SC846E16-R1200B
Backplane: Supermicro BPN-SAS2-846EL
HBA: LSI 9211-4i flashed to IT P20 firmware and 07.39 BIOS.
Drives: 24x Seagate Constallation 3TB
Memory: 128GB Hynix PC-12800 ECC DDR3
CPU: Xeon E5 2630
Motherboard: Supermicro X9DRi-F 1.11a.

I have recently started encountering what I can best describe as the strange disappearances of all drives attached to the backplane - it seems as though the HBA occasionally loses contact with all drives simultaneously. Over several occurrances, there has been no obvious connecting factor: system and drive temperatures have been both low and moderate, system load has been both low and moderate, the system has not been subject to any sudden movement, power is filtered by an APC UPS with around 8 hours of capacity.

I've reseated cables, I've replaced the cable from the HBA to the backplane, reversed its direction, blown air on the contacts, I've moved the HBA to other slots on the board, and I've replaced the HBA with an identical but more recently produced card, and nothing has happened. The drives do not appear to get any power when the system boots, even though the HBA receives power and the HBA configuration utility works - it can't see any attached devices.

Getting the drives back has been frustrating, mostly because I can't figure out how or why they come back as suddenly as they disappear.

After the drives disappeared recently I spent some time messing with things - HBA (swapped it for a spare), cable (swapped for a spare), reseating power cables and memory and whatever to no effect. I returned everything to the state it was before the failure and attempted to power up the system for about the 25th time, and was once again rewarded with no drives. After a few hours sleep I returned to the office the next morning and the first thing I did was power on the server - and the drive lights turned on and the drives powered up and everything was working just fine, as though the system hadn't had any problems.

This recovery seems to be the standard recovery method for the failures I'm seeing.

I've been away for the weekend and the drives disappeared while I was away, and I'm going through the getting-them-back exercise.

FreeNAS isn't providing much in the way of useful logs. When the drives disappear, I can't log in to the server via the web interface, SSH, or console. I've just create a remote syslog server to try to catch some useful logging, but need the HBA to see the drives first!

Can anyone suggest any possible causes of this problem? It's intermittent, irregular, and I have no idea what causes the drives to disappear or what causes them to reappear. It's driving me nuts.

Thanks,

--Finch!
 
Last edited:
Joined
Apr 9, 2015
Messages
1,258
If you have a breakout cable available I would try and see if bypassing the backplane solves the problem, it could be having an issue of some sort related to data or power as my first guess.

I don't remember the post but there was something strangely like this where the FreeNAS os was on a single USB stick and it was encountering errors so maybe make a backup of the config and try a new FreeNAS drive first just in case. When it loses a drive in the pool it should not render the UI unreachable so this could be it.
 

Finch!

Cadet
Joined
Oct 24, 2012
Messages
6
Hi,

Thanks a bunch for the reply. No progress as of yet...

I have a breakout cable on the way. It won't be ideal as it won't allow me to do much more than find out if the HBA can see four drives, but it's a good start and might indicate problems with the backplane. My supplier should have it to me some time this week or early next.

My system runs from a single USB stick but I'm reluctant to investigate FreeNAS or the system boot device as the culprit because the drives don't appear to get power. Usually, when I power up the system the power and activity LED's on each drive light up for a fraction of a second as soon as the system gets power. At the moment, this doesn't happen - I know within a few seconds of power on if the drives will get power and spin up.

As of now, I've replaced the SF-8087 cable and swapped the HBA for one running the IR firmware. Nothing. I've changed back to the HBA with the IT firmware and I've re-seated power cables, but still nothing. I expect it will "magically" start working some time today, but I wish I knew why.

Interesting point regarding the UI. I agree, it should not be unreachable. I'm not losing a drive in the pool but the entire pool at the same time, and I have a bunch of scheduled tasks - snapshots and replication - that might be causing the system load to increase. We'll see!
 

Finch!

Cadet
Joined
Oct 24, 2012
Messages
6
My breakout cable arrived today and it indicates a problem with the backplane. I could order one on eBay or go through the return process from my supplier... argh. I'll see what they think!
 
Joined
Apr 9, 2015
Messages
1,258
That sucks but at least you got it figured out.
 

Finch!

Cadet
Joined
Oct 24, 2012
Messages
6
Hopefully it IS the backplane and this is figured out. I'll keep everyone posted :)
 
Status
Not open for further replies.
Top