How to find a failed drive if you have an LSI controller

Status
Not open for further replies.

lloyd

Dabbler
Joined
Dec 27, 2013
Messages
16
I found this great answer on for how to find a failed drive by using sas2ircu utility.

http://serverfault.com/a/644165

First discover the Index of your HBA card. Mine is 0
Code:
 /tmp# sas2ircu list
LSI Corporation SAS2 IR Configuration Utility.
Version 20.00.00.00 (2014.09.18)
Copyright (c) 2008-2014 LSI Corporation. All rights reserved.


         Adapter      Vendor  Device                       SubSys  SubSys
Index    Type          ID      ID    Pci Address          Ven ID  Dev ID
-----  ------------  ------  ------  -----------------    ------  ------
   0     SAS2008     1000h    72h   00h:02h:00h:00h      1000h   3020h
SAS2IRCU: Utility Completed Successfully.


Display details of your disk. I matched my bad one based on the serial number given by smartctl -a /dev/da11
Code:
 /tmp# sas2ircu 0 display | grep -B9 -A3 Z300DBGT
Device is a Hard disk
  Enclosure #                             : 2
  Slot #                                  : 11
  SAS Address                             : 5003048-0-01c5-5877
  State                                   : Ready (RDY)
  Size (in MB)/(in sectors)               : 3815447/7814037167
  Manufacturer                            : ATA    
  Model Number                            : ST4000DM000-1F21
  Firmware Revision                       : CC52
  Serial No                               : Z300DBGT
  GUID                                    : N/A
  Protocol                                : SATA
  Drive Type                              : SATA_HDD


Then it was just a matter of turning on the blinking red light using the enclosure and slot light
Code:
 /tmp#  sas2ircu 0 locate 2:11 ON


Once you replace the drive turn off the light.
Code:
 /tmp#  sas2ircu 0 locate 2:11 OFF


Very handy because I didn't write any serial number down when I built it.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What hardware did this work with?

I see it was an SAS 2008, but what backplane are you using?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It'll usually work with the BE16/BE26's, etc. I still prefer to mark drives.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I've known 2 or 3 people that claimed that they thought this worked properly, until it somehow misidentified the disk and they pulled the wrong disk in a production system. So while this is a "should work" type of situation, I definitely wouldn't trust it with my data. ;)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
It's easy to make mistakes. This is why I so heartily believe in belt and suspenders. You don't just trust one thing. Trust but verify. Saved my ass so many times.... for drives, I like slot number PLUS seeing proof on the activity LED (either positive or negative proof is fine) PLUS serial number where possible.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
So I downloaded this utility and it worked great on 2 of my 3 expanders (all Supermicro 24 port SAS2 in 846 chassis), but when I did the detail view on the 3rd one, my entire FreeNAS server did a hard reset!

As soon as I did the display command against my index 2 controller it happened. Here's a snippet of the frozen putty session:

sas2ircu.PNG


Now I was taxing the system heavily at the time being in the middle of copying 20TB from one datastore to another, and with 32Gig of RAM, I'm probably pushing it with 50 drives, so maybe that was the reason.

I'll cautiously try the command again once the copying is done and I have done a fresh reboot.

The utility did a great job of letting me validate my spreadsheet where I track serial numbers to bays, but if it is what caused the hard reset, I'll be staying away from this utility in the future.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Now I was taxing the system heavily at the time being in the middle of copying 20TB from one datastore to another, and with 32Gig of RAM, I'm probably pushing it with 50 drives, so maybe that was the reason.

I'd say that it shouldn't have happened whatever load you've put on it.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
I just realized I was using the "P20" version of sas2ircu. I have downloaded and installed the "P16" version to match the firmware version on my LSI controllers.

I'll give this version a spin once my datastore copy is completed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I just realized I was using the "P20" version of sas2ircu. I have downloaded and installed the "P16" version to match the firmware version on my LSI controllers.

I'll give this version a spin once my datastore copy is completed.
I thought about that, but didn't bring it up since it's far from certain that the tool only works properly with P20 firmware/drivers. I would want an external tool to work with all previous versions, but perhaps I'm giving LSI more credit than is due.
 

pclausen

Patron
Joined
Apr 19, 2015
Messages
267
I'm happy to report that the P16 version hasn't caused any hard resets of the entire server. Not sure if what happened before was a fluke or if there was an issue running the P20 version.
 

Scharbag

Guru
Joined
Feb 1, 2012
Messages
620
Just tested this on a SM847 with a 9211-8i and LSI SAS2 expanders. Worked as expected and matched the serial # stickers on the drive carts perfectly.

Thanks for the info!!

Cheers,
 
Status
Not open for further replies.
Top