Unable To Boot FreeNAS -- IBM M1015 Controller Issue?

Status
Not open for further replies.

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Hi!

Last November, my FreeNAS server suddenly died one day. I was using SAS adapters as listed in my signature block, running LSI SAS BIOS P13. I've flashed these to P16 in IT mode, but after the adapters detect attached hard drive devices, my FreeNAS implementation doesn't boot -- that is, the system does not appear to use the USB flash drive to boot from. Instead the system attempts to PXE boot. I am not sure why the "system" suddenly cannot boot from FreeNAS after the SAS controller finds all the hard drives. Everything had worked fine for a couple of years. In fact it worked so well that I simply forgot it was there until it crashed.

Since then I've done the following:

* Purchased an IBM RES2CV360 SAS expander.
* Purchased and installed new cables for connection from the expander to my hard drive devices.
* Removed one of the two IBM M1015 controller cards from my system and connected the remaining one to my SAS expander, again with brand new cables.
* Installed a brand new power supply on the computer system.

I've had many periods of not doing a thing to rescue the system, partly due to life events and mostly from ignorance of what the problem might be.

It is in my budget to purchase a new SAS/RAID controller card if that is what the real issue is -- if the cards have somehow gone dead. I really need to get the FreeNAS system up and running.

Thanks for any help that can be provided.

Bob
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi!

Last November, my FreeNAS server suddenly died one day. I was using SAS adapters as listed in my signature block, running LSI SAS BIOS P13. I've flashed these to P16 in IT mode, but after the adapters detect attached hard drive devices, my FreeNAS implementation doesn't boot -- that is, the system does not appear to use the USB flash drive to boot from. Instead the system attempts to PXE boot. I am not sure why the "system" suddenly cannot boot from FreeNAS after the SAS controller finds all the hard drives. Everything had worked fine for a couple of years. In fact it worked so well that I simply forgot it was there until it crashed.

Since then I've done the following:

* Purchased an IBM RES2CV360 SAS expander.
* Purchased and installed new cables for connection from the expander to my hard drive devices.
* Removed one of the two IBM M1015 controller cards from my system and connected the remaining one to my SAS expander, again with brand new cables.
* Installed a brand new power supply on the computer system.

I've had many periods of not doing a thing to rescue the system, partly due to life events and mostly from ignorance of what the problem might be.

It is in my budget to purchase a new SAS/RAID controller card if that is what the real issue is -- if the cards have somehow gone dead. I really need to get the FreeNAS system up and running.

Thanks for any help that can be provided.

Bob
First things first... Have you verified that the system BIOS is configured to boot from the USB stick?
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Yes, I've checked that. The USB drive containing FreeNAS is definitely first in the boot order. The issue is that after the M1015 controller card displays a list of all the devices it found on the HBA channels, and is finished, the system does not boot to FreeNAS. I've tried using CTRL-C to get in the adapter configuration utility, then pressing the <enter> key to get a display of settings, then changing the "boot support" item to:

disabled -- this appears to disable the HBA's operation, but it does allow FreeNAS to boot. It does me no good; FreeNAS can't find my hard drives.
BIOS & OS: causes system to go to PXE boot after the adapter is initialized
OS Only: apparently disables the adapter, so I shut down the system as FreeNAS was booting by pressing and holding the power button.

In the photo below, there are two disk packs of 9 hard drives in the foreground, and in the system unit are more drives. The device sitting on the pink bubble wrap is an Intel RES2CV360 expander. The motherboard and processor details are in my signature block.

overall_system_view_small.jpg


This photo shows 2 of my "spare" HBAs. Neither seems to work properly with respect to booting to FreeNAS, either. The top adapter is flashed to BIOS P16 and the bottom adapter remains at P13.

spare_m1015_hba_cards_small.jpg


Thanks

Bob
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
Reset your BIOS to optimal defaults. Don't enable PXE boot or disable it if enabled. And change boot order. Don't change anything else and see if it boots. Ironically a cheap SSD would probably take care of this problem. Just boot from that.


Sent from my iPhone using Tapatalk
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi!

Last November, my FreeNAS server suddenly died one day. I was using SAS adapters as listed in my signature block, running LSI SAS BIOS P13. I've flashed these to P16 in IT mode, but after the adapters detect attached hard drive devices, my FreeNAS implementation doesn't boot -- that is, the system does not appear to use the USB flash drive to boot from. Instead the system attempts to PXE boot. I am not sure why the "system" suddenly cannot boot from FreeNAS after the SAS controller finds all the hard drives. Everything had worked fine for a couple of years. In fact it worked so well that I simply forgot it was there until it crashed.

Since then I've done the following:

* Purchased an IBM RES2CV360 SAS expander.
* Purchased and installed new cables for connection from the expander to my hard drive devices.
* Removed one of the two IBM M1015 controller cards from my system and connected the remaining one to my SAS expander, again with brand new cables.
* Installed a brand new power supply on the computer system.

I've had many periods of not doing a thing to rescue the system, partly due to life events and mostly from ignorance of what the problem might be.

It is in my budget to purchase a new SAS/RAID controller card if that is what the real issue is -- if the cards have somehow gone dead. I really need to get the FreeNAS system up and running.

Thanks for any help that can be provided.

Bob
The PXE (network boot) option is usually the very last BIOS boot option defined in a system's boot order. The fact that your system gets to it implies that none of the preceding boot selections are working. So my guess is that your USB stick has failed. I suggest you get a new Sandisk Cruzer Fit 16GB USB 2.0 flash drive (or a cheap/small SSD as @maglin suggests) and re-install FreeNAS, restoring your settings from a saved configuration file. Hopefully you have a copy of the configuration file handy?

One of the basic tenets of system troubleshooting is not changing more than one thing at a time. Doing so makes it nearly impossible to isolate and identify a problem. You've added an expander, swapped HBA cards, swapped cables, installed a new power supply, etc., without really knowing why the system failed in the first place. No harm done, I suppose, but it would be better to get the system working before adding new devices.
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
I was going to ask how to extract the config file from my current USB flash drive containing FreeNAS. I have backup copies of the config file on my other computer, but I can't remember whether the newest backup of the config is before or after the last addition of a hard drive to my disk array. I think I can mount the flash drive on AltLinux and copy the config file to another disk if I knew the name of the config.

I think there is a misunderstanding on one point though. I am not saying that FreeNAS does not boot at all. It does indeed boot...if the HBA card doesn't display a list of hard drives on the screen. I previously did clone my FreeNAS flash drive to a new flash drive. Nevertheless, I'll replace that.

You are correct, I made all those changes without really knowing why the system failed in the first place. To add to my list of sins, I even flashed two of the HBA cards to P16 from P13. I give no excuses for this, especially since I'm a software developer and should know better.

Thanks

Bob

 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I was going to ask how to extract the config file from my current USB flash drive containing FreeNAS. I have backup copies of the config file on my other computer, but I can't remember whether the newest backup of the config is before or after the last addition of a hard drive to my disk array. I think I can mount the flash drive on AltLinux and copy the config file to another disk if I knew the name of the config.

I think there is a misunderstanding on one point though. I am not saying that FreeNAS does not boot at all. It does indeed boot...if the HBA card doesn't display a list of hard drives on the screen. I previously did clone my FreeNAS flash drive to a new flash drive. Nevertheless, I'll replace that.

You are correct, I made all those changes without really knowing why the system failed in the first place. To add to my list of sins, I even flashed two of the HBA cards to P16 from P13. I give no excuses for this, especially since I'm a software developer and should know better.

Thanks

Bob
My mistake: I thought your system wouldn't boot, and the fact that it does means your USB drive is good and doesn't need to be replaced.

Not sure which version of FreeNAS you're running: for 9.3.x and 9.10 the config file is /data/freenas-v1.db

Odd boot problem you're having... I'd be tempted to flash the HBA cards without the BIOS option ROM code and see if that helps. You don't need the option ROM for IT mode. Also, I'd check to make sure the two cards have unique SAS IDs.
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Thank you very much for your help. I'm running FreeNAS 9.2.0 as of February 2014. I'll look for a /data/freenas-v1.db file and copy it.

I will also try re-flashing one of the HBAs without the "option ROM". I'll go over the files I downloaded from Avago's website ... one of them has the characters "rom" in the filename, I think. That would be the one to skip?

Bob
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
My mistake: I thought your system wouldn't boot, and the fact that it does means your USB drive is good and doesn't need to be replaced.

Funny enough I was (am) on the same wave length as you.

When @BobCochran says, and I qoute:

"system" suddenly cannot boot from FreeNAS after the SAS controller finds all the hard drives.


His FN does see the HDs.

and continues with:

Instead the system attempts to PXE boot.

It leads to the conclusion that all boot devices failed and it is trying the last option on it's list.

I do agree that making multiple changes is not a good troubleshooting methodology and usually I try to isolate to the bare minimum and start adding to the equation until find the fault one. For instance, disconnect all and only boot from the boot device to ensure that is working ...
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Thanks for the suggestions. First I tried @maglin advice to try and disable PXE boot in my SuperMicro motherboard's BIOS. However it seems as if PXE booting can't be disabled, at least not when you are setting the boot order. I was still able to confirm that the USB flash drive housing FreeNAS is the first boot device on that motherboard.

Next, I wanted to change the "boot support" option of the HBA's "Adapter Properties" to "BIOS & OS". It was previously set to "OS Only". The "OS Only" setting does cause FreeNAS to boot off the USB. But it does not seem able to find my hard disk drives. It gives a lot of "sense errors".

So, I went into the HBA's adapter properties to make the changes. Here are some screen shots that might help.

first_screen_in_config_utility_small.jpg


Then I pressed the [enter] key to get the adapter properties:

adapter_properties_small.jpg


I don't know if the "SAS9210-8i" shown in the red box is correct. The sticker on the adapter card seems to say SAS9220-8i. However, I wanted to change the "Boot support" to "BIOS & OS":

adapter_properties_boot_support_small.jpg


...I did that, then before saving that change I moved the cursor down to "SAS Topology" and pressed the [enter] key. It took many minutes to get the list that is shown next:

sas_topology_small.jpg


After that, I pressed ESC to exit, and saved my settings, and rebooted.

This brought me to this screen which seems normal...but...it does not show the hard drive devices. Instead, it displays a new "initializing" screen:

search_for_devices_small.jpg


It stays essentially stuck on the screen given below. I had thought it would briefly show a list of the hard drives being found, then the screen would go blank and the PXE boot screen would show up. Well, at least something different is happening. I wonder if I toasted the HBA somehow?


stuck_on_initializing_small.jpg


Tomorrow I will remove the HBA per @melloa and see if FreeNAS boots. I expect it will and will tell me it cannot find the storage pools it is expecting.

Thanks

Bob
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
...it does not seem able to find my hard disk drives. It gives a lot of "sense errors".

...I moved the cursor down to "SAS Topology" and pressed the [enter] key. It took many minutes to get the list that is shown next...


This brought me to this screen which seems normal...but...it does not show the hard drive devices. Instead, it displays a new "initializing" screen:

It stays essentially stuck on the screen given below. I had thought it would briefly show a list of the hard drives being found, then the screen would go blank and the PXE boot screen would show up. Well, at least something different is happening. I wonder if I toasted the HBA somehow?
Interesting. You're having sense errors; it takes 'many minutes' for the HBA option ROM BIOS to recognize and load the disks; and at boot time the system times out before the HBA recognizes the disks.

You may have one or more bad disks. In fact, you have at least two of the notorious Seagate ST3000DM001 drives, which are famous for failing spectacularly. I suggest you rig the system to boot Ubuntu from a USB flash drive and test each drive separately with the smartmontools:
Code:
Create bootable Ubuntu stick (I recommend version 14.04)
Install the smartmontools on the USB stick (sudo apt-get install smartmontools)
Disconnect all drives from the HBA
Attach a single drive to the HBA, boot Ubuntu, run the smartctl tests on it.
Repeat step 4. for all of your drives.
This will be tedious, I know... but you have a major problem that needs addressing.

Good luck!
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Interesting. You're having sense errors; it takes 'many minutes' for the HBA option ROM BIOS to recognize and load the disks; and at boot time the system times out before the HBA recognizes the disks.

You may have one or more bad disks. In fact, you have at least two of the notorious Seagate ST3000DM001 drives, which are famous for failing spectacularly. I suggest you rig the system to boot Ubuntu from a USB flash drive and test each drive separately with the smartmontools:
Code:
Create bootable Ubuntu stick (I recommend version 14.04)
Install the smartmontools on the USB stick (sudo apt-get install smartmontools)
Disconnect all drives from the HBA
Attach a single drive to the HBA, boot Ubuntu, run the smartctl tests on it.
Repeat step 4. for all of your drives.
This will be tedious, I know... but you have a major problem that needs addressing.

Good luck!

Thanks a ton for taking the time to point out that the 3 Tb Seagate drives may have failed. And thank you for suggesting the test protocol. I'm going to follow your suggestions and do exactly as you say. Hopefully further testing will help me here.

Bob
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
You may have one or more bad disks. In fact, you have at least two of the notorious Seagate ST3000DM001 drives, which are famous for failing spectacularly. I suggest you rig the system to boot Ubuntu from a USB flash drive and test each drive separately with the smartmontools:

Question: If FN boot is OK, would it boot if all HDs fail?
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
I don't know...

That's what's intriguing me. When you install a new FN, you create the pools later, so boots without any HDs been used. If one HD fails, it boots and the alarm start flashing, you can replace the failed HD. In @BobCochran 's case could be more than just the pool. Let's see if anyone can solve this riddle or @BobCochran can isolate with step-by-step hardware troubleshooting.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
That's what's intriguing me. When you install a new FN, you create the pools later, so boots without any HDs been used. If one HD fails, it boots and the alarm start flashing, you can replace the failed HD. In @BobCochran 's case could be more than just the pool. Let's see if anyone can solve this riddle or @BobCochran can isolate with step-by-step hardware troubleshooting.
Yes, @BobCochran's problems are baffling... it's as if the HBA 'gives up' and isn't even seen by FreeNAS. :smile:

I'm hoping that simplifying his hardware testing will reveal either a bad HBA or one or more bad drives. Creep, crawl, walk, run...
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Yes, @BobCochran's problems are baffling... it's as if the HBA 'gives up' and isn't even seen by FreeNAS. :)

I'm hoping that simplifying his hardware testing will reveal either a bad HBA or one or more bad drives. Creep, crawl, walk, run...

Thanks a ton! I think Alt Linux comes with smartmontools, and I'm more familiar with that than Ubuntu. I'm going to follow your advice and use smartmontools on each hard drive. I also have a new USB flash drive on order that will arrive tomorrow, and I'll put Ubuntu on that. I will also learn how to use smartmontools themselves -- there is probably a tutorial somewhere for how to test a hard disk with them. I want to get to the bottom of this. If it is just a matter of one or two failed drives, then I should be able to replace them with bare drives and have the entire array resilvered.

Thanks so much for the help and advice.

Bob
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
@BobCochran that's the point I'm trying to make ... in a perfect scenario, you would be able to (1) create a new boot, (2) import your config and (3) import your pool. If there are one or two bad HDs in your pool (depending on the raid option used) you should be able to replace them (assuming you will install the same/compatible version of your old FN).

FN has been very good to me when HDs failed. It displayed a warning and I've replaces and resilvered the failed HD -- no problem.

If you don't have a saved config, you still can create a new boot, import your pool, change bad HDs, and reconfigure all your shares, jails, etc.

Wish you lucky!
 
Status
Not open for further replies.
Top