Unable To Boot FreeNAS -- IBM M1015 Controller Issue?

Status
Not open for further replies.

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
I am practicing with smartctl and will start testing the drives on my system soon. In fact I'm testing on a well-used Toshiba DT01ACA300 drive which is not part of my FreeNAS system. The "remaining lifetime" column says 654 hours. Should I take that number literally and expect the drive to fail after 654 hours of operation?

If I see a low "remaining lifetime" value on other drives, is that a warning to replace the drives as soon as possible?

Thanks a ton

Bob
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I ignore the remaining lifetime field as it's a guesstimate (at best). Instead look at the disk parameters and see if anything is out of place.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
I am practicing with smartctl and will start testing the drives on my system soon. In fact I'm testing on a well-used Toshiba DT01ACA300 drive which is not part of my FreeNAS system. The "remaining lifetime" column says 654 hours. Should I take that number literally and expect the drive to fail after 654 hours of operation?

If I see a low "remaining lifetime" value on other drives, is that a warning to replace the drives as soon as possible?

Thanks a ton

Bob
I'm pretty sure there's a good SMART tutorial here on the forum, but I can't find it...

Anyway, here are the SMART attributes I watch:

1 Raw_Read_Error_Rate
5 Reallocated_Sector_Ct
7 Seek_Error_Rate
9 Power_On_Hours
183 Runtime_Bad_Block
193 Load_Cycle_Count
194 Temperature_Celsius
196 Reallocated_Event_Count
197 Current_Pending_Sector
198 Offline_Uncorrectable
199 UDMA_CRC_Error_Count
200 Multi_Zone_Error_Rate

SMART attributes can vary by manufacturer, so it's always a good idea to research your particular hardware.
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Thank you very much, @Robert Trevellyan! I'll give the Wikipedia and BackBlaze articles study.

Bob
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Hi! I have tested all my hard drives following the procedure @Spearfoot specified, except I used the smartmontools from the very latest rescue build of Alt Linux. One of the Seagate ST3000DM001 3 TB drives is belly up dead. 3 other drives (all are Seagate 2 Tb units) passed the smartctl "short" test but there are errors reported in the output of `smartctl -a /dev/sda'. I know I definitely need to replace the failed 3 Tb drive, and I'll get an order out in a few minutes. But what should I do about the other 3 hard drives that passed testing, but do have errors reported? Should these be replaced? I want to do the prudent thing here.

I want to express my deepest thanks to @Spearfoot for spelling out a test procedure for me. Until now I have been totally unaware of smartmontools (I've heard about them but never inquired or Googled about them), and after using them on about 18 hard drives now, I'm wondering how I ever managed without it. Thanks, Spearfoot!

Bob
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
what should I do about the other 3 hard drives that passed testing, but do have errors reported?
Depends on various factors, including the type of error. I'd consider things like how recently the errors occurred (the logs should show the 'power on hours' when each error was recorded) and whether they were clustered or spread out. For example, I'd consider a cluster of errors recorded a significant time ago less troubling than repeated errors over the life of the drive. You could post the smartctl output here for members to comment on.
 

Spearfoot

He of the long foot
Moderator
Joined
May 13, 2015
Messages
2,478
Hi! I have tested all my hard drives following the procedure @Spearfoot specified, except I used the smartmontools from the very latest rescue build of Alt Linux. One of the Seagate ST3000DM001 3 TB drives is belly up dead. 3 other drives (all are Seagate 2 Tb units) passed the smartctl "short" test but there are errors reported in the output of `smartctl -a /dev/sda'. I know I definitely need to replace the failed 3 Tb drive, and I'll get an order out in a few minutes. But what should I do about the other 3 hard drives that passed testing, but do have errors reported? Should these be replaced? I want to do the prudent thing here.

I want to express my deepest thanks to @Spearfoot for spelling out a test procedure for me. Until now I have been totally unaware of smartmontools (I've heard about them but never inquired or Googled about them), and after using them on about 18 hard drives now, I'm wondering how I ever managed without it. Thanks, Spearfoot!

Bob
Aw, schucks! You're very welcome, @BobCochran! :smile:
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Hi!

I replaced the 3 Tb hard drive yesterday. All the other drives had tested okay (some, as noted above, had passed but have errors.) Now, the HBA displays a device list of all the hard drives on my monitor quickly...but after doing this, FreeNAS still does not boot. Instead the system attempts to PXE boot, even though the FreeNAS USB stick is listed as first in the system boot order.

I am stumped about why this is happening. Could I have flashed totally the wrong firmware for the HBA? Perhaps I need to flash the firmware for the LSI 9240-8i instead of the 9220-8i or the 9210-8i?

Thanks

Bob

 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
@BobCochran, As I've posted before I'd do: "usually I try to isolate to the bare minimum and start adding to the equation until find the fault one".
Get the box booting first without anything connected to it...
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
@BobCochran, As I've posted before I'd do: "usually I try to isolate to the bare minimum and start adding to the equation until find the fault one".
Get the box booting first without anything connected to it...

Thank you! That will be my next step. I'll report results in the next couple of days.

Bob
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Any luck?

Hi! I had something hit my eye recently in a yard accident and I've been "down" with an eye infection. I'm just starting to feel well enough to work with the FreeNAS array again. Either later tonight or tomorrow night I'll follow the method you suggest and report results.

Thanks a ton

Bob
 

BobCochran

Contributor
Joined
Aug 5, 2011
Messages
184
Hi! I am sorry I have not followed up on this more promptly. I did follow the method I believe melloa was suggesting to me and it was very helpful. Unfortunately I still don't have the jackpot answer to how to get FreeNAS to boot on my system. What I did was this. I disconnected the Intel RES2CV360 SAS expander entirely, including removing power from it. Then I connected one of the hard drives among my devices to just the HBA (it is running firmware P13) and tried to boot FreeNAS. I took photos of the devices that the HBA is able to see. If FreeNAS boot, I would take a photo of which storage pools it was able to discover.

One at a time, I connected the drives to my HBA. I have 14 total hard drives in my setup. One at a time, I attempted to boot FreeNAS. Each time, FreeNAS would boot. It also found one of my storage pools -- one of the small ones based on a UFS drive.

However, using the cables I have, I could only add 8 hard drives to the HBA: 2 ports X 4 hard drives for each port. When I wanted to connect a 9th hard drive, I switched over to connecting the Intel SAS expander to the HBA, and connecting 9 hard drives to the expander rather than the HBA. With 9 hard drives connected...FreeNAS would still boot.

With the 10th hard drive, FreeNAS would not boot. The Intel PXE boot agent would start up instead, and the system would attempt to PXE boot. I took this drive out of the system and tested it again with smartctl. It failed the self test. It was a Seagate 3 Tb drive. I replaced the drive with a brand new and bare drive of the same size and brand. FreeNAS booted once I did this.

I connected the 11th hard drive and FreeNAS booted, but still could not find 2 storage pools.

I connected the 12th hard drive and FreeNAS again booted.

Starting with hard drive #13 FreeNAS refused to boot. It also failed to boot when I connected drive #14, the final drive in the setup, by itself. Nor would FreeNAS boot when drives #13 and 14 were connected together. In each case, the Intel PXE boot agent would start up instead, and the system would attempt to PXE boot. These drives tested okay with smartctl.

This caused me to look more closely at the HBA adapter SAS topology list, and also at the hard drive list that my SuperMicro motherboard's BIOS is reporting.

Apparently, it is possible, in the HBA properties, to select a hard drive as a preferred boot device. I don't know how this HBA property works or what the intent is. I just want to report it in case this is the problem. I notice this display after drilling into the SAS Topology screens:

select_device_as_boot_device.jpg


Well I now wonder if the system is pointed to entirely the wrong boot device? I also noticed that my motherboard's BIOS will list hard drives that are connected to the SAS expander. I see this screen in the system BIOS "Boot" menu.

list_of_hard_disk_drives_from_system_bios.jpg


So I wonder if disabling the above list of devices will allow FreeNAS to boot with all 14 hard drives connected to the SAS expander?

Thanks a lot for the help and advice.

Bob
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
You don't *need* to have any HDs connected to your box for FreeNAS to boot.

Can you boot without the HBA and any HDs installed?

If you can't... fix that first.

If you can... thats interesting :)
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
@BobCochran - Glad you are back and OK.

It seems the system is booting up to HD #12, so something when adding the 13th is probably causing the system to became unstable and stop working. It could be related with the controller, either the card itself or cable. As order of HDs won't affect boot, have you replaced the HD #12 with #13? Tried another card or that card on another box? I'm also not sure why your BIOS would show the HDs connected to the SAS. I'd think it won't show up - I'm a complete dummy on those and only used a couple time to test and learn how to flash them.
Again, my thoughts are the - as you can boot as you add HDs - FN has no play on the system not booting (it boots without HDs :) )

Hopefully someone can provide more helpful thoughts here...
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Could it be the psu?
 

melloa

Wizard
Joined
May 22, 2016
Messages
1,749
Could it be the psu?

It could if the wattage from the powered HDs/MB/card/etc is exceeding the max PSU capacity. That could also happen if the PSU is going bad and not able to keep up with the demand.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Can also be issues with how the power is distributed too
 
Status
Not open for further replies.
Top