Failing to boot with Samsung SAS SSDs in place

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
Hey All,
I've got 9 Samsung SAS SSDs (p/n: MZ6ER200HAGM-00003) and whenever have *any* plugged in, I get the below messages. I hot-plugged the drives and created a z2 pool, but the issue persists. Any insights what's going on here? I've got 3, LSI SAS2008 cards connected to my 20 drive backplane, and an existing z2 pool (where feature updates are pending) of 10 disks on a fully up-to-date FreeNAS-11.3-RELEASE system.

Code:
gptzfsboot: error 1 lba 1
gptzfsboot: error 1 lba 1080
error 1 lba 1080
failed to clear pad2 area of primary vdev
failed to read pad2 area of primary vdev
gptzfsboot: error 1 lba 192999656
gptzfsboot: error 1 lba 192999656
gptzfsboot: error 1 lba 132149728
gptzfsboot: error 1 lba 132149728
gptzfsboot: error 1 lba 21013120
gptzfsboot: error 1 lba 21013120
ZFS: i/o error - all block copies unavailable
ZFS: can't read MOS of pool freenas-boot
gptzfsboot: failed to mount default pool freenas-boot

FreeBSD/x86 boot
 

Attachments

  • IMG_20200224_200527.jpg
    IMG_20200224_200527.jpg
    319.8 KB · Views: 251

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
I've been digging some more into this and I haven't found a confirmed solution, but a couple of similar posts suggest a re-install is a likely fix here. Anyone seen these symptoms who can confirm?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The searching I did says that it seems to be related to FreeBSD querying all drives, and throwing a minor fit if something doesn't have a valid partition table. Check to see that your SAS2008's aren't trying to flag them as bootable devices maybe if they still have their option ROMs - are your cards in IT mode?

Failing that, hotplug the SM1625s and secure-erase them to nuke any remainders of a partition table, then see if it boots again?
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
So, it's none of those things, but asking about IT mode reminded me that one of the three HBAs was extremely difficult to successfully flash into IT mode (no joke, it took two machines, three utilities and three OSes and to wipe out the IR firmware and get the IT firmware onto it and the other two cards were a breeze).

Long story short, I confirmed that the HBA in question had not been in use until I added the new drives, I swapped some cables around so I knew which drive bay went to which HBA, and tested booting adding one drive at a time. Every time I added a drive to the HBA in question, my boot issues reappeared.

So, looks like I'll be buying another SAS2008.

For anyone else in the future who is running into this issue, check your "Boot Support" setting (as recommended by HoneyBadger above) in the Avago Config Utility (ctrl+c at the prompt during boot). I suspect this setting is part of my problem here, but the utility refuses to load for me, presumably because of the bad card.
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
I forgot to add that I'll update this thread if the replacement HBA fixes the issue.

I don't seem to have powers to edit my initial post, but I also wanted to add that FreeBSD does begin to boot (the text based spinner blips on the screen for a second) before spitting out the above messages.
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
Flashed and dropped in the replacement HBA this afternoon. This one flashed over into IT mode without much effort. I'm still seeing precisely the same issue, however.
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
I tried a few more troubleshooting steps here including booting from a thumb drive into Linux as a sanity test. Linux found all disks without issue, which makes me feel like I'm narrowing in on a FreeBSD/TrueNAS issue here.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Sorry for the radio silence here.

Is Linux able to write partition tables to and utilize the disks? Just trying to narrow down to if it's an issue with the drives or the drivers.

If this is the first time you've used the SSDs, have you made sure they aren't using 520-byte sectors or a T10 protection mode?
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
Hey! Thanks for getting back. Here's what I've dug up or tested based on your suggestion:
- In Linux, I created new GPT+ext4 partitions on the Samsung disks, mounted, and made test files on all the disks without issue.
- In Linux, I removed the ext4 partitions and created a raidz2 using all of the Samsung disks and used dd to create files between 100MB and 1GB
- I pulled back as much info on one of the disks as I could think of (`smartctl`, `hwinfo --block`, and `fdisk -ll` -- let me know if I've missed the right command) and found no mention of the disks support for T10, DIX or DIF.
- All drives have 512B sector sizes
- I was going to re-install and restore the backed up config yesterday, but I got a "can't load 'kernel'" message. To validate there wasn't anything wrong with my thumb drive, I shutdown, pulled the Samsung drives again, then booted cleanly into the installer (pausing and shutting back down at the installer splash/prompt).
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
Just for kicks, I tried to boot to the TrueNAS Core 12.0 nightly build image and got the same "can't load 'kernel'" message -_-
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
Okay, now that I've a spare, matching HBA, I put into a spare machine I had laying around and plugged in four of the SAS drives and booted into the FreeNAS installer without issue. With that info, I began playing around with different combinations of drives plugged into different HBAs and found that I can have any combination of 3 drives installed, but if I plug in any more than that, FreeNAS fails to boot as outlined in my initial post.
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
I haven't filed a Jira bug. If you point me to the right place, I'd be glad to do so.

fwiw, I think something is happening after grub (or I assume it's grub) but before the boot splash.
 

dev_dull

Dabbler
Joined
Dec 15, 2017
Messages
12
Following up on an old thread for anyone else running into an issue as described above. While I still don't know the exact cause, it appears to have been a FreeBSD based issue as upgrading to the Linux based TrueNAS Scale has resolved my issue.
 
Top