Boot array goes offline and system hangs

gmckeown

Cadet
Joined
Jan 10, 2020
Messages
4
I have a strange issue where my boot drives (ada0 & ada1) are going offline with errors. Screenshot attached.

System:
- Boot drives: 2 X Kingston DC-500R SSD ( new drives - less than 1000 hours powered on).
- Supermicro x9dri-f
- 24 Port chassis
- 24 X 1TB Seagate Exos SAS - 3 X 7 drive zfs2 with 3 hot spares
- Avago 6GB SAS HBA
- 32GB Samsung ECC Ram

Freenas 11.2 U7

Has anyone seen this type of thing?

This is a purpose-built system with enterprise grade hardware. No desktop components.
 

Attachments

  • Screenshot at 2020-01-10 12-12-50.png
    Screenshot at 2020-01-10 12-12-50.png
    119.8 KB · Views: 190

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
are you using the onboard SATA, or the avago, for the boot drives?

"Avago 6GB SAS HBA " and "24 Port chassis " is very non specific, so it's difficult to figure out how you have everything connected (does the SAS card have 24 ports?, does the chassis have an expander? are the boot drives in the 24 drives, or are they internal or misc hotswap?).

there is always the chance you has 2 dud drives, but if both drives are doing the same thing, I would suspect the cables/controllers first, and I'm going to assume you're using onboard SATA for boot and the avago for the data drives; within this context:
this is an older board, and has up to 3(?) SATA controllers; enterprise-grade merely means less likely to fail, so it's entirely likely that the controller is bad, or the cables are bad. you'll need to test out alternatives; it looks like there are 2 sets of SATA and 1 sff-8087 connectors on the board, try switching b/w them. also, the SATA controller(s) mention raid, are they in HBA/direct mode?
 

gmckeown

Cadet
Joined
Jan 10, 2020
Messages
4
Thanks so much for the reply.

This is an older system, yes. It is used for secondary backups. I understand enterprise-grade merely means less likely to fail - I mentioned it so it's clear that I am not using desktop-grade components. I tested all the usual suspects (controllers, cables, RAM) and have not been able to find the cause.

Here is a bit more information on the setup.

1. The SSDs are connected to a on-board SATA in direct mode. The next thing I will do is connect to a separate HBA and eliminate the on-board from the mix.
2. The 24 platter drives are connected via Avago 4 port SAS HBA (SAS 2108) and Supermicro SuperChassis 846 with a 24 port SAS expander.
3. Memory was tested prior to deployment - no errors
4. I have 4 sets of SSDs (All Kingston DC-500R) were tested and the same results, so the chance that I have 8 bad SSDs bought in October 2019 is pretty slim. It's possible, but not likely.

If I login to the BMC and hard reboot - the system comes back up with no issue for 2-3 days then the freeze happens again.

Maybe the Kingston drives are not compatible. I have quite a few of them in production on Linux systems with no issues. This is the first Freenas system I am deploying, so I am not as familiar with BSD as I am with Linux in terms of hardware issues. I have not tried to used USB sticks yet, but I am not a fan of using USB. I'll try that and see what shakes.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
hmm. something that I have noticed with zfs is that the way it handles disks sometimes catches iregularities that other systems just do not find. i have a few disks that zfs reports as returning bad data but an hp410 thinks they are just fine.
as long as the drives show up they should be compatible, the drivers only reflect the controller, really, less the drives themselves since the interface b/w controller and drive is standard (SATA/SAS)
 

gmckeown

Cadet
Joined
Jan 10, 2020
Messages
4
I put in a SAS2108 card and connected the drives. No dice. Same issue. I don't have any more time to mess with it, so back to Debian and hardware raid for this time. I really wanted to start using zfs, but it will have to wait. I will do some testing and burn-in on some newer hardware and see where that goes. Thanks again for the assistance.
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
uh. debian should have zfs though. zfs is not freenas exclusive, freenas is just a storage appliance that primarily uses zfs
 

gmckeown

Cadet
Joined
Jan 10, 2020
Messages
4
I understand. I am not a novice by any stretch. I just don't have enough confidence in freenas yet to trust it in full production. I know there are tons of installations that work out there. It's not ZFS that is the issue, it's a freenas compatibility issue with this specific install. I just installed proxmox with zfs on the same system and there have been no problems yet. Time will tell.
 
Top