Confused, freenas slowly falling apart?

hme

Cadet
Joined
Aug 9, 2012
Messages
8
Hi,

I'm running a fairly new freenas server and having a ton of problems with it.
It's a fairly new supermicro server with 10 Seagate sas drives running raid z2 with 2 spares. It's been running ok for a couple of months.

Last week it kept on crashing. No web interface, no ssh, no nfs. There were messages about not being able to read SMART values from /dev/da2.
I forced a power cycle then noticed all kinds of scsi bus errors in the console. I couldnt believe this was caused by one faulty drive so i replaced the HBA (Avago MPT SAS3).

That didnt help so i powered down again, pulling that DA2 drive. That helped; No more error messages on startup. Soon i got the resilvering message so all good, i thought.

Just now i get the message "state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.."
When looking at the pool i see one of the spares is UNAVAIL

1603403873227.png


My second spare is still there. But what is the best approach ? Offline and replace da7 but where? At the spares section on the bottom or in the middle?
And shouldn't it just resilver with the other spare ?

Hope anyone can help me, thanks.
A novice freenas/truenas user
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Welcome to the forums (first post in 8 years - you had a good run so far...).

it'll help you to get some help if you'll post the information suggested/requested in "Forum Rules" on the masthead. Right now there's not much to go on. Please include a description of the pool structure.

Did you burn this "fairly new " system in ?
 

hme

Cadet
Joined
Aug 9, 2012
Messages
8
Yes, it has been running for a few months without problems. Also running an identical system on another site without problems. I think you get a good idea on the pool structure when looking at the screenshot. Nothing special really, a raidz2 based on 8 sas drives with 2 spares and a whole lot of memory (128G). I've always been using raid cards so im new at zfs management and right now im puzzled on what to do next; clearly the system is waiting for some action but im not really sure what.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Hardware description, please.
 

hme

Cadet
Joined
Aug 9, 2012
Messages
8
Here goes:

10BAY E5 X10 Intel 6Core E5-2603V4 1.7Ghz 85W CPU
8 x Certified 16GB DDR4 2666Mhz ECC REG
Redundant 700W PlatinumLevel Power Supply
10 x 2.5 Seagate SAS 1800GB 10K ST1800MM0129
Intel i350-T2 Dual 1GbE adapter
Supermicro 32GB SATA DOM
Supermicro AOC-S3216L-L16IT controller
 

hme

Cadet
Joined
Aug 9, 2012
Messages
8
For the record. This turned out to be a system board problem. After replacing that everything went back to normal.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
For the record. This turned out to be a system board problem. After replacing that everything went back to normal.

Sounds like a nasty problem..... ideally these types of issues could be detected, but it would be necessary to diagnose what actually went wrong and what symptoms could be reliably used to detect the issues.
 

hme

Cadet
Joined
Aug 9, 2012
Messages
8
Very nasty indeed. I understand that could be valuable info but even my hardware supplier couldn't find anything in the ipmi logs etc. Apart from one or two watchdog occurences. We've been using supermicro for over 17 years, never saw it before. But here we are, running for over a week without even one tiny extra line in the console.
 
Top