Odd behaviour - troubleshooting hardware issues

ZodiacUHD

Patron
Joined
Aug 28, 2015
Messages
226
Hello,

since a couple weeks ago, i started noticing erratic behaviour from my TrueNas machine. I'll put them in order of appearance and hopefully some of you can point me in the right direction.

  1. One of my drive, fails. I wasn't particularly surprised by this: the drive was 6 years old and i'd say it was about time. The only thing that i found interesting was that if i rebooted the system, the drive was read as properly working (for a few hours or days at least and then marked as failing)
  2. After replacing the drive, another one has to be swapped and gets marked as failing. Again i thought it might just be my luck. The drive was as old as the other one so, i simply went ahead and swapped it. A pity that after the resilvering process, it only needed a few days to be marked as failing as well. That's when i started scratching my head. I decided to swap 2 drives inside my system to see if it would make any difference (and it did).
  3. No matter which drive i'd connect on that port, it would eventually be marked as failing. So, since i had to spare sata ports, i connected it to a new one and everything seemed to be working ok. I started running scrubs and long smart tests to see if anything popped up.
  4. After a smart test, the pool was marked "unhealthy". I ran other smart tests and scrubs, still unhealthy.
  5. I decided to reboot the machine and it hang on boot while reading data on the USB where i have my OS Storage. Hard reset and booted just fine.
  6. This takes us to the present day: the system is running fine but i have 4 smart tests i started 2 days ago stuck on 99% and this message
Code:
Jun 12 05:58:01 freenas APEI Corrected Memory Error:
Jun 12 05:58:01 freenas Node: 0
Jun 12 05:58:01 freenas Device: 1
Jun 12 05:58:01 freenas Memory Error Type: 2
Jun 12 05:58:01 freenas Flags: 0x1
Jun 12 05:58:01 freenas FRU Text: CorrectedErr

Screenshot (15).png

My system specs are as follow:
-E3C224D4I-14S
-32 GB ECC DDR3 Kingston memory
-4x3TB WD RED+2x4TB WD RED (1 pool)
-Intel(R) Xeon(R) CPU E3-1231 v3

Let me know if anyone has any idea of what is happening and if i can upload anything else to help in the process.

Cheers,

Mike
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
What are the exact models of your Reds? If they’re WD40EFAX, then you have SMR drives, which can explain the behavior you’re seeing.

 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Last edited:

ZodiacUHD

Patron
Joined
Aug 28, 2015
Messages
226
The 4 TB drives are: WD40EFZX
The 3TB drives are: WD30EFRX

This looks like a RAM error

That is also what i initially thought after some googling. I'm not sure how to approach it...
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
The 4 TB drives are: WD40EFZX
The 3TB drives are: WD30EFRX
The drives are CMR then.
If you are using a thumb drive for booting and it has hung on boot, then I should replace it and reload your saved config and see what happens.
 

ZodiacUHD

Patron
Joined
Aug 28, 2015
Messages
226
The drives are CMR then.
If you are using a thumb drive for booting and it has hung on boot, then I should replace it and reload your saved config and see what happens.

I will try that for sure, I've already ordered a new one.
 

ZodiacUHD

Patron
Joined
Aug 28, 2015
Messages
226
New alert: * Pool Media state is DEGRADED: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. The following devices are not healthy: • Disk 16837724245428996327 is DEGRADED • Disk 8865920302035276106 is DEGRADED • Disk 2439674705461220852 is DEGRADED • Disk 10007795424645275763 is DEGRADED

ok, i just got this email which leads me to think my SAS Controller is probably a suspect in all of this mess. Is there any card suggested i could use? I'm afraid my motherboard is out of warranty...
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
Looks likely good. Don't know anything about the seller, of course. But that is one of the commonly used devices that many of us on the forums here buy used and install with good results.
 
Top