Dying Drive(s) unsure what action to take

Status
Not open for further replies.

DeTrimmy

Dabbler
Joined
Jan 23, 2016
Messages
10
Hello all,

System Specs:
Base DELL T20
Processor - Xeon 1225-v3
Ram - 12gb ecc ram
HDD - x4 2 GB WD red
RAIDZ2 arrangement

I received an email from my FreeNAS today
" CRITICAL: Oct. 3, 2018, 5:33 p.m. - The volume home_pool state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected"

From this i must have two dying drives as im running RAIDZ2? Most of my important data is backed up on the cloud and DVD.

I ran zpool status (It doesnt look good)

Code:
root@freenas:~ # zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:31 with 0 errors on Mon Sep 10 03:45:32 2018
config:

		NAME										  STATE	 READ WRITE CKSUM
		freenas-boot								  ONLINE	   0	 0	 0
		  gptid/c0d84f35-d767-11e5-926c-989096b05488  ONLINE	   0	 0	 0

errors: No known data errors

  pool: home_pool
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
		corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
		entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:10:20 with 1 errors on Wed Oct  3 17:38:37 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		home_pool									   DEGRADED	 0	 0   120
		  raidz2-0									  DEGRADED	 0	 0   240
			gptid/b2d81d5b-c6ff-11e8-a617-989096b05488  DEGRADED	 0	 0	 1  too many errors
			gptid/b3e04b4f-c6ff-11e8-a617-989096b05488  DEGRADED	 0	 0	 0  too many errors
			gptid/b4ea68e2-c6ff-11e8-a617-989096b05488  DEGRADED	 0	 0	 0  too many errors
			gptid/b5f5da06-c6ff-11e8-a617-989096b05488  DEGRADED	 0	 0	 1  too many errors

errors: 1 data errors, use '-v' for a list
root@freenas:~ #



Could i have some advice on what to do next besides throwing the whole thing out of a very high window :)

Thanks in advance
 

DeTrimmy

Dabbler
Joined
Jan 23, 2016
Messages
10
I ran MemTest and have isolated the faulty stick of RAM.

My question now is how has this faulty stick affected my dataset?
 
Last edited:
D

dlavigne

Guest
Where are you at with this? As in did you isolate a disk issue, still have zpool errors, or recreate the pool?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
My question now is how has this faulty stick affected my dataset?
It appears that it has resulted in at least one data error. I'd suggest removing the bad stick, booting the system, and running a scrub. Once that scrub finishes, you'll have a better idea of the impact (if any) to your pool.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
I ran MemTest and have isolated the faulty stick of RAM.

My question now is how has this faulty stick affected my dataset?

Why didn't your system hault when an ECC error was triggered? Can you see any errors logged in the system monitor?

Definitely re-scrub... your issues may go away.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
Why didn't your system hault when an ECC error was triggered?

Great question!

It would have had to have been a double error to halt right? A single error would have been corrected and the system would keep going, no (but probably report an error)? I'm not sure how FreeBSD handles an MCE actually.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
But a single corrected error wouldn’t cause a problem, but should be logged. Have you checked your IPMI?
 

DeTrimmy

Dabbler
Joined
Jan 23, 2016
Messages
10
Sorry for the delayed response, I have been out of the country. After some further investigation it required the ram to be reseated. I have no idea how it came loose, anyhow I’m surprised that being ECC memory the system didn’t halt when the issues occurred.

Long story short no data loss, memory reseated everything appears to be fine.

Thanks for all you help
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
How did you get to a memory error when the original post was about the disks?

Sent from my SAMSUNG-SGH-I537 using Tapatalk
 
Status
Not open for further replies.
Top