Help with infinite resilver

Status
Not open for further replies.

Todd Nine

Dabbler
Joined
Nov 16, 2013
Messages
37
Hi all,
I have a drive /dev/ada2 that has 7 bad sectors on it. I've attempted to use the dd + scrub method to overwrite the sectors. In every case dd fails with an I/O error.

I then removed this drive, wiped it, then re-added it to my array. My resilver process seems to be stuck in an infinite loop. It takes about four hours to complete, then just starts over. Aside from the sector errors, smart doesn't report any other errors on the drive. I'm assuming I need to replace this disk. Is this a correct assumption?

Any help would be greatly appreciated. I'm struggling to determine if my drive is actually faulty or just has a couple of bad sectors.

Thanks,
Todd
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
FreeNAS version? Hardware? Output of zpool status? smartctl -x /dev/adaX for all drives? [CODE][/CODE] tags, please.
 

Todd Nine

Dabbler
Joined
Nov 16, 2013
Messages
37
FreeNAS Version: FreeNAS-9.10.2-U2 (e1497f2)
Hardware:
AMD Athlon(tm) II X2 215 Processor
3809MB RAM
4 x 3TB Western Digital Caviar Green 3 TB SATA III 64 MB Cache

zpool status:

NOTE: I rebooted this morning.

Code:
[root@freenas] ~# zpool status -v
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h4m with 0 errors on Tue Mar 21 03:49:05 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: internal
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Apr 18 08:32:40 2017
		1.79T scanned out of 2.94T at 193M/s, 1h44m to go
		458G resilvered, 60.83% done
config:

	NAME											STATE	 READ WRITE CKSUM
	internal										ONLINE	   0	 0	 0
	  raidz1-0									  ONLINE	   0	 0	 0
		gptid/26f9791f-a046-11e4-bfe9-6805ca2ded8a  ONLINE	   0	 0	 0
		gptid/27e1a66f-a046-11e4-bfe9-6805ca2ded8a  ONLINE	   0	 0	 0
		gptid/19ea6310-2388-11e7-9230-6805ca2ded8a  ONLINE	   0	 0	 0  (resilvering)
		gptid/29c86639-a046-11e4-bfe9-6805ca2ded8a  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		internal/media:<0x2149>



Attached are the ADA logs. They're pretty large.

I noticed that ADA2 has the error of "Device Error Count: 1011 (device log contains only the most recent 24 errors)".
 

Attachments

  • ada1.txt
    7.7 KB · Views: 205
  • ada2.txt
    17.8 KB · Views: 223
  • ada3.txt
    10.4 KB · Views: 180
  • ada4.txt
    277 bytes · Views: 242

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Todd Nine

Dabbler
Joined
Nov 16, 2013
Messages
37
I'm only using it as a time machine backup for 3 laptops, so it sees very low traffic. Does it really take 8 GB for such little use?

Low ram doesn't seem to address the issue with /dev/ada2, given that none of my other devices are experiencing failure.

I understand your point on low RAM would greatly affect performance, and throughput, but is that actually the cause of my issue? It seems unlikely to me, wouldn't I simply just take a performance hit and utilize more swap?
 

indy

Patron
Joined
Dec 28, 2013
Messages
287
Low RAM affects more than performance, it can lead to any kind of erratic behavior.
Just as an AMD CPU can.
Plus Raid-Z1 has a good chance to poop itself if it encounters an unrecoverable error during resilver, ZFS depends on redundancy to fall back on during any operation.
Since you did not mention it, I am guessing you are not using ECC RAM?

Your setup strays very far from any recommendation which means errors are more likely to occur and are harder to diagnose.
I would consider destroying the pool.
If you are lucky you can get a backup out of the pool if needed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194

Todd Nine

Dabbler
Joined
Nov 16, 2013
Messages
37
Thanks all. Sounds like it's time for me to get a new chassis, board, and RAM!
 

Vito Reiter

Wise in the Ways of Science
Joined
Jan 18, 2017
Messages
232
Well, on top of the recommended 8GB of RAM you're supposed to have an extra 1GB for every TB you add of storage. Given your processor, you probably don't have ECC RAM and a RAM error could also be causing a bunch of issues that you're experiencing. I mean, not trying to be rude, but you're asking 16 cars to fit side by side on a 4 lane road if you can understand that. Also, you're trying to replace a drive that's spitting out errors with itself. I don't know, to me none of this makes sense because if I saw a drive spewing errors I would replace it with a known good or new drive. Really if you look at most of the best-practices threads you'll see that if you don't have minimum requirements, no one is going to be able to help you.

With ZFS there's a lot of quirks that almost make no sense like the 8GB minimum and maximum 11 disk vdevs, but those limitations are there for a reason. Hope you can figure it out, man. 8GB of RAM isn't too expensive, and will save you a huge headache.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Load Cycle Count on ada2 looks 1M+. WDIDLE3 should be applied.

I would guess that if the zpool status numbers don't show an increase in drive errors for ada2, then the scrub is restarting due to the metadata corruption. If no resilver can complete, then the next step would be backup and rebuild the pool.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It should be okay.
 
Status
Not open for further replies.
Top