Help with infinite resilver

Todd Nine · Apr 18, 2017

Hi all,
I have a drive /dev/ada2 that has 7 bad sectors on it. I've attempted to use the dd + scrub method to overwrite the sectors. In every case dd fails with an I/O error.

I then removed this drive, wiped it, then re-added it to my array. My resilver process seems to be stuck in an infinite loop. It takes about four hours to complete, then just starts over. Aside from the sector errors, smart doesn't report any other errors on the drive. I'm assuming I need to replace this disk. Is this a correct assumption?

Any help would be greatly appreciated. I'm struggling to determine if my drive is actually faulty or just has a couple of bad sectors.

Thanks,
Todd

Ericloewe · Apr 18, 2017

FreeNAS version? Hardware? Output of zpool status? smartctl -x /dev/adaX for all drives? [CODE][/CODE] tags, please.

Todd Nine · Apr 18, 2017

FreeNAS Version: FreeNAS-9.10.2-U2 (e1497f2)
Hardware:
AMD Athlon(tm) II X2 215 Processor
3809MB RAM
4 x 3TB Western Digital Caviar Green 3 TB SATA III 64 MB Cache

zpool status:

NOTE: I rebooted this morning.

Code:

[root@freenas] ~# zpool status -v
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h4m with 0 errors on Tue Mar 21 03:49:05 2017
config:

	NAME		STATE	 READ WRITE CKSUM
	freenas-boot  ONLINE	   0	 0	 0
	  da0p2	 ONLINE	   0	 0	 0

errors: No known data errors

  pool: internal
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Apr 18 08:32:40 2017
		1.79T scanned out of 2.94T at 193M/s, 1h44m to go
		458G resilvered, 60.83% done
config:

	NAME											STATE	 READ WRITE CKSUM
	internal										ONLINE	   0	 0	 0
	  raidz1-0									  ONLINE	   0	 0	 0
		gptid/26f9791f-a046-11e4-bfe9-6805ca2ded8a  ONLINE	   0	 0	 0
		gptid/27e1a66f-a046-11e4-bfe9-6805ca2ded8a  ONLINE	   0	 0	 0
		gptid/19ea6310-2388-11e7-9230-6805ca2ded8a  ONLINE	   0	 0	 0  (resilvering)
		gptid/29c86639-a046-11e4-bfe9-6805ca2ded8a  ONLINE	   0	 0	 0

errors: Permanent errors have been detected in the following files:

		internal/media:<0x2149>

Attached are the ADA logs. They're pretty large.

I noticed that ADA2 has the error of "Device Error Count: 1011 (device log contains only the most recent 24 errors)".

Ericloewe · Apr 18, 2017

Todd Nine said:
3809MB RAM

That's less than half the minimum amount of RAM.

Todd Nine · Apr 18, 2017

I'm only using it as a time machine backup for 3 laptops, so it sees very low traffic. Does it really take 8 GB for such little use?

Low ram doesn't seem to address the issue with /dev/ada2, given that none of my other devices are experiencing failure.

I understand your point on low RAM would greatly affect performance, and throughput, but is that actually the cause of my issue? It seems unlikely to me, wouldn't I simply just take a performance hit and utilize more swap?

indy · Apr 18, 2017

Low RAM affects more than performance, it can lead to any kind of erratic behavior.
Just as an AMD CPU can.
Plus Raid-Z1 has a good chance to poop itself if it encounters an unrecoverable error during resilver, ZFS depends on redundancy to fall back on during any operation.
Since you did not mention it, I am guessing you are not using ECC RAM?

Your setup strays very far from any recommendation which means errors are more likely to occur and are harder to diagnose.
I would consider destroying the pool.
If you are lucky you can get a backup out of the pool if needed.

Ericloewe · Apr 18, 2017

Todd Nine said:
Does it really take 8 GB for such little use?

They're called minimum requirements. Believe me, I'd like them to be lower, but they are what they are.

Todd Nine said:
is that actually the cause of my issue?

ZFS going haywire is a quintessential low RAM problem.

Todd Nine · Apr 18, 2017

Thanks all. Sounds like it's time for me to get a new chassis, board, and RAM!

Vito Reiter · Apr 18, 2017

Well, on top of the recommended 8GB of RAM you're supposed to have an extra 1GB for every TB you add of storage. Given your processor, you probably don't have ECC RAM and a RAM error could also be causing a bunch of issues that you're experiencing. I mean, not trying to be rude, but you're asking 16 cars to fit side by side on a 4 lane road if you can understand that. Also, you're trying to replace a drive that's spitting out errors with itself. I don't know, to me none of this makes sense because if I saw a drive spewing errors I would replace it with a known good or new drive. Really if you look at most of the best-practices threads you'll see that if you don't have minimum requirements, no one is going to be able to help you.

With ZFS there's a lot of quirks that almost make no sense like the 8GB minimum and maximum 11 disk vdevs, but those limitations are there for a reason. Hope you can figure it out, man. 8GB of RAM isn't too expensive, and will save you a huge headache.

rs225 · Apr 18, 2017

Load Cycle Count on ada2 looks 1M+. WDIDLE3 should be applied.

I would guess that if the zpool status numbers don't show an increase in drive errors for ada2, then the scrub is restarting due to the metadata corruption. If no resilver can complete, then the next step would be backup and rebuild the pool.

Todd Nine · Apr 18, 2017

So a follow up. I'm looking at getting a refurbished model to stay on a lower budget. Specifically this system from NewEgg.

https://www.supermicro.com/products/system/1u/6016/sys-6016t-ntf.cfm

It seems primarily Intel components. Do you guys see anything in that hardware list that would not be compatible with FreeNAS?

Ericloewe · Apr 18, 2017

It should be okay.

Important Announcement for the TrueNAS Community.

Help with infinite resilver

Todd Nine

Dabbler

Ericloewe

Server Wrangler

Todd Nine

Dabbler

Attachments

Ericloewe

Server Wrangler

Todd Nine

Dabbler

indy

Patron

Ericloewe

Server Wrangler

Todd Nine

Dabbler

Vito Reiter

Wise in the Ways of Science

rs225

Guru

Todd Nine

Dabbler

Ericloewe

Server Wrangler

Similar threads

Important Announcement for the TrueNAS Community.

Help with infinite resilver

Dabbler

Server Wrangler

Dabbler

Attachments

Server Wrangler

Dabbler

Patron

Server Wrangler

Dabbler

Wise in the Ways of Science

Guru

Dabbler

Server Wrangler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Help with infinite resilver"

Similar threads