ZFS error - recommendation moving to Freenas 11

Status
Not open for further replies.

usergiven

Dabbler
Joined
Jul 15, 2015
Messages
49
I'm not sure where to begin but I need help if anyone is willing.

Hardware - supermicro x10sl7 with xeon e3-1246 v3 with 32 GB of Crucial ECC DDR3 ram, 2x4tb (mirror) WD Blue (the new green)
Backup box - supermicro x10sll with intel pentium 3220 with 16 GB of Crucial ECC DDR3 ram, 2x2tb WD green mirror with 2x1TB WD green mirror as part of the same vDev

History: ran FN 9.10 and then jumped on the FN 10 wagon when it was released, when all hell broke loose I decided the rollup my sleeves and try zfs on linux through a couple of buildups and breakdowns over the last couple of months. Ubuntu 16.04 and Proxmox to be exact.

My goal is to install FN 11 but I'm not sure the right play here to make sure my data is intact. I have two backups, 1 that's about 2 weeks old via external USB that I straight copied (GRSYNC) the data via SMB share. The other copy is 1 week old via zfs send/receive replication to my backupbox

Earlier this week (on Proxmox), I received a smart alert stating that one of my drives had 2 current pending and 2 offline uncorrectable sectors. I ordered a replacement 4tb WD drive, A couple of short/long smart test failures later I decided that I was going to swap the drive out to see if I could RMA it back to WD. I did the ZFS replace deal, it reslivered, everything was OK. This morning, I received an checksum error on the drive that I used as a replacement. Started a pool scrub on both my active box and the backup box. The active box fixed 2.3K checksum errors after it finished. The backup box finished with no errors.

At this point I believed I had fixed the data but now, i'm about 1/3 of the way through another scrub and it's coming back with lots more checksum errors. I know I'm technically not using freenas right now but I want to get there and to be honest I'm scared for my data.

Question, how do I (should I even) install FN11 with this situation? I see a few different routes but I'm not sure which would actually solve the checksum issue. Should I destroy the current pool and recreate it using my backups?
 

usergiven

Dabbler
Joined
Jul 15, 2015
Messages
49
Thanks for the reply! Do you happen to have an idea what I should to to fix the issue? I'm honestly at a loss, even just being able to tell if my backups are safe would be victory enough.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Refresh your backup.
 

Cellobita

Contributor
Joined
Jul 15, 2011
Messages
107
Checksum errors are sometimes caused by bad cables.

I've had similar issues over the years, caused by faulty SATA cables or connectors. Another possibility is bad RAM, but since you use ECC memory, this is highly unlikely.
 

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Another possibility is bad RAM, but since you use ECC memory, this is highly unlikely.
While unlikely, ECC can and does go bad. It may be worth running a memtest for 24 hours to see if RAM might be a factor.
 

usergiven

Dabbler
Joined
Jul 15, 2015
Messages
49
Thanks for all your help! First thing I tried before ordering a replacement drive was changing sata cables and also relocating the drives to other sata ports on the motherboard, gave me the same issues.

Update: The 4tb (the first one which I will RMA) failed short and long tests after I even zeroed it out using WD's Lifeguard utilities. The second (and original other side of the mirror) 4TB drive I replaced it with in the pool, which was giving me checksum errors, passed all smart tests.

I'm currently running off my backup box which seems to be working just fine, I can replace a week's worth of files. I will run a memtest tonight and provide an update. I upgraded to FN11 on the backup box and things are smoother than 9.10!
 

usergiven

Dabbler
Joined
Jul 15, 2015
Messages
49
After 23 hours and change, no memtest errors
 
Status
Not open for further replies.
Top