ZFS status is UNKNOWN after upgrade to v28

Status
Not open for further replies.

sampledi

Dabbler
Joined
Nov 23, 2012
Messages
13
Hi, I wanted to post this thread earlier but I was away.
System:
FreeNAS-8.3.0-RELEASE-p1-x64 (r12825)
Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz
8GB RAM
4 port Intel server adapter with link aggregation
4x2TB RAID 0 (I know, I know, I have a backup, and backup of that backup :) )
also additional 1x3TB and 640GB drives for some files.
AFP / Netatalk sharing

I was running Freenas with ZFS v15 couple of months without any issues. The same day I updated to v28 I started getting "status UNKOWN" warnings.
- According to web gui volume status is "healthy"
- zpool status report errors
- When I run scrub some of the errors are repaired, but than some new came in
- If I replace problematic files, again some new files have permanent errors
- Most of the files with errors are video files, and they are not damaged (probably checksum is somehow changed?).
- Just in case I have replaced PSU and RAM, again same thing
- HDDs tested

I know that this look like hardware failure, but again why everything started right after upgrade to v28? Also I believe there is at least one more thread with similar issue.

Thank you in advance.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'm sure its just a random coincidence that you upgraded and then started having issues. More than likely you have a failing disk. If it is a failing disk you can expect to continue to have corrupted files until you replace the failing disk.
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
I know that this look like hardware failure, but again why everything started right after upgrade to v28? Also I believe there is at least one more thread with similar issue.

I don't know the answer to your question, but it seems you're insinuating the upgrade to v28 is behind the error. I find it far more likely that, if anything, v28 exposed errors rather than causing them. And even far more likely is it's just a coincidence as noobsauce80 said. Either way, you can shut down the NAS, boot up Ultimate BootCD or something and run a SMART long test on it. That will have absolutely nothing to do with ZFS or FreeNAS. Then you can see what your drive looks like and know what you should do. If it's a high availability server (you can't shut it down), then you can run a SMART long test from the shell. I was just trying to remove FreeNAS from the equation to ease your mind about FreeNAS.
 

sampledi

Dabbler
Joined
Nov 23, 2012
Messages
13
Thank you guys.
I was just trying to remove FreeNAS from the equation to ease your mind about FreeNAS.
I'm sure this is not Freenas issue because I have tried booting SmartOS, imported zfs volumes, and still getting the same errors. v28 has definitely exposed something new and it is interesting that I'm getting errors on all drives (RAID and single drives). I will run SMART test ASAP, but it is hard to believe that 3 or more drives are falling at the same time (NAS is behind UPS, there was no power failures and other unexpected situations). Is it possible that bad RAM can cause behaviour like this (random corruption on files that are written much earlier)?
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
Is it possible that bad RAM can cause behaviour like this (random corruption on files that are written much earlier)?

Sure. But given that you've replaced RAM (and I assume you're using spec's RAM and aren't overclocking), I'd be checking cables.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Is it possible that bad RAM can cause behaviour like this (random corruption on files that are written much earlier)?
Yes. If some of the RAM is bad then the data or checksum will be 'changed' and consequently won't match.

See [post=48046]jgreco's post[/post] for additional possibilities.
 

sampledi

Dabbler
Joined
Nov 23, 2012
Messages
13
Thank you all.
Long disk and memtest are next. I have just re-checked the cables and haven't found any problems there.
 

sampledi

Dabbler
Joined
Nov 23, 2012
Messages
13
Just to close this thread - it was bad RAM. Actually one bad stick in both 4x2GB sets!? v28 has exposed those errors and I'm glad that is fixed.
Now I'm even bigger fan of ZFS and Freenas :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I highly recommend you do a scrub after you get your system up and running with good RAM. :) Hopefully any errors found can be fixed :D
 

sampledi

Dabbler
Joined
Nov 23, 2012
Messages
13
Yes, I did that already when good RAM is installed, and all errors were fixed.
It is interesting that this same system was running Debian / XFS / Netatalk before Freenas. Of course there was no warnings about file corruption and I have found some damaged frames in Quicktime files.
It is great to know that ZFS really work as advertised :)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You were pretty lucky. It sure looks like you caught it early. Your diligence to get the problem fixed vice ignoring what may have looked like a bug in FreeNAS is probably why you have your zpool today. Someone else in the forum experienced what was a total zpool corruption from bad RAM and lost everything. Unfortunately he had no backups. :(
 
Status
Not open for further replies.
Top