Data corruption error message > Raidz2 > would Raidz2 Scrub not have corrected it?

HillTopsGM · Jan 27, 2020

Hello Everyone.
I've had FreeNAS set up for at least 5 years (if not more).
I set it up and let it go . . . never really had to touch it, so I am very much a novice.

Here is what I have:

Build FreeNAS-8.3.1-RELEASE-p2-x64 (r12686+b770da6_dirty)
Platform AMD E-350 Processor
Memory 7772MB
4x2TB Hatachi HDD's Set up with Raidz2

Generally I don't leave the NAS running ALL the time (it may be off for a couple of weeks before I turn it on to access something).
When I do turn it on, I usually leave it on so it can do a Scrub if it has been off for a while.

Well for the first time ever, post scrub, I had this message emailed to me that showed an error:

Removing stale files from /var/preserve:

Cleaning out old system announcements:

Backup passwd and group files:

Verifying group file syntax:
/etc/group is fine

Backing up package db directory:

Disk status:
Filesystem Size Used Avail Capacity Mounted on
/dev/ufs/FreeNASs2a 926M 381M 470M 45% /
devfs 1.0k 1.0k 0B 100% /dev
/dev/md0 4.6M 3.2M 981k 77% /etc
/dev/md1 823k 2.5k 755k 0% /mnt
/dev/md2 149M 16M 121M 12% /var
/dev/ufs/FreeNASs4 19M 1.9M 16M 10% /data
bin1 3.5T 3.2T 286G 92% /mnt/bin1
volume1 129G 943M 128G 1% /mnt/volume1
volume1/admin 1.8T 1.7T 128G 93% /mnt/volume1/admin
volume1/user2 128G 77k 128G 0% /mnt/volume1/user2

Last dump(s) done (Dump '>' file systems):

Checking status of zfs pools:
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
bin1 7.25T 6.56T 707G 90% 1.00x ONLINE /mnt
volume1 1.81T 1.66T 158G 91% 1.00x ONLINE /mnt

pool: bin1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub in progress since Sun Jan 26 22:09:56 2020
4.61T scanned out of 6.56T at 277M/s, 2h3m to go
0 repaired, 70.25% done
config:

NAME STATE READ WRITE CKSUM
bin1 ONLINE 0 0 6
raidz2-0 ONLINE 0 0 12
gptid/74271bb8-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0 12
gptid/74d1a118-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0 11
gptid/75c1c7f5-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0 19
gptid/766b7256-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0 12

errors: 6 data errors, use '-v' for a list

Checking status of ATA raid partitions:

Checking status of gmirror(8) devices:

Checking status of graid3(8) devices:

Checking status of gstripe(8) devices:

Network interface status:
Name Mtu Network Address Ipkts Ierrs Idrop Opkts Oerrs Coll
usbus 0 <Link#1> 0 0 0 0 0 0
usbus 0 <Link#2> 0 0 0 0 0 0
usbus 0 <Link#3> 0 0 0 0 0 0
usbus 0 <Link#4> 0 0 0 0 0 0
usbus 0 <Link#5> 0 0 0 0 0 0
re0 1500 <Link#6> f4:6d:04:d9:8f:51 195965454 0 0 245839538 0 0
re0 1500 192.168.1.0 192.168.1.7 195442833 - - 245838294 - -
usbus 0 <Link#7> 0 0 0 0 0 0
usbus 0 <Link#8> 0 0 0 0 0 0
lo0 16384 <Link#9> 42768 0 0 42768 0 0
lo0 16384 fe80::1%lo0 fe80::1 0 - - 0 - -
lo0 16384 localhost ::1 2 - - 2 - -
lo0 16384 your-net localhost 42766 - - 42766 - -

Security check:
(output mailed separately)

Checking status of 3ware RAID controllers:
Alarms (most recent first):
No new alarms.

-- End of daily output --

I wasn't sure what that meant exactly.
I logged in to the admin section via a Browser, and everything SEEMED to be OK there, and I was able to access all my files over the network normally, so I opened SHELL and ran zpool status -v

Here is the output for that:

pool: bin1
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scan: scrub repaired 0 in 6h56m with 3 errors on Mon Jan 27 05:06:02 2020
config:

NAME STATE READ WRITE CKS
UM
bin1 ONLINE 0 0
6
raidz2-0 ONLINE 0 0
12
gptid/74271bb8-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0
12
gptid/74d1a118-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0
11
gptid/75c1c7f5-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0
19
gptid/766b7256-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0
--More--(byte 899)
gptid/75c1c7f5-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0
19
gptid/766b7256-5554-11e2-9a32-f46d04d98f51 ONLINE 0 0
12

errors: Permanent errors have been detected in the following files:

/mnt/bin1/001_My_Operational_VMs/abc.vdi
/mnt/bin1/Styles/def.vdi
/mnt/bin1/Videos/xyz.mp4

pool: volume1
state: ONLINE
scan: scrub repaired 0 in 5h41m with 0 errors on Sun Jan 26 05:41:41 2020
config:

NAME STATE READ WRITE CKSUM
volume1 ONLINE 0 0 0
gptid/004b94fb-0755-11e1-be94-f46d04d98f51 ONLINE 0 0 0

errors: No known data errors
[root@freenas ~]#

That made me feel a little better as I have those files backed up elsewhere.
So far the experience has me wondering about what to expect from FreeNAS.
I realise that there have been SIGNIFICANT updates to the OS (I hope to get around to upgrading sometime soon).

For now hear are my questions:

1. I was under the impression that with a raidz2 setup that when a scrub happens it should have detected there was an issue with one of the copies, and then fixed the one that was corrupted. Did both copies get corrupted here?

2. In the emailed report it was suggested that there were 6 corrupted files, yet zpool status -v showed 3 . . . is that because both copies that were created in the raidz2 pool were corrupted?

3. I was under the impression that with raidz2 setups, the chances of corrupted files was supposed to be very low (almost non-existent) because of the scrubs and file checking. Could there be something else going on here?

Here are a couple of images from my admin GUI (not sure if this helps):

Thanks for your consideration.

HillTopsGM · Jan 27, 2020

Just found the System "Specifics" . . . not sure if this helps:

Platform ASUS E35 M1-M Pro - AMD E-350 Processor
Memory 3678MB (Patriot Gamer 2 Series DDR3 4GB PC3-12800)
2 x Hitachi 2TB Deskstar 7K3000 SATA III Internal Drive w/ 64MB Cache
OS Version FreeBSD 8.2-RELEASE-p3

HillTopsGM · Jan 27, 2020

Apologies . . . not sure why I can't edit that last post (feeling kinda stupid) . . . THIS is what I have going in my FreeNAS setup:

Platform ASUS E35 M1-M Pro - AMD E-350 Processor
Memory 7774MB (Patriot Gamer 2 Series DDR3 2 x 4GB PC3-12800)
2 x Hitachi 2TB Deskstar 7K3000 SATA III Internal Drive w/ 64MB Cache & 2 x Seagate 2TB Barracuda SATA III w/ 64MB Cache
in a Raidz2
Build FreeNAS-8.3.1-RELEASE-p2-x64 (r12686+b770da6_dirty)

Again, not sure why I can't edit my posts. Am I missing something?
Thanks again for the help.

G8One2 · Jan 27, 2020

Looks like thats from not using ECC RAM. This is exactly why its pushed to use it. Your data is likely no good. Desktop gaming hardware is not recommended for FreeNas. Read the following - Hardware recommendations (read this first) | iXsystems Community

Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!

Slideshow explaining VDev, zpool, ZIL and L2ARC and other newbie mistakes! I've put together a Powerpoint presentation(and PDF) that gives some useful info for newbies to FreeNAS. I decided to create this slideshow because in the last 5 months I've been on this forum I've seen a lot of people...

www.ixsystems.com

HillTopsGM · Jan 27, 2020

Thanks for the reply G8One2.
Is it likely it's just the files that are listed that are not good, or is it likely that there are other files that may have been corrupted and I just don't know it yet?

Any comment on question 2?

Thanks for the reply!

danb35 · Jan 28, 2020

HillTopsGM said:
In the emailed report it was suggested that there were 6 corrupted files, yet zpool status -v showed 3 . . . is that because both copies that were created in the raidz2 pool were corrupted?

It said there were six data errors; each file could have more than one error. But I think you have a fundamental misunderstanding of RAIDZn--it isn't accomplished by mirroring, where the system just stores multiple copies of all the data. Rather, it's done by storing parity--data from which ZFS can compute what the missing data would have been. Your pool configuration stores lots of parity--as much parity as data--but it's still computed parity, not a simple copy of the data.

The big problem you have is that your pool is showing checksum errors on all four disks, and apparently some of them have coincided to the point were some of your data is corrupted (ZFS can do a lot, but it isn't magic). This could be an issue with your RAM as previously suggested, or it could be with the disks themselves, your motherboard, cables, or pretty much anywhere else in the chain. But my concern is that the problem will grow. This would be a good time to make sure you have a complete backup of what's on that pool.

Important Announcement for the TrueNAS Community.

Data corruption error message > Raidz2 > would Raidz2 Scrub not have corrected it?

HillTopsGM

Cadet

HillTopsGM

Cadet

HillTopsGM

Cadet

G8One2

Patron

Slideshow explaining VDev, zpool, ZIL and L2ARC for noobs!

HillTopsGM

Cadet

danb35

Hall of Famer

Similar threads