Pool data errors count reported differently for root and user

jakubjb

Dabbler
Joined
Feb 9, 2017
Messages
29
Hi!
I found something that I thought is impossible, namely "zpool status pool0" reporting different permanent data count for root and regular user (used by monitoring system). Btw, it's FreeNAS-11.2-U5.

Here's what root sees:
Code:
  pool: pool0
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 2.05T in 0 days 06:37:19 with 57 errors on Thu Jul 11 21:50:21 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool0                                           DEGRADED     0     0 6.43K
[..]
errors: 1 data errors, use '-v' for a list

Resilvering ended with 57 errors, but only 1 permanent data error shows in last line.

And here's what user zabbix-test sees:
Code:
  
  pool: pool0
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: resilvered 2.05T in 0 days 06:37:19 with 57 errors on Thu Jul 11 21:50:21 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        pool0                                           DEGRADED     0     0 6.43K
[...]
errors: 59 data errors, use '-v' for a list

Same 57 error after resilvering, but 59 permanent data errors reported.
Apart from different reporting for different users, why are numbers after resilvering and at the and of the output different?
My googling for this issue is unseccessful as for now.
 

jakubjb

Dabbler
Joined
Feb 9, 2017
Messages
29
Btw, It was the same number of 243 permanent data errors before one of 50 disk (5x 10 disk RAID-Z2) faulted and was replaced.
At that time, the reporting was consistent for root and zabbix-test.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
What's actually listed with the -v flag?
 

jakubjb

Dabbler
Joined
Feb 9, 2017
Messages
29
root:
Code:
errors: Permanent errors have been detected in the following files:

        pool0/Zvols/zvol-2-a:<0x1>

zabbix-test:
Code:
errors: List of errors unavailable (insufficient privileges)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
It's showing Metadata corruption. That probably caused the discrepancy, somehow.
 

jakubjb

Dabbler
Joined
Feb 9, 2017
Messages
29
Thanks for this information, but could You explain more or point me to some documentation? How do You know it's metadata corruption? Based on that difference between error count or user's output or both? It seems to be an assumption (don't take it as offensive statement, please, just an attempt to learn :smile: ).
The pool only says: restore file (in this case 40 TiB volume served via iSCSI).
How would You proceed with these? My experience with ZFS so far does not cover metadata corruption. Thanks in advance.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
How do You know it's metadata corruption?
The error is listed as an opaque pointer instead of a file.

How would You proceed with these?
Unfortunately, it's best to start over. The error seems to be in specific dataset, so you might be able to delete that one and restore it from backups.
 

jakubjb

Dabbler
Joined
Feb 9, 2017
Messages
29
Life, what can You do ;-)
Just to be clear, You write about "opaque pointer instead of a file" in this part:
Code:
pool0/Zvols/zvol-2-a:<0x1>

I always thought that ZFS wil not show specific files on a volume (zvol) as it's a block device. Or You mean exactly that, it's a pointer as a whole expression "pool0...0x1>. In that case, it's clear... somehow.
Anyway, thank You very much for Your help. I'll proceed with volume/zvol restoration.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,176
If it's a zvol, I'm actually not sure what it would show. The 0x1 part is still suggestive of metadata, though.
 

jakubjb

Dabbler
Joined
Feb 9, 2017
Messages
29
For the complete story. Auto scrub triggered yesterday.
Code:
scan: scrub repaired 0 in 0 days 08:38:32 with 227 errors on Tue Jul 16 06:38:37 2019

And now error count is consistent between users . ZFS keeps me learning :smile:
At the same time Supermicro keeps confusing, as it's their chassis and all the guts.
And the pool still finds cksum errors and sometimes data errors on this one zvol (volume, not dataset).
Thanks @Ericloewe.
 
Top