zpool status issue

afassl

Cadet
Joined
Jun 2, 2022
Messages
3
Hello,

after reading through the various threads I wasn't able to find a solution to my issue.
The zpool is marked as degraded due to one file:

# zpool status -v
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:03 with 0 errors on Sat May 28 03:45:03 2022
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
errors: No known data errors
pool: raid5_lsi
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 13:11:48 with 1 errors on Thu Jun 2 01:29:23 2022
config:
NAME STATE READ WRITE CKSUM
raid5_lsi DEGRADED 0 0 0
gptid/7b19a7bf-79af-11e6-acb6-bcee7bd993c9 DEGRADED 0 0 2 too many errors
errors: Permanent errors have been detected in the following files:
raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad:/log/daemon.log


the

zpool scrub raid5_lsi
didn't solve the issue.
The Raid-Set itself is fine,

And I can't find the file itself:

raid5_lsi/.system 3.3T 785M 3.3T 0% /var/db/system
raid5_lsi/.system/cores 1.0G 19K 1.0G 0% /var/db/system/cores
raid5_lsi/.system/samba4 3.3T 91K 3.3T 0% /var/db/system/samba4
raid5_lsi/.system/syslog-342111fac902458e8468ba3ccee40d16 3.3T 1.0M 3.3T 0% /var/db/system/syslog-342111fac902458e8468ba3ccee40d16
raid5_lsi/.system/rrd-342111fac902458e8468ba3ccee40d16 3.3T 11M 3.3T 0% /var/db/system/rrd-342111fac902458e8468ba3ccee40d16
raid5_lsi/.system/configs-342111fac902458e8468ba3ccee40d16 3.3T 3.4M 3.3T 0% /var/db/system/configs-342111fac902458e8468ba3ccee40d16
raid5_lsi/.system/webui 3.3T 24K 3.3T 0% /var/db/system/webui
raid5_lsi/.system/services 3.3T 24K 3.3T 0% /var/db/system/services

root@truenas[/mnt/raid5_lsi]# cd /var/db/system
root@truenas[/var/db/system]# ls
configs-342111fac902458e8468ba3ccee40d16 rrd-342111fac902458e8468ba3ccee40d16 syslog-7f4d67ae16c94917b949456bb9f364ad
configs-7f4d67ae16c94917b949456bb9f364ad rrd-7f4d67ae16c94917b949456bb9f364ad update
cores samba4 webui
nfs-stablerestart services
nfs-stablerestart.bak syslog-342111fac902458e8468ba3ccee40d16
root@truenas[/var/db/system]# ls -l
total 17
drwxr-xr-x 3 root wheel 3 Mar 24 03:45 configs-342111fac902458e8468ba3ccee40d16
drwxr-xr-x 2 root wheel 2 Sep 13 2016 configs-7f4d67ae16c94917b949456bb9f364ad
drwxrwxr-x 2 root wheel 2 Jan 7 2009 cores
-rw-r--r-- 1 root wheel 24 Sep 13 2016 nfs-stablerestart
-rw-r--r-- 1 root wheel 24 Sep 28 2021 nfs-stablerestart.bak
drwxr-xr-x 3 root wheel 4 Jan 7 2009 rrd-342111fac902458e8468ba3ccee40d16
drwxr-xr-x 2 root wheel 2 Sep 13 2016 rrd-7f4d67ae16c94917b949456bb9f364ad
drwxr-xr-x 4 root wheel 14 Jan 7 2009 samba4
drwxr-xr-x 2 root wheel 2 Mar 22 08:26 services
drwxr-xr-x 3 root wheel 3 Mar 22 08:26 syslog-342111fac902458e8468ba3ccee40d16
drwxr-xr-x 2 root wheel 2 Sep 13 2016 syslog-7f4d67ae16c94917b949456bb9f364ad
drwxr-xr-x 2 root wheel 10 Apr 16 04:47 update
drwxr-xr-x 2 root wheel 2 Mar 22 08:26 webui
root@truenas[/var/db/system]# cd syslog-7f4d67ae16c94917b949456bb9f364ad
root@truenas[...slog-7f4d67ae16c94917b949456bb9f364ad]# ls
root@truenas[...slog-7f4d67ae16c94917b949456bb9f364ad]# ls -a
. ..

Any idea how to resolve this issue?

Thanks for any hint/point

Best regards

Andreas
 

afassl

Cadet
Joined
Jun 2, 2022
Messages
3
Hi,

yes - this is a LSI-Raid-Controller:

<<<megaraid_pdisks>>>
Enclosure Device ID: 252
Slot Number: 0
Enclosure position: N/A
Device Id: 5
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1338P4HUENDBHGST HDN724040ALE640 MJAOA5E0
Enclosure Device ID: 252
Slot Number: 1
Enclosure position: N/A
Device Id: 6
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1338P4HTY2ZBHGST HDN724040ALE640 MJAOA5E0
Enclosure Device ID: 252
Slot Number: 2
Enclosure position: N/A
Device Id: 4
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1334PEHKT3VSHGST HDN724040ALE640 MJAOA5E0
Enclosure Device ID: 252
Slot Number: 3
Enclosure position: N/A
Device Id: 7
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1338P4HTY1HBHGST HDN724040ALE640 MJAOA5E0
<<<megaraid_ldisks>>>
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Size : 10.914 TB
Sector Size : 512
Parity Size : 3.637 TB
State : Optimal
Strip Size : 256 KB
Number Of Drives : 4
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So your RAID controller is - unsurprisingly - allowing data to be corrupted (or actively corrupting it in the first place). Just one of the many reasons why you should not use hardware RAID with ZFS.
 

afassl

Cadet
Joined
Jun 2, 2022
Messages
3
Hm, Not sure if this is really an issue with the HW-Raid.
This is the "corrupted" file

raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad:/log/daemon.log

But it is on a not mounted filesystem, the syslogs are here:

raid5_lsi/.system/syslog-342111fac902458e8468ba3ccee40d16 3.3T 1.0M 3.3T 0% /var/db/system/syslog-342111fac902458e8468ba3ccee40d16

Possible to mount the old mount point?

this is what I've found in the history (I've got no documentation, as the former system admin died a couple of months ago, and someone else tried things there, but with no documentation as well)

root@truenas[~]# zpool history | grep syslog
This one was created new

2022-03-22.08:26:37 zfs create -o mountpoint=legacy -o readonly=off boot-pool/.system/syslog-342111fac902458e8468ba3ccee40d16
And this looks like the initial one. possible mount?

2016-09-13.05:42:09 zfs create -o mountpoint=legacy raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Hm, Not sure if this is really an issue with the HW-Raid.
Why not? Do you have hashes made independently of ZFS that lead you to believe otherwise? Bugs in ZFS that lead to data corruption are unlikely, but not impossible.

This is the "corrupted" file

raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad:/log/daemon.log

But it is on a not mounted filesystem, the syslogs are here:

raid5_lsi/.system/syslog-342111fac902458e8468ba3ccee40d16 3.3T 1.0M 3.3T 0% /var/db/system/syslog-342111fac902458e8468ba3ccee40d16

Possible to mount the old mount point?

this is what I've found in the history (I've got no documentation, as the former system admin died a couple of months ago, and someone else tried things there, but with no documentation as well)

root@truenas[~]# zpool history | grep syslog
This one was created new

2022-03-22.08:26:37 zfs create -o mountpoint=legacy -o readonly=off boot-pool/.system/syslog-342111fac902458e8468ba3ccee40d16
And this looks like the initial one. possible mount?

2016-09-13.05:42:09 zfs create -o mountpoint=legacy raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad
Well, one of my systems also has a second, mystery syslog dataset. Not entirely sure what that's all about, but it could be cruft from an older version of FreeNAS. In any case, the immediate issue is easy to resolve, assuming the pool doesn't have additional damage. Delete the offending dataset (raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad) and this should be resolved.

That would then leave the matter of hardware RAID, which still needs to be solved. Here are some relevant Resources to get you started on this:

 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@afassl - To be clear, ZFS without it's own redundancy, (which underlying RAID-5 does not provide), means that ZFS scrubs can not repair any data. Thus, your only option is to delete, and potentially restore from backups. (But, that file does not seem to be important at present.)

Their are exceptions. ZFS by default has 2 copies of metadata, and 3 copies of critical metadata. I have personally seen my media server claim a checksum error on a non-redundant pool, (for my media, which has multiple backups), but no file listed. Took me a while to figure out it must have lost a block in metadata, and was able to automatically correct it.

Last, on a ZFS non-redundant pool, their are options for data redundancy. Specifically "copies=2", which writes 2 copies of data on any dataset with that option set. (A somewhat poor man's Mirroring.)
 
Top