zpool status issue

afassl · Jun 2, 2022

Hello,

after reading through the various threads I wasn't able to find a solution to my issue.
The zpool is marked as degraded due to one file:

# zpool status -v
pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:03 with 0 errors on Sat May 28 03:45:03 2022
config:
NAME STATE READ WRITE CKSUM
boot-pool ONLINE 0 0 0
ada0p2 ONLINE 0 0 0
errors: No known data errors
pool: raid5_lsi
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 13:11:48 with 1 errors on Thu Jun 2 01:29:23 2022
config:
NAME STATE READ WRITE CKSUM
raid5_lsi DEGRADED 0 0 0
gptid/7b19a7bf-79af-11e6-acb6-bcee7bd993c9 DEGRADED 0 0 2 too many errors
errors: Permanent errors have been detected in the following files:
raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad:/log/daemon.log

the

zpool scrub raid5_lsi
didn't solve the issue.
The Raid-Set itself is fine,

And I can't find the file itself:

raid5_lsi/.system 3.3T 785M 3.3T 0% /var/db/system
raid5_lsi/.system/cores 1.0G 19K 1.0G 0% /var/db/system/cores
raid5_lsi/.system/samba4 3.3T 91K 3.3T 0% /var/db/system/samba4
raid5_lsi/.system/syslog-342111fac902458e8468ba3ccee40d16 3.3T 1.0M 3.3T 0% /var/db/system/syslog-342111fac902458e8468ba3ccee40d16
raid5_lsi/.system/rrd-342111fac902458e8468ba3ccee40d16 3.3T 11M 3.3T 0% /var/db/system/rrd-342111fac902458e8468ba3ccee40d16
raid5_lsi/.system/configs-342111fac902458e8468ba3ccee40d16 3.3T 3.4M 3.3T 0% /var/db/system/configs-342111fac902458e8468ba3ccee40d16
raid5_lsi/.system/webui 3.3T 24K 3.3T 0% /var/db/system/webui
raid5_lsi/.system/services 3.3T 24K 3.3T 0% /var/db/system/services

root@truenas[/mnt/raid5_lsi]# cd /var/db/system
root@truenas[/var/db/system]# ls
configs-342111fac902458e8468ba3ccee40d16 rrd-342111fac902458e8468ba3ccee40d16 syslog-7f4d67ae16c94917b949456bb9f364ad
configs-7f4d67ae16c94917b949456bb9f364ad rrd-7f4d67ae16c94917b949456bb9f364ad update
cores samba4 webui
nfs-stablerestart services
nfs-stablerestart.bak syslog-342111fac902458e8468ba3ccee40d16
root@truenas[/var/db/system]# ls -l
total 17
drwxr-xr-x 3 root wheel 3 Mar 24 03:45 configs-342111fac902458e8468ba3ccee40d16
drwxr-xr-x 2 root wheel 2 Sep 13 2016 configs-7f4d67ae16c94917b949456bb9f364ad
drwxrwxr-x 2 root wheel 2 Jan 7 2009 cores
-rw-r--r-- 1 root wheel 24 Sep 13 2016 nfs-stablerestart
-rw-r--r-- 1 root wheel 24 Sep 28 2021 nfs-stablerestart.bak
drwxr-xr-x 3 root wheel 4 Jan 7 2009 rrd-342111fac902458e8468ba3ccee40d16
drwxr-xr-x 2 root wheel 2 Sep 13 2016 rrd-7f4d67ae16c94917b949456bb9f364ad
drwxr-xr-x 4 root wheel 14 Jan 7 2009 samba4
drwxr-xr-x 2 root wheel 2 Mar 22 08:26 services
drwxr-xr-x 3 root wheel 3 Mar 22 08:26 syslog-342111fac902458e8468ba3ccee40d16
drwxr-xr-x 2 root wheel 2 Sep 13 2016 syslog-7f4d67ae16c94917b949456bb9f364ad
drwxr-xr-x 2 root wheel 10 Apr 16 04:47 update
drwxr-xr-x 2 root wheel 2 Mar 22 08:26 webui
root@truenas[/var/db/system]# cd syslog-7f4d67ae16c94917b949456bb9f364ad
root@truenas[...slog-7f4d67ae16c94917b949456bb9f364ad]# ls
root@truenas[...slog-7f4d67ae16c94917b949456bb9f364ad]# ls -a
. ..

Any idea how to resolve this issue?

Thanks for any hint/point

Best regards

Andreas

anodos · Jun 2, 2022

To clarify: are you running ZFS on top of hardware RAID5?

afassl · Jun 2, 2022

Hi,

yes - this is a LSI-Raid-Controller:

<<<megaraid_pdisks>>>
Enclosure Device ID: 252
Slot Number: 0
Enclosure position: N/A
Device Id: 5
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1338P4HUENDBHGST HDN724040ALE640 MJAOA5E0
Enclosure Device ID: 252
Slot Number: 1
Enclosure position: N/A
Device Id: 6
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1338P4HTY2ZBHGST HDN724040ALE640 MJAOA5E0
Enclosure Device ID: 252
Slot Number: 2
Enclosure position: N/A
Device Id: 4
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1334PEHKT3VSHGST HDN724040ALE640 MJAOA5E0
Enclosure Device ID: 252
Slot Number: 3
Enclosure position: N/A
Device Id: 7
Predictive Failure Count: 0
Raw Size: 3.638 TB [0x1d1c0beb0 Sectors]
Firmware state: Online, Spun Up
Inquiry Data: PK1338P4HTY1HBHGST HDN724040ALE640 MJAOA5E0
<<<megaraid_ldisks>>>
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Size : 10.914 TB
Sector Size : 512
Parity Size : 3.637 TB
State : Optimal
Strip Size : 256 KB
Number Of Drives : 4

Ericloewe · Jun 2, 2022

So your RAID controller is - unsurprisingly - allowing data to be corrupted (or actively corrupting it in the first place). Just one of the many reasons why you should not use hardware RAID with ZFS.

afassl · Jun 2, 2022

Hm, Not sure if this is really an issue with the HW-Raid.
This is the "corrupted" file

raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad:/log/daemon.log

But it is on a not mounted filesystem, the syslogs are here:

raid5_lsi/.system/syslog-342111fac902458e8468ba3ccee40d16 3.3T 1.0M 3.3T 0% /var/db/system/syslog-342111fac902458e8468ba3ccee40d16

Possible to mount the old mount point?

this is what I've found in the history (I've got no documentation, as the former system admin died a couple of months ago, and someone else tried things there, but with no documentation as well)

root@truenas[~]# zpool history | grep syslog
This one was created new

2022-03-22.08:26:37 zfs create -o mountpoint=legacy -o readonly=off boot-pool/.system/syslog-342111fac902458e8468ba3ccee40d16
And this looks like the initial one. possible mount?

2016-09-13.05:42:09 zfs create -o mountpoint=legacy raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad

Ericloewe · Jun 2, 2022

afassl said:
Hm, Not sure if this is really an issue with the HW-Raid.

Why not? Do you have hashes made independently of ZFS that lead you to believe otherwise? Bugs in ZFS that lead to data corruption are unlikely, but not impossible.

afassl said:
This is the "corrupted" file

raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad:/log/daemon.log

But it is on a not mounted filesystem, the syslogs are here:

raid5_lsi/.system/syslog-342111fac902458e8468ba3ccee40d16 3.3T 1.0M 3.3T 0% /var/db/system/syslog-342111fac902458e8468ba3ccee40d16

Possible to mount the old mount point?

this is what I've found in the history (I've got no documentation, as the former system admin died a couple of months ago, and someone else tried things there, but with no documentation as well)

root@truenas[~]# zpool history | grep syslog
This one was created new

2022-03-22.08:26:37 zfs create -o mountpoint=legacy -o readonly=off boot-pool/.system/syslog-342111fac902458e8468ba3ccee40d16
And this looks like the initial one. possible mount?

2016-09-13.05:42:09 zfs create -o mountpoint=legacy raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad

Well, one of my systems also has a second, mystery syslog dataset. Not entirely sure what that's all about, but it could be cruft from an older version of FreeNAS. In any case, the immediate issue is easy to resolve, assuming the pool doesn't have additional damage. Delete the offending dataset (raid5_lsi/.system/syslog-7f4d67ae16c94917b949456bb9f364ad) and this should be resolved.

That would then leave the matter of hardware RAID, which still needs to be solved. Here are some relevant Resources to get you started on this:

Introduction to ZFS

This is a short introduction to ZFS. It is really only intended to convey the bare minimum knowledge needed to start diving into ZFS and is in no way meant to cut Michael W. Lucas' and Allan Jude's book income. It is a bit of a spiritual...

www.truenas.com

What's all the noise about HBAs, and why can't I use a RAID controller?

1) An HBA is a Host Bus Adapter. This is a controller that allows SAS and SATA devices to be attached to, and communicate directly with, a server. RAID controllers typically aggregate several disks into a Virtual Disk abstraction of some sort...

www.truenas.com

Hardware Recommendations Guide

This is the latest edition of the FreeNAS Community hardware recommendations guide. The current major version is R2, dated January 2021, with the last minor update on 2021-01-24. The format has moved away from the forum post form factor, to...

www.truenas.com

Jailer · Jun 3, 2022

afassl said:
Hm, Not sure if this is really an issue with the HW-Raid

Wish I had a nickel for every time a new forum member posts this.

Arwen · Jun 3, 2022

@afassl - To be clear, ZFS without it's own redundancy, (which underlying RAID-5 does not provide), means that ZFS scrubs can not repair any data. Thus, your only option is to delete, and potentially restore from backups. (But, that file does not seem to be important at present.)

Their are exceptions. ZFS by default has 2 copies of metadata, and 3 copies of critical metadata. I have personally seen my media server claim a checksum error on a non-redundant pool, (for my media, which has multiple backups), but no file listed. Took me a while to figure out it must have lost a block in metadata, and was able to automatically correct it.

Last, on a ZFS non-redundant pool, their are options for data redundancy. Specifically "copies=2", which writes 2 copies of data on any dataset with that option set. (A somewhat poor man's Mirroring.)

Important Announcement for the TrueNAS Community.

zpool status issue

afassl

Cadet

anodos

Sambassador

afassl

Cadet

Ericloewe

Server Wrangler

afassl

Cadet

Ericloewe

Server Wrangler

Introduction to ZFS

What's all the noise about HBAs, and why can't I use a RAID controller?

Hardware Recommendations Guide

Jailer

Not strong, but bad

Arwen

MVP

Similar threads