Unrecoverable erros but no known data errors on single disk

TheColin21

Dabbler
Joined
Jul 2, 2023
Messages
19
Hi, I am currently doing something very insecure for my data, which I am aware of.
I upgraded my NAS from 2 to 3 disks and am currently moving my data from my old mirror zpool (which I removed a disk from) to my new raidz1 zpool (which currently misses a disk too).

I am aware that I currently have no redundancy whatsoever. The data on the set is replacable.
I did a scrub right before the migration (which finished with no errors) to decrease the risk but I am prepared to lose this data.

The source pool now went from "no redundancy but healthy" to "degraded" as there were unrecoverable errors.

According to zpool status -vx there were 3 read and 15 checksum errors, but there are no known data errors.

SMART is healthy for the disk.

As I have no redundancy to repair the errors I do not understand how so many errors could not cause any data errors.

Do I have corrupt files or not?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,694
Hi, I am currently doing something very insecure for my data, which I am aware of.
I upgraded my NAS from 2 to 3 disks and am currently moving my data from my old mirror zpool (which I removed a disk from) to my new raidz1 zpool (which currently misses a disk too).

I am aware that I currently have no redundancy whatsoever. The data on the set is replacable.
I did a scrub right before the migration (which finished with no errors) to decrease the risk but I am prepared to lose this data.

The source pool now went from "no redundancy but healthy" to "degraded" as there were unrecoverable errors.

According to zpool status -vx there were 3 read and 15 checksum errors, but there are no known data errors.

SMART is healthy for the disk.

As I have no redundancy to repair the errors I do not understand how so many errors could not cause any data errors.

Do I have corrupt files or not?

I doubt you have corrupt files.... checksums protect against that. The worst is missing pieces of files which should be reported as data errors.

The read errors can result in a retry which then is OK.

Checksum errors can be on metadata or data. If on metadata, there is generally a second copy stored.

The checksum errors.. if on data, maybe there is a retry to read the data? I don't know. ZFS does a lot of things to recover from errors. You are pushing into unchartered territory with this config.
 

TheColin21

Dabbler
Joined
Jul 2, 2023
Messages
19
Thanks for your answer.
Just to be sure, here is the output of zpool status -v:

Code:
  pool: hdd_raid
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 22:01:37 with 0 errors on Fri Oct  6 21:16:13 2023
config:

        NAME                                    STATE     READ WRITE CKSUM
        hdd_raid                                DEGRADED     0     0     0
          3a0aa053-41b7-4433-bf5d-cf384657ff81  DEGRADED     3     0    15  too many errors

errors: No known data errors


I just hope, that this indeed means, that my data is still intact and that ZFS was able to repair it by trying reads again, using CRC and so on.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I just hope, that this indeed means, that my data is still intact and that ZFS was able to repair it by trying reads again, using CRC and so on.
What it's telling you is that the errors were not found in sectors on disk that contained any of your files.

If you care about that data, you need a copy of it on another healthy disk/pool ASAP.
 

TheColin21

Dabbler
Joined
Jul 2, 2023
Messages
19
Thanks a lot for that clarification. I am currently replicating all data on that disk.
It's just a weird error considering the disk is just a few weeks old, a scrub yesterday showed no errors and there are no recorded SMART errors either.
 

TheColin21

Dabbler
Joined
Jul 2, 2023
Messages
19
Just FYI: The replication and the resilvering of the new pool with the old drive afterwards both finished without data errors. Thanks for your help.
 
Top