how to repair permanent errors that's just hex code pointers?

phx7

Cadet
Joined
Aug 7, 2022
Messages
3
a couple of weeks ago I built a dedicated NAS with TrueNAS installed, after spending the night moving all my files over I did a pool scrub and it returned hundreds of "permanent errors", this was caused by a mix of bad memory sticks and improperly seated SATA cables, I've since fixed both issues and re-copied all the bad files. I also ran SMART tests on all my drives multiple times and had 0 errors.
after replacing all the corrupted files I was left with just 1 entry that looked like this: <pool _name>:/<dataset_name>/
so I thought maybe something related to the dataset's metadata was corrupted so I created a new dataset and moved all the files over, and then I deleted the dataset, and now the error reads like this:

Code:
        NAME                                            STATE     READ WRITE CKSUM
        pool_name123                                    DEGRADED     0     0     0
          raidz1-0                                      DEGRADED     0     0     0
            gptid/6c495656-0dd2-11ed-b1cc-d85ed3d9ed1f  ONLINE       0     0 30.1K
            gptid/6c53b063-0dd2-11ed-b1cc-d85ed3d9ed1f  ONLINE       0     0 30.1K
            gptid/6c3dfdab-0dd2-11ed-b1cc-d85ed3d9ed1f  DEGRADED     0     0 30.1K  too many errors
        cache
          gptid/6b22f26d-0dd2-11ed-b1cc-d85ed3d9ed1f    ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        <0xffffffffffffffff>:<0x9b4>


and this is where I'm stuck, I cant think of anything else to try to try to get rid of this error. I did a bunch of searching around and tried the "start a scrub then stop it immediate" method to no avail. I also tried to export+import the pool and renaming it. I also tried to reinstall my jails/plugins and did multiple clears and scrubs.
the only thing left I can think of is just backup the files and delete the pool entirely and restart fresh. does anyone have any other ideas I can try? Thanks!

though in the end the goal is just to get rid of the annoying "DEGRADED" label, the drives themselves are clearly ok.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
That's metadata.

To repair it, you'll need to re-create the pool.

In order for metadata to become corrupt, you need several copies to have issues, so you might want to think about how reliable your disks/cables are.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
zpool clear?
But as @sretalla says - you have a permanent issue and zpool clear may / will only hide the problem, not fix it
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@phx7 Please list your hardware.

Certain configurations are known to cause issues, and may appear to work for a while until they don't work, causing weird problems like you have shown.
 

phx7

Cadet
Joined
Aug 7, 2022
Messages
3
hardware: (all bought new)
ASUS PRIME B450M-A II
Ryzen 5 3600
G.SKILL 32G DDR4 3200
OS: Kingston A400 120GB
Cache: WD Red 500GB M.2
Storage: 3x WD Red Plus 8TB
PSU: EVGA 550W

here is a full timeline of what happened so far:
- finished the built and installed TrueNAS, all hardware components were recognized in BIOS so I started copying all my files over (mostly media files)
- during the move I noticed the NAS was periodically restarting, after some investigation it turned out it was due to a bad memory stick that would crash under load. I've replaced them after
- kept moving files and ran pool scrubs, noticed a large number of files were marked as "permanently corrupted", at first I thought this was caused by the bad RAM sticks so I just replaced the files while moving even more files over.
- started noticing even some of the new files were marked as corrupted by pool scrubs, did a bunch of research and determined it could be caused by poorly seated SATA cables, so I reseated all of them.
- moved even more files and finished manually replacing all the corrupted files (talking about hundreds of video files)
- ended up with just 1 error left and in top post I described the methods I took so far to fix it.
- have since done multiple scrubs with no additional errors popping up.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Their appear to be some AMD Ryzen power save features that don't work well with TrueNAS CORE. I don't remember what they are, but search the forums on how to disable them in the BIOS.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Their appear to be some AMD Ryzen power save features that don't work well with TrueNAS CORE. I don't remember what they are, but search the forums on how to disable them in the BIOS.
Those would be the C6 power state and the "Cool N Quiet" technology, I believe.
 

phx7

Cadet
Joined
Aug 7, 2022
Messages
3
hmm is there no way to edit my original post? anyway I just wanted to come back to give an update in case future people find this post searching for answers.

I ended up deleting my entire pool and starting a new one to fix the stuck error, NAS has been running completely fine since then without any other errors popping up after multiple scrubs/smart tests.

I also made a mistake in my hardware post, my mobo is actually an Gigabyte A520I AC

I had "C state" on "Auto" and "Cool N Quiet" on "Auto" this entire time too without seeing any issues, I've just acquired a spare GPU so I can go into the BIOS to turn both off, will report back if turning them off ends up causing issues.
 
Top