Pool degraded, Permanent errors present.

Tor243

Cadet
Joined
Mar 24, 2024
Messages
3
The setup:
4X Segate CONSTELLATION ES.3 HDD 7200rpm drives (Refurbished by the manufacturer, still got warranty on them.)
Ryzen 3 2200G
1 unspecified NVME M.2 drive (Some laptop one, this shouldn't be the root of the issue, works fine.)
GA-A320M-H Mobo.
24GB RAM (At first it was 2 sticks, one was corrupted, it did not detect or boot, then I used 1 16gb NEW stick, the same issue, now I'm using one 8 and one 16GB stick.)
TrueNAS-SCALE-23.10.2

Everything operates on a 1gig connection.

I've set up my first NAS. installed everything, worked perfectly. Then I started installing apps, and here's the issue. While doing that, I noticed one of the drives making a weird clicking sound (It's the sound when the head hits the surface, something you DON'T want to hear). It would OFF the drive every now and then for a few seconds, which caused some corruption (Only in the apps being installed, no data was moved at that point.)

Now here's my issue, I fixed that drive with a simple cable swap (No new cable, I just swapped the cable with another drive, even swapped the cables around from all drives). It doesn't go offline anymore. I ran it through HDTune to find bad sectors, none were found in the quick scan (Did not do the long one yet, it will take ages to do.) The cables were not the issue I think and hope. I'm still getting corruption not on that 1 specific drive, BUT ON 2 DRIVES, even with switching the "bad" cables I was thinking that they were, only these specific drives report errors. Sometimes it's only 64 units, other times it's 424 units or a bit more, never anything over 1k over the span of a few days.

Here's what zpool status spews out:

Code:
admin@truenas[~]$ sudo zpool status -v
  pool: DOM_DATA_ALL
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 02:15:33 with 13 errors on Sun Mar 24 22:20:14 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        DOM_DATA_ALL                              DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            d18f6763-6044-408d-b946-567923f24c6a  DEGRADED     0     0   424  too many errors
            4231ec6f-ab3a-4c62-9cc7-39bcb90a4694  DEGRADED     0     0   424  too many errors
            c4cba4c4-866d-4614-9ffd-9322a81c3ee7  ONLINE       0     0     0
            dccafb81-634a-4f72-adb3-674eff475320  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:<0x24e31>
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:<0x24d3e>
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:/github_com_truecharts_catalog_main/stable/rapidphotodownloader/4.2.0
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:/github_com_truecharts_catalog_main/stable/rcon-webadmin/8.1.1
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:/github_com_truecharts_catalog_main/stable/radarr/20.2.0
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:<0x24dcf>
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:/github_com_truecharts_catalog_main/stable/qwantify/3.1.2
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-24_01-00:/github_com_truecharts_catalog_main/stable/rdesktop
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:<0x24e31>
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:<0x24d3e>
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:/github_com_truecharts_catalog_main/stable/rapidphotodownloader/4.2.0
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:/github_com_truecharts_catalog_main/stable/rcon-webadmin/8.1.1
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:/github_com_truecharts_catalog_main/stable/radarr/20.2.0
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:<0x24dcf>
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:/github_com_truecharts_catalog_main/stable/qwantify/3.1.2
        DOM_DATA_ALL/ix-applications/catalogs@auto-2024-03-23_01-00:/github_com_truecharts_catalog_main/stable/rdesktop
        DOM_DATA_ALL/ix-applications/catalogs:<0x24e31>
        DOM_DATA_ALL/ix-applications/catalogs:<0x24d3e>
        /mnt/DOM_DATA_ALL/ix-applications/catalogs/github_com_truecharts_catalog_main/stable/rapidphotodownloader/4.2.0
        /mnt/DOM_DATA_ALL/ix-applications/catalogs/github_com_truecharts_catalog_main/stable/rcon-webadmin/8.1.1
        /mnt/DOM_DATA_ALL/ix-applications/catalogs/github_com_truecharts_catalog_main/stable/radarr/20.2.0
        DOM_DATA_ALL/ix-applications/catalogs:<0x24dcf>
        /mnt/DOM_DATA_ALL/ix-applications/catalogs/github_com_truecharts_catalog_main/stable/qwantify/3.1.2
        /mnt/DOM_DATA_ALL/ix-applications/catalogs/github_com_truecharts_catalog_main/stable/rdesktop

  pool: SSDPOOL
 state: ONLINE
config:

        NAME         STATE     READ WRITE CKSUM
        SSDPOOL      ONLINE       0     0     0
          nvme0n1p4  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:12 with 0 errors on Tue Mar 19 03:45:14 2024
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

Now, I know about bad connections, I've read through A LOT of forum posts trying to get this issue pinned down. I don't think anything is invalid with cabling. I re-seated the cables multiple times by now. I've ran multiple scrubs, as you see, nothing was repaired, with multiple errors.

I need help with going forward from this. I think I used up all of my options by now. I haven't done anything through SSH, besides clearing the errors to see what would happen. I also haven't swapped the drive for another one, don't have the resources to do that for now. A fresh install is also out of the window, since I've got about 3TB of data that I can't store anywhere for now. It NEEDS to be on that NAS, there's simply no other space to put it. Snapshots are also out of the window, since I've been going at this for at least 2 weeks. I've got only 5 snapshots going.

I have no clue what the best course of action would be. Run Seatools to fix the drives up? Contact the drive company for warranty? Do something specific in the file system? I'm lost.
 

Tor243

Cadet
Joined
Mar 24, 2024
Messages
3
Might add, SMART tests complete with no errors. I've only done the short ones for now, but those don't send any errors.
 

Tor243

Cadet
Joined
Mar 24, 2024
Messages
3
Some additional info after I tested today:

-The system has a gtx 1050ti installed. When I pull it out, leaving only the integrated chip, no drives are detected besides the NVME one.
- Extended SMART finished with no errors. Just ran them.
- Corruption seems to appear when transfering larger files. The whole backup lists as corrupted when done and it can't be repaired (Macrium Reflect)
 
Top