Clear Degraded Status After Data Corruption?

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
A storm rolled though and I got some weird power interruptions. Anyway, when FreeNAS came back online it didn't come back with the LSI controller only the MB drives were up which is 4 out of the 6 in that RAIDZ2 pool so of course it was working in a degraded mode. Power was interrupted again, it looks like during a snapshot. All the drives are mounted now and no actual data seems lost. When doing a zpool status tank0 I get:

Code:
  pool: tank0
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub repaired 0 in 0 days 00:02:37 with 110 errors on Mon Jul  1 18:06:23 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank0                                           DEGRADED     0     0   332
          raidz2-0                                      DEGRADED     0     0   664
            gptid/5235ccd8-44db-11e8-af65-38d547e287d1  DEGRADED     0     0     0  too many errors
            gptid/d42750fd-827b-11e8-a06a-38d547e287d1  DEGRADED     0     0     0  too many errors
            gptid/303e97ac-443f-11e8-af65-38d547e287d1  DEGRADED     0     0     0  too many errors
            gptid/a346846f-1462-11e9-931c-38d547e287d1  DEGRADED     0     0     0  too many errors
            gptid/423e4174-9e85-11e8-a368-38d547e287d1  ONLINE       0     0     5
            gptid/cf3e9fee-b3d7-11e8-b41d-38d547e287d1  ONLINE       0     0     4

errors: 110 data errors, use '-v' for a list


Using the '-v' options gives this extra info:

Code:
errors: Permanent errors have been detected in the following files:

        tank0/VM-Storage/mediamgr-vm:<0x1>
        tank0/Backup/Joseph@auto-20190701.1304-2w:/Jool-Win10/WindowsImageBackup/jool/Backup 2019-07-01 000010/a04f5669-91fa-4f2b-97e9-2c99bff99008.vhdx
        tank0/Backup/Joseph@auto-20190701.1304-2w:/Jool-Win10/JOOL/Backup Set 2019-06-09 190005/Backup Files 2019-06-30 190005/Backup files 13.zip
        tank0/Backup/Joseph@auto-20190701.1304-2w:/Jool-Win10/JOOL/Backup Set 2019-06-09 190005/Backup Files 2019-06-30 190005/Backup files 35.zip
        tank0/Backup/Joseph@auto-20190701.1304-2w:/Joseph/JOOL/Configuration/Catalog2.edb


Looks like a snapshot that was of a Windows backup yesterday corrupted. Not a big deal, I can discard it and just perform a new backup. I don't know what the first one is though. I mean, I know it's referring to the zvol for my "mediamgr" VM, but that machine is running and everything is working?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
For future, I would suggest putting all drives on a single controller, the SAS controller would be my recommendation.
Also, it is a very good idea to get a UPS to avoid random power issues.

As for the pool, you should be able to use the CLEAR command (from the command line) to clear the errors. Here is a reference:

https://docs.oracle.com/cd/E19253-01/819-5461/gazge/index.html
 

FlexibleToast

Dabbler
Joined
Aug 10, 2014
Messages
32
For future, I would suggest putting all drives on a single controller, the SAS controller would be my recommendation.
Also, it is a very good idea to get a UPS to avoid random power issues.

As for the pool, you should be able to use the CLEAR command (from the command line) to clear the errors. Here is a reference:

https://docs.oracle.com/cd/E19253-01/819-5461/gazge/index.html

Yeah, I understand, but life is about compromises isn't it? The SAS controller has 8 ports and I have 12 drives. It was on an UPS, I think the battery needs replacing. It is nearing 4 years old at this point, not exactly something that's at the front of my mind though.

Anyway, yes I did the zpool clear. I forgot to mention it because it didn't work before. Well, it still didn't work (kind of). It does its job and clears the error status, but then ZFS does its job and scrubs and finds those errors again and puts it back in a degraded state. Would just deleting those files fix the issue? I can always just recreate a backup of a working system. I don't know what I would attempt with the zvol though. Revert to a previous snapshot? I could backup and restore the configs on the apps that are running on it pretty trivially.
 
Top