SOLVED The Message ID: ZFS-8000-8A indicates corrupted data exists in the current pool

Status
Not open for further replies.

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Good Morning,

After a power outage, I am seeing this error appear.
Code:
errors: Permanent errors have been detected in the following files:                                                                 
                                                                                                                                   
        tank/.system/cores:<0x0>                                     

All of my Data appears to be OK, but I am a little worried. I am not sure what this file is that is damaged.

Code:
[root@freenas ~]# zpool status -xv                                                                                                 
  pool: tank                                                                                                                       
state: ONLINE                                                                                                                     
status: One or more devices has experienced an error resulting in data                                                             
        corruption.  Applications may be affected.                                                                                 
action: Restore the file in question if possible.  Otherwise restore the                                                           
        entire pool from backup.                                                                                                   
   see: http://illumos.org/msg/ZFS-8000-8A                                                                                         
  scan: scrub repaired 0 in 6h22m with 0 errors on Thu Feb 16 06:22:11 2017                                                         
config:                                                                                                                             
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        tank                                            ONLINE       0     0     0                                                 
          mirror-0                                      ONLINE       0     0     0                                                 
            gptid/728891ef-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/73b26b53-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/75887ab0-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
          mirror-1                                      ONLINE       0     0     0                                                 
            gptid/ef1ec14a-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/a74d44ee-c7e2-11e6-83c0-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/f3022fca-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
          mirror-2                                      ONLINE       0     0     0                                                 
            gptid/53daad39-c1ee-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/55c7424f-c1ee-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/718c0169-c728-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
          mirror-3                                      ONLINE       0     0     0                                                 
            gptid/40382162-c714-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/4cfcec8d-c6fc-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/61623ed4-c606-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
                                                                                                                                   
errors: Permanent errors have been detected in the following files:                                                                 
                                                                                                                                   
        tank/.system/cores:<0x0>                                                                                                   
[root@freenas ~]#                                              

Does anyone have a suggestion on how I should proceed.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Restore the file or delete it. And get a ups so next time this doesn't happen to your important data.

Sent from my Nexus 5X using Tapatalk
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Just updated the specs in my signature, have a Supermicro X10SL7-F with onboard controller. The drives are a mixture of Toshiba, Seagate, and Western Digital 3.0 TB.
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Error message attached.
 

Attachments

  • Screenshot from 2017-02-22 11:39:19.png
    Screenshot from 2017-02-22 11:39:19.png
    12.4 KB · Views: 395

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Are you running periodic smart tests?
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Yes, I have setup the SMART test schedule.
 

Attachments

  • Screenshot from 2017-02-22 12:11:46.png
    Screenshot from 2017-02-22 12:11:46.png
    15.1 KB · Views: 401

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
But have you checked your drive(s) to see if it is running? Post the output of smartctl -a /dev/da7 and place the output in code tags.
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Here is the output.
Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)   
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org   
   
=== START OF INFORMATION SECTION ===   
Model Family:  Toshiba 3.5" DT01ACA... Desktop HDD   
Device Model:  TOSHIBA DT01ACA300   
Serial Number:  Y5FM2LTGS   
LU WWN Device Id: 5 000039 fe3c8acd5   
Firmware Version: MX6OABB0   
User Capacity:  3,000,592,982,016 bytes [3.00 TB]   
Sector Sizes:  512 bytes logical, 4096 bytes physical   
Rotation Rate:  7200 rpm   
Form Factor:  3.5 inches   
Device is:  In smartctl database [for details use: -P show]   
ATA Version is:  ATA8-ACS T13/1699-D revision 4   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)   
Local Time is:  Wed Feb 22 12:55:54 2017 CST   
SMART support is: Available - device has SMART capability.   
SMART support is: Enabled   
   
=== START OF READ SMART DATA SECTION ===   
SMART overall-health self-assessment test result: PASSED   
   
General SMART Values:   
Offline data collection status:  (0x85) Offline data collection activity   
  was aborted by an interrupting command from host.   
  Auto Offline Data Collection: Enabled.   
Self-test execution status:  (  0) The previous self-test routine completed   
  without error or no self-test has ever   
  been run.   
Total time to complete Offline   
data collection:  (22222) seconds.   
Offline data collection   
capabilities:  (0x5b) SMART execute Offline immediate.   
  Auto Offline data collection on/off support.   
  Suspend Offline collection upon new   
  command.   
  Offline surface scan supported.   
  Self-test supported.   
  No Conveyance Self-test supported.   
  Selective Self-test supported.   
SMART capabilities:  (0x0003) Saves SMART data before entering   
  power-saving mode.   
  Supports SMART auto save timer.   
Error logging capability:  (0x01) Error logging supported.   
  General Purpose Logging supported.   
Short self-test routine   
recommended polling time:  (  1) minutes.   
Extended self-test routine   
   

recommended polling time:  ( 371) minutes.   
SCT capabilities:  (0x003d) SCT Status supported.   
  SCT Error Recovery Control supported.   
  SCT Feature Control supported.   
  SCT Data Table supported.   
   
SMART Attributes Data Structure revision number: 16   
Vendor Specific SMART Attributes with Thresholds:   
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE   
  1 Raw_Read_Error_Rate  0x000b  100  100  016  Pre-fail  Always  -  0   
  2 Throughput_Performance  0x0005  139  139  054  Pre-fail  Offline  -  71   
  3 Spin_Up_Time  0x0007  133  133  024  Pre-fail  Always  -  431 (Average 433)   
  4 Start_Stop_Count  0x0012  100  100  000  Old_age  Always  -  32   
  5 Reallocated_Sector_Ct  0x0033  100  100  005  Pre-fail  Always  -  0   
  7 Seek_Error_Rate  0x000b  100  100  067  Pre-fail  Always  -  0   
  8 Seek_Time_Performance  0x0005  124  124  020  Pre-fail  Offline  -  33   
  9 Power_On_Hours  0x0012  100  100  000  Old_age  Always  -  4334   
 10 Spin_Retry_Count  0x0013  100  100  060  Pre-fail  Always  -  0   
 12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  32   
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  74   
193 Load_Cycle_Count  0x0012  100  100  000  Old_age  Always  -  74   
194 Temperature_Celsius  0x0002  166  166  000  Old_age  Always  -  36 (Min/Max 20/43)   
196 Reallocated_Event_Count 0x0032  100  100  000  Old_age  Always  -  0   
197 Current_Pending_Sector  0x0022  100  100  000  Old_age  Always  -  0   
198 Offline_Uncorrectable  0x0008  100  100  000  Old_age  Offline  -  0   
199 UDMA_CRC_Error_Count  0x000a  200  200  000  Old_age  Always  -  49   
   
SMART Error Log Version: 1   
ATA Error Count: 49 (device log contains only the most recent five errors)   
  CR = Command Register [HEX]   
  FR = Features Register [HEX]   
  SC = Sector Count Register [HEX]   
  SN = Sector Number Register [HEX]   
  CL = Cylinder Low Register [HEX]   
  CH = Cylinder High Register [HEX]   
  DH = Device/Head Register [HEX]   
  DC = Device Command Register [HEX]   
  ER = Error register [HEX]   
  ST = Status register [HEX]   
Powered_Up_Time is measured from power on, and printed as   
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,   
SS=sec, and sss=millisec. It "wraps" after 49.710 days.   
   
Error 49 occurred at disk power-on lifetime: 4323 hours (180 days + 3 hours)   
  When the command that caused the error occurred, the device was active or idle.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH
-- -- -- -- -- -- --   
  84 51 c1 9f 6d ad 03  Error: ICRC, ABRT at LBA = 0x03ad6d9f = 61697439   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
  -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 00 00 60 6d ad 40 00  05:38:57.828  READ FPDMA QUEUED   
  60 00 00 60 6e ad 40 00  05:38:57.799  READ FPDMA QUEUED   
  ea 00 00 00 00 00 00 00  05:38:57.776  FLUSH CACHE EXT   
  60 00 00 f0 4a ac 40 00  05:38:56.559  READ FPDMA QUEUED   
  60 00 00 f0 49 ac 40 00  05:38:56.531  READ FPDMA QUEUED   
   
Error 48 occurred at disk power-on lifetime: 4317 hours (179 days + 21 hours)   
  When the command that caused the error occurred, the device was active or idle.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH   
  -- -- -- -- -- -- --   
  84 51 41 3f d3 ce 0f  Error: ICRC, ABRT at LBA = 0x0fced33f = 265212735   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
  -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 00 00 80 d2 ce 40 00  00:09:37.597  READ FPDMA QUEUED   
  60 00 00 00 1b d2 40 00  00:09:37.584  READ FPDMA QUEUED   
  2f 00 01 10 00 00 00 00  00:09:37.579  READ LOG EXT   
  60 00 00 00 1b d2 40 00  00:09:37.562  READ FPDMA QUEUED   
  60 08 00 d8 39 cf 40 00  00:09:37.561  READ FPDMA QUEUED   
   
Error 47 occurred at disk power-on lifetime: 4317 hours (179 days + 21 hours)   
  When the command that caused the error occurred, the device was active or idle.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH   
  -- -- -- -- -- -- --   
  84 51 f1 0f 1b d2 0f  Error: ICRC, ABRT at LBA = 0x0fd21b0f = 265427727   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
  -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 00 00 00 1b d2 40 00  00:09:37.562  READ FPDMA QUEUED   
  60 08 00 d8 39 cf 40 00  00:09:37.561  READ FPDMA QUEUED   
  60 08 00 70 10 0d 40 00  00:09:37.538  READ FPDMA QUEUED   
  60 08 00 b0 39 cf 40 00  00:09:37.518  READ FPDMA QUEUED   
  60 08 00 90 ba 50 40 00  00:09:37.490  READ FPDMA QUEUE
Error 46 occurred at disk power-on lifetime: 4317 hours (179 days + 21 hours)   
  When the command that caused the error occurred, the device was active or idle.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH   
  -- -- -- -- -- -- --   
  84 51 31 ef 2a ca 0f  Error: ICRC, ABRT at LBA = 0x0fca2aef = 264907503   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
  -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 00 00 20 2a ca 40 00  00:09:37.387  READ FPDMA QUEUED   
  60 00 00 20 29 ca 40 00  00:09:37.386  READ FPDMA QUEUED   
  60 08 00 f8 28 ca 40 00  00:09:37.310  READ FPDMA QUEUED   
  60 10 00 a8 74 50 40 00  00:09:37.282  READ FPDMA QUEUED   
  60 08 00 a0 74 50 40 00  00:09:37.266  READ FPDMA QUEUED   
   
Error 45 occurred at disk power-on lifetime: 4317 hours (179 days + 21 hours)   
  When the command that caused the error occurred, the device was active or idle.   
   
  After command completion occurred, registers were:   
  ER ST SC SN CL CH DH   
  -- -- -- -- -- -- --   
  84 51 a1 9f 99 c8 0f  Error: ICRC, ABRT at LBA = 0x0fc8999f = 264804767   
   
  Commands leading to the command that caused the error were:   
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name   
  -- -- -- -- -- -- -- --  ----------------  --------------------   
  60 00 00 40 99 c8 40 00  00:09:37.159  READ FPDMA QUEUED   
  60 08 00 00 92 0c 40 00  00:09:37.156  READ FPDMA QUEUED   
  60 08 00 48 f6 c8 40 00  00:09:37.154  READ FPDMA QUEUED   
  60 08 00 50 f6 c8 40 00  00:09:37.136  READ FPDMA QUEUED   
  60 08 00 00 da ec 40 00  00:09:37.100  READ FPDMA QUEUED   
   
SMART Self-test log structure revision number 1   
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error   
# 1  Extended offline  Completed without error  00%  4329  -   
# 2  Short offline  Completed without error  00%  4295  -   
# 3  Short offline  Completed without error  00%  4127  -   
# 4  Extended offline  Completed without error  00%  4038  -   
# 5  Short offline  Completed without error  00%  3959  -   
# 6  Short offline  Completed without error  00%  3719  -   
# 7  Extended offline  Completed without error  00%  3630  -   
# 8  Short offline  Completed without error  00%  3551  -   
# 9  Short offline  Completed without error  00%  3383  -   
#10  Extended offline  Completed without error  00%  3294  -   
#11  Short offline  Completed without error  00%  3215  -   
#12  Short offline  Completed without error  00%  2975  -   
#13  Extended offline  Completed without error  00%  2888   
#14  Short offline  Completed without error  00%  2836  -   
#15  Short offline  Completed without error  00%  2668  -   
#16  Short offline  Completed without error  00%  1900  -   
#17  Extended offline  Completed without error  00%  1763  -   
#18  Short offline  Completed without error  00%  1685  -   
#19  Short offline  Completed without error  00%  1517  -   
#20  Extended offline  Completed without error  00%  1428  -   
#21  Short offline  Completed without error  00%  1349  -   
   
SMART Selective self-test log data structure revision number 1   
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS   
  1  0  0  Not_testing   
  2  0  0  Not_testing   
  3  0  0  Not_testing   
  4  0  0  Not_testing   
  5  0  0  Not_testing   
Selective self-test flags (0x0):   
  After scanning selected spans, do NOT read-scan remainder of disk.   
If Selective self-test is pending on power-up, resume after 0 minute delay. 
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
You have a CRC error count of 49 on that drive. Check your cabling. Also what are you running for a power supply?
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Does anyone do power-cut testing on FreeNAS? Supposedly ZFS should have no problems with it.
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
The Power supply is a Seasonic S12II-620 watts. It's possible the startup current for 12 drives momentarily overloads it. It has been running over a year without any other problems. What puzzles me is how the error message says the corrupted file is uncorrectable, yet the pool is triple mirrored. The file that is corrupted appears to be a system file, not part of my data.
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
The Power supply is a Seasonic S12II-620 watts. It's possible the startup current for 12 drives momentarily overloads it. It has been running over a year without any other problems. What puzzles me is how the error message says the corrupted file is uncorrectable, yet the pool is triple mirrored. The file that is corrupted appears to be a system file, not part of my data.
Power cuts should not cause problems for zfs. At most you lose the last transaction group. What you're missing here is your system has had these problems for some time now and you are just now noticing them.

This thread has gone on to long. Please read the forum rules and post your hardware specs and freenas version. Also post the output of zpool status.

Sent from my Nexus 5X using Tapatalk
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
Code:
Build

FreeNAS-9.10.2-U1 (86c7ef5)
Bootdrive Intel SSD 710 100GB
Motherboard Supermicro X10SL7-F with onboard SMC2308, IT Firmware
Platform Intel(R) Xeon(R) CPU E3-1226 v3 @ 3.30GHz
Memory 32697MB ECC
Storage 4 Mirrors of 3 x 3.0 TB


Code:
[root@freenas ~]# zpool status -xv                                                                                                 
  pool: tank                                                                                                                       
state: ONLINE                                                                                                                     
status: One or more devices has experienced an error resulting in data                                                             
        corruption.  Applications may be affected.                                                                                 
action: Restore the file in question if possible.  Otherwise restore the                                                           
        entire pool from backup.                                                                                                   
   see: http://illumos.org/msg/ZFS-8000-8A                                                                                         
  scan: scrub repaired 0 in 6h22m with 0 errors on Thu Feb 16 06:22:11 2017                                                         
config:                                                                                                                             
                                                                                                                                   
        NAME                                            STATE     READ WRITE CKSUM                                                 
        tank                                            ONLINE       0     0     0                                                 
          mirror-0                                      ONLINE       0     0     0                                                 
            gptid/728891ef-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/73b26b53-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/75887ab0-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
          mirror-1                                      ONLINE       0     0     0                                                 
            gptid/ef1ec14a-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/a74d44ee-c7e2-11e6-83c0-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/f3022fca-c1ed-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
          mirror-2                                      ONLINE       0     0     0                                                 
            gptid/53daad39-c1ee-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/55c7424f-c1ee-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/718c0169-c728-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
          mirror-3                                      ONLINE       0     0     0                                                 
            gptid/40382162-c714-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/4cfcec8d-c6fc-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
            gptid/61623ed4-c606-11e6-8676-90e2ba10fb36  ONLINE       0     0     0                                                 
                                                                                                                                   
errors: Permanent errors have been detected in the following files:                                                                 
                                                                                                                                   
        tank/.system/cores:<0x0>                                                                                                   
[root@freenas ~]#                                                    
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
What is the ashift on this pool?
I don't know the command to check that, the install was done using the default settings.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What is the ashift on this pool?
Super random question that has nothing to do with the problem.

Sent from my Nexus 5X using Tapatalk
 

Makki

Explorer
Joined
Nov 8, 2016
Messages
57
Does anyone do power-cut testing on FreeNAS? Supposedly ZFS should have no problems with it.
I did several, physical, virtual (powering off ESXi "cold") with several TB data on and a read/write load of ~100MB/s: never had any issue.

I'd also suggest to inspect the cabling from the values you've posted.

Michael

P.S.: this was done for DR-testing only, surely the systems all have an UPS but from my experience UPS fail more often than mains power here :) And everything that wasn't tested doesn't work.
 

larencio88

Dabbler
Joined
May 5, 2016
Messages
13
I am posting a copy of the system log. Scanning through it, these are my thoughts.

  1. The log begins 20 days before the power outage. There are no entries during this time period for disk errors.
  2. Feb 19 22:59:22 the power outage occurs.
  3. The UPS initiates a graceful shutdown.
  4. Feb 19 23:09:22 there is an I/O error during shutdown.
  5. when server powered up 2 days later, Feb 21 20:31:55. Multiple disk errors on Da7.
Random thoughts, Can a bad cable on one drive corrupt a file on the pool. A failing drive itself. A drive without TLER cause a hang.

Maybe this post will help someone troubleshooting in the future.

edit. It looks like the log file is too long to put in with code brackets. (Exceed 3000 characters) Attaching text file instead.
 

Attachments

  • Freenaslog.txt
    99.6 KB · Views: 313

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Super random question that has nothing to do with the problem.

Thank you. ;) The phenomenon is torn writes with write amplification. Simply put, if I write one 512 byte sector to an AF disk, 8 sectors are at risk of loss if that write fails. Same if a 4K write is unaligned, except 16 sectors at risk. Another scenario is a drive that automatically claims completion of such writes to avoid exposing the actual latency, in the expectation that more will follow to the same track. There are three makes in this pool.

Ashift should be viewable by zdb -e tank | grep ashift

Feb 19 23:09:22 there is an I/O error during shutdown.

I'm not sure that is a real ZFS I/O error. In any case, even if there was an I/O error, the theory is there should be no corruption to the pool. But if there was an I/O error and corruption, then there is probably a problem in the code, even if the window of exposure is merely 'during a shutdown'.
 
Status
Not open for further replies.
Top