Critical alerts | unreadable (pending) sectors auf zwei Festplatten in RAID5

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
Hallo liebe Forumsmitglieder,

ich habe diese Woche mein Zweit-NAS auf dem HP ProLiant MicroServer N54L mal wieder eingeschaltet, die neueste Version von FreeNAS eingespielt und wollte darauf 12 TB Backupdaten speichern.

Dies würde einmalig reichen, danach werden die vorhandenen 3 TB-Festplatten in dem NAS sowieso gegen viele Jahre neuere 6 TB-Modelle ausgetauscht.

Während des Backups kamen folgenden Meldungen:

Bildschirmfoto 2020-09-20 um 12.29.54.png

Kann man aus den - noch folgenden Ausgaben von smartctl - evtl. ableiten, ob die Daten noch ca. 2 Wochen und einmal Auslesen überleben werden?

Danke für Eure Hilfe und Euch allen einen schönen Sonntag


Gee3

root@gee-n54l:~ # smartctl -a /dev/ada1 smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N1LEP5FR LU WWN Device Id: 5 0014ee 2640055f1 Firmware Version: 82.00A82 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sun Sep 20 12:19:29 2020 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (43200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 433) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 198 198 051 Pre-fail Always - 3433 3 Spin_Up_Time 0x0027 180 180 021 Pre-fail Always - 5991 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 081 081 000 Old_age Always - 14301 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 12 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 565 194 Temperature_Celsius 0x0022 121 114 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 45 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 45 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:40.266 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:36.870 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:33.473 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 44 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:36.870 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:33.473 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 43 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:33.473 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 42 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 41 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
root@gee-n54l:~ # smartctl -a /dev/ada4 smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N2ZHN1KN LU WWN Device Id: 5 0014ee 261585fc5 Firmware Version: 82.00A82 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Sun Sep 20 12:38:42 2020 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: (40080) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 402) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 6 3 Spin_Up_Time 0x0027 182 181 021 Pre-fail Always - 5875 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 50 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 20561 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 29 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1439 194 Temperature_Celsius 0x0022 122 112 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 20558 hours (856 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 e8 ce d0 e0 Error: UNC at LBA = 0x00d0cee8 = 13684456 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 10 ce d0 e0 00 1d+13:46:31.337 READ DMA c8 00 00 10 ce d0 e0 00 1d+13:46:27.968 READ DMA c8 00 00 10 cd d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cc d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cb d0 e0 00 1d+13:46:27.951 READ DMA Error 1 occurred at disk power-on lifetime: 20558 hours (856 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 e8 ce d0 e0 Error: UNC at LBA = 0x00d0cee8 = 13684456 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 10 ce d0 e0 00 1d+13:46:27.968 READ DMA c8 00 00 10 cd d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cc d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cb d0 e0 00 1d+13:46:27.951 READ DMA c8 00 00 10 ca d0 e0 00 1d+13:46:27.950 READ DMA SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

According to the data you are not running any SMART tests which is a sure sign of not configuring your system properly.

Run a SMART long/extended test on all your drives, they can be done all at once. The command is smartctrl -t long /dev/ada1 and it will take a minimum of 433 minutes (~7 hours 13 minutes) if all goes well and you are not using the system. If the system is active then it will take longer for the drives to scan the entire surface. Use smartctl -a /dev/ada1 to view the results.

Lastly, read the FreeNAS user manual and setup routine SMART Short and Long tests. I recommend a daily Short test (takes about 2 minutes) and a weekly Long test (for your drives a minimum of 433 minutes).

The Current_Pending_Sector_Errors is 17 as you know, these on their own are not a problem but they are sign that the drive may be failing. I sispect that when you run the SMART Long test, it will not complete to 100% and will fail. If it does complete then your drive is fine right now. If you are still under RMA, you can get the drive replaced. Just make sure you specify you do not want an SMR drive.
 

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
The SMART Extended self-test is started for ada1 and ada4.​
I just hope the drives will pass it. They're definitely out of any possible RMA period.​
Thanks for that tip!​
As for the configuration of regular short and extended tests - I'll maybe set them up, when I setup the N54l box again with the new drives​
Up until the current backup task the system was shutdown completely and not used for about 2 years... And before that I was just a little bit playing around with FreeNAS ;)​
So I didn't take the time to get it up running perfectly smooth with all the possible (and probably recommended) tweaks...​
Best regards​
Gee3​
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
The SMART Extended self-test is started for ada1 and ada4.
You should do all your drives, including ada2 and ada3.

Good Luck.
 

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
The results of the extended self-test are:
root@gee-n54l:~ # smartctl -a /dev/ada1 smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N1LEP5FR LU WWN Device Id: 5 0014ee 2640055f1 Firmware Version: 82.00A82 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Mon Sep 21 09:27:15 2020 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (43200) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 433) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 198 198 051 Pre-fail Always - 3433 3 Spin_Up_Time 0x0027 180 180 021 Pre-fail Always - 5991 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 14 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 081 081 000 Old_age Always - 14322 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 12 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 0 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 587 194 Temperature_Celsius 0x0022 121 114 000 Old_age Always - 29 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 17 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 45 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 45 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:40.266 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:36.870 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:33.473 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 44 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:36.870 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:33.473 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 43 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:33.473 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 42 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:30.077 READ DMA c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA Error 41 occurred at disk power-on lifetime: 14299 hours (595 days + 19 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 f8 d4 08 48 Error: UNC at LBA = 0x0808d4f8 = 134796536 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 90 d4 08 48 08 1d+14:28:26.690 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 14306 7929928 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
root@gee-n54l:~ # smartctl -a /dev/ada4 smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p11 amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red Device Model: WDC WD30EFRX-68EUZN0 Serial Number: WD-WCC4N2ZHN1KN LU WWN Device Id: 5 0014ee 261585fc5 Firmware Version: 82.00A82 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Mon Sep 21 09:30:50 2020 CEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 121) The previous self-test completed having the read element of the test failed. Total time to complete Offline data collection: (40080) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 402) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 6 3 Spin_Up_Time 0x0027 182 181 021 Pre-fail Always - 5875 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 50 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 072 072 000 Old_age Always - 20582 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 29 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 1 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 1461 194 Temperature_Celsius 0x0022 122 112 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 ATA Error Count: 2 CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2 occurred at disk power-on lifetime: 20558 hours (856 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 e8 ce d0 e0 Error: UNC at LBA = 0x00d0cee8 = 13684456 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 10 ce d0 e0 00 1d+13:46:31.337 READ DMA c8 00 00 10 ce d0 e0 00 1d+13:46:27.968 READ DMA c8 00 00 10 cd d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cc d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cb d0 e0 00 1d+13:46:27.951 READ DMA Error 1 occurred at disk power-on lifetime: 20558 hours (856 days + 14 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 e8 ce d0 e0 Error: UNC at LBA = 0x00d0cee8 = 13684456 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 00 10 ce d0 e0 00 1d+13:46:27.968 READ DMA c8 00 00 10 cd d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cc d0 e0 00 1d+13:46:27.967 READ DMA c8 00 00 10 cb d0 e0 00 1d+13:46:27.951 READ DMA c8 00 00 10 ca d0 e0 00 1d+13:46:27.950 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 20566 13684456 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Both drive have the
SMART overall-health self-assessment test result: PASSED
does that mean I can trust them for this one time job?

I've started extended self-tests for the other drives to - just waiting for the tests to finish.​
 

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
And to be safe in the future, the regular SMART short and extended tests are properly configured:

Bildschirmfoto 2020-09-21 um 09.47.30.png
 

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
Backup your data and replace the disks.

That wasn't the answer I was hoping for, but I guess I shouldn't trust the drives anymore...

Basically I hoped that my RAID-Z1 setup would mitigate the risk of the bad drive state for one backup operation :)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
Serial Number: WD-WCC4N1LEP5FR
It not terrible yet but the fact it cannot pass an Extended/Long test is a failure and while your data is not at risk yet, it could be.
Serial Number: WD-WCC4N2ZHN1KN
Is in the exact same place as the other drive.

One good thing to note is you are running a RAIDZ1.

Have you run a SCRUB? Do this to validate your pool data. Odds are it will pass just fine.

Have you run a SMART Long test on your other drives (ada2, ada3)? If not, do so. If they are all the same vintage drives as the failed drives, my advice is to replace them all. You can replace them one at a time, let the new drive resilver into the pool and then replace the next drive and you will not loose data nor any downtime if that is important. (User Manual gives you information on this) And as previously stated, backup your data if it's important to you.

Your SMART Jobs do not look correct to me. Here is what I get from the scheduling:
Short: Run every hour on minute 0.
Long: Run every day at 1:00 AM.

It should look more like this:
Short: 0 0 * * *
Long: 5 0 * * sun
Which would make the Short run at midnight 00:00 everyday. The Long test will run every Sunday at 00:05 which gives ample time for the short test on Sunday to complete before starting it.
 

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
Thanks - I'm running the scrub right now.​
Code:
root@gee-n54l:~ # zpool status
Code:
pool: freenas-boot​
state: ONLINE​
scan: scrub repaired 0 in 0 days 00:03:38 with 0 errors on Tue Sep 15 03:48:38 2020​
config:​
NAME STATE READ WRITE CKSUM​
freenas-boot ONLINE 0 0 0​
da0p2 ONLINE 0 0 0​
errors: No known data errors​
pool: gee-n54l-pool​
state: ONLINE​
scan: scrub in progress since Sun Sep 20 00:00:02 2020​
2.67T scanned at 1.84G/s, 2.13T issued at 65.5M/s, 13.2T total​
472M repaired, 16.17% done, 2 days 01:08:49 to go​
config:​
NAME STATE READ WRITE CKSUM​
gee-n54l-pool ONLINE 0 0 0​
raidz1-0 ONLINE 0 0 0​
gptid/cc8d9843-2b4f-11e6-ac6e-28924a36104f.eli ONLINE 0 0 0​
gptid/6dc5d13b-9d58-11e7-b8c1-28924a36104f.eli ONLINE 0 0 0​
gptid/41404fdd-c46b-11e6-a422-28924a36104f.eli ONLINE 0 0 0​
gptid/04bc3e91-505d-11e8-bcb2-28924a36104f.eli ONLINE 0 0 0​
gptid/3257e780-12c3-11e6-b121-28924a36104f.eli ONLINE 0 0 0​
errors: No known data errors​
It doesn't look good:​
472M repaired, 16.17% done, 2 days 01:08:49 to go
I've already ordered replacement drives.​
When they arrive I'll do a complete new FreeNAS hardware and software setup from scratch.​
The actual drives are really old...​
Thanks for your help!​
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
It doesn't look good:
It's repairing data it finds as corrupt which is a good thing but as it finds bad data, it will take considerably longer. Looks like you have a lot of data as well, ~13TB. Keep in mind that when you read your data, it does a scrub of that data to ensure it's good, with that in mind, if you have a place to copy all your data, I'd cancel the scrub and copy that data now. Copy the important stuff first. If you are like me, most of my data is in backups of my systems and I generally copy off the only backups I need to keep (last and generally the first). I cleaned off about 3TB of data just deleting some old backups the other day. I still have more to go through, probably just over a TB. Then I need to clean out old data as well but that will take me some time. My point is, backup what you need to backup. Then you can install from scratch and load your new drives with your data.

And you are welcome.
 

Gee3

Cadet
Joined
Sep 20, 2020
Messages
8
You're completely right - I'm very much about data safety, security and backups like you.​
The data currently on my FreeNAS instance is completely irrelevant as it is only a backup for my primary NAS. My primary NAS has RAID6 and newer hard drives of which none are failing or showing bad SMART values :)​
Also the most important personal data like pictures, music, private videos and documents are synchronised over 4 independent NAS/desktops/notebooks by Resilio Sync.​
So I'm quite relaxed about my important data.​
The backup I'm doing ist mostly to get from 10 x 6TB-drives (RAID 6) on my primary NAS to 6 x 14TB-drives (also RAID 6) to get more usable capacity and lower my power consumption...​
 

micneu

Patron
Joined
Mar 23, 2019
Messages
473
hmmm, ich dachte wir sind im deutschen bereich,
wenn ich im englischen bereich mit deutsch antworten würde, ich glaube die würden ausflippen.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,972
hmmm, ich dachte wir sind im deutschen bereich,
wenn ich im englischen bereich mit deutsch antworten würde, ich glaube die würden ausflippen.

Ich stimme zu und ich appoligiere. Sie haben Recht, wenn Sie dies in einem englischen Abschnitt dieses Forums tun würden, würden Sie angeschrien. Ich werde versuchen, aufmerksamer zu sein.
 
Top