9.3 sending alerts for unreadable drives

Status
Not open for further replies.

godofdew11

Dabbler
Joined
Dec 9, 2014
Messages
11
Hey guys,
Running 9.3 on an intel i3-2100/MSI-H67MS-e43, 16 gigs ram, 5xHitachi desksta 2TB raidz

I am getting emails stating errors are climbing on 2 drives, and most recently that they are unreadable.
When I log into GUI i have red alert light on right side stating 4 critical alerts two drives unreadable 2 drives offline (same 2)

When I go to Storage-Volumes-view volumes-volume status it shows all 5 drives with 0 errors and all online.

Do i have dead drives? Please bear with me I am not terribly experianced with freenas

Thanks,
Don
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
Please post the output of the "zpool status" command.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
And the output smartctl -a /dev/adaX for each drive between code tags please.
 

godofdew11

Dabbler
Joined
Dec 9, 2014
Messages
11
zpool status
scan: scrub repaired 0 in 10h57m with 0 errors on Sun Feb 28 10:57:46 2016
config:

NAME STATE READ WRITE CKSUM
Donsnetworkstorage ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gpt/ad0 ONLINE 0 0 0
gpt/ad1 ONLINE 0 0 0
gpt/ad2 ONLINE 0 0 0
gpt/ad3 ONLINE 0 0 0
gpt/ad4 ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h1m with 0 errors on Thu Feb 25 03:46:26 2016
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/73a1e3b8-089a-11e5-9d1f-6c626d3a4e60 ONLINE 0 0 0

errors: No known data errors
[root@freenas ~]#

when i run the last command (smart test?) it scrolls through and leaves a bunch of information i cant get to. but 2 of the discks come back with different info at the end then the others, the ones showing error in gui

ada0

Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 380) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 95
3 Spin_Up_Time 0x0007 141 141 024 Pre-fail Always - 395 (Average 382)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 101
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 146 146 020 Pre-fail Offline - 29
9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 41342
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 101
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 306
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 306
194 Temperature_Celsius 0x0002 214 214 000 Old_age Always - 28 (Min/Max 17/42)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada1
Error 29 occurred at disk power-on lifetime: 38298 hours (1595 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 8d 0d 03 1e 06 Error: ICRC, ABRT 141 sectors at LBA = 0x061e030d = 102630157

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 ac ee 02 1e 40 00 3d+18:56:54.089 READ DMA EXT
25 00 ab 43 02 1e 40 00 3d+18:56:54.088 READ DMA EXT
25 00 ac 97 01 1e 40 00 3d+18:56:54.087 READ DMA EXT
25 00 ab ec 00 1e 40 00 3d+18:56:54.086 READ DMA EXT
25 00 ab 40 00 1e 40 00 3d+18:56:54.085 READ DMA EXT

Error 28 occurred at disk power-on lifetime: 38297 hours (1595 days + 17 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 4c f4 83 59 03 Error: ICRC, ABRT 76 sectors at LBA = 0x035983f4 = 56198132

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 ab 95 83 59 40 00 3d+18:09:04.153 READ DMA EXT
25 00 ab e9 82 59 40 00 3d+18:09:04.152 READ DMA EXT
25 00 ac 3d 82 59 40 00 3d+18:09:04.151 READ DMA EXT
25 00 ab 92 81 59 40 00 3d+18:09:04.150 READ DMA EXT
25 00 ac e6 80 59 40 00 3d+18:09:04.149 READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada2
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 375) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 135 135 054 Pre-fail Offline - 98
3 Spin_Up_Time 0x0007 139 139 024 Pre-fail Always - 404 (Average 387)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 105
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 146 146 020 Pre-fail Offline - 29
9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 41346
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 105
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 304
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 304
194 Temperature_Celsius 0x0002 214 214 000 Old_age Always - 28 (Min/Max 17/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada3
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 394) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 95
3 Spin_Up_Time 0x0007 141 141 024 Pre-fail Always - 395 (Average 381)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 105
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 148 148 020 Pre-fail Offline - 28
9 Power_On_Hours 0x0012 095 095 000 Old_age Always - 41347
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 105
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 301
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 301
194 Temperature_Celsius 0x0002 214 214 000 Old_age Always - 28 (Min/Max 17/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

ada4
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

SMART Error Log Version: 1
ATA Error Count: 1
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 40909 hours (1704 days + 13 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 53 65 f6 1e 05 Error: UNC 83 sectors at LBA = 0x051ef665 = 85915237

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 ab 0d f6 1e 40 00 3d+12:59:25.418 READ DMA EXT
25 00 ac 61 f5 1e 40 00 3d+12:59:25.418 READ DMA EXT
25 00 ab b6 f4 1e 40 00 3d+12:59:25.417 READ DMA EXT
25 00 ab 0a f4 1e 40 00 3d+12:59:25.416 READ DMA EXT
25 00 ac 5e f3 1e 40 00 3d+12:59:25.375 READ DMA EXT

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

Hope that is what you guys are looking for.

Thanks for the help!
Don
 

Sakuru

Guru
Joined
Nov 20, 2015
Messages
527
If you use something like putty to SSH into your FreeNAS box you will be able to scroll back. Also, it can be useful to save the output to a text file. "smartctl -a /dev/adaX > adaX.txt".
It makes your post more readable if you use Code tags for text dumps like you get from smartctl. See the attached picture.
 

Attachments

  • 2016-03-24_14-55-01.png
    2016-03-24_14-55-01.png
    7.8 KB · Views: 140

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Waow, 41 kh and no SMART test ever executed on those drives?

Please do a long SMART test on each drive (smartctl -t long /dev/adaX) and then repost the output of smartctl -a.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
^^ I would have replied earlier with a similar message, but some replies are difficult to type on my phone.


Sent from my iPhone using Tapatalk
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
First you need to run a SMART Long Test as @Bidule0hm stated on each drive and then once those are complete, report the drive data. You also need to establish routine SMART testing to be accomplished. I prefer a daily short test and a once a week long test on all my drives. It would be nice to post exactly what the emails said, that should identify the hard drives (I hope).

Try my troubleshooting guide here, it will step you though everything you need to do to identify the problem:
https://forums.freenas.org/index.ph...leshooting-guide-basic-common-failures.41026/

All the instructions to identify your problems should be clear, if not, ask.
 
Status
Not open for further replies.
Top