HDD problem or not?

Status
Not open for further replies.

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
Code:
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org


=== START OF INFORMATION SECTION ===

Model Family:	 Western Digital Green

Device Model:	 WDC WD40EZRX-00SPEB0

Serial Number:	WD-WCC4E1174874

LU WWN Device Id: 5 0014ee 209dda498

Firmware Version: 80.00A80

User Capacity:	4,000,787,030,016 bytes [4.00 TB]

Sector Sizes:	 512 bytes logical, 4096 bytes physical

Rotation Rate:	5400 rpm

Device is:		In smartctl database [for details use: -P show]

ATA Version is:   ACS-2 (minor revision not indicated)

SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)

Local Time is:	Sun Oct 14 12:42:29 2018 CEST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x82)	Offline data collection activity

					was completed without error.

					Auto Offline Data Collection: Enabled.

Self-test execution status:	  (   0)	The previous self-test routine completed

					without error or no self-test has ever 

					been run.

Total time to complete Offline 

data collection:		 (54000) seconds.

Offline data collection

capabilities:			 (0x7b) SMART execute Offline immediate.

					Auto Offline data collection on/off support.

					Suspend Offline collection upon new

					command.

					Offline surface scan supported.

					Self-test supported.

					Conveyance Self-test supported.

					Selective Self-test supported.

SMART capabilities:			(0x0003)	Saves SMART data before entering

					power-saving mode.

					Supports SMART auto save timer.

Error logging capability:		(0x01)	Error logging supported.

					General Purpose Logging supported.

Short self-test routine 

recommended polling time:	 (   2) minutes.

Extended self-test routine

recommended polling time:	 ( 540) minutes.

Conveyance self-test routine

recommended polling time:	 (   5) minutes.

SCT capabilities:		   (0x7035)	SCT Status supported.

					SCT Feature Control supported.

					SCT Data Table supported.


SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   0

  3 Spin_Up_Time			0x0027   175   171   021	Pre-fail  Always	   -	   8216

  4 Start_Stop_Count		0x0032   080   080   000	Old_age   Always	   -	   20824

  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0

  7 Seek_Error_Rate		 0x002e   100   253   000	Old_age   Always	   -	   0

  9 Power_On_Hours		  0x0032   069   069   000	Old_age   Always	   -	   23072

 10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0

 11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0

 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   728

192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   242

193 Load_Cycle_Count		0x0032   189   189   000	Old_age   Always	   -	   33988

194 Temperature_Celsius	 0x0022   115   099   000	Old_age   Always	   -	   37

196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0

197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0

198 Offline_Uncorrectable   0x0030   200   200   000	Old_age   Offline	  -	   0

199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   3

200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   0


SMART Error Log Version: 1

ATA Error Count: 10 (device log contains only the most recent five errors)

	CR = Command Register [HEX]

	FR = Features Register [HEX]

	SC = Sector Count Register [HEX]

	SN = Sector Number Register [HEX]

	CL = Cylinder Low Register [HEX]

	CH = Cylinder High Register [HEX]

	DH = Device/Head Register [HEX]

	DC = Device Command Register [HEX]

	ER = Error register [HEX]

	ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.


Error 10 occurred at disk power-on lifetime: 988 hours (41 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 01 00 00 00 a0  Error: ABRT


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  b0 d6 01 e0 4f c2 a0 00	  00:01:42.872  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:42.858  SMART WRITE LOG

  80 45 01 01 44 57 a0 00	  00:01:42.848  [VENDOR SPECIFIC]

  ec 44 01 01 00 00 a0 00	  00:01:42.797  IDENTIFY DEVICE

  80 44 10 00 44 57 a0 00	  00:01:24.545  [VENDOR SPECIFIC]


Error 9 occurred at disk power-on lifetime: 988 hours (41 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 01 00 00 00 a0  Error: ABRT


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  b0 d6 01 e0 4f c2 a0 00	  00:01:42.858  SMART WRITE LOG

  80 45 01 01 44 57 a0 00	  00:01:42.848  [VENDOR SPECIFIC]

  ec 44 01 01 00 00 a0 00	  00:01:42.797  IDENTIFY DEVICE

  80 44 10 00 44 57 a0 00	  00:01:24.545  [VENDOR SPECIFIC]

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.541  SMART WRITE LOG


Error 8 occurred at disk power-on lifetime: 988 hours (41 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 01 00 00 00 a0  Error: ABRT


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.541  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.527  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.510  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.500  SMART WRITE LOG

  80 45 01 01 44 57 a0 00	  00:01:24.497  [VENDOR SPECIFIC]


Error 7 occurred at disk power-on lifetime: 988 hours (41 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 01 00 00 00 a0  Error: ABRT


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.527  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.510  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.500  SMART WRITE LOG

  80 45 01 01 44 57 a0 00	  00:01:24.497  [VENDOR SPECIFIC]

  ec 44 01 01 00 00 a0 00	  00:01:24.446  IDENTIFY DEVICE


Error 6 occurred at disk power-on lifetime: 988 hours (41 days + 4 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  04 51 01 00 00 00 a0  Error: ABRT


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.510  SMART WRITE LOG

  b0 d6 01 e0 4f c2 a0 00	  00:01:24.500  SMART WRITE LOG

  80 45 01 01 44 57 a0 00	  00:01:24.497  [VENDOR SPECIFIC]

  ec 44 01 01 00 00 a0 00	  00:01:24.446  IDENTIFY DEVICE

  80 44 10 00 44 57 a0 00	  00:00:52.074  [VENDOR SPECIFIC]


SMART Self-test log structure revision number 1

Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Short offline	   Completed without error	   00%	 22813		 -

# 2  Short offline	   Completed without error	   00%	 22587		 -

# 3  Short offline	   Completed without error	   00%	 22331		 -

# 4  Short offline	   Completed without error	   00%	 22012		 -

# 5  Short offline	   Completed without error	   00%	 21820		 -

# 6  Short offline	   Completed without error	   00%	 21378		 -

# 7  Short offline	   Completed without error	   00%	 21237		 -

# 8  Short offline	   Completed without error	   00%	 21069		 -

# 9  Short offline	   Completed without error	   00%	 20984		 -

#10  Short offline	   Completed without error	   00%	 20816		 -

#11  Short offline	   Completed without error	   00%	 20648		 -

#12  Short offline	   Completed without error	   00%	 20545		 -

#13  Short offline	   Completed without error	   00%	 20410		 -

#14  Short offline	   Completed without error	   00%	 20254		 -

#15  Short offline	   Completed without error	   00%	 19994		 -

#16  Short offline	   Completed without error	   00%	 19832		 -

#17  Short offline	   Completed without error	   00%	 19664		 -

#18  Short offline	   Completed without error	   00%	 19504		 -

#19  Short offline	   Completed without error	   00%	 19336		 -

#20  Short offline	   Completed without error	   00%	 19168		 -

#21  Short offline	   Completed without error	   00%	 19000		 -


SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

	1		0		0  Not_testing

	2		0		0  Not_testing

	3		0		0  Not_testing

	4		0		0  Not_testing

	5		0		0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.


root@freenas:~ # 



I found a lot topics about "The status of volume Data is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."
But, i tried to find which disk is giving errors, also the volume is not degraded, the repot above is the drive that only gives me some errors. The thing is, i think it is not that bad right? it was an idle error i think?
I really don't undestand the error is there somebody so kind to tell me what is wrong? And is there a way to reset this error to see if it will come back or not?
This error came after i had changed the power settings of the drives.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
You're not running long smart tests. Run a long smart test on all your drives and check them again.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I can't figure out what the periodicity is for the Short Tests, they vary widely. I'd recommend running a daily Short Test and a weekly Long Test, since your long test takes 9 hours you should choose a start time conducive with minimal usage of FreeNAS or it will just take a bit longer to complete.

As for errors, look at ID 199 UDMA_CRC_Error_Count = 3, this typically indicates a suspect SATA data cable issue, typically not an internal hard drive error.

Look at the Hard Drive Troubleshooting link in my signature, it may help.
 

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
I can't figure out what the periodicity is for the Short Tests, they vary widely. I'd recommend running a daily Short Test and a weekly Long Test, since your long test takes 9 hours you should choose a start time conducive with minimal usage of FreeNAS or it will just take a bit longer to complete.

As for errors, look at ID 199 UDMA_CRC_Error_Count = 3, this typically indicates a suspect SATA data cable issue, typically not an internal hard drive error.

Look at the Hard Drive Troubleshooting link in my signature, it may help.
Thank you Joe, that is a releave i will check the cable's i resently moved the machine because i was cleanig the room were the freenas stands. I will check the cables and run a long test.
 

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
i am a noob, i know, and i never digg into this because i had 1 bad disk and that was easy to find out... now... i did a long test but how can i obtain a report?
I got a critical message in freenas gui looks about the same, with a short test have the same results show, but how to get the repot of the long test...

I will try to replace the HBA card, i had problems with one before and just pulled it off and slide it in back and it worked again i think it is the same card so i think the card is bad (got it second hand).
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

ajschot

Patron
Joined
Nov 7, 2016
Messages
341
For SMART test info, including how to get the report opf your long test go here https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/
I know that checked that already but there is not written how to get the report from a long test....

Code:
3)  Once the period of time has lapsed for the testing, obtain a SMART Status Result and return to the troubleshooting text.


if i do a -H i just get... PASSED, not anymore info, there must be a way to get a report from a long test....
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
smartctl -a /dev/ada1 for example. I never use the "-H" parameter, it's not a true representation of the drive health.
 
Joined
May 10, 2017
Messages
838
The "report" for the long test will appear together the the other results here:

Code:
Num  Test_Description	Status				 Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	  Completed without error	  00%	 22813		 -
# 2  Short offline	  Completed without error	  00%	 22587		 -
# 3  Short offline	  Completed without error	  00%	 22331		 -
# 4  Short offline	  Completed without error	  00%	 22012		 -
...
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I want to see the entire output from the command I listed above. It's fine to see that the long test passed but I want to see if someting new showed up in the ID's as well.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
if i do a -H i just get... PASSED, not any more info, there must be a way to get a report from a long test....
That is not what you are supposed to do anyhow. I have seen drives with thousands of bad sectors still report that they "PASSED". You need to be looking at the full report by giving the command smartctl -x /dev/ada1
 
Status
Not open for further replies.
Top