SOLVED Critical alert shown re HDD

Status
Not open for further replies.

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Part I

I have an inkling of the answer but no harm in asking a few questions.

I have a 3TB WD HDD in my FreeNAS box which is reporting 'critical' "CRITICAL: Sept. 5, 2017, 3:43 a.m. - Device: /dev/ada6, Self-Test Log error count increased from 4 to 5"

I'm trying to find the last SMART Test, but can't at the moment.
(I've not been receiving email updates either so will have to check that out)
Can anyone advise how I could run a smart-test on that drive?

I normally access the FreeNAS via the webgui. If I had to run it via the console that would be one thing but how to cut and paste from the NAS computer...?
(Naturally I can't remember how to SSH...)

In any event it obviously has problems, it's just the extent in question.

I have checked online at the drive is definitely OOW.

Serial Number Status Model Number Description Expiration Date
WMC1T3798899 Out of Limited Warranty WD30EFRX WD Red 06/28/2016

I don't suppose I have any recourse other than to buy another drive?

Part II

If it does come to replacing the drive, is it just a matter of pulling one drive and plugging in another in it's place?
Perhaps I should look up the instructions....
 

CraigD

Patron
Joined
Mar 8, 2016
Messages
343
Part I

Open a shell, putty is better
Code:
root@freenas:~ # smartctl -t short /dev/ada6
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Wed Sep  6 00:24:31 2017

Use smartctl -X to abort test.
root@freenas:~ # smartctl -a /dev/da6
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Seagate Barracuda 7200.14 (AF)
Device Model:  ST2000DM001-1ER164
Serial Number:  Z4Z5G5FF
LU WWN Device Id: 5 000c50 0911f589f
Firmware Version: CC26
User Capacity:  2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:  512 bytes logical, 4096 bytes physical
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Wed Sep  6 00:48:30 2017 NZST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  80) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  1) minutes.
Extended self-test routine
recommended polling time:  ( 208) minutes.
Conveyance self-test routine
recommended polling time:  (  2) minutes.
SCT capabilities:  (0x1085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x000f  117  099  006  Pre-fail  Always  -  162964616
  3 Spin_Up_Time  0x0003  096  096  000  Pre-fail  Always  -  0
  4 Start_Stop_Count  0x0032  100  100  020  Old_age  Always  -  49
  5 Reallocated_Sector_Ct  0x0033  100  100  010  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x000f  075  060  030  Pre-fail  Always  -  34192255
  9 Power_On_Hours  0x0032  088  088  000  Old_age  Always  -  11036
10 Spin_Retry_Count  0x0013  100  100  097  Pre-fail  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  020  Old_age  Always  -  49
183 Runtime_Bad_Block  0x0032  100  100  000  Old_age  Always  -  0
184 End-to-End_Error  0x0032  100  100  099  Old_age  Always  -  0
187 Reported_Uncorrect  0x0032  100  100  000  Old_age  Always  -  0
188 Command_Timeout  0x0032  100  100  000  Old_age  Always  -  0 0 0
189 High_Fly_Writes  0x003a  099  099  000  Old_age  Always  -  1
190 Airflow_Temperature_Cel 0x0022  073  063  045  Old_age  Always  -  27 (Min/Max 23/33)
191 G-Sense_Error_Rate  0x0032  100  100  000  Old_age  Always  -  0
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  46
193 Load_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  84
194 Temperature_Celsius  0x0022  027  040  000  Old_age  Always  -  27 (0 13 0 0 0)
197 Current_Pending_Sector  0x0012  100  100  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0010  100  100  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x003e  200  200  000  Old_age  Always  -  0
240 Head_Flying_Hours  0x0000  100  253  000  Old_age  Offline  -  11035h+56m+58.633s
241 Total_LBAs_Written  0x0000  100  253  000  Old_age  Offline  -  8057118756
242 Total_LBAs_Read  0x0000  100  253  000  Old_age  Offline  -  177992074850

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline  Completed without error  00%  10990  -
# 2  Short offline  Completed without error  00%  10748  -
# 3  Extended offline  Completed without error  00%  10584  -
# 4  Short offline  Completed without error  00%  10413  -
# 5  Extended offline  Completed without error  00%  10248  -
# 6  Short offline  Completed without error  00%  10005  -
# 7  Extended offline  Completed without error  00%  9840  -
# 8  Short offline  Completed without error  00%  9669  -
# 9  Extended offline  Completed without error  00%  9505  -
#10  Short offline  Completed without error  00%  9285  -
#11  Short offline  Completed without error  00%  9157  -
#12  Short offline  Completed without error  00%  8926  -
#13  Extended offline  Completed without error  00%  8797  -
#14  Short offline  Completed without error  00%  8664  -
#15  Extended offline  Completed without error  00%  8524  -
#16  Short offline  Completed without error  00%  8185  -
#17  Extended offline  Completed without error  00%  8044  -
#18  Short offline  Completed without error  00%  7945  -
#19  Extended offline  Completed without error  00%  7804  -
#20  Short offline  Completed without error  00%  7465  -
#21  Extended offline  Completed without error  00%  7324  -





Part II

If possible leave the original drives for now, add another drive, then click REPLACE in the GUI (guide below)

Be sure your replacing the right drive by checking the serial number (before clicking REPLACE, double click the drive check)

Your drive already had issues, they normally just get worse, and worse once errors start

Have Fun
replace drive.jpg
 
Last edited:

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Craig thanks you so much for that comprehensive reply it is VERY much appreciated.

Unfortunately my MOBO has 6 drive capacity (well boot +6) and I have 6 drives in the array, so adding a drive doesn't appear to be an option for me. What's the best way of proceeding in that instance?
 

CraigD

Patron
Joined
Mar 8, 2016
Messages
343
Craig thanks you so much for that comprehensive reply it is VERY much appreciated.

Unfortunately my MOBO has 6 drive capacity (well boot +6) and I have 6 drives in the array, so adding a drive doesn't appear to be an option for me. What's the best way of proceeding in that instance?

Just pull the drive, just make sure it is right one, the "missing" drive will show as long string of number and letters

Have Fun
 

Zwck

Patron
Joined
Oct 27, 2016
Messages
371
Just pull the drive, just make sure it is right one, the "missing" drive will show as long string of number and letters

Have Fun
so basically pull and put new one in ?
 

CraigD

Patron
Joined
Mar 8, 2016
Messages
343
so basically pull and put new one in ?
Yes and click REPLACE

I've had five drives die in 14 months each of the failed drives had a long, hard, hot life, in many different USB enclosures or computers. All were desktop drives that spun 50000-86000 hours before death (3 more are still going)

The most important thing NO DATA WAS LOST

I'm off to bed 1.40 am

Have Fun
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Thanks :)
I might have to cough up $150 for a new 3TB drive....

I went to 'Putty' into the NAS (to do a SMART test) and Putty won't play tells me there's no such place.
IP works from my web browser perfectly.... no idea why Putty doesn't like it. :mad:
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Just pull the drive, just make sure it is right one, the "missing" drive will show as long string of number and letters

Have Fun

Follow the manual and offline the drive before pulling it. If you have swap in use on the drive you'll crash the system otherwise.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Thanks :)
I might have to cough up $150 for a new 3TB drive....

I went to 'Putty' into the NAS (to do a SMART test) and Putty won't play tells me there's no such place.
IP works from my web browser perfectly.... no idea why Putty doesn't like it. :mad:

You need to enable SSH with root login to login as root with SSH.
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Do you mean in FreeNAS or Putty?
 

CraigD

Patron
Joined
Mar 8, 2016
Messages
343
Do you mean in FreeNAS or Putty?
Untitled.jpg

Follow the manual and offline the drive before pulling it. If you have swap in use on the drive you'll crash the system otherwise.
I keep forgetting some people don't power down and unplug the power cable before working on them

Have Fun
 
Last edited:

VladTepes

Patron
Joined
May 18, 2016
Messages
287
OK so a result, of sorts.

I got putty to play and ran the smart test as instructed. It said it would take 2 minutes. I gave it >10 minutes and nothing.. seemed to just be in limbo
so I did a smartctl -a /dev/ada6
and got the following result

Code:
root@Remus:~ # smartctl -a /dev/ada6
smartctl 6.5 2016-05-07 r4318 [FreeBSD 11.0-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 Western Digital Red
Device Model:	 WDC WD30EFRX-68AX9N0
Serial Number:	WD-WMC1T3798899
LU WWN Device Id: 5 0014ee 603666ef4
Firmware Version: 80.00A80
User Capacity:	3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Wed Sep  6 21:37:40 2017 AEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				(40380) seconds.
Offline data collection
capabilities:					(0x7b) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Suspend Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   2) minutes.
Extended self-test routine
recommended polling time:		( 405) minutes.
Conveyance self-test routine
recommended polling time:		(   5) minutes.
SCT capabilities:			  (0x70bd) SCT Status supported.
										SCT Error Recovery Control supported.
										SCT Feature Control supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   200   200   051	Pre-fail  Always	   -	   100
  3 Spin_Up_Time			0x0027   183   179   021	Pre-fail  Always	   -	   5825
  4 Start_Stop_Count		0x0032   098   098   000	Old_age   Always	   -	   2247
  5 Reallocated_Sector_Ct   0x0033   200   200   140	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x002e   200   200   000	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   088   088   000	Old_age   Always	   -	   9136
10 Spin_Retry_Count		0x0032   100   100   000	Old_age   Always	   -	   0
11 Calibration_Retry_Count 0x0032   100   100   000	Old_age   Always	   -	   0
12 Power_Cycle_Count	   0x0032   098   098   000	Old_age   Always	   -	   2066
192 Power-Off_Retract_Count 0x0032   200   200   000	Old_age   Always	   -	   17
193 Load_Cycle_Count		0x0032   200   200   000	Old_age   Always	   -	   2229
194 Temperature_Celsius	 0x0022   108   101   000	Old_age   Always	   -	   42
196 Reallocated_Event_Count 0x0032   200   200   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   200   200   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   253   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   200   200   000	Old_age   Always	   -	   0
200 Multi_Zone_Error_Rate   0x0008   200   200   000	Old_age   Offline	  -	   1

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	  9136		 -
# 2  Short offline	   Completed: read failure	   10%	  9136		 2171748640
# 3  Short offline	   Completed: read failure	   10%	  9094		 2171748640
# 4  Short offline	   Completed: read failure	   10%	  8854		 2171748640
# 5  Extended offline	Completed: read failure	   60%	  8759		 2171748640
# 6  Short offline	   Completed: read failure	   10%	  8686		 2171748640
# 7  Short offline	   Completed: read failure	   10%	  8518		 2171748640
# 8  Extended offline	Completed without error	   00%	  8429		 -
# 9  Short offline	   Completed: read failure	   10%	  8351		 2171748640
#10  Short offline	   Completed: read failure	   10%	  8111		 2171748640
#11  Extended offline	Completed without error	   00%	  8021		 -
#12  Short offline	   Completed without error	   00%	  7943		 -
#13  Short offline	   Completed: read failure	   10%	  7775		 2171748640
#14  Extended offline	Completed: read failure	   60%	  7680		 2171748640
#15  Short offline	   Completed: read failure	   10%	  7607		 2171748640
#16  Short offline	   Completed: read failure	   10%	  7391		 2171748640
#17  Extended offline	Completed: read failure	   60%	  7297		 2171748640
#18  Short offline	   Completed: read failure	   10%	  7224		 2171748640
#19  Short offline	   Completed: read failure	   10%	  7056		 2171748640
#20  Extended offline	Completed without error	   00%	  6966		 -
#21  Short offline	   Completed: read failure	   10%	  6888		 2171748640
10 of 16 failed self-tests are outdated by newer successful extended offline self-test # 8

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



Now that doesn't seem the same as other smart tests I have seen in the past, Usually the FLAG in the smart attributes with tyhreasholds has varioius letters to explain whats happening, rather than a hexadecimal code?

So:
I'm not sure why the test didn't complete, or at least tell me it completed or that it failed.
I'm not sure what the results mean (some seem to contradict others) but if i had to make a diagnosis I'd be telling it to make out a will.... ?

Comments / Explanations welcomed.
 
Joined
Dec 2, 2015
Messages
730
So:
I'm not sure why the test didn't complete, or at least tell me it completed or that it failed.
I'm not sure what the results mean (some seem to contradict others) but if i had to make a diagnosis I'd be telling it to make out a will.... ?

Comments / Explanations welcomed.
SMART tests are done by the drive, and the drive sends no feedback to the computer during a test, or when a test finishes or fails. So, smartctl cannot provide any feedback to the user. The only way to find out the results of a test are to issue a command like smartctl -a /dev/XXX or for even more detailed results, smartctl -x /dev/XXX

I won't comment on the drive issue, as I'm no expert in this area. You may find the Hard Drive Troubleshooting Guide (found in the Resources section of this forum) useful.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I'm not sure why the test didn't complete, or at least tell me it completed or that it failed.
Well, the most recent SMART test passed ("completed without error"). Sixteen of the twenty before that failed. Odds say the disk is mostly dead.
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Thanks to both of you.

Here's another related question. Now that I know to offline a drive (or power down), swap it out and then click REPLACE that seems easy enough.
BUT do I need to 'burn in' the new disk as I did when setting up the array originally?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
I agree with danb35. Always burn in the drives. Whether new or old.

I recently put 4x500GB used drives in my NAS and 2 of the drives had 'Reallocated_Sector_Ct' raw values set to 9 and 7. The 'Offline_Uncorrectable' and 'Current_Pending_Sector' raw values were still 0. I was wondering if I should add them to my pool, but eventually I figured I could take a chance since the values were under the threshold. But I will keep an eye on those numbers very carefully and if they ever increase, I will replace the drives.

So in your SMART tests, keep an eye on those values.
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
OK so I guess that opens up a follow up question.

Obviously I DON'T want to press REPLACE until such time as I am ready to add the new drive back into the pool.

That is, I can physically hook up the drive, do the burn in stuff (making sure burn the correct drive :) ) and THEN when all is well click on REPLACE.

Have I got that correct?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Yes
 

VladTepes

Patron
Joined
May 18, 2016
Messages
287
Thank you all.
 
Status
Not open for further replies.
Top