Boot SSD Failing?

Status
Not open for further replies.

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
I received an email alert

Device: /dev/ada2, Failed SMART usage Attribute: 1 Raw_Read_Error_Rate

So I checked ada2, it's my 30GB SSD boot (FreeNAS OS) drive. Now, this drive is an OCZ original, I paid as much for this SSD years ago as you would for a good 1TB drive now. It has had lots of use and is probably ready to be retired.
My question (finally) is this a normal SMART code for an SSD? It happened shortly after I upgraded from 11.1 to 11.2BETA. Maybe that's a coincidence. My system is still stable but I do have a replacement on the way. Any education on this matter is appreciated.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
SMART codes are not always clear and ID1 may or maynot be a true reflection of a failure however if this code is a low count (1 to 10) and not super high (10,000...) then it's likely to be accurate. Tell you what, post the entire output of smartctl -x /dev/ada2 in code brackets and we will let you know. There are more indicators on a SSD than ID 1 but since some of these IDs are custom to the vendor, we need to see all the data in order to give you a good answer. If the "-x" switch fails the use the "-a" switch. Also look at my little guide in my signature, it can give you pointers.
 

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
SMART codes are not always clear and ID1 may or maynot be a true reflection of a failure however if this code is a low count (1 to 10) and not super high (10,000...) then it's likely to be accurate. Tell you what, post the entire output of smartctl -x /dev/ada2 in code brackets and we will let you know. There are more indicators on a SSD than ID 1 but since some of these IDs are custom to the vendor, we need to see all the data in order to give you a good answer. If the "-x" switch fails the use the "-a" switch. Also look at my little guide in my signature, it can give you pointers.
Thanks Joe. I will be away from the machine at band practice tonight, but should be able to get that information within 24 hours.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
No problem. I'm hit and miss on the computer these days but if I don't chime in, someone else most certainly will. One of the values you should be looking for is the wear level as well. Wish I could tell you it was a specific ID but I've seen different ID numbers used. Another value it total data written. If you take the size of your drive and multiply it by 2000, that is the total writes a typical SSD can endure. If your drive is close to that maximum then it will fail soon with the inability to write any more data.

On a good note, small SSD's are much cheaper these days! The Samsung 860 is a fantastic model, the 850 is great as well and I own a few of the 850's. I think the write endurance is slightly higher on the 860 model. I only buy Samsung branded SSDs for my main computers now just because it's a personal preference. I have a Samsung TV (x2) and use to own a Samsung Refrigerator (it stayed with the house when we moved out), and a few Samsung cell phones. Of course I've used all kinds of brands for other projects and so far only one needed a firmware upgrade because it locked up after running so many hours.

Hey, good luck.
 

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
Here we go
Code:
root@freenas:~ # smartctl -x /dev/ada2
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:	 SandForce Driven SSDs
Device Model:	 OCZ-VERTEX3
Serial Number:	OCZ-H0E954WL84QJO8UN
LU WWN Device Id: 5 e83a97 f15d79b4d
Firmware Version: 2.13
User Capacity:	60,021,399,040 bytes [60.0 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Device is:		In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Sat Oct 20 19:11:27 2018 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
										was never started.
										Auto Offline Data Collection: Disabled.
Self-test execution status:	  (   0) The previous self-test routine completed
										without error or no self-test has ever
										been run.
Total time to complete Offline
data collection:				( 2097) seconds.
Offline data collection
capabilities:					(0x7f) SMART execute Offline immediate.
										Auto Offline data collection on/off support.
										Abort Offline collection upon new
										command.
										Offline surface scan supported.
										Self-test supported.
										Conveyance Self-test supported.
										Selective Self-test supported.
SMART capabilities:			(0x0003) Saves SMART data before entering
										power-saving mode.
										Supports SMART auto save timer.
Error logging capability:		(0x01) Error logging supported.
										General Purpose Logging supported.
Short self-test routine
recommended polling time:		(   1) minutes.
Extended self-test routine
recommended polling time:		(  48) minutes.
Conveyance self-test routine
recommended polling time:		(   2) minutes.
SCT capabilities:			  (0x0021) SCT Status supported.
										SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate	 POSR--   092   092   050	-	0/171580964
  5 Retired_Block_Count	 PO--CK   100   100   003	-	0
  9 Power_On_Hours_and_Msec -O--CK   000   000   000	-	113378h+27m+47.170s
 12 Power_Cycle_Count	   -O--CK   100   100   000	-	158
171 Program_Fail_Count	  -O--CK   000   000   000	-	0
172 Erase_Fail_Count		-O--CK   000   000   000	-	0
174 Unexpect_Power_Loss_Ct  ----CK   000   000   000	-	37
177 Wear_Range_Delta		------   000   000   000	-	8
181 Program_Fail_Count	  -O--CK   000   000   000	-	0
182 Erase_Fail_Count		-O--CK   000   000   000	-	0
187 Reported_Uncorrect	  -O--CK   100   100   000	-	0
194 Temperature_Celsius	 -O---K   128   129   000	-	128 (0 127 0 129 0)
195 ECC_Uncorr_Error_Count  --SRC-   120   120   000	-	0/171580964
196 Reallocated_Event_Count PO--CK   100   100   003	-	0
201 Unc_Soft_Read_Err_Rate  --SRC-   120   120   000	-	0/171580964
204 Soft_ECC_Correct_Rate   --SRC-   120   120   000	-	0/171580964
230 Life_Curve_Status	   PO--C-   100   100   000	-	100
231 SSD_Life_Left		   PO--C-   100   100   010	-	0
233 SandForce_Internal	  ------   000   000   000	-	13705
234 SandForce_Internal	  -O--CK   000   000   000	-	3992
241 Lifetime_Writes_GiB	 -O--CK   000   000   000	-	3992
242 Lifetime_Reads_GiB	  -O--CK   000   000   000	-	24128
							||||||_ K auto-keep
							|||||__ C event count
							||||___ R error rate
							|||____ S speed/performance
							||_____ O updated online
							|______ P prefailure warning

General Purpose Log Directory Version 1
SMART		   Log Directory Version 1 [multi-sector log support]
Address	Access  R/W   Size  Description
0x00	   GPL,SL  R/O	  1  Log Directory
0x07	   GPL	 R/O	  1  Extended self-test log
0x09		   SL  R/W	  1  Selective self-test log
0x10	   GPL	 R/O	  1  NCQ Command Error log
0x11	   GPL,SL  R/O	  1  SATA Phy Event Counters log
0x80-0x9f  GPL,SL  R/W	 16  Host vendor specific log
0xb7	   GPL,SL  VS	  16  Device vendor specific log
0xd0	   GPL	 VS   65535  Device vendor specific log
0xd0	   SL	  VS	 255  Device vendor specific log
0xe0	   GPL,SL  R/W	  1  SCT Command/Status
0xe1	   GPL,SL  R/W	  1  SCT Data Transfer

SMART Extended Comprehensive Error Log (GP Log 0x03) not supported

SMART Error Log not supported

SMART Extended Self-test Log Version: 1 (1 sectors)
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:				  3
SCT Version (vendor specific):	   0 (0x0000)
SCT Support Level:				   1
Device State:						Active (0)
Current Temperature:					 ? Celsius
Power Cycle Min/Max Temperature:	 127/-127 Celsius
Lifetime	Min/Max Temperature:	  0/ 0 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:	 10 (Unknown, should be 2)
Temperature Sampling Period:		 1 minute
Temperature Logging Interval:		10 minutes
Min/Max recommended Temperature:	 -52/88 Celsius
Min/Max Temperature Limit:		   34/89 Celsius
Temperature History Size (Index):	30464 (451)
Invalid Temperature History Size or Index

SCT Error Recovery Control command not supported

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	 Value  Description
0x0001  2			0  Command failed due to ICRC error
0x0003  2			0  R_ERR response for device-to-host data FIS
0x0004  2			0  R_ERR response for host-to-device data FIS
0x0006  2			0  R_ERR response for device-to-host non-data FIS
0x0007  2			0  R_ERR response for host-to-device non-data FIS
0x0008  2			0  Device-to-host non-data FIS retries
0x0009  2		   48  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2		   47  Device-to-host register FISes sent due to a COMRESET
0x000f  2			0  R_ERR response for host-to-device data FIS, CRC
0x0010  2			0  R_ERR response for host-to-device data FIS, non-CRC
0x0012  2			0  R_ERR response for host-to-device non-data FIS, CRC
0x0013  2			0  R_ERR response for host-to-device non-data FIS, non-CRC
0x0002  2			0  R_ERR response for data FIS
0x0005  2			0  R_ERR response for non-data FIS
0x000b  2			0  CRC errors within host-to-device FIS
0x000d  2			0  Non-CRC errors within host-to-device FIS
 

tres_kun

Dabbler
Joined
Oct 10, 2015
Messages
40
Only 4Tb of data written at its age
It sure faild soon
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Well in looking at the SMART data all I can say it I don't see any obvious failures. The data is definately not clear. Since this problem started when you upgraded I would suggest that your roll back to the previous version and see if the problem goes away. If it goes away then maybe submit a bug report. Have you tried the 11.2-RC yet?
 

Szyrs

Dabbler
Joined
Jan 23, 2015
Messages
15
Just out of curiosity, how did you manage to squeeze 113,378 "Power: On" hours inbetween now & 2011, when the OCZ Vertex3 was released?
 

chris crude

Patron
Joined
Oct 13, 2016
Messages
210
Just out of curiosity, how did you manage to squeeze 113,378 "Power: On" hours inbetween now & 2011, when the OCZ Vertex3 was released?
Good question, seeing that's nearly double the amount of actual hours in a 7 year span. Some of the SMART data doesnt make sense, that bit among others.
 
Status
Not open for further replies.
Top