Uncorrectable parity/CRC error on SSD

Status
Not open for further replies.

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Hi,

I have 9 x WD Red 6 TB ( RAIDZ1) and 1 x 256 GB SSD for jails...

I have no problems on HDDs.. But on SSD I sometimes get the error below..

Code:
Nov 25 07:52:13 freenas (ada10:ahcich15:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 a0 8c 3e 40 12 00 00 01 00 00
Nov 25 07:52:13 freenas (ada10:ahcich15:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 25 07:52:13 freenas (ada10:ahcich15:0:0:0): Retrying command
Nov 25 07:58:34 freenas (ada10:ahcich15:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 58 4d 2b 40 12 00 00 01 00 00
Nov 25 07:58:34 freenas (ada10:ahcich15:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 25 07:58:34 freenas (ada10:ahcich15:0:0:0): Retrying command
Nov 25 08:14:24 freenas (ada10:ahcich15:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 50 6b 3d 40 12 00 00 01 00 00
Nov 25 08:14:24 freenas (ada10:ahcich15:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 25 08:14:24 freenas (ada10:ahcich15:0:0:0): Retrying command


The thing is this occurs for example one day and not for a month after.. So it doesn't happen regularly, also I never got more than 3 errors at the same time. I mean for example today, I got three errors and that's it. I don't think there'll be any more after this.

I checked the SMART and relocated sector count is still zero ( so no bad sector or something )..

I did see some errors on SMART though.

As far as I know FreeNAS finally wrote the information on disk successfully for now. Should I be concerned over a thing happens about every 1-2 months? My only jail takes about 40GB space on disk right now ( most of them Plex metadata ), should I arrange backup for the jail volume?
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
Hi,

Code:
Nov 25 07:52:13 freenas (ada10:ahcich15:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 a0 8c 3e 40 12 00 00 01 00 00
Nov 25 07:52:13 freenas (ada10:ahcich15:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 25 07:52:13 freenas (ada10:ahcich15:0:0:0): Retrying command
Nov 25 07:58:34 freenas (ada10:ahcich15:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 58 4d 2b 40 12 00 00 01 00 00
Nov 25 07:58:34 freenas (ada10:ahcich15:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 25 07:58:34 freenas (ada10:ahcich15:0:0:0): Retrying command
Nov 25 08:14:24 freenas (ada10:ahcich15:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 50 6b 3d 40 12 00 00 01 00 00
Nov 25 08:14:24 freenas (ada10:ahcich15:0:0:0): CAM status: Uncorrectable parity/CRC error
Nov 25 08:14:24 freenas (ada10:ahcich15:0:0:0): Retrying command

[... snip ...]

As far as I know FreeNAS finally wrote the information on disk successfully for now. Should I be concerned over a thing happens about every 1-2 months? My only jail takes about 40GB space on disk right now ( most of them Plex metadata ), should I arrange backup for the jail volume?

Hi there,

what are the SMART errors you are seeing ? Can you post a full smartctl -x /dev/daX ?

Concerning BackUps:
In my opinion everything should be backed up. But if you can live with a Plex Metadata rebuild, that is your choice ;-)
Would annoy me, personnally, to have to redownload the all the Plex data for my library...

BTW: What is the rest of your System config ?
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Hi,

I'm using Asrock C2750D4i with 32 GB ECC ram, Boot volume is in two 16GB mini USB drives (mirrored), jail volume is on Sandisk 256 GB SSD disk, and lastly there are 9 WD Red 6 TB drives attached to the mainboard as RAIDZ1.

Here is the smart output of SSD:

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	SanDisk SD8SBAT256G1122
Serial Number:	154933404374
LU WWN Device Id: 5 001b44 f1e551ed6
Firmware Version: Z2201000
User Capacity:	256,060,514,304 bytes [256 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Form Factor:	  2.5 inches
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:  ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Fri Nov 25 14:13:54 2016 EET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:  Unavailable
APM feature is:  Disabled
Rd look-ahead is: Enabled
Write cache is:  Enabled
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Unavailable

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)  Offline data collection activity
		  was completed without error.
		  Auto Offline Data Collection: Disabled.
Self-test execution status:	  (  0)  The previous self-test routine completed
		  without error or no self-test has ever 
		  been run.
Total time to complete Offline 
data collection:	 (	0) seconds.
Offline data collection
capabilities:	   (0x11) SMART execute Offline immediate.
		  No Auto Offline data collection support.
		  Suspend Offline collection upon new
		  command.
		  No Offline surface scan supported.
		  Self-test supported.
		  No Conveyance Self-test supported.
		  No Selective Self-test supported.
SMART capabilities:			(0x0002)  Does not save SMART data before
		  entering power-saving mode.
		  Supports SMART auto save timer.
Error logging capability:		(0x01)  Error logging supported.
		  General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (  2) minutes.
Extended self-test routine
recommended polling time:   (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAGS	VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct  -O--CK  100  100  000	-	0
  9 Power_On_Hours		  -O--CK  148  100  000	-	3476
12 Power_Cycle_Count	  -O--CK  100  100  000	-	26
166 Unknown_Attribute	  -O--CK  100  100  000	-	110
167 Unknown_Attribute	  -O--CK  100  100  000	-	0
168 Unknown_Attribute	  -O--CK  100  100  000	-	354
169 Unknown_Attribute	  -O--CK  100  100  000	-	319
170 Unknown_Attribute	  -O--CK  100  100  000	-	0
171 Unknown_Attribute	  -O--CK  100  100  000	-	0
172 Unknown_Attribute	  -O--CK  100  100  000	-	0
173 Unknown_Attribute	  -O--CK  100  100  ---	-	380
174 Unknown_Attribute	  -O--CK  100  100  000	-	22
187 Reported_Uncorrect	  -O--CK  100  100  000	-	0
194 Temperature_Celsius	-O---K  070  100  000	-	30 (Min/Max 0/50)
199 UDMA_CRC_Error_Count	-O--CK  100  100  000	-	0
230 Unknown_SSD_Attribute  -O--CK  100  100  000	-	10
232 Available_Reservd_Space PO--CK  100  100  004	-	100
233 Media_Wearout_Indicator -O--CK  100  100  000	-	49454
241 Total_LBAs_Written	  ----CK  253  253  000	-	13173
242 Total_LBAs_Read		----CK  253  253  000	-	1528
							||||||_ K auto-keep
							|||||__ C event count
							||||___ R error rate
							|||____ S speed/performance
							||_____ O updated online
							|______ P prefailure warning

General Purpose Log Directory Version 1
SMART		  Log Directory Version 1 [multi-sector log support]
Address	Access  R/W  Size  Description
0x00	  GPL,SL  R/O	  1  Log Directory
0x01		  SL  R/O	  1  Summary SMART error log
0x02		  SL  R/O	  1  Comprehensive SMART error log
0x03	  GPL	R/O	  1  Ext. Comprehensive SMART error log
0x04	  GPL,SL  R/O	  8  Device Statistics log
0x06		  SL  R/O	  1  SMART self-test log
0x07	  GPL	R/O	  1  Extended self-test log
0x09		  SL  R/W	  1  Selective self-test log
0x10	  GPL	R/O	  1  SATA NCQ Queued Error log
0x11	  GPL	R/O	  1  SATA Phy Event Counters log
0x30	  GPL,SL  R/O	  9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W	16  Host vendor specific log
0xe0	  GPL,SL  R/W	  1  SCT Command/Status
0xe1	  GPL,SL  R/W	  1  SCT Data Transfer

Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 11 (device log contains only the most recent 4 errors)
  CR	= Command Register
  FEATR  = Features Register
  COUNT  = Count (was: Sector Count) Register
  LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
  LH	= LBA High (was: Cylinder High) Register	]  LBA
  LM	= LBA Mid (was: Cylinder Low) Register	  ] Register
  LL	= LBA Low (was: Sector Number) Register	]
  DV	= Device (was: Device/Head) Register
  DC	= Device Control Register
  ER	= Error register
  ST	= Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 11 [0] occurred at disk power-on lifetime: 65535 hours (2730 days + 15 hours)
  When the command that caused the error occurred, the device was in a vendor specific state.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  ff -- ff ff ff ff ff ff ff ff ff ff ff

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]

Error 10 [3] occurred at disk power-on lifetime: 65535 hours (2730 days + 15 hours)
  When the command that caused the error occurred, the device was in a vendor specific state.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  ff -- ff ff ff ff ff ff ff ff ff ff ff

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]

Error 9 [2] occurred at disk power-on lifetime: 65535 hours (2730 days + 15 hours)
  When the command that caused the error occurred, the device was in a vendor specific state.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  ff -- ff ff ff ff ff ff ff ff ff ff ff

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]
  ff ff ff ff ff ff ff ff ff ff ff ff ff 49d+17:02:47.295  [VENDOR SPECIFIC]

Error 8 [1] occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  04 -- 51 01 00 00 12 6b 00 3d 50 00 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  b0 00 d1 01 01 00 00 4f 00 c2 01 40 08	00:00:00.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  2f 00 00 01 01 00 00 00 00 00 03 40 08	00:00:00.000  READ LOG EXT
  2f 00 00 01 01 00 00 00 00 00 00 40 08	00:00:00.000  READ LOG EXT
  b0 00 d5 01 01 00 00 4f 00 c2 00 40 08	00:00:00.000  SMART READ LOG
  b0 00 da 00 00 00 00 4f 00 c2 00 40 08	00:00:00.000  SMART RETURN STATUS

Warning! SMART Extended Self-test Log Structure error: invalid SMART checksum.
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	  Completed without error	  00%	  3459		-
# 2  Short offline	  Unknown status (0xb)		  10%	53042		-
# 3  Short offline	  Completed without error	  00%	45441		-
# 4  Short offline	  Completed without error	  00%	  3456		-
# 5  Short offline	  Completed without error	  00%	  3456		-
# 6  Short offline	  Completed without error	  00%	  3455		-
# 7  Short offline	  Completed without error	  00%	  3455		-
# 8  Short offline	  Completed without error	  00%	  3455		-
# 9  Short offline	  Completed without error	  00%	  3454		-
#10  Short offline	  Completed without error	  00%	  3453		-
#11  Short offline	  Completed without error	  00%	  3452		-
#12  Reserved (0x1d)	Completed without error	  00%	  3451		-
#13  Reserved (0x32)	Unknown status (0xc)		  150%	  3450		-
#14  Short offline	  Unknown status (0xb)		  10%	  3449		-
#15  Short offline	  Completed without error	  00%	  3449		-
#16  Short offline	  Completed without error	  00%	  3448		-
#17  Short offline	  Completed without error	  00%	  3447		-
#18  Short offline	  Completed without error	  00%	  3446		-
#19  Reserved (0x1d)	Completed without error	  00%	  3446		-

Selective Self-tests/Logging not supported

SCT Commands not supported

Device Statistics (GP Log 0x04)
Page  Offset Size		Value Flags Description
0x01  =====  =			  =  ===  == General Statistics (rev 1) ==
0x01  0x008  4			  26  ---  Lifetime Power-On Resets
0x01  0x018  6			  0  ---  Logical Sectors Written
0x01  0x020  6			  0  ---  Number of Write Commands
0x01  0x028  6			  0  ---  Logical Sectors Read
0x01  0x030  6			  0  ---  Number of Read Commands
0x07  =====  =			  =  ===  == Solid State Device Statistics (rev 1) ==
0x07  0x008  1			  12  ---  Percentage Used Endurance Indicator
								|||_ C monitored condition met
								||__ D supports DSN
								|___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID	  Size	Value  Description
0x0001  4			0  Command failed due to ICRC error
0x0002  4			0  R_ERR response for data FIS
0x0005  4			0  R_ERR response for non-data FIS
0x000a  4		  115  Device-to-host register FISes sent due to a COMRESET
 

darkwarrior

Patron
Joined
Mar 29, 2015
Messages
336
Hum.. I don't have much experience with SMART results on SSDs, but IMO that drive is looking fishy. o_O

First of all the SMART results are messed up. The extended Self-test log is not able to track and associate the Lifetime(hours) properly with the test execution.:confused:
Additionally it looks like that drive does not have a long SMART test done since a long time ( at all ?)
Can you run a Long SMART test and share the results ?
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Hi,

I did a long test but it doesn't show. I also checked the progress and it decreased from 90% gradually... But as I said as far as I understand there is no difference in the log:

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	SanDisk SD8SBAT256G1122
Serial Number:	154933404374
LU WWN Device Id: 5 001b44 f1e551ed6
Firmware Version: Z2201000
User Capacity:	256,060,514,304 bytes [256 GB]
Sector Size:	  512 bytes logical/physical
Rotation Rate:	Solid State Device
Form Factor:	  2.5 inches
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:  ACS-2 T13/2015-D revision 3
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Fri Nov 25 16:36:04 2016 AST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x02)  Offline data collection activity
		  was completed without error.
		  Auto Offline Data Collection: Disabled.
Self-test execution status:	  (  0)  The previous self-test routine completed
		  without error or no self-test has ever 
		  been run.
Total time to complete Offline 
data collection:	 (	0) seconds.
Offline data collection
capabilities:	   (0x11) SMART execute Offline immediate.
		  No Auto Offline data collection support.
		  Suspend Offline collection upon new
		  command.
		  No Offline surface scan supported.
		  Self-test supported.
		  No Conveyance Self-test supported.
		  No Selective Self-test supported.
SMART capabilities:			(0x0002)  Does not save SMART data before
		  entering power-saving mode.
		  Supports SMART auto save timer.
Error logging capability:		(0x01)  Error logging supported.
		  General Purpose Logging supported.
Short self-test routine 
recommended polling time:   (  2) minutes.
Extended self-test routine
recommended polling time:   (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct  0x0032  100  100  000	Old_age  Always	  -	  0
  9 Power_On_Hours		  0x0032  148  100  000	Old_age  Always	  -	  3476
12 Power_Cycle_Count	  0x0032  100  100  000	Old_age  Always	  -	  26
166 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  110
167 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  0
168 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  354
169 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  319
170 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  0
171 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  0
172 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  0
173 Unknown_Attribute	  0x0032  100  100  ---	Old_age  Always	  -	  380
174 Unknown_Attribute	  0x0032  100  100  000	Old_age  Always	  -	  22
187 Reported_Uncorrect	  0x0032  100  100  000	Old_age  Always	  -	  0
194 Temperature_Celsius	0x0022  066  100  000	Old_age  Always	  -	  34 (Min/Max 0/50)
199 UDMA_CRC_Error_Count	0x0032  100  100  000	Old_age  Always	  -	  0
230 Unknown_SSD_Attribute  0x0032  100  100  000	Old_age  Always	  -	  10
232 Available_Reservd_Space 0x0033  100  100  004	Pre-fail  Always	  -	  100
233 Media_Wearout_Indicator 0x0032  100  100  000	Old_age  Always	  -	  49473
241 Total_LBAs_Written	  0x0030  253  253  000	Old_age  Offline	  -	  13177
242 Total_LBAs_Read		0x0030  253  253  000	Old_age  Offline	  -	  1528

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 1
Warning: ATA error count 65535 inconsistent with error log pointer 1

ATA Error Count: 65535 (device log contains only the most recent five errors)
  CR = Command Register [HEX]
  FR = Features Register [HEX]
  SC = Sector Count Register [HEX]
  SN = Sector Number Register [HEX]
  CL = Cylinder Low Register [HEX]
  CH = Cylinder High Register [HEX]
  DH = Device/Head Register [HEX]
  DC = Device Command Register [HEX]
  ER = Error register [HEX]
  ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 65535 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 a8 c2 2f 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  06 01 01 00 00 00 40 08	  00:00:00.000  DATA SET MANAGEMENT
  b0 da 00 00 4f c2 40 08	  00:00:00.000  SMART RETURN STATUS
  b0 d0 01 00 4f c2 40 08	  00:00:00.000  SMART READ DATA
  b0 d5 01 06 4f c2 40 08	  00:00:00.000  SMART READ LOG
  b0 d5 01 01 4f c2 40 08	  00:00:00.000  SMART READ LOG

Error 65534 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 a0 58 75 52 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  06 01 01 00 00 00 40 08	  00:00:00.000  DATA SET MANAGEMENT
  b0 da 00 00 4f c2 40 08	  00:00:00.000  SMART RETURN STATUS
  b0 d0 01 00 4f c2 40 08	  00:00:00.000  SMART READ DATA
  b0 d5 01 06 4f c2 40 08	  00:00:00.000  SMART READ LOG
  b0 d5 01 01 4f c2 40 08	  00:00:00.000  SMART READ LOG

Error 65533 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 20 40 ad 41 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  06 01 01 00 00 00 40 08	  00:00:00.000  DATA SET MANAGEMENT
  b0 da 00 00 4f c2 40 08	  00:00:00.000  SMART RETURN STATUS
  b0 d0 01 00 4f c2 40 08	  00:00:00.000  SMART READ DATA
  b0 d5 01 06 4f c2 40 08	  00:00:00.000  SMART READ LOG
  b0 d5 01 01 4f c2 40 08	  00:00:00.000  SMART READ LOG

Error 65532 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 00 00 18 8f 69 00

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ea 00 00 00 00 00 40 08	  00:00:00.000  FLUSH CACHE EXT
  b0 da 00 00 4f c2 40 08	  00:00:00.000  SMART RETURN STATUS
  b0 d0 01 00 4f c2 40 08	  00:00:00.000  SMART READ DATA
  b0 d5 01 06 4f c2 40 08	  00:00:00.000  SMART READ LOG
  b0 d5 01 01 4f c2 40 08	  00:00:00.000  SMART READ LOG

Error 65531 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was doing SMART Offline or Self-test.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  04 51 00 50 6b 3d 00  Error: ABRT

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  b0 d0 01 00 4f c2 40 08	  00:00:00.000  SMART READ DATA
  b0 d1 01 01 4f c2 40 08	  00:00:00.000  SMART READ ATTRIBUTE THRESHOLDS [OBS-4]
  b0 da 00 00 4f c2 40 08	  00:00:00.000  SMART RETURN STATUS
  b0 d5 01 00 4f c2 40 08	  00:00:00.000  SMART READ LOG
  b0 d5 01 01 4f c2 40 08	  00:00:00.000  SMART READ LOG

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	  Completed without error	  00%	  131		-
# 2  Short offline	  Completed without error	  00%	  130		-
# 3  Short offline	  Completed without error	  00%	  129		-
# 4  Short offline	  Completed without error	  00%	  128		-
# 5  Short offline	  Completed without error	  00%	  128		-
# 6  Short offline	  Completed without error	  00%	  127		-
# 7  Short offline	  Completed without error	  00%	  127		-
# 8  Short offline	  Completed without error	  00%	  127		-
# 9  Short offline	  Completed without error	  00%	  126		-
#10  Short offline	  Completed without error	  00%	  125		-
#11  Short offline	  Completed without error	  00%	  124		-
#12  Short offline	  Completed without error	  00%	  123		-
#13  Short offline	  Completed without error	  00%	  122		-
#14  Short offline	  Completed without error	  00%	  121		-
#15  Short offline	  Completed without error	  00%	  121		-
#16  Short offline	  Completed without error	  00%	  120		-
#17  Short offline	  Completed without error	  00%	  119		-
#18  Short offline	  Completed without error	  00%	  118		-
#19  Short offline	  Completed without error	  00%	  118		-
#20  Short offline	  Completed without error	  00%	  117		-
#21  Short offline	  Completed without error	  00%	  117		-

Selective Self-tests/Logging not supported
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
First, it looks like you have SMART short tests running more than once an hour. Change that to somewhere between once a day and once a week. Otherwise, you'll never have a useful test history.

Second, the drive does look a bit flaky, so if you're not ready to replace it, start backing up anything on it that you care about.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
It's very apparent that smartctl cannot understand your SSD and as far as I know, smartctl is more suited for Hard Drives, not so much SSDs, but these tests are read only so they should not affect the SSD adversely. I believe you have two options here...

1) Replace the SATA data cable between the SSD and the motherboard. See if the problem persists.
2) Replace the SSD and the SATA data cable. See if the problem persists.

Also, what does a SCRUB report, has there been recoveries during a SCRUB? Probably not but it should be asked.

And as @Robert Trevellyan said, change the frequency of running a SMART Short Test. I run mine once a day for the Short test and the Long test once a week for a hard drive, if I were to have SSDs for a pool then I'd probably shoot for less frequent.
 

Plato

Contributor
Joined
Mar 24, 2016
Messages
101
Hi,

First of all, the problem still persists from time to time ( every few days ). I didn't see more than 3 message at one time and mostly only one at a time.

But I will add a mirror to that SSD. I have one another SSD with the same capacity as the other one. So it should keep any possible problems at bay at least.

I also modified my short and long self-tests for smart and also scrubs. I should say that, I don't get any problems in scrubs for either my SSD, or other HDDs yet.

I'll also change the cable when I open the case. It may be as you said a cable problem ( which I saw in one of my HDDs ).

BTW, I also saw a DRDY ( Disk Ready ) ERROR on one of my HDDs, but it was again not every time ( maybe 3-4 times for the last 8 months ).. Do you think that I should replace the drive or should I wait for a while to show itself as more serious problem.

When you think about this system which is running 7/24, how long do you think it'll take to degrade HDDs? I'm using it for my downloads ( TV shows / movies ) and for plex media server, and it's almost filled up to the brim ( almost 90% percent full ).

I know that keeping it that full decreases the performance but it'll have to wait, while I build the expansion for this system.
 

LawrenceSystems

Dabbler
Joined
Jun 14, 2017
Messages
14
Just finished a build using six WD Red 6 TB drives in a raidz2 setup and when under full load I am randomly getting the same error. All smart status is fine and the error pops up randomly. I am wondering if it is just something with the drives & FreeNAS.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Please start your own thread and describe exactly what you're seeing. Include full details of your system, including hardware and software. That way people will have the best chance of helping you.
 
Status
Not open for further replies.
Top