Is this bad: SMART error (ErrorCount) detected on host: freenas

Status
Not open for further replies.

esamett

Patron
Joined
May 28, 2011
Messages
345
I recently started using SMART service thanks to cyberjock. I got this message last night:

Code:


This message was generated by the smartd daemon running on:

  host name:  freenas
  DNS domain: domain

The following warning/error was logged by the smartd daemon:

Device: /dev/da12 [SAT], ATA error count increased from 1 to 3

Device info:
HGST HDN724040ALE640, S/N:PK2334PBK7JY9T, WWN:5-000cca-23dedb4b8, FW:MJAOA5E0, 4.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.


The drive manufacture date is June 2014. I will recheck the cable connections today since I have just added fans and moved server to address temps to 42'C.

In case it is helpful here is latest
freenas.domain security run output
Code:
[/B]
freenas.domain kernel log messages:
> ugen0.2: <American Power Conversion> at usbus0
> mps1: Calling Reinit from mps_wait_command
> mps1: Reinitializing controller,
> mps1: Firmware: 19.00.00.00, Driver: 16.00.00.00-fbsd
> mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
> mps1: mps_reinit finished sc 0xffffff8000a9d000 post 4 free 3
> mps1: Reinit success
> mpssas_get_sata_identify: request for page completed with error 60mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x43
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x43
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x43
> mpssas_get_sata_identify: error reading SATA PASSTHRU; iocstatus = 0x43
> failure at /fusion/jkh/921/freenas/FreeBSD/src/sys/dev/mps/mps_sas_lsi.c:667/mpssas_add_device()! Could not get ID for device with handle 0x000c
> mpssas_fw_work: failed to add device with handle 0xc
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mpssas_add_device: error reading SAS device page0
> mpssas_fw_work: failed to add device with handle 0xd
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mpssas_add_device: error reading SAS device page0
> mpssas_fw_work: failed to add device with handle 0xe
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mpssas_add_device: error reading SAS device page0
> mpssas_fw_work: failed to add device with handle 0xf
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mps_config_get_sas_device_pg0: page read with error; iocstatus = 0x20
> mpssas_add_device: error reading SAS device page0
> mpssas_fw_work: failed to add device with handle 0x10
> run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config
> da6 at mps0 bus 0 scbus0 target 6 lun 0
> da6: <ATA APPLE HDD HTS547 D70F> Fixed Direct Access SCSI-6 device
> da6: Serial Number      J2200050J358LA
> da6: 300.000MB/s transfers
> da6: Command Queueing enabled
> da6: 476940MB (976773168 512 byte sectors: 255H 63S/T 60801C)
> da7 at mpt0 bus 0 scbus1 target 0 lun 0
> da7: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-5 device
> da7: Serial Number      ML0220F30UE7HD
> da7: 300.000MB/s transfers
> da7: Command Queueing enabled
> da7: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #2 Launched!
> da14 at mps1 bus 0 scbus3 target 6 lun 0
> da14: <ATA Hitachi HDS5C302 A580> Fixed Direct Access SCSI-6 device
> da14: Serial Number      ML0220F318WXKD
> da14: 600.000MB/s transfers
> da14: Command Queueing enabled
> da14: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
> da11 at mps1 bus 0 scbus3 target 3 lun 0
> da11: <ATA HGST HDN724040AL A5E0> Fixed Direct Access SCSI-6 device
> da11: Serial Number      PK2334PBK09G6T
> da11: 600.000MB/s transfers
> da11: Command Queueing enabled
> da11: 3815447MB (7814037168 512 byte sectors: 255H 63S/T 486401C)
> da12 at mps1 bus 0 scbus3 target 4 lun 0
> da12: <ATA HGST HDN724040AL A5E0> Fixed Direct Access SCSI-6 device
> da12: Serial Number      PK2334PBK7JY9T
> da12: 600.000MB/s transfers
> da12: Command Queueing enabled
> da12: 3815447MB (7814037168 512 byte sectors: 255H 63S/T 486401C)
> Timecounter "TSC-low" frequency 1906399595 Hz quality 1000
> vboxdrv: fAsync=0 offMin=0x86d offMax=0x323d
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 3c f2 98 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 06 30 97 20 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> arp: 192.168.0.103 moved from 00:0c:29:9b:ec:f3 to bc:92:6b:2e:f9:6b on re0
> arp: 192.168.0.103 moved from bc:92:6b:2e:f9:6b to 00:0c:29:9b:ec:f3 on re0
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 41 1c 20 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
>   (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 36 44 68 00 00 01 00 00 00 length 131072 SMID 483 terminated ioc 804b scsi 0 state 0 xfer 0
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 36 43 68 00 00 01 00 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 3d eb 88 00 00 00 60 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 40 78 68 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 10 40 96 d0 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 7f 27 68 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 11 16 a8 98 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 71 32 68 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
>   (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 7f ee 10 00 00 01 00 00 00 length 131072 SMID 247 terminated ioc 804b scsi 0 state 0 xfer 0
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 7f ed 10 00 00 00 a0 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 74 63 70 00 00 01 00 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 67 ac 00 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 67 b9 88 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 81 84 c8 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
>   (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 82 8e 08 00 00 01 00 00 00 length 131072 SMID 525 terminated ioc 804b scsi 0 state 0 xfer 0
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 82 8d 08 00 00 01 00 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 81 3f 70 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 81 41 d0 00 00 01 00 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 6a 38 e0 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)
> (da12:mps1:0:4:0): READ(16). CDB: 88 00 00 00 00 01 14 7a bd 00 00 00 00 20 00 00
> (da12:mps1:0:4:0): CAM status: SCSI Status Error
> (da12:mps1:0:4:0): SCSI status: Check Condition
> (da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
> (da12:mps1:0:4:0): Retrying command (per sense data)

-- End of security output --


Upshot: is this a "return it, watch it, or forget it" message?

Thanks to all again,

e
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Please post the output of
Code:
smartctl -a /dev/da12
 

esamett

Patron
Joined
May 28, 2011
Messages
345
Thank you for reply. Here is data from shell:

Code:
Shell
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: HGST HDN724040ALE640
Serial Number: PK2334PBK7JY9T
LU WWN Device Id: 5 000cca 23dedb4b8
Firmware Version: MJAOA5E0
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 21 08:40:51 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
--More--(byte 992)

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 24) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
--More--(byte 1770)

Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 580) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always -
0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline -
77
3 Spin_Up_Time 0x0007 139 139 024 Pre-fail Always -
632 (Average 479)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always -
18
--More--(byte 2691)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always -
18
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always -
0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always -
0
8 Seek_Time_Performance 0x0005 121 121 020 Pre-fail Offline -
34
9 Power_On_Hours 0x0012 100 100 000 Old_age Always -
1054
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always -
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always -
18
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always -
59
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always -
59
194 Temperature_Celsius 0x0002 162 162 000 Old_age Always -
37 (Min/Max 25/53)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always -
0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always -
--More--(byte 3685)
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always -
0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline -
0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always -
25

SMART Error Log Version: 1
ATA Error Count: 25 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 25 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 31 87 82 19 01 Error: ICRC, ABRT at LBA = 0x01198287 = 18449031

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 80 10 b8 83 19 40 00 00:24:03.657 READ FPDMA QUEUED
60 00 08 b8 82 19 40 00 00:24:03.656 READ FPDMA QUEUED
60 00 00 b8 81 19 40 00 00:24:03.656 READ FPDMA QUEUED
60 00 10 b8 80 19 40 00 00:24:03.655 READ FPDMA QUEUED
60 00 08 b8 7f 19 40 00 00:24:03.655 READ FPDMA QUEUED

Error 24 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.
--More--(byte 5468)
Shell
Error 24 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:02:00.628 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:50.045 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.199 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.197 IDENTIFY DEVICE
00 e0 01 01 00 00 00 ff 00:01:44.453 NOP [Reserved subcommand] [OBS-ACS-
2]

Error 23 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

--More--(byte 6274)
Error 23 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:01:50.045 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.199 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.197 IDENTIFY DEVICE
00 e0 01 01 00 00 00 ff 00:01:44.453 NOP [Reserved subcommand] [OBS-ACS-
2]
00 e0 01 01 00 00 a0 ff 00:01:44.390 NOP [Reserved subcommand] [OBS-ACS-
2]

Error 22 occurred at disk power-on lifetime: 1051 hours (43 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
--More--(byte 7098)
Error 22 occurred at disk power-on lifetime: 1051 hours (43 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 87 9e bf 0b Error: ICRC, ABRT at LBA = 0x0bbf9e87 = 197107335

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 20 00 68 9e bf 40 00 11:58:46.537 READ FPDMA QUEUED
ea 00 00 00 00 00 00 00 11:58:41.751 FLUSH CACHE EXT
61 08 08 20 be c0 40 00 11:58:41.751 WRITE FPDMA QUEUED
61 08 00 20 bc c0 40 00 11:58:41.751 WRITE FPDMA QUEUED
61 08 08 20 04 40 40 00 11:58:41.751 WRITE FPDMA QUEUED

Error 21 occurred at disk power-on lifetime: 1048 hours (43 days + 16 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
--More--(byte 7999)
Error 22 occurred at disk power-on lifetime: 1051 hours (43 days + 19 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 87 9e bf 0b Error: ICRC, ABRT at LBA = 0x0bbf9e87 = 197107335

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 20 00 68 9e bf 40 00 11:58:46.537 READ FPDMA QUEUED
ea 00 00 00 00 00 00 00 11:58:41.751 FLUSH CACHE EXT
61 08 08 20 be c0 40 00 11:58:41.751 WRITE FPDMA QUEUED
61 08 00 20 bc c0 40 00 11:58:41.751 WRITE FPDMA QUEUED
61 08 08 20 04 40 40 00 11:58:41.751 WRITE FPDMA QUEUED

Error 21 occurred at disk power-on lifetime: 1048 hours (43 days + 16 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
--More--(byte 7999)

 

got mixed up at end of printout - trying to get overlap in readings:
Code:

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA
_of_first_error
# 1 Short offline Completed without error 00% 1047 -
# 2 Short offline Completed without error 00% 1004 -
# 3 Short offline Completed without error 00% 956 -
# 4 Short offline Completed without error 00% 908 -
# 5 Short offline Completed without error 00% 860 -
# 6 Short offline Completed without error 00% 812 -
# 7 Short offline Completed without error 00% 764 -
# 8 Extended offline Completed without error 00% 751 -
# 9 Short offline Completed without error 00% 716 -
#10 Short offline Completed without error 00% 669 -
#11 Short offline Completed without error 00% 622 -
#12 Short offline Completed without error 00% 574 -
#13 Short offline Completed without error 00% 550 -
#14 Short offline Completed without error 00% 502 -



SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas ~]#
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
UDMA_CRC_Error_Count isn't a good value for the age of the drive.

This is most commonly caused by a bad SATA/SAS cable. Find the cable for that drive and replace it.
 

esamett

Patron
Joined
May 28, 2011
Messages
345
Uh oh...another email of error:
Code:
This message was generated by the smartd daemon running on:

  host name:  freenas
  DNS domain: domain

The following warning/error was logged by the smartd daemon:

Device: /dev/da12 [SAT], ATA error count increased from 24 to 27

Device info:
HGST HDN724040ALE640, S/N:PK2334PBK7JY9T, WWN:5-000cca-23dedb4b8, FW:MJAOA5E0, 4.00 TB

For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional messages about this problem will be sent.
Reply, Reply All or Forward | More
 

esamett

Patron
Joined
May 28, 2011
Messages
345
Code:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: HGST HDN724040ALE640
Serial Number: PK2334PBK7JY9T
LU WWN Device Id: 5 000cca 23dedb4b8
Firmware Version: MJAOA5E0
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Sep 21 08:58:51 2014 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
--More--(byte 968)
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 24) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off supp
ort.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
--More--(byte 1810)
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 580) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always -
0
2 Throughput_Performance 0x0005 137 137 054 Pre-fail Offline -
77
3 Spin_Up_Time 0x0007 139 139 024 Pre-fail Always -
632 (Average 479)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always -
18
--More--(byte 2691)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always -
18
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always -
0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always -
0
8 Seek_Time_Performance 0x0005 121 121 020 Pre-fail Offline -
34
9 Power_On_Hours 0x0012 100 100 000 Old_age Always -
1054
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always -
0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always -
18
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always -
59
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always -
59
194 Temperature_Celsius 0x0002 162 162 000 Old_age Always -
37 (Min/Max 25/53)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always -
0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always -
--More--(byte 3685)
0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline -
0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always -
27

SMART Error Log Version: 1
ATA Error Count: 27 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 27 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
--More--(byte 4543)
Error 27 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 01 2f d6 18 01 Error: ICRC, ABRT at LBA = 0x0118d62f = 18404911

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 20 00 10 d6 18 40 00 00:33:21.527 READ FPDMA QUEUED
60 20 00 f0 d5 18 40 00 00:33:21.526 READ FPDMA QUEUED
60 20 00 d0 d5 18 40 00 00:33:21.526 READ FPDMA QUEUED
60 20 00 b0 d5 18 40 00 00:33:21.526 READ FPDMA QUEUED
60 20 00 90 d5 18 40 00 00:33:21.526 READ FPDMA QUEUED

Error 26 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
--More--(byte 5522)
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 11 7f 75 16 01 Error: ICRC, ABRT at LBA = 0x0116757f = 18249087

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 20 00 70 75 16 40 00 00:31:44.572 READ FPDMA QUEUED
60 20 00 50 75 16 40 00 00:31:44.572 READ FPDMA QUEUED
60 20 00 30 75 16 40 00 00:31:44.572 READ FPDMA QUEUED
60 20 00 10 75 16 40 00 00:31:44.572 READ FPDMA QUEUED
60 20 00 d0 74 16 40 00 00:31:44.572 READ FPDMA QUEUED

Error 25 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 31 87 82 19 01 Error: ICRC, ABRT at LBA = 0x01198287 = 18449031

--More--(byte 6485)
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 80 10 b8 83 19 40 00 00:24:03.657 READ FPDMA QUEUED
60 00 08 b8 82 19 40 00 00:24:03.656 READ FPDMA QUEUED
60 00 00 b8 81 19 40 00 00:24:03.656 READ FPDMA QUEUED
60 00 10 b8 80 19 40 00 00:24:03.655 READ FPDMA QUEUED
60 00 08 b8 7f 19 40 00 00:24:03.655 READ FPDMA QUEUED

Error 24 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:02:00.628 IDENTIFY DEVICE
--More--(byte 7533)
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:02:00.628 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:50.045 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.199 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.197 IDENTIFY DEVICE
00 e0 01 01 00 00 00 ff 00:01:44.453 NOP [Reserved subcommand] [OBS-ACS-
2]

Error 23 occurred at disk power-on lifetime: 1054 hours (43 days + 22 hours)
When the command that caused the error occurred, the device was active or idle
.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 00

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:01:50.045 IDENTIFY DEVICE
--More--(byte 8338)
-- -- -- -- -- -- -- -- ---------------- --------------------
ec 00 00 00 00 00 00 00 00:01:50.045 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.199 IDENTIFY DEVICE
ec 00 00 00 00 00 00 00 00:01:45.197 IDENTIFY DEVICE
00 e0 01 01 00 00 00 ff 00:01:44.453 NOP [Reserved subcommand] [OBS-ACS-
2]
00 e0 01 01 00 00 a0 ff 00:01:44.390 NOP [Reserved subcommand] [OBS-ACS-
2]

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA
_of_first_error
# 1 Short offline Completed without error 00% 1047 -
# 2 Short offline Completed without error 00% 1004 -
# 3 Short offline Completed without error 00% 956 -
# 4 Short offline Completed without error 00% 908 -
# 5 Short offline Completed without error 00% 860 -
# 6 Short offline Completed without error 00% 812 -
# 7 Short offline Completed without error 00% 764 -
# 8 Extended offline Completed without error 00% 751 -
# 9 Short offline Completed without error 00% 716 -
#10 Short offline Completed without error 00% 669 -
#11 Short offline Completed without error 00% 622 -
--More--(byte 9640)

#11 Short offline Completed without error 00% 622 -
#12 Short offline Completed without error 00% 574 -
#13 Short offline Completed without error 00% 550 -
#14 Short offline Completed without error 00% 502 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas ~]# 
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Try a new sata cable as cj pointed out..

Sent from my SGH-I257M using Tapatalk 2
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I answered you above.....
 

esamett

Patron
Joined
May 28, 2011
Messages
345
its a sata 8087 fan cable. I can order a new one. (I will make sure it is plugged in properly into the controller card first)

To verify your advice, I should cancel the RMA and replace cable. My drive is likely healthy?

Thanks.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Looks like the drive is probably fine.

Related tip: if you use an SSH session, you don't need more (or less) to get the full output of smartctl - and since you can resize the window, you can preserve the formatting.
 

esamett

Patron
Joined
May 28, 2011
Messages
345
cj:
Got cable ordered. In medicine we have to verify important information. Depending upon the circumstance it is called a "read back" or a "time-out" - periodically helpful, typically annoying. :)

e guru:
SSH still intimidates me. I keep forgetting which socket wrench it needs. :rolleyes:

Thank you all for your help and patience,

e
 

esamett

Patron
Joined
May 28, 2011
Messages
345
different problem:
lots of checksum errors ada4p2, serial number xxx-E7HD. I had one disk offline about two hours ago but didn't have the serial number - I do now. I disconnected all drives and reconnected one at a time without any disk falling off. Restarted server and now this happened.

I changed the cable and interface it was using. Volume came up clean. Doing Scrub now.

Advice?
o_O
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah.. sounds like you've got some bigger problems to deal with than just the cable problem. o_O
 

esamett

Patron
Joined
May 28, 2011
Messages
345
If bad drive it is time for that to happen - about three years. Can you point me in direction of proper test?

Thanks.

p.s. scrub is correcting some errors:
Code:
Scrub Volume
Shell
pool: tank2
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub in progress since Sun Sep 21 19:04:44 2014
13.4G scanned out of 5.73T at 208M/s, 7h59m to go
1008K repaired, 0.23% done
config:

NAME STATE READ WRITE CKS
UM
tank2 ONLINE 0 0
0
raidz3-0 ONLINE 0 0
0
gptid/ac247040-2690-11e4-b8b4-ac220b508cd3 ONLINE 0 0
0
gptid/acb24730-2690-11e4-b8b4-ac220b508cd3 ONLINE 0 0
0
gptid/ad41bc23-2690-11e4-b8b4-ac220b508cd3 ONLINE 0 0
--More--(byte 966)


I just did zpool clear tank2, and will look for more errors. If they happen it should be the drive?
 

esamett

Patron
Joined
May 28, 2011
Messages
345
Checksum errors da9p2 on same drive serial number xxE7HD. I guess I have bad disk. Glad I have Z3.

I just offlined the drive and will order a replacement. I'm out of spares. If I lose another drive I will shut down until replacements arrive.
 
Status
Not open for further replies.
Top