Problem with WD REDS

-Adam- · May 13, 2019

HI!

I have a strange problem with my backup FREENAS. I have set up RAID1 (since this is a replica backup of my main RAIDZ2 FREENAS).

3 days ago I got a notification that 1 of 4 HD is getting errors:

FREENAS-BACKUP.local kernel log messages:

(aprobe0:ata3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(ada1:ata3:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
(ada1:ata3:0:0:0): CAM status: Command timeout
(ada1:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SET_MULTI. ACB: c6 00 00 00 00 40 00 00 00 00 10 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 45 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(ada1:ata3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
(ada1:ata3:0:0:0): CAM status: Command timeout
(ada1:ata3:0:0:0): Retrying command
(ada1:ata3:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
(ada1:ata3:0:0:0): CAM status: Command timeout
(ada1:ata3:0:0:0): Retrying command
(ada1:ata3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
(ada1:ata3:0:0:0): CAM status: Command timeout
(ada1:ata3:0:0:0): Retrying command

Checking status of zfs pools:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
DOE_ARRAY 10.9T 6.27T 4.60T - - 0% 57% 1.00x DEGRADED /mnt
freenas-boot 14G 774M 13.2G - - - 5% 1.00x ONLINE -

pool: DOE_ARRAY
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub repaired 0 in 0 days 04:47:09 with 0 errors on Sun Apr 28 04:47:10 2019
config:

NAME STATE READ WRITE CKSUM
DOE_ARRAY DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
gptid/d68b5873-2938-11e9-a464-001b21b6e94c ONLINE 0 0 0
gptid/d77038ae-2938-11e9-a464-001b21b6e94c FAULTED 1 416 0 too many errors
gptid/d85a31dd-2938-11e9-a464-001b21b6e94c ONLINE 0 0 0
gptid/d93b2fae-2938-11e9-a464-001b21b6e94c ONLINE 0 0 0

errors: No known data errors

New alerts:
* Device: /dev/ada1, not capable of SMART self-check

I have purchased new 3TB WD RED, but today second disk disappeared:

FREENAS-BACKUP.local kernel log messages:

(aprobe0:ata3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SET_MULTI. ACB: c6 00 00 00 00 40 00 00 00 00 10 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 45 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SET_MULTI. ACB: c6 00 00 00 00 40 00 00 00 00 10 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(ada1:ata3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 00 00 00 00 00
(ada1:ata3:0:0:0): CAM status: Command timeout
(ada1:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SETFEATURES SET TRANSFER MODE. ACB: ef 03 00 00 00 40 00 00 00 00 45 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(ada1:ata3:0:0:0): SETFEATURES ENABLE WCACHE. ACB: ef 02 00 00 00 40 00 00 00 00 00 00
(ada1:ata3:0:0:0): CAM status: Command timeout
(ada1:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
(aprobe0:ata3:0:0:0): SET_MULTI. ACB: c6 00 00 00 00 40 00 00 00 00 10 00
(aprobe0:ata3:0:0:0): CAM status: Command timeout
(aprobe0:ata3:0:0:0): Retrying command
ada2 at ata3 bus 0 scbus1 target 1 lun 0
ada2: <WDC WD30EFRX-68EUZN0 82.00A82> s/n WD-WCC4N6EATE69 detached
ada1 at ata3 bus 0 scbus1 target 0 lun 0
ada1: <WDC WD30EFRX-68EUZN0 82.00A82> s/n WD-WCC4N4CH8V2A detached
GEOM_MIRROR: Device swap1: provider ada1p1 disconnected.GEOM_MIRROR
: Device swap0: provider ada2p1 disconnected.
(ada2:ata3:0:1:0): Periph destroyed
(ada1:ata3:0:0:0): Periph destroyed
GEOM_ELI: Device mirror/swap1.eli destroyed.
GEOM_MIRROR: Device swap1: provider destroyed.
GEOM_MIRROR: Device swap1 destroyed.
GEOM_ELI: Device mirror/swap0.eli destroyed.
GEOM_MIRROR: Device swap0: provider destroyed.
GEOM_MIRROR: Device swap0 destroyed.
GEOM_MIRROR: Device mirror/swap0 launched (2/2).
GEOM_ELI: Device mirror/swap0.eli created.
GEOM_ELI: Encryption: AES-XTS 128
GEOM_ELI: Crypto: hardware

Checking status of zfs pools:
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
DOE_ARRAY 10.9T 6.27T 4.60T - - 0% 57% 1.00x UNAVAIL /mnt
freenas-boot 14G 774M 13.2G - - - 5% 1.00x ONLINE -

pool: DOE_ARRAY
state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool clear'.
see: http://illumos.org/msg/ZFS-8000-JQ
scan: scrub repaired 0 in 0 days 04:47:09 with 0 errors on Sun Apr 28 04:47:10 2019
config:

NAME STATE READ WRITE CKSUM
DOE_ARRAY UNAVAIL 0 0 0
raidz1-0 UNAVAIL 0 0 0
gptid/d68b5873-2938-11e9-a464-001b21b6e94c ONLINE 0 0 0
gptid/d77038ae-2938-11e9-a464-001b21b6e94c FAULTED 1 416 0 too many errors
13022967432022470486 REMOVED 0 0 0 was /dev/gptid/d85a31dd-2938-11e9-a464-001b21b6e94c
gptid/d93b2fae-2938-11e9-a464-001b21b6e94c ONLINE 0 0 0

errors: 2 data errors, use '-v' for a list

I have rebooted the machine and now data is available (second disk ada 2 is visible), but still ada1 is to be replaced due to the smart errors. My question is - why ada2 was removed from pool?

Could you please take a look at smartctl below and tell what you think? I will post results in next messages.

I'm definitely replacing ADA1, but ADA2 is ok in my opinion.

Thanks!
Adam

-Adam- · May 13, 2019

ADA1 - not full - limmited to 30000 char

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N4CH8V2A
LU WWN Device Id: 5 0014ee 20df4eb67
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon May 13 09:50:51 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (38760) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 389) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 175 171 021 - 6250
4 Start_Stop_Count -O--CK 100 100 000 - 90
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 073 073 000 - 20308
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 90
192 Power-Off_Retract_Count -O--CK 200 200 000 - 67
193 Load_Cycle_Count -O--CK 200 200 000 - 172
194 Temperature_Celsius -O---K 123 109 000 - 27
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 177 089 000 - 194661
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 0
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 2048 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 42 (device log contains only the most recent 24 errors)
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 42 [17] occurred at disk power-on lifetime: 20247 hours (843 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 48 00 01 19 05 4a 98 40 00 Error: IDNF 72 sectors at LBA = 0x119054a98 = 4714744472

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 48 00 01 19 05 4a 98 40 00 13d+22:35:50.299 WRITE DMA EXT
35 00 00 00 78 00 01 19 05 4a 18 40 00 13d+22:35:50.299 WRITE DMA EXT
35 00 00 00 38 00 00 a7 c6 3a d8 40 00 13d+22:35:50.299 WRITE DMA EXT
35 00 00 00 50 00 00 74 20 72 90 40 00 13d+22:35:50.298 WRITE DMA EXT
35 00 00 00 88 00 00 49 6e 49 78 40 00 13d+22:35:50.296 WRITE DMA EXT

Error 41 [16] occurred at disk power-on lifetime: 20247 hours (843 days + 15 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 10 00 00 49 6e 48 c0 40 00 Error: IDNF 16 sectors at LBA = 0x496e48c0 = 1231964352

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 10 00 00 49 6e 48 c0 40 00 13d+22:31:10.116 WRITE DMA EXT
35 00 00 00 08 00 00 47 4c 03 88 40 00 13d+22:31:10.116 WRITE DMA EXT
35 00 00 00 08 00 00 47 4c 03 78 40 00 13d+22:31:10.116 WRITE DMA EXT
35 00 00 00 28 00 00 46 13 59 f0 40 00 13d+22:31:10.115 WRITE DMA EXT
35 00 00 00 10 00 00 46 13 59 d8 40 00 13d+22:31:10.115 WRITE DMA EXT

Warning! SMART Extended Comprehensive Error Log Structure error: invalid SMART checksum.
Error 40 [15] occurred at disk power-on lifetime: 18234 hours (759 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- b1 60 34 00 1a 1d bc da 98 a4 00 Device Fault; Error: IDNF 24628 sectors at LBA = 0x1a1dbcda98 = 112168065688

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 30 00 01 12 bc ca 98 40 00 29d+12:02:07.357 WRITE DMA EXT
35 00 00 01 00 00 01 12 bc c9 98 40 00 29d+12:02:07.356 WRITE DMA EXT
35 00 00 01 00 00 01 12 bc c8 98 40 00 29d+12:02:07.355 WRITE DMA EXT
35 00 00 01 00 00 01 12 bc c7 98 40 00 29d+12:02:07.354 WRITE DMA EXT
35 00 00 01 00 00 01 12 bc c6 98 40 00 29d+12:02:07.353 WRITE DMA EXT

Error 39 [14] occurred at disk power-on lifetime: 18234 hours (759 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 01 00 00 01 14 1a 92 40 40 00 Error: IDNF 256 sectors at LBA = 0x1141a9240 = 4632252992

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 01 00 00 01 14 1a 92 40 40 00 29d+11:54:33.982 WRITE DMA EXT
35 00 00 01 00 00 01 14 1a 91 40 40 00 29d+11:54:33.981 WRITE DMA EXT
35 00 00 00 d8 00 01 14 1a 90 68 40 00 29d+11:54:33.980 WRITE DMA EXT
35 00 00 01 00 00 01 14 1a 8f 68 40 00 29d+11:54:33.979 WRITE DMA EXT
35 00 00 01 00 00 01 14 1a 8e 68 40 00 29d+11:54:33.977 WRITE DMA EXT

Error 38 [13] occurred at disk power-on lifetime: 18234 hours (759 days + 18 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 10 00 01 3f 42 e3 f8 40 00 Error: IDNF 16 sectors at LBA = 0x13f42e3f8 = 5356315640

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 10 00 01 3f 42 e3 f8 40 00 29d+11:53:32.593 WRITE DMA EXT
35 00 00 00 08 00 01 3f 42 e3 e8 40 00 29d+11:53:32.593 WRITE DMA EXT
35 00 00 00 10 00 01 19 cd 4e b0 40 00 29d+11:53:32.593 WRITE DMA EXT
35 00 00 00 10 00 01 13 db 35 f8 40 00 29d+11:53:32.593 WRITE DMA EXT
35 00 00 00 08 00 01 13 db 35 e0 40 00 29d+11:53:32.593 WRITE DMA EXT

Error 37 [12] occurred at disk power-on lifetime: 18187 hours (757 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 01 00 00 00 7f ca 00 20 40 00 Error: IDNF 256 sectors at LBA = 0x7fca0020 = 2143944736

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 01 00 00 00 7f ca 00 20 40 00 27d+12:26:52.666 WRITE DMA EXT
35 00 00 00 20 00 00 7f ca 00 00 40 00 27d+12:26:52.665 WRITE DMA EXT
35 00 00 01 00 00 00 7f c9 ff 00 40 00 27d+12:26:52.664 WRITE DMA EXT
35 00 00 01 00 00 00 7f c9 fe 00 40 00 27d+12:26:52.664 WRITE DMA EXT
35 00 00 01 00 00 00 7f c9 fd 00 40 00 27d+12:26:52.661 WRITE DMA EXT

Error 36 [11] occurred at disk power-on lifetime: 18187 hours (757 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 00 e0 00 00 7f 75 91 48 40 00 Error: IDNF 224 sectors at LBA = 0x7f759148 = 2138411336

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 00 e0 00 00 7f 75 91 48 40 00 27d+12:25:28.220 WRITE DMA EXT
35 00 00 01 00 00 00 7f 75 90 48 40 00 27d+12:25:28.219 WRITE DMA EXT
35 00 00 01 00 00 00 7f 75 8f 48 40 00 27d+12:25:28.218 WRITE DMA EXT
35 00 00 01 00 00 00 7f 75 8e 48 40 00 27d+12:25:28.217 WRITE DMA EXT
35 00 00 01 00 00 00 7f 75 8d 48 40 00 27d+12:25:28.216 WRITE DMA EXT

Error 35 [10] occurred at disk power-on lifetime: 18187 hours (757 days + 19 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
10 -- 51 01 00 00 00 7d fe 13 98 40 00 Error: IDNF 256 sectors at LBA = 0x7dfe1398 = 2113803160

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
35 00 00 01 00 00 00 7d fe 13 98 40 00 27d+12:24:18.899 WRITE DMA EXT
35 00 00 01 00 00 00 7d fe 12 98 40 00 27d+12:24:18.898 WRITE DMA EXT
35 00 00 01 00 00 00 7d fe 11 98 40 00 27d+12:24:18.897 WRITE DMA EXT
35 00 00 00 30 00 00 7d fe 11 68 40 00 27d+12:24:18.896 WRITE DMA EXT
35 00 00 01 00 00 00 7d fe 10 68 40 00 27d+12:24:18.895 WRITE DMA EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 17191 -
# 2 Extended offline Completed without error 00% 16447 -
# 3 Extended offline Completed without error 00% 15728 -
# 4 Extended offline Completed without error 00% 14989 -
# 5 Extended offline Completed without error 00% 14270 -
# 6 Extended offline Completed without error 00% 13526 -
# 7 Extended offline Completed without error 00% 12783 -
# 8 Extended offline Completed without error 00% 12065 -
# 9 Extended offline Completed without error 00% 11322 -
#10 Extended offline Completed without error 00% 10644 -
#11 Extended offline Completed without error 00% 9930 -
#12 Extended offline Completed without error 00% 9259 -
#13 Extended offline Completed without error 00% 8515 -
#14 Extended offline Completed without error 00% 7772 -
#15 Extended offline Completed without error 00% 7053 -
#16 Extended offline Completed without error 00% 6308 -
#17 Extended offline Completed without error 00% 5589 -
#18 Extended offline Completed without error 00% 4846 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 27 Celsius
Power Cycle Min/Max Temperature: 25/27 Celsius
Lifetime Min/Max Temperature: 2/41 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (400)

-Adam- · May 13, 2019

ADA2

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Red
Device Model: WDC WD30EFRX-68EUZN0
Serial Number: WD-WCC4N6EATE69
LU WWN Device Id: 5 0014ee 2b89fdea1
Firmware Version: 82.00A82
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon May 13 09:46:44 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Unavailable
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
DSN feature is: Unavailable
ATA Security is: Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (39060) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 392) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x703d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 45
3 Spin_Up_Time POS--K 176 172 021 - 6183
4 Start_Stop_Count -O--CK 100 100 000 - 90
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 073 073 000 - 20307
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 90
192 Power-Off_Retract_Count -O--CK 200 200 000 - 67
193 Load_Cycle_Count -O--CK 200 200 000 - 177
194 Temperature_Celsius -O---K 123 109 000 - 27
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 0
198 Offline_Uncorrectable ----CK 100 253 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 200 200 000 - 7
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning

General Purpose Log Directory Version 1
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 GPL,SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x03 GPL R/O 6 Ext. Comprehensive SMART error log
0x04 GPL,SL R/O 8 Device Statistics log
0x06 SL R/O 1 SMART self-test log
0x07 GPL R/O 1 Extended self-test log
0x09 SL R/W 1 Selective self-test log
0x0c GPL R/O 2048 Pending Defects log
0x10 GPL R/O 1 NCQ Command Error log
0x11 GPL R/O 1 SATA Phy Event Counters log
0x21 GPL R/O 1 Write stream error log
0x22 GPL R/O 1 Read stream error log
0x80-0x9f GPL,SL R/W 16 Host vendor specific log
0xa0-0xa7 GPL,SL VS 16 Device vendor specific log
0xa8-0xb7 GPL,SL VS 1 Device vendor specific log
0xbd GPL,SL VS 1 Device vendor specific log
0xc0 GPL,SL VS 1 Device vendor specific log
0xc1 GPL VS 93 Device vendor specific log
0xe0 GPL,SL R/W 1 SCT Command/Status
0xe1 GPL,SL R/W 1 SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (6 sectors)
Device Error Count: 1
CR = Command Register
FEATR = Features Register
COUNT = Count (was: Sector Count) Register
LBA_48 = Upper bytes of LBA High/Mid/Low Registers ] ATA-8
LH = LBA High (was: Cylinder High) Register ] LBA
LM = LBA Mid (was: Cylinder Low) Register ] Register
LL = LBA Low (was: Sector Number) Register ]
DV = Device (was: Device/Head) Register
DC = Device Control Register
ER = Error register
ST = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 16779 hours (699 days + 3 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER -- ST COUNT LBA_48 LH LM LL DV DC
-- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 b8 00 00 01 b4 ff 88 e1 00 Error: UNC 184 sectors at LBA = 0x01b4ff88 = 28639112

Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name
-- == -- == -- == == == -- -- -- -- -- --------------- --------------------
c8 00 00 00 b8 00 00 01 b4 fe f0 e1 00 7d+16:12:06.960 READ DMA
c8 00 00 00 b0 00 00 01 b4 fe 40 e1 00 7d+16:12:06.960 READ DMA
c8 00 00 00 b0 00 00 01 b4 fd 90 e1 00 7d+16:12:06.959 READ DMA
c8 00 00 00 b0 00 00 01 b4 fc e0 e1 00 7d+16:12:06.958 READ DMA
c8 00 00 00 b0 00 00 01 b4 fc 30 e1 00 7d+16:12:06.958 READ DMA

SMART Extended Self-test Log Version: 1 (1 sectors)
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 17183 28639112
# 2 Extended offline Completed without error 00% 16447 -
# 3 Extended offline Completed without error 00% 15728 -
# 4 Extended offline Completed without error 00% 14989 -
# 5 Extended offline Completed without error 00% 14270 -
# 6 Extended offline Completed without error 00% 13526 -
# 7 Extended offline Completed without error 00% 12783 -
# 8 Extended offline Completed without error 00% 12065 -
# 9 Extended offline Completed without error 00% 11322 -
#10 Extended offline Completed without error 00% 10643 -
#11 Extended offline Completed without error 00% 9930 -
#12 Extended offline Completed without error 00% 9259 -
#13 Extended offline Completed without error 00% 8516 -
#14 Extended offline Completed without error 00% 7772 -
#15 Extended offline Completed without error 00% 7052 -
#16 Extended offline Completed without error 00% 6308 -
#17 Extended offline Completed without error 00% 5589 -
#18 Extended offline Completed without error 00% 4846 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version: 3
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 27 Celsius
Power Cycle Min/Max Temperature: 26/27 Celsius
Lifetime Min/Max Temperature: 2/41 Celsius
Under/Over Temperature Limit Count: 0/0
Vendor specific:
01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (287)

Index Estimated Time Temperature Celsius
288 2019-05-13 01:49 26 *******
... ..(364 skipped). .. *******
175 2019-05-13 07:54 26 *******
176 2019-05-13 07:55 ? -
177 2019-05-13 07:56 27 ********
178 2019-05-13 07:57 ? -
179 2019-05-13 07:58 27 ********
180 2019-05-13 07:59 26 *******
... ..( 13 skipped). .. *******
194 2019-05-13 08:13 26 *******
195 2019-05-13 08:14 27 ********
... ..( 35 skipped). .. ********
231 2019-05-13 08:50 27 ********
232 2019-05-13 08:51 26 *******
... ..( 54 skipped). .. *******
287 2019-05-13 09:46 26 *******

SCT Error Recovery Control:
Read: 70 (7.0 seconds)
Write: 70 (7.0 seconds)

Device Statistics (GP Log 0x04)
Page Offset Size Value Flags Description
0x01 ===== = = === == General Statistics (rev 2) ==
0x01 0x008 4 90 --- Lifetime Power-On Resets
0x01 0x010 4 20307 --- Power-on Hours
0x01 0x018 6 46761129643 --- Logical Sectors Written
0x01 0x020 6 385986052 --- Number of Write Commands
0x01 0x028 6 87563724494 --- Logical Sectors Read
0x01 0x030 6 434019481 --- Number of Read Commands
0x03 ===== = = === == Rotating Media Statistics (rev 1) ==
0x03 0x008 4 19935 --- Spindle Motor Power-on Hours
0x03 0x010 4 19918 --- Head Flying Hours
0x03 0x018 4 245 --- Head Load Events
0x03 0x020 4 0 --- Number of Reallocated Logical Sectors
0x03 0x028 4 329 --- Read Recovery Attempts
0x03 0x030 4 0 --- Number of Mechanical Start Failures
0x04 ===== = = === == General Errors Statistics (rev 1) ==
0x04 0x008 4 1 --- Number of Reported Uncorrectable Errors
0x04 0x010 4 0 --- Resets Between Cmd Acceptance and Completion
0x05 ===== = = === == Temperature Statistics (rev 1) ==
0x05 0x008 1 26 --- Current Temperature
0x05 0x010 1 25 --- Average Short Term Temperature
0x05 0x018 1 27 --- Average Long Term Temperature
0x05 0x020 1 40 --- Highest Temperature
0x05 0x028 1 16 --- Lowest Temperature
0x05 0x030 1 35 --- Highest Average Short Term Temperature
0x05 0x038 1 20 --- Lowest Average Short Term Temperature
0x05 0x040 1 29 --- Highest Average Long Term Temperature
0x05 0x048 1 22 --- Lowest Average Long Term Temperature
0x05 0x050 4 0 --- Time in Over-Temperature
0x05 0x058 1 60 --- Specified Maximum Operating Temperature
0x05 0x060 4 0 --- Time in Under-Temperature
0x05 0x068 1 0 --- Specified Minimum Operating Temperature
0x06 ===== = = === == Transport Statistics (rev 1) ==
0x06 0x008 4 821 --- Number of Hardware Resets
0x06 0x010 4 19295 --- Number of ASR Events
0x06 0x018 4 0 --- Number of Interface CRC Errors
|||_ C monitored condition met
||__ D supports DSN
|___ N normalized value

Pending Defects log (GP Log 0x0c) supported [please try: '-l defects']

SATA Phy Event Counters (GP Log 0x11)
ID Size Value Description
0x0001 2 0 Command failed due to ICRC error
0x0002 2 0 R_ERR response for data FIS
0x0003 2 0 R_ERR response for device-to-host data FIS
0x0004 2 0 R_ERR response for host-to-device data FIS
0x0005 2 0 R_ERR response for non-data FIS
0x0006 2 0 R_ERR response for device-to-host non-data FIS
0x0007 2 0 R_ERR response for host-to-device non-data FIS
0x0008 2 0 Device-to-host non-data FIS retries
0x0009 2 17 Transition from drive PhyRdy to drive PhyNRdy
0x000a 2 18 Device-to-host register FISes sent due to a COMRESET
0x000b 2 0 CRC errors within host-to-device FIS
0x000f 2 0 R_ERR response for host-to-device data FIS, CRC
0x0012 2 0 R_ERR response for host-to-device non-data FIS, CRC
0x8000 4 3914 Vendor specific

sretalla · May 13, 2019

It looks to me like you pulled the wrong drive for replacement.

Did you go by serial number or were you relying on the adaX label (which can change at rboot or drive changes)?

The disk at second position in the zpool status was the faulted one in the first output and it still was in the second, but you had replaced the third one.

-Adam- · May 13, 2019

I haven't done anything yet. ADA1 has to be replaced, but ADA2 was disconnected by the system and I don't know why:

: Device swap0: provider ada2p1 disconnected.
(ada2:ata3:0:1:0): Periph destroyed

Yorick · May 13, 2019

Hmm. Maybe time to look hard at layer 1.
Controller? Cables? Power?

Important Announcement for the TrueNAS Community.

Problem with WD REDS

-Adam-

Dabbler

-Adam-

Dabbler

-Adam-

Dabbler

sretalla

Powered by Neutrality

-Adam-

Dabbler

Yorick

Wizard

Similar threads