Hey
@cyberjock - Same exact problem, different drive (/dev/da3 as opposed to /dev/da11). Took your advice and did nothing but rebooted the box. Problem went away according to freenas on reboot but it gave me a notice in the GUI:
Code:
The volume vol1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
So I jumped on the CLI and saw that it brought the drive back online and resilvered 460K of data, but that /dev/da3 (faeb3d6d-ed62-11e4-a956-0cc47a31abcc) was showing 2 cksum errors:
Code:
[root@plexnas] ~# zpool status vol1
pool: vol1
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 460K in 0h0m with 0 errors on Mon Jun 1 08:47:31 2015
config:
NAME STATE READ WRITE CKSUM
vol1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/f46fb4ec-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/f69f4e21-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/f8cde372-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/faeb3d6d-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 2
gptid/fd087ff0-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/ff28300a-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/013d5491-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/0357b342-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/05811f51-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/079f5f22-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/09b81318-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/a82dda8c-ef5f-11e4-bb0a-0cc47a31abcc ONLINE 0 0 0
errors: No known data errors
[root@plexnas] ~#
Smartdrive was not showing any errors at all (same as before):
Code:
[root@plexnas] ~# smartctl -a /dev/da3
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p13 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: HGST Deskstar NAS
Device Model: HGST HDN724040ALE640
Serial Number: PK1334PCJZNLWX
LU WWN Device Id: 5 000cca 24cea1f81
Firmware Version: MJAOA5E0
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jun 1 09:08:14 2015 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 136 136 054 Pre-fail Offline - 80
3 Spin_Up_Time 0x0007 133 133 024 Pre-fail Always - 577 (Average 588)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 68
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 121 121 020 Pre-fail Offline - 34
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 1198
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 57
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 100
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 100
194 Temperature_Celsius 0x0002 200 200 000 Old_age Always - 30 (Min/Max 21/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1190 -
# 2 Short offline Completed without error 00% 1166 -
# 3 Short offline Completed without error 00% 1118 -
# 4 Short offline Completed without error 00% 1070 -
# 5 Short offline Completed without error 00% 1022 -
# 6 Short offline Completed without error 00% 974 -
# 7 Extended offline Completed without error 00% 961 -
# 8 Short offline Completed without error 00% 926 -
# 9 Short offline Completed without error 00% 878 -
#10 Short offline Completed without error 00% 830 -
#11 Short offline Completed without error 00% 783 -
#12 Short offline Completed without error 00% 734 -
#13 Short offline Completed without error 00% 686 -
#14 Short offline Completed without error 00% 638 -
#15 Extended offline Completed without error 00% 625 -
#16 Short offline Completed without error 00% 590 -
#17 Short offline Completed without error 00% 542 -
#18 Extended offline Completed without error 00% 117 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
[root@plexnas] ~#
So I cleared the error:
Code:
[root@plexnas] ~# zpool clear vol1 gptid/faeb3d6d-ed62-11e4-a956-0cc47a31abcc
And now everything looks good again:
Code:
[root@plexnas] ~# zpool status vol1
pool: vol1
state: ONLINE
scan: resilvered 460K in 0h0m with 0 errors on Mon Jun 1 08:47:31 2015
config:
NAME STATE READ WRITE CKSUM
vol1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/f46fb4ec-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/f69f4e21-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/f8cde372-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/faeb3d6d-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/fd087ff0-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/ff28300a-ed62-11e4-a956-0cc47a31abcc ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
gptid/013d5491-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/0357b342-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/05811f51-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/079f5f22-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/09b81318-ed63-11e4-a956-0cc47a31abcc ONLINE 0 0 0
gptid/a82dda8c-ef5f-11e4-bb0a-0cc47a31abcc ONLINE 0 0 0
errors: No known data errors
[root@plexnas] ~#
I did receive this in my email, I am assuming that it has something to do with the issue since it is specifically about the drive in question:
Code:
plexnas.rstechnical.com kernel log messages:
> (noperiph:mpr0:0:4294967295:0): SMID 1 Aborting command 0xffffff8000f82de8
> (da3:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 17 7f 1c 40 00 00 00 08 00 00 length 4096 SMID 722 terminated ioc 804b scsi 0 state c xfer 0
> (da3:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 17 7d 86 90 00 00 01 00 00 00 length 131072 SMID 416 terminated ioc 804b scsi 0 state c xfer 0
> (da3:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 17 7d 85 90 00 00 01 00 00 00 length 131072 SMID 436 terminated ioc 804b scsi 0 state c xfer 0
> (da3:mpr0:0:11:0): READ(16). CDB: 88 00 00 00 00 01 17 7d 84 90 00 00 01 00 00 00 length 131072 SMID 144 terminated ioc 804b scsi 0 state c xfer 0
> (da3:mpr0:0:11:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
> (da3:mpr0:0:11:0): CAM status: Command timeout
> (da3:mpr0:0:11:0): Retrying command
> (da3:mpr0:0:11:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
> (da3:mpr0:0:11:0): CAM status: SCSI Status Error
> (da3:mpr0:0:11:0): SCSI status: Check Condition
> (da3:mpr0:0:11:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da3:mpr0:0:11:0): Error 6, Retries exhausted
> (da3:mpr0:0:11:0): Invalidating pack
-- End of security output --
I also downloaded the Advanced => Debug info as you suggested and attached it to this post. Would be very interested in your take on what the problem might be.