mloiterman
Dabbler
- Joined
- Jan 30, 2013
- Messages
- 45
- motherboard make and model
- SuperMicro X11SSH-CTF
- Firmware Revision : 01.48
- Firmware Build Time : 06/22/2018
- BIOS Version: 2.2
- BIOS Build Time: 05/23/2018
- Redfish Version : 1.0.1
- SuperMicro X11SSH-CTF
- CPU make and model
- CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz (3600.18-MHz K8-class CPU)
- Origin="GenuineIntel" Id=0x506e3 Family=0x6 Model=0x5e Stepping=3
- CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz (3600.18-MHz K8-class CPU)
- RAM quantity
- 32 GiB
- Crucial 2-16GB DDR4-2400 EUDIMM 1.2V CL17
- 32 GiB
- boot drive
- Intel SSD 600p Series SSDPEKKW128G7X1 (128 GB, M.2 80mm PCIe NVMe 3.0 x4, 3D1, TLC)
- hard drives, quantity, model numbers, and RAID configuration
- 8 x ST4000LM024
- RAIDZ2
- hard disk controllers
- Avago Technologies (LSI) SAS3008
- Code:
Avago Technologies SAS3 Flash UtilityVersion 16.00.00.00 (2017.05.02) Copyright 2008-2017 Avago Technologies. All rights reserved. Adapter Selected is a Avago SAS: SAS3008(C0) Controller Number : 0 Controller : SAS3008(C0) PCI Address : 00:01:00:00 SAS Address : 5003048-0-1e04-6000 NVDATA Version (Default) : 0e.00.20.00 NVDATA Version (Persistent) : 0e.00.20.00 Firmware Product ID : 0x2221 (IT) Firmware Version : 15.00.03.00 NVDATA Vendor : LSI NVDATA Product ID : LSI3008-IT BIOS Version : 08.35.00.00 UEFI BSD Version : 17.00.00.00 FCODE Version : N/A Board Name : LSI3008-IT Board Assembly : N/A Board Tracer Number : N/A
- Avago Technologies (LSI) SAS3008
- network cards
- ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xd0200000-0xd03fffff,0xd0404000-0xd0407fff irq 16 at device 0.0 on pci4
- ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xd0000000-0xd01fffff,0xd0400000-0xd0403fff irq 17 at device 0.1 on pci4
- FreeNAS-11.2-RELEASE (Build Date: Dec 5, 2018 21:28)
On 12/26/18 I received this error:
Code:
New alerts: * The volume tank state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. * Device: /dev/da7 [SAT], failed to read SMART Attribute Data
This was in the kernel log:
Code:
(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 776 Aborting command 0xfffffe0001094b80mpr0: Sending reset from mprsas_send_abort for target ID 7(pass7:mpr0:0:7:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 626 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0mpr0: Unfreezing devq for target ID 7(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da7:mpr0:0:7:0): CAM status: Command timeout(da7:mpr0:0:7:0): Retrying command(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da7:mpr0:0:7:0): CAM status: SCSI Status Error(da7:mpr0:0:7:0): SCSI status: Check Condition(da7:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)(da7:mpr0:0:7:0): Error 6, Retries exhausted(da7:mpr0:0:7:0): Invalidating pack
I replaced the drive and resilvered without any errors. The was completed last night and I thought all was OK.
This morning, I received this error:
Code:
New alerts: * Device: /dev/da6 [SAT], failed to read SMART Attribute Data
This was in the kernel log:
Code:
pid 2232 (syslog-ng), uid 0: exited on signal 6 (core dumped)(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 449 Aborting command 0xfffffe0001077570mpr0: Sending reset from mprsas_send_abort for target ID 6(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 b0 00 00 00 80 00 00 length 65536 SMID 978 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 a8 00 00 00 80 00 00 length 65536 SMID 316 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 b0 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 28 00 00 00 80 00 00 length 65536 SMID 729 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an erroroginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 20 00 00 00 80 00 00 length 65536 SMID 930 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying commandoginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 a8 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 a0 00 00 00 80 00 00 length 65536 SMID 886 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an erroroginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 18 00 00 00 80 00 00 length 65536 SMID 802 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 28 00 00 00 80 00 00oginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 20 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 98 00 00 00 80 00 00 length 65536 SMID 558 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an erroroginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 10 00 00 00 80 00 00 length 65536 SMID 588 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 a0 00 00 00 80 00 00oginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 18 00 00 00 80 00 00(pass6:mpr0:0:6:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 480 te(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 98 00 00 00 80 00 00rminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 10 00 00 00 80 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 08 00 00 00 08 00 00 length 4096 SMID 778 terminated ioc 804b lo(da6:mpr0:0:6:0): CAM status: CCB request completed with an errorginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4e 08 00 00 01 00 00 00 length 131072 SMID 765 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 08 00 00 00 08 00 00loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4d 48 00 00 00 c0 00 00 length 98304 SMID 867 terminated ioc 804b l(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4e 08 00 00 01 00 00 00oginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4c 48 00 00 01 00 00 00 length 131072 SMID 905 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an errorloginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0: (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4b 48 00 00 01 00 00 00 length 131072 SMID 671 terminated ioc 804b 6:loginfo 31130000 scsi 0 state c xfer 00): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4a 48 00 00 01 00 00 00 length 131072 SMID 596 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4d 48 00 00 00 c0 00 00loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4c 48 00 00 01 00 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 49 48 00 00 01 00 00 00 length 131072 SMID 731 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an errorloginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 48 48 00 00 01 00 00 00 length 131072 SMID 293 terminated ioc 804b (da6:loginfo 31130000 scsi 0 state c xfer 0mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 47 48 00 00 01 00 00 00 length 131072 SMID 805 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4b 48 00 00 01 00 00 00loginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4a 48 00 00 01 00 00 00(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 46 48 00 00 01 00 00 00 length 131072 SMID 186 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an errorloginfo 31130000 scsi 0 state c xfer 0(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 58 ac 00 70 00 00 00 f8 00 00 length 126976 SMID 983 terminated ioc 804b (da6:loginfo 31130000 scsi 0 state c xfer 0mpr0: Unfreezing devq for target ID 6mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 49 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 48 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 47 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 46 48 00 00 01 00 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 58 ac 00 70 00 00 00 f8 00 00(da6:mpr0:0:6:0): CAM status: CCB request completed with an error(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da6:mpr0:0:6:0): CAM status: Command timeout(da6:mpr0:0:6:0): Retrying command(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00(da6:mpr0:0:6:0): CAM status: SCSI Status Error(da6:mpr0:0:6:0): SCSI status: Check Condition(da6:mpr0:0:6:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)(da6:mpr0:0:6:0): Error 6, Retries exhausted(da6:mpr0:0:6:0): Invalidating pack
I've already ordered a replacement drive which will be here on Thursday, just in case. But what is going on here? Is drive /dev/da6 dead due to stress of the resilver? Is it just a coincidence? Something else?
For reference:
Code:
root@marshall:~ # zpool status pool: freenas-boot state: ONLINE scan: scrub repaired 0 in 0 days 00:00:11 with 0 errors on Wed Dec 26 03:45:11 2018 config: NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 nvd0p2 ONLINE 0 0 0 errors: No known data errors pool: tank state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: resilvered 1.53T in 3 days 03:11:16 with 0 errors on Sun Dec 30 18:05:28 2018 config: NAME STATE READ WRITE CKSUM tank DEGRADED 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/30805cfa-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0 gptid/317926db-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0 gptid/32708da7-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0 gptid/8892b717-1998-11e7-96f0-0cc47ac56608 ONLINE 0 0 0 raidz2-1 DEGRADED 0 0 0 gptid/3478907c-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0 gptid/358a11bf-0044-11e7-95e6-0cc47ac56608 ONLINE 0 0 0 gptid/febc4ffd-2f92-11e7-8a04-0cc47ac56608 FAULTED 6 118 0 too many errors gptid/8d602633-0a19-11e9-be92-0cc47ac56608 ONLINE 0 0 0 errors: No known data errors
Code:
root@marshall:~ # smartctl -a /dev/da6 smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 2.5 5400 Device Model: ST4000LM024-2AN17V Serial Number: WCK0K9GW LU WWN Device Id: 5 000c50 0a8fcf1dc Firmware Version: 0001 User Capacity: 4,000,787,030,016 bytes [4.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5526 rpm Form Factor: 2.5 inches Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Mon Dec 31 11:23:17 2018 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 1) minutes. Extended self-test routine recommended polling time: ( 652) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x30a5) SCT Status supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 082 064 006 Pre-fail Always - 152689043 3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 27 5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 092 060 045 Pre-fail Always - 1540777129 9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 14572 (48 233 0) 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 27 183 SATA_Downshift_Count 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 076 070 040 Old_age Always - 24 (Min/Max 22/28) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 52 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 143 194 Temperature_Celsius 0x0022 024 040 000 Old_age Always - 24 (0 21 0 0 0) 195 Hardware_ECC_Recovered 0x001a 082 064 000 Old_age Always - 152689043 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 14571 (180 15 0) 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 15977107520 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 128477063110 254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 14569 - # 2 Short offline Completed without error 00% 14564 - # 3 Short offline Completed without error 00% 8156 - # 4 Extended offline Completed without error 00% 8002 - # 5 Short offline Completed without error 00% 7988 - # 6 Short offline Completed without error 00% 7916 - # 7 Short offline Completed without error 00% 7748 - # 8 Short offline Completed without error 00% 7580 - # 9 Short offline Completed without error 00% 7413 - #10 Extended offline Completed without error 00% 7258 - #11 Short offline Completed without error 00% 7245 - #12 Short offline Completed without error 00% 7077 - #13 Short offline Completed without error 00% 6909 - #14 Short offline Completed without error 00% 6742 - #15 Extended offline Completed without error 00% 6587 - #16 Short offline Completed without error 00% 6574 - #17 Short offline Completed without error 00% 6502 - #18 Short offline Completed without error 00% 6334 - #19 Short offline Completed without error 00% 6166 - #20 Short offline Completed without error 00% 5998 - #21 Extended offline Completed without error 00% 5843 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Last edited: