CAM / SCSI errors - help please!

Status
Not open for further replies.

Charles Frank

Dabbler
Joined
Dec 24, 2014
Messages
13
I've got a FreeNAS box built with a Supermicro X10SL7-F-O motherboard that's been flawless. I ran an update a few days ago as I hadn't patched it for some time and (this may be coincidence) a day or two later I started noticing the box being a little 'flaky'. Tonight I dumped some data to it and when I tried to check it - while I could see the shares fine on my Windows PC and could list the contents of the shares ok - the files themselves were unreadable. So I checked and I'm seeing a lot of CAM and SCSI status errors:

(da3:mps0:0:3:0): READ(10). CDB: 28 00 00 40 02 a0 00 00 e0 00 length 114688 SMID 193 command timeout cm 0xfffffe0000a8dd50 ccb 0xfffff8009f7be000
(noperiph:mps0:0:4294967295:0): SMID 2 Aborting command 0xfffffe0000a8dd50
mps0: Sending reset from mpssas_send_abort for target ID 3
mps0: Unfreezing devq for target ID 3
(da3:mps0:0:3:0): READ(10). CDB: 28 00 00 40 02 a0 00 00 e0 00
(da3:mps0:0:3:0): CAM status: command timeout
(da3:mps0:0:3:0): Retrying command
(da3:mps0:0:3:0): READ(10). CDB: 28 00 00 40 02 a0 00 00 e0 00
(da3:mps0:0:3:0): CAM status: SCSI Status Error
(da3:mps0:0:3:0): SCSI status: Check Condition
(da3:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da3:mps0:0:3:0): Retrying command (per sense data)

and it just starts looping through that sequence. I've reseated all the power and data cables both at the drive end and at the on-board LSI controller, to no avail. From the FreeNAS UI itself I had no warnings, the drives reported they were OK, scrubs showed nothing. Assuming the update had borked something I did something very silly (don't hate me, my knowledge of FreeNAS/BSD is very limited) I decided to do a clean reinstall of FreeNAS 9.10 (my existing setup had been upgraded to 9.10 from 9.3) and import my volumes and settings. This is where things went from bad to worse... The clean 9.10 install couldn't import my volume - it could see it, but when it tried to import it, the above error just started cycling through. Now with my (VERY) limited knowledge I'm guessing one of the drives has died - but my understanding for ZFS was (and correct me if I'm wrong) it would warn you and the RAID array would carry on, assuming you set it up with redundant drives (I am using 6x3Tb Hitachi NAS drives set up with RAID-Z2).

Any help here would be MUCH appreciated - I've got about 10Tb on my server and while I have a backup of the data it would take me a considerable amount of time to copy it all back over if I had to do a complete rebuild from scratch.

Cheers,

Charles.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Ok, let's see the output of:
  • zpool status
  • smartctl -a /dev/da3
  • sas2flash -listall
Code tags throughout, please, it'll make everyone's life easier.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630

Charles Frank

Dabbler
Joined
Dec 24, 2014
Messages
13
Here are the outputs requested:

zpool status:

Code:
  pool: freenas-boot
state: ONLINE
  scan: none requested
config:

   NAME  STATE  READ WRITE CKSUM
   freenas-boot  ONLINE  0  0  0
    mirror-0  ONLINE  0  0  0
    gptid/2df18d19-4f7c-11e6-a7c7-0cc47a3001e8  ONLINE  0  0  0
    da8p2  ONLINE  0  0  0

errors: No known data errors

  pool: volume0
state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Jun 26 00:00:10 2016
config:

   NAME  STATE  READ WRITE CKSUM
   volume0  ONLINE  0  0  0
    gptid/b8cd60ed-983d-11e4-b676-0cc47a3001e8  ONLINE  0  0  0

errors: No known data errors

  pool: volume1
state: ONLINE
  scan: resilvered 148K in 0h0m with 0 errors on Thu Jul 21 21:51:41 2016
config:

   NAME  STATE  READ WRITE CKSUM
   volume1  ONLINE  0  0  0
    raidz2-0  ONLINE  0  0  0
    gptid/e155fa41-9841-11e4-b676-0cc47a3001e8  ONLINE  0  0  0
    gptid/e39bc97d-9841-11e4-b676-0cc47a3001e8  ONLINE  0  0  0
    gptid/e5ff8a2b-9841-11e4-b676-0cc47a3001e8  ONLINE  0  0  0
    gptid/e89b2d25-9841-11e4-b676-0cc47a3001e8  ONLINE  0  0  0
    gptid/eb223c63-9841-11e4-b676-0cc47a3001e8  ONLINE  0  0  0
    gptid/eda32740-9841-11e4-b676-0cc47a3001e8  ONLINE  0  0  0

errors: No known data errors



smartctl -a output:

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Hitachi Ultrastar 7K3000
Device Model:  Hitachi HUA723030ALA640
Serial Number:  MK0331YHG61MBA
LU WWN Device Id: 5 000cca 225c2c053
Firmware Version: MKAOA5C0
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon Jul 25 10:41:06 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84)   Offline data collection activity
           was suspended by an interrupting command from host.
           Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (29271) seconds.
Offline data collection
capabilities:         (0x5b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           No Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  1) minutes.
Extended self-test routine
recommended polling time:     ( 488) minutes.
SCT capabilities:     (0x003d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x000b  074  074  016  Pre-fail  Always  -  180879365
  2 Throughput_Performance  0x0005  131  131  054  Pre-fail  Offline  -  97
  3 Spin_Up_Time  0x0007  125  125  024  Pre-fail  Always  -  617 (Average 618)
  4 Start_Stop_Count  0x0012  100  100  000  Old_age  Always  -  66
  5 Reallocated_Sector_Ct  0x0033  001  001  005  Pre-fail  Always  FAILING_NOW 2005
  7 Seek_Error_Rate  0x000b  100  100  067  Pre-fail  Always  -  0
  8 Seek_Time_Performance  0x0005  133  133  020  Pre-fail  Offline  -  27
  9 Power_On_Hours  0x0012  099  099  000  Old_age  Always  -  12728
10 Spin_Retry_Count  0x0013  100  100  060  Pre-fail  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  000  Old_age  Always  -  66
192 Power-Off_Retract_Count 0x0032  100  100  000  Old_age  Always  -  360
193 Load_Cycle_Count  0x0012  100  100  000  Old_age  Always  -  360
194 Temperature_Celsius  0x0002  142  142  000  Old_age  Always  -  42 (Min/Max 17/46)
196 Reallocated_Event_Count 0x0032  001  001  000  Old_age  Always  -  2389
197 Current_Pending_Sector  0x0022  100  100  000  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0008  100  100  000  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x000a  200  200  000  Old_age  Always  -  0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline  Completed without error  00%  4189  -
# 2  Short offline  Completed without error  00%  4103  -
# 3  Short offline  Completed without error  00%  4039  -
# 4  Short offline  Completed without error  00%  3872  -
# 5  Short offline  Completed without error  00%  3703  -
# 6  Short offline  Completed without error  00%  3535  -
# 7  Short offline  Completed without error  00%  3391  -
# 8  Short offline  Completed without error  00%  3295  -
# 9  Short offline  Completed without error  00%  3133  -
#10  Short offline  Completed without error  00%  2974  -
#11  Short offline  Completed without error  00%  2806  -
#12  Short offline  Completed without error  00%  2662  -
#13  Short offline  Completed without error  00%  2590  -
#14  Short offline  Completed without error  00%  2439  -
#15  Short offline  Completed without error  00%  2280  -
#16  Short offline  Completed without error  00%  2112  -
#17  Short offline  Completed without error  00%  1970  -
#18  Short offline  Completed without error  00%  1875  -
#19  Short offline  Completed without error  00%  1707  -
#20  Short offline  Completed without error  00%  1539  -
#21  Short offline  Completed without error  00%  26  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



smartctl -x output:

Code:
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:  Hitachi Ultrastar 7K3000
Device Model:  Hitachi HUA723030ALA640
Serial Number:  MK0331YHG61MBA
LU WWN Device Id: 5 000cca 225c2c053
Firmware Version: MKAOA5C0
User Capacity:  3,000,592,982,016 bytes [3.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Mon Jul 25 10:41:20 2016 BST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:  Unavailable
APM feature is:  Disabled
Rd look-ahead is: Enabled
Write cache is:  Disabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x84)   Offline data collection activity
           was suspended by an interrupting command from host.
           Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0)   The previous self-test routine completed
           without error or no self-test has ever
           been run.
Total time to complete Offline
data collection:      (29271) seconds.
Offline data collection
capabilities:         (0x5b) SMART execute Offline immediate.
           Auto Offline data collection on/off support.
           Suspend Offline collection upon new
           command.
           Offline surface scan supported.
           Self-test supported.
           No Conveyance Self-test supported.
           Selective Self-test supported.
SMART capabilities:  (0x0003)   Saves SMART data before entering
           power-saving mode.
           Supports SMART auto save timer.
Error logging capability:  (0x01)   Error logging supported.
           General Purpose Logging supported.
Short self-test routine
recommended polling time:     (  1) minutes.
Extended self-test routine
recommended polling time:     ( 488) minutes.
SCT capabilities:     (0x003d)   SCT Status supported.
           SCT Error Recovery Control supported.
           SCT Feature Control supported.
           SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAGS  VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate  PO-R--  074  074  016  -  180879365
  2 Throughput_Performance  P-S---  131  131  054  -  97
  3 Spin_Up_Time  POS---  125  125  024  -  617 (Average 618)
  4 Start_Stop_Count  -O--C-  100  100  000  -  66
  5 Reallocated_Sector_Ct  PO--CK  001  001  005  NOW  2005
  7 Seek_Error_Rate  PO-R--  100  100  067  -  0
  8 Seek_Time_Performance  P-S---  133  133  020  -  27
  9 Power_On_Hours  -O--C-  099  099  000  -  12728
10 Spin_Retry_Count  PO--C-  100  100  060  -  0
12 Power_Cycle_Count  -O--CK  100  100  000  -  66
192 Power-Off_Retract_Count -O--CK  100  100  000  -  360
193 Load_Cycle_Count  -O--C-  100  100  000  -  360
194 Temperature_Celsius  -O----  142  142  000  -  42 (Min/Max 17/46)
196 Reallocated_Event_Count -O--CK  001  001  000  -  2389
197 Current_Pending_Sector  -O---K  100  100  000  -  0
198 Offline_Uncorrectable  ---R--  100  100  000  -  0
199 UDMA_CRC_Error_Count  -O-R--  200  200  000  -  0
  ||||||_ K auto-keep
  |||||__ C event count
  ||||___ R error rate
  |||____ S speed/performance
  ||_____ O updated online
  |______ P prefailure warning

General Purpose Log Directory Version 1
SMART  Log Directory Version 1 [multi-sector log support]
Address  Access  R/W  Size  Description
0x00  GPL,SL  R/O  1  Log Directory
0x01  SL  R/O  1  Summary SMART error log
0x03  GPL  R/O  1  Ext. Comprehensive SMART error log
0x04  GPL  R/O  7  Device Statistics log
0x06  SL  R/O  1  SMART self-test log
0x07  GPL  R/O  1  Extended self-test log
0x08  GPL  R/O  1  Power Conditions log
0x09  SL  R/W  1  Selective self-test log
0x10  GPL  R/O  1  SATA NCQ Queued Error log
0x11  GPL  R/O  1  SATA Phy Event Counters log
0x20  GPL  R/O  1  Streaming performance log [OBS-8]
0x21  GPL  R/O  1  Write stream error log
0x22  GPL  R/O  1  Read stream error log
0x80-0x9f  GPL,SL  R/W  16  Host vendor specific log
0xe0  GPL,SL  R/W  1  SCT Command/Status
0xe1  GPL,SL  R/W  1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description  Status  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline  Completed without error  00%  4189  -
# 2  Short offline  Completed without error  00%  4103  -
# 3  Short offline  Completed without error  00%  4039  -
# 4  Short offline  Completed without error  00%  3872  -
# 5  Short offline  Completed without error  00%  3703  -
# 6  Short offline  Completed without error  00%  3535  -
# 7  Short offline  Completed without error  00%  3391  -
# 8  Short offline  Completed without error  00%  3295  -
# 9  Short offline  Completed without error  00%  3133  -
#10  Short offline  Completed without error  00%  2974  -
#11  Short offline  Completed without error  00%  2806  -
#12  Short offline  Completed without error  00%  2662  -
#13  Short offline  Completed without error  00%  2590  -
#14  Short offline  Completed without error  00%  2439  -
#15  Short offline  Completed without error  00%  2280  -
#16  Short offline  Completed without error  00%  2112  -
#17  Short offline  Completed without error  00%  1970  -
#18  Short offline  Completed without error  00%  1875  -
#19  Short offline  Completed without error  00%  1707  -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
  1  0  0  Not_testing
  2  0  0  Not_testing
  3  0  0  Not_testing
  4  0  0  Not_testing
  5  0  0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:  3
SCT Version (vendor specific):  256 (0x0100)
SCT Support Level:  1
Device State:  SMART Off-line Data Collection executing in background (4)
Current Temperature:  42 Celsius
Power Cycle Min/Max Temperature:  25/43 Celsius
Lifetime  Min/Max Temperature:  17/46 Celsius
Under/Over Temperature Limit Count:  0/0

SCT Temperature History Version:  2
Temperature Sampling Period:  1 minute
Temperature Logging Interval:  1 minute
Min/Max recommended Temperature:  0/60 Celsius
Min/Max Temperature Limit:  -40/70 Celsius
Temperature History Size (Index):  128 (64)

Index  Estimated Time  Temperature Celsius
  65  2016-07-25 08:34  28  *********
  66  2016-07-25 08:35  29  **********
  67  2016-07-25 08:36  29  **********
  68  2016-07-25 08:37  30  ***********
  69  2016-07-25 08:38  30  ***********
  70  2016-07-25 08:39  31  ************
  71  2016-07-25 08:40  31  ************
  72  2016-07-25 08:41  32  *************
  73  2016-07-25 08:42  32  *************
  74  2016-07-25 08:43  33  **************
  75  2016-07-25 08:44  33  **************
  76  2016-07-25 08:45  34  ***************
...  ..(  3 skipped).  ..  ***************
  80  2016-07-25 08:49  34  ***************
  81  2016-07-25 08:50  35  ****************
  82  2016-07-25 08:51  35  ****************
  83  2016-07-25 08:52  36  *****************
...  ..(  2 skipped).  ..  *****************
  86  2016-07-25 08:55  36  *****************
  87  2016-07-25 08:56  37  ******************
...  ..(  4 skipped).  ..  ******************
  92  2016-07-25 09:01  37  ******************
  93  2016-07-25 09:02  38  *******************
...  ..(  2 skipped).  ..  *******************
  96  2016-07-25 09:05  38  *******************
  97  2016-07-25 09:06  39  ********************
...  ..( 16 skipped).  ..  ********************
114  2016-07-25 09:23  39  ********************
115  2016-07-25 09:24  40  *********************
...  ..(  3 skipped).  ..  *********************
119  2016-07-25 09:28  40  *********************
120  2016-07-25 09:29  41  **********************
...  ..( 11 skipped).  ..  **********************
  4  2016-07-25 09:41  41  **********************
  5  2016-07-25 09:42  42  ***********************
...  ..( 13 skipped).  ..  ***********************
  19  2016-07-25 09:56  42  ***********************
  20  2016-07-25 09:57  41  **********************
  21  2016-07-25 09:58  42  ***********************
  22  2016-07-25 09:59  42  ***********************
  23  2016-07-25 10:00  41  **********************
  24  2016-07-25 10:01  41  **********************
  25  2016-07-25 10:02  42  ***********************
...  ..(  6 skipped).  ..  ***********************
  32  2016-07-25 10:09  42  ***********************
  33  2016-07-25 10:10  43  ************************
  34  2016-07-25 10:11  42  ***********************
...  ..(  9 skipped).  ..  ***********************
  44  2016-07-25 10:21  42  ***********************
  45  2016-07-25 10:22  41  **********************
  46  2016-07-25 10:23  42  ***********************
  47  2016-07-25 10:24  42  ***********************
  48  2016-07-25 10:25  41  **********************
  49  2016-07-25 10:26  42  ***********************
...  ..( 14 skipped).  ..  ***********************
  64  2016-07-25 10:41  42  ***********************

SCT Error Recovery Control:
  Read: Disabled
  Write: Disabled

Device Statistics (GP Log 0x04)
Page  Offset Size  Value Flags Description
0x01  =====  =  =  ===  == General Statistics (rev 1) ==
0x01  0x008  4  66  ---  Lifetime Power-On Resets
0x01  0x010  4  12728  ---  Power-on Hours
0x01  0x018  6  7129801523  ---  Logical Sectors Written
0x01  0x020  6  40590812  ---  Number of Write Commands
0x01  0x028  6  1728126918179  ---  Logical Sectors Read
0x01  0x030  6  578608314  ---  Number of Read Commands
0x03  =====  =  =  ===  == Rotating Media Statistics (rev 1) ==
0x03  0x008  4  12725  ---  Spindle Motor Power-on Hours
0x03  0x010  4  12725  ---  Head Flying Hours
0x03  0x018  4  360  ---  Head Load Events
0x03  0x020  4  2005  ---  Number of Reallocated Logical Sectors
0x03  0x028  4  2050759  ---  Read Recovery Attempts
0x03  0x030  4  0  ---  Number of Mechanical Start Failures
0x04  =====  =  =  ===  == General Errors Statistics (rev 1) ==
0x04  0x008  4  0  ---  Number of Reported Uncorrectable Errors
0x04  0x010  4  86  ---  Resets Between Cmd Acceptance and Completion
0x05  =====  =  =  ===  == Temperature Statistics (rev 1) ==
0x05  0x008  1  42  ---  Current Temperature
0x05  0x010  1  42  N--  Average Short Term Temperature
0x05  0x018  1  41  N--  Average Long Term Temperature
0x05  0x020  1  46  ---  Highest Temperature
0x05  0x028  1  17  ---  Lowest Temperature
0x05  0x030  1  43  N--  Highest Average Short Term Temperature
0x05  0x038  1  22  N--  Lowest Average Short Term Temperature
0x05  0x040  1  41  N--  Highest Average Long Term Temperature
0x05  0x048  1  25  N--  Lowest Average Long Term Temperature
0x05  0x050  4  0  ---  Time in Over-Temperature
0x05  0x058  1  60  ---  Specified Maximum Operating Temperature
0x05  0x060  4  0  ---  Time in Under-Temperature
0x05  0x068  1  0  ---  Specified Minimum Operating Temperature
0x06  =====  =  =  ===  == Transport Statistics (rev 1) ==
0x06  0x008  4  743  ---  Number of Hardware Resets
0x06  0x010  4  339  ---  Number of ASR Events
0x06  0x018  4  0  ---  Number of Interface CRC Errors
  |||_ C monitored condition met
  ||__ D supports DSN
  |___ N normalized value

SATA Phy Event Counters (GP Log 0x11)
ID  Size  Value  Description
0x0001  2  0  Command failed due to ICRC error
0x0002  2  0  R_ERR response for data FIS
0x0003  2  0  R_ERR response for device-to-host data FIS
0x0004  2  0  R_ERR response for host-to-device data FIS
0x0005  2  0  R_ERR response for non-data FIS
0x0006  2  0  R_ERR response for device-to-host non-data FIS
0x0007  2  0  R_ERR response for host-to-device non-data FIS
0x0009  2  11  Transition from drive PhyRdy to drive PhyNRdy
0x000a  2  12  Device-to-host register FISes sent due to a COMRESET
0x000b  2  0  CRC errors within host-to-device FIS
0x000d  2  0  Non-CRC errors within host-to-device FIS



and finally SAS2Flash:

Code:
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

  Adapter Selected is a LSI SAS: SAS2308_1(D1)

Num  Ctlr  FW Ver  NVDATA  x86-BIOS  PCI Addr
----------------------------------------------------------------------------

0  SAS2308_1(D1)  20.00.04.00  14.01.30.16  07.39.00.00  00:02:00:00

  Finished Processing Commands Successfully.
  Exiting SAS2Flash.




Thanks for looking at this guys - I can't make heads nor tails of most of it....though looking at the output of the smartctl command it looks like da3 is on the verge of failing? Even though everything else says the pool is fine?

Charles.
 
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Code:
  5 Reallocated_Sector_Ct  0x0033  001  001  005  Pre-fail  Always  FAILING_NOW 2005
The drive is going bad. Time to replace it.

I also note you have not ever run a long SMART test on the drive.
 
Status
Not open for further replies.
Top