failed to read SMART Attribute Data

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
  • motherboard make and model
    • SuperMicro X11SSH-CTF
      • Firmware Revision : 01.48
      • Firmware Build Time : 06/22/2018
      • BIOS Version: 2.2
      • BIOS Build Time: 05/23/2018
      • Redfish Version : 1.0.1
  • CPU make and model
    • CPU: Intel(R) Xeon(R) CPU E3-1275 v5 @ 3.60GHz (3600.18-MHz K8-class CPU)
      • Origin="GenuineIntel" Id=0x506e3 Family=0x6 Model=0x5e Stepping=3
  • RAM quantity
    • 32 GiB
      • Crucial 2-16GB DDR4-2400 EUDIMM 1.2V CL17
  • boot drive
    • Intel SSD 600p Series SSDPEKKW128G7X1 (128 GB, M.2 80mm PCIe NVMe 3.0 x4, 3D1, TLC)
  • hard drives, quantity, model numbers, and RAID configuration
    • 8 x ST4000LM024
    • RAIDZ2
  • hard disk controllers
    • Avago Technologies (LSI) SAS3008
      • Code:
        Avago Technologies SAS3 Flash UtilityVersion 16.00.00.00 (2017.05.02)
        Copyright 2008-2017 Avago Technologies. All rights reserved.
        
            Adapter Selected is a Avago SAS: SAS3008(C0)
        
            Controller Number              : 0
            Controller                     : SAS3008(C0)
            PCI Address                    : 00:01:00:00
            SAS Address                    : 5003048-0-1e04-6000
            NVDATA Version (Default)       : 0e.00.20.00
            NVDATA Version (Persistent)    : 0e.00.20.00
            Firmware Product ID            : 0x2221 (IT)
            Firmware Version               : 15.00.03.00
            NVDATA Vendor                  : LSI
            NVDATA Product ID              : LSI3008-IT
            BIOS Version                   : 08.35.00.00
            UEFI BSD Version               : 17.00.00.00
            FCODE Version                  : N/A
            Board Name                     : LSI3008-IT
            Board Assembly                 : N/A
            Board Tracer Number            : N/A
        
  • network cards
    • ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xd0200000-0xd03fffff,0xd0404000-0xd0407fff irq 16 at device 0.0 on pci4
    • ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> mem 0xd0000000-0xd01fffff,0xd0400000-0xd0403fff irq 17 at device 0.1 on pci4
  • FreeNAS-11.2-RELEASE (Build Date: Dec 5, 2018 21:28)
On 12/21/18, I flashed the firmware for my LSI3008 from 12.00.02.00 IR to 15.00.03.00 IT in connection with upgrade to 11.2 from 11.1-U6. Everything appeared to go OK and I received no errors or problems.

On 12/26/18 I received this error:
Code:
New alerts:
* The volume tank state is DEGRADED: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state.
* Device: /dev/da7 [SAT], failed to read SMART Attribute Data


This was in the kernel log:
Code:
(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 776 Aborting command 0xfffffe0001094b80​
mpr0: Sending reset from mprsas_send_abort for target ID 7​
(pass7:mpr0:0:7:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 626 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0​
mpr0: Unfreezing devq for target ID 7​
(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00​
(da7:mpr0:0:7:0): CAM status: Command timeout​
(da7:mpr0:0:7:0): Retrying command​
(da7:mpr0:0:7:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00​
(da7:mpr0:0:7:0): CAM status: SCSI Status Error​
(da7:mpr0:0:7:0): SCSI status: Check Condition​
(da7:mpr0:0:7:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)​
(da7:mpr0:0:7:0): Error 6, Retries exhausted​
(da7:mpr0:0:7:0): Invalidating pack​


I replaced the drive and resilvered without any errors. The was completed last night and I thought all was OK.

This morning, I received this error:

Code:
New alerts:
* Device: /dev/da6 [SAT], failed to read SMART Attribute Data


This was in the kernel log:

Code:
pid 2232 (syslog-ng), uid 0: exited on signal 6 (core dumped)​
(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 449 Aborting command 0xfffffe0001077570​
mpr0: Sending reset from mprsas_send_abort for target ID 6​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 b0 00 00 00 80 00 00 length 65536 SMID 978 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 a8 00 00 00 80 00 00 length 65536 SMID 316 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 b0 00 00 00 80 00 00​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 28 00 00 00 80 00 00 length 65536 SMID 729 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 20 00 00 00 80 00 00 length 65536 SMID 930 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 a8 00 00 00 80 00 00​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 a0 00 00 00 80 00 00 length 65536 SMID 886 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 18 00 00 00 80 00 00 length 65536 SMID 802 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 52 28 00 00 00 80 00 00​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 51 20 00 00 00 80 00 00​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 98 00 00 00 80 00 00 length 65536 SMID 558 terminated ioc 804b l(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 10 00 00 00 80 00 00 length 65536 SMID 588 terminated ioc 804b l(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 a0 00 00 00 80 00 00​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 50 18 00 00 00 80 00 00​
(pass6:mpr0:0:6:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00 length 512 SMID 480 te(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 98 00 00 00 80 00 00​
rminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 10 00 00 00 80 00 00​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 08 00 00 00 08 00 00 length 4096 SMID 778 terminated ioc 804b lo(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
ginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4e 08 00 00 01 00 00 00 length 131072 SMID 765 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4f 08 00 00 00 08 00 00​
loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4d 48 00 00 00 c0 00 00 length 98304 SMID 867 terminated ioc 804b l(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4e 08 00 00 01 00 00 00​
oginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4c 48 00 00 01 00 00 00 length 131072 SMID 905 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0: (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4b 48 00 00 01 00 00 00 length 131072 SMID 671 terminated ioc 804b 6:loginfo 31130000 scsi 0 state c xfer 0​
0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4a 48 00 00 01 00 00 00 length 131072 SMID 596 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4d 48 00 00 00 c0 00 00​
loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4c 48 00 00 01 00 00 00​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 49 48 00 00 01 00 00 00 length 131072 SMID 731 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 48 48 00 00 01 00 00 00 length 131072 SMID 293 terminated ioc 804b (da6:loginfo 31130000 scsi 0 state c xfer 0​
mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 47 48 00 00 01 00 00 00 length 131072 SMID 805 terminated ioc 804b (da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4b 48 00 00 01 00 00 00​
loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 4a 48 00 00 01 00 00 00​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 46 48 00 00 01 00 00 00 length 131072 SMID 186 terminated ioc 804b (da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
loginfo 31130000 scsi 0 state c xfer 0​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 58 ac 00 70 00 00 00 f8 00 00 length 126976 SMID 983 terminated ioc 804b (da6:loginfo 31130000 scsi 0 state c xfer 0​
mpr0: Unfreezing devq for target ID 6​
mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 49 48 00 00 01 00 00 00​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 48 48 00 00 01 00 00 00​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 47 48 00 00 01 00 00 00​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 1c da 46 48 00 00 01 00 00 00​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): WRITE(16). CDB: 8a 00 00 00 00 01 58 ac 00 70 00 00 00 f8 00 00​
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00​
(da6:mpr0:0:6:0): CAM status: Command timeout​
(da6:mpr0:0:6:0): Retrying command​
(da6:mpr0:0:6:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00​
(da6:mpr0:0:6:0): CAM status: SCSI Status Error​
(da6:mpr0:0:6:0): SCSI status: Check Condition​
(da6:mpr0:0:6:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)​
(da6:mpr0:0:6:0): Error 6, Retries exhausted​
(da6:mpr0:0:6:0): Invalidating pack​


I've already ordered a replacement drive which will be here on Thursday, just in case. But what is going on here? Is drive /dev/da6 dead due to stress of the resilver? Is it just a coincidence? Something else?

For reference:

Code:
root@marshall:~ # zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:11 with 0 errors on Wed Dec 26 03:45:11 2018
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      nvd0p2    ONLINE       0     0     0

errors: No known data errors

  pool: tank
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
    Sufficient replicas exist for the pool to continue functioning in a
    degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
    repaired.
  scan: resilvered 1.53T in 3 days 03:11:16 with 0 errors on Sun Dec 30 18:05:28 2018
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            DEGRADED     0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/30805cfa-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/317926db-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/32708da7-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/8892b717-1998-11e7-96f0-0cc47ac56608  ONLINE       0     0     0
      raidz2-1                                      DEGRADED     0     0     0
        gptid/3478907c-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/358a11bf-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/febc4ffd-2f92-11e7-8a04-0cc47ac56608  FAULTED      6   118     0  too many errors
        gptid/8d602633-0a19-11e9-be92-0cc47ac56608  ONLINE       0     0     0

errors: No known data errors


Code:
root@marshall:~ # smartctl -a /dev/da6
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 2.5 5400
Device Model:     ST4000LM024-2AN17V
Serial Number:    WCK0K9GW
LU WWN Device Id: 5 000c50 0a8fcf1dc
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5526 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Mon Dec 31 11:23:17 2018 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 652) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30a5) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   064   006    Pre-fail  Always       -       152689043
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   092   060   045    Pre-fail  Always       -       1540777129
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       14572 (48 233 0)
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0                
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   076   070   040    Old_age   Always       -       24 (Min/Max 22/28)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       52
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       143
194 Temperature_Celsius     0x0022   024   040   000    Old_age   Always       -       24 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   082   064   000    Old_age   Always       -       152689043
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       14571 (180 15 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       15977107520
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       128477063110
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     14569         -
# 2  Short offline       Completed without error       00%     14564         -
# 3  Short offline       Completed without error       00%      8156         -
# 4  Extended offline    Completed without error       00%      8002         -
# 5  Short offline       Completed without error       00%      7988         -
# 6  Short offline       Completed without error       00%      7916         -
# 7  Short offline       Completed without error       00%      7748         -
# 8  Short offline       Completed without error       00%      7580         -
# 9  Short offline       Completed without error       00%      7413         -
#10  Extended offline    Completed without error       00%      7258         -
#11  Short offline       Completed without error       00%      7245         -
#12  Short offline       Completed without error       00%      7077         -
#13  Short offline       Completed without error       00%      6909         -
#14  Short offline       Completed without error       00%      6742         -
#15  Extended offline    Completed without error       00%      6587         -
#16  Short offline       Completed without error       00%      6574         -
#17  Short offline       Completed without error       00%      6502         -
#18  Short offline       Completed without error       00%      6334         -
#19  Short offline       Completed without error       00%      6166         -
#20  Short offline       Completed without error       00%      5998         -
#21  Extended offline    Completed without error       00%      5843         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Have you worked through this guide:

Hard Drive Troubleshooting Guide (All Versions of FreeNAS)
https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/

You have some write errors in your zpool status you might want to try doing a clear and run the scrub again. Also do a long test on that da6 drive. This could be some remnant of the previous fault or it might be a new issue. More testing is called for. The trouble with those 2.5 inch 4TB drives is that they use SMR recording and that puts a lot of extra wear on the drives. I wouldn't expect them to last as well as standard drives.
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
Have you worked through this guide:

Hard Drive Troubleshooting Guide (All Versions of FreeNAS)
https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/

You have some write errors in your zpool status you might want to try doing a clear and run the scrub again. Also do a long test on that da6 drive. This could be some remnant of the previous fault or it might be a new issue. More testing is called for. The trouble with those 2.5 inch 4TB drives is that they use SMR recording and that puts a lot of extra wear on the drives. I wouldn't expect them to last as well as standard drives.

I have worked through that guide and so far nothing sticks out. I'm running a long test now which will complete sometime later tonight. If that checks out OK, I guess I will do a clear, scrub, and then keep an eye on it??? In the mean time I will wait on my brand new drive.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
(da6:mpr0:0:6:0): CAM status: CCB request completed with an error (da6:mpr0:0:6:0): Retrying command
This does look like a mechanical fault in the drive.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I have worked through that guide and so far nothing sticks out. I'm running a long test now which will complete sometime later tonight. If that checks out OK, I guess I will do a clear, scrub, and then keep an eye on it??? In the mean time I will wait on my brand new drive.
Let us know how the test comes out.
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
Let us know how the test comes out.

Long test finished and it seems to have passed:

Code:
root@marshall:~ # smartctl -a /dev/da6
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 2.5 5400
Device Model:     ST4000LM024-2AN17V
Serial Number:    WCK0K9GW
LU WWN Device Id: 5 000c50 0a8fcf1dc
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5526 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Tue Jan  1 12:30:24 2019 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 241)    Self-test routine in progress...
                    10% of test remaining.
Total time to complete Offline
data collection:         (    0) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 652) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x30a5)    SCT Status supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   064   006    Pre-fail  Always       -       152689043
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   092   060   045    Pre-fail  Always       -       1546569245
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       14598 (219 140 0)
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   074   070   040    Old_age   Always       -       26 (Min/Max 22/29)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       52
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       143
194 Temperature_Celsius     0x0022   026   040   000    Old_age   Always       -       26 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   082   064   000    Old_age   Always       -       152689043
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       14596 (77 49 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       15977107520
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       128477063110
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 10%     14598         -
# 2  Extended offline    Completed without error       00%     14583         -
# 3  Short offline       Completed without error       00%     14569         -
# 4  Short offline       Completed without error       00%     14564         -
# 5  Short offline       Completed without error       00%      8156         -
# 6  Extended offline    Completed without error       00%      8002         -
# 7  Short offline       Completed without error       00%      7988         -
# 8  Short offline       Completed without error       00%      7916         -
# 9  Short offline       Completed without error       00%      7748         -
#10  Short offline       Completed without error       00%      7580         -
#11  Short offline       Completed without error       00%      7413         -
#12  Extended offline    Completed without error       00%      7258         -
#13  Short offline       Completed without error       00%      7245         -
#14  Short offline       Completed without error       00%      7077         -
#15  Short offline       Completed without error       00%      6909         -
#16  Short offline       Completed without error       00%      6742         -
#17  Extended offline    Completed without error       00%      6587         -
#18  Short offline       Completed without error       00%      6574         -
#19  Short offline       Completed without error       00%      6502         -
#20  Short offline       Completed without error       00%      6334         -
#21  Short offline       Completed without error       00%      6166         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Nothing new in kernel log.

Drive will be replaced irrespective of anything here, but not sure what to make of this.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Lets take a step back for a minute and re-evaluate what is going on before you make a huge mistake.

Lets me recap what I think is going on...
1. Your system was running perfectly fine on FreeNAS 11.1-U6 prior to your upgrade.
2. You upgraded your HBA to new firmware.
3. You upgraded FreeNAS to 11.2-RELEASE.

My first suggestion is to upgrade to FreeNAS 11.2-U1. While I'm not sure it will work it is the first thing I'd try with fingers crossed.

So if that didn't work, here are the steps I'd use to continue troubleshooting:
1. Remove the HBA card from your system and connect your hard drives directly to your eight SATA ports. I'm not sure if your NVM boot drive uses up a SATA port, if it does then I'd remove it and boot from a USB flash drive for the troubleshooting efforts. Remember, we need to figure out what is failing and I doubt it's your hard drives.
2. Once you have your HBA out and system running, run a scrub on your pool, make sure it is all clean.
3. If your system is now working fine and is stable (give it a few days or longer), you can suspect your HBA (firmware or cables, yes cables do go bad, many folks can speak from first hand experience).
4. If your system is still throwing the same errors, do you have a backup of your important data? If not, try to get that done then continue forward to the next step.
5. Run Memtest86 (3 full passes minimum, longer is always better) and a CPU stress test (2-3 hours) with all the hard drives still connected, this is to ensure your system is still stable. If you have failures then recheck all your electrical connections or you could have a bad power supply, maybe some other component was damaged. Power can be a tricky thing to fix if it's not the power supply at fault.

Report back what you have done and I wouldn't change any hard drives based on your SMART results, the drive you posted reported no physical errors. Something else you might want to do is run an Extended test on all of your drives since it doesn't appear that you are running any routine SMART tests, post the results of each drive so we may take a look to see if there is something slipping through the cracks. Also once we are done, setup a routine to run SMART tests, I run a daily short test and a once a week long test for example.

Hope this helps.
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
Lets take a step back for a minute and re-evaluate what is going on before you make a huge mistake.

Lets me recap what I think is going on...
1. Your system was running perfectly fine on FreeNAS 11.1-U6 prior to your upgrade.
2. You upgraded your HBA to new firmware.
3. You upgraded FreeNAS to 11.2-RELEASE.

No, it was more like:

1. System was running perfectly fine on FreeNAS 11.1-U6.
2. Upgraded FreeNAS to 11.2-RELEASE.
3. Upgraded HBA to new firmware AND flashed to IT version
4. Drive 1 failure (da7)
5. Resilver
6. Drive 2 failure (da6)

My first suggestion is to upgrade to FreeNAS 11.2-U1. While I'm not sure it will work it is the first thing I'd try with fingers crossed.

OK, shall I leave drives exactly as they are without doing zpool clear first?

So if that didn't work, here are the steps I'd use to continue troubleshooting:
1. Remove the HBA card from your system and connect your hard drives directly to your eight SATA ports. I'm not sure if your NVM boot drive uses up a SATA port, if it does then I'd remove it and boot from a USB flash drive for the troubleshooting efforts. Remember, we need to figure out what is failing and I doubt it's your hard drives.

I have no HBA card. It is part of my motherboard. How shall I proceed?

Something else you might want to do is run an Extended test on all of your drives since it doesn't appear that you are running any routine SMART tests, post the results of each drive so we may take a look to see if there is something slipping through the cracks. Also once we are done, setup a routine to run SMART tests, I run a daily short test and a once a week long test for example.

Drives WERE setup for weekly short SMART and monthly long SMART, but somewhere along the line, the drives were replaced and I never added them back to the SMART task. This has now been corrected and I believe a long test is scheduled to run tonight.

Main question is to run a zpool clear and observe now yet or not?

Also, worth noting, is that I have updated my HBA to the latest firmware provided by SuperMicro. There are newer versions of the firmware provided by LSI, but so far I have not elected to use those. Chris thought, in another thread, I should upgrade to the very latest version regardless of what SuperMicro provides, but I decided against it.

Thoughts on that from everyone?
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
I've upgraded to 11.2-U1 and after system came back up it started doing a resilver by itself. This was zpool status initially:

Code:
root@marshall:~ # zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:11 with 0 errors on Wed Dec 26 03:45:11 2018
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      nvd0p2    ONLINE       0     0     0

errors: No known data errors

  pool: tank
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Jan  1 22:22:10 2019
    6.22G scanned at 56.9M/s, 4.10G issued at 37.4M/s, 11.5T total
    175M resilvered, 0.03% done, 3 days 17:26:45 to go
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/30805cfa-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/317926db-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/32708da7-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/8892b717-1998-11e7-96f0-0cc47ac56608  ONLINE       0     0     0
      raidz2-1                                      ONLINE       0     0     0
        gptid/3478907c-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/358a11bf-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/febc4ffd-2f92-11e7-8a04-0cc47ac56608  ONLINE       0     0    10
        gptid/8d602633-0a19-11e9-be92-0cc47ac56608  ONLINE       0     0     0

errors: No known data errors



After a few minutes, I decided to check on status on this is result:

Code:
root@marshall:~ # zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:11 with 0 errors on Wed Dec 26 03:45:11 2018
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      nvd0p2    ONLINE       0     0     0

errors: No known data errors

  pool: tank
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 705M in 0 days 00:03:13 with 0 errors on Tue Jan  1 22:25:23 2019
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/30805cfa-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/317926db-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/32708da7-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/8892b717-1998-11e7-96f0-0cc47ac56608  ONLINE       0     0     0
      raidz2-1                                      ONLINE       0     0     0
        gptid/3478907c-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/358a11bf-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/febc4ffd-2f92-11e7-8a04-0cc47ac56608  ONLINE       0     0    10
        gptid/8d602633-0a19-11e9-be92-0cc47ac56608  ONLINE       0     0     0

errors: No known data errors


I'm running a long test on the suspect drive now and it will be done in about 11 hours.
 
Last edited:

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
Long test completed without error:

Code:
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 2.5 5400
Device Model:     ST4000LM024-2AN17V
Serial Number:    WCK0K9GW
LU WWN Device Id: 5 000c50 0a8fcf1dc
Firmware Version: 0001
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5526 rpm
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Wed Jan  2 14:41:48 2019 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 652) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30a5) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   082   064   006    Pre-fail  Always       -       169186874
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       27
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   092   060   045    Pre-fail  Always       -       1550080588
  9 Power_On_Hours          0x0032   084   084   000    Old_age   Always       -       14624 (99 95 0)
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       27
183 SATA_Downshift_Count    0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   073   070   040    Old_age   Always       -       27 (Min/Max 22/29)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       52
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       143
194 Temperature_Celsius     0x0022   027   040   000    Old_age   Always       -       27 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   082   064   000    Old_age   Always       -       169186874
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       14622 (213 1 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       15993481528
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       128477186933
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     14624         -
# 2  Extended offline    Completed without error       00%     14599         -
# 3  Extended offline    Completed without error       00%     14583         -
# 4  Short offline       Completed without error       00%     14569         -
# 5  Short offline       Completed without error       00%     14564         -
# 6  Short offline       Completed without error       00%      8156         -
# 7  Extended offline    Completed without error       00%      8002         -
# 8  Short offline       Completed without error       00%      7988         -
# 9  Short offline       Completed without error       00%      7916         -
#10  Short offline       Completed without error       00%      7748         -
#11  Short offline       Completed without error       00%      7580         -
#12  Short offline       Completed without error       00%      7413         -
#13  Extended offline    Completed without error       00%      7258         -
#14  Short offline       Completed without error       00%      7245         -
#15  Short offline       Completed without error       00%      7077         -
#16  Short offline       Completed without error       00%      6909         -
#17  Short offline       Completed without error       00%      6742         -
#18  Extended offline    Completed without error       00%      6587         -
#19  Short offline       Completed without error       00%      6574         -
#20  Short offline       Completed without error       00%      6502         -
#21  Short offline       Completed without error       00%      6334         -
                                        
SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Give 11.1-U1 some time, maybe it fixed the issue.

With respect to HBA firmware, you should know what you are doing before making any changes. You could put your system back to the firmware it was using before you made the changes, or listen to Chris, I think he give darn good advice more frequently than I do. And you could always roll it back if problems strike. But for now, let the system run and keep your fingers crossed. If the problem does not come back then I would not make any changes at all unless you have a clear reason to make the change.
 

mloiterman

Dabbler
Joined
Jan 30, 2013
Messages
45
Give 11.1-U1 some time, maybe it fixed the issue.

With respect to HBA firmware, you should know what you are doing before making any changes. You could put your system back to the firmware it was using before you made the changes, or listen to Chris, I think he give darn good advice more frequently than I do. And you could always roll it back if problems strike. But for now, let the system run and keep your fingers crossed. If the problem does not come back then I would not make any changes at all unless you have a clear reason to make the change.

after running the long test, I rebooted and now everything appears clear:

Code:
root@marshall:~ # zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:11 with 0 errors on Wed Dec 26 03:45:11 2018
config:

    NAME        STATE     READ WRITE CKSUM
    freenas-boot  ONLINE       0     0     0
      nvd0p2    ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: resilvered 705M in 0 days 00:03:13 with 0 errors on Tue Jan  1 22:25:23 2019
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank                                            ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/30805cfa-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/317926db-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/32708da7-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/8892b717-1998-11e7-96f0-0cc47ac56608  ONLINE       0     0     0
      raidz2-1                                      ONLINE       0     0     0
        gptid/3478907c-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/358a11bf-0044-11e7-95e6-0cc47ac56608  ONLINE       0     0     0
        gptid/febc4ffd-2f92-11e7-8a04-0cc47ac56608  ONLINE       0     0     0
        gptid/8d602633-0a19-11e9-be92-0cc47ac56608  ONLINE       0     0     0

errors: No known data errors


I also called SuperMicro and the technical support area confirmed that their recommendation is to ONLY use the HBA firmware that they distribute. He said that they do make changes to the firmware to ensure that it works well with their hardware. So, I am using the IT version of the latest SuperMicro firmware and will stay up to date with that channel.

I have a new drive arriving tomorrow and a new HBA cable, but I think you're right and I will not make any further changes unless there is a reason too.
 
Top