Array Degraded - Help

LIGISTX · Feb 7, 2019

I got a message a few days ago that appeared to be a possible loose cable issue. I got this email:

Code:

.local kernel log messages:
>       (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 cd 91 6f e8 00 00 00 08 00 00 length 4096 SMID 761 terminated ioc 804b loginfo 31110d01 scsi 0 state c xfer 4096
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 cd 91 6f e8 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: CCB request completed with an error
> (da9:mps0:0:16:0): Retrying command
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 cd 91 6f e8 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: SCSI Status Error
> (da9:mps0:0:16:0): SCSI status: Check Condition
> (da9:mps0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps0:0:16:0): Retrying command (per sense data)

-- End of security output --

The above was on 11.1U6

Two nights ago I decided to update to 11.1U7 and that night a few hours after, I got a slew of emails, not wanting to deal with it yesterday before work, I shut FreeNAS down and shut down ESXi and the server itself.

Replugged all the cables, made sure everything was fine, brought everything back up and it seemed to be fine. I believe it had to resilver a driver, but not the entire drive itself as it seemed to be very quick. I checked zpool status and it seemed fine.

This morning I woke up to more emails. I am not really sure what to check or what is useful here, but once again it is degraded, and looking at the security run output from the night I upgraded to 11.1U7 I have similar issues as previously stated at the very bottom of the emailed output:

Code:

> epair0a: promiscuous mode enabled
>       (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6c a0 00 00 00 10 00 00 length 8192 SMID 342 terminated ioc 804b loginfo 31110d01 scsi 0 state c xfer 0
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6c a0 00 00 00 10 00 00
> (da9:mps0:0:16:0): CAM status: CCB request completed with an error
> (da9:mps0:0:16:0): Retrying command
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6c a0 00 00 00 10 00 00
> (da9:mps0:0:16:0): CAM status: SCSI Status Error
> (da9:mps0:0:16:0): SCSI status: Check Condition
> (da9:mps0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps0:0:16:0): Retrying command (per sense data)
>       (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6d a8 00 00 00 08 00 00 length 4096 SMID 178 terminated ioc 804b loginfo 31110d01 scsi 0 state c xfer 0
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6d a8 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: CCB request completed with an error
> (da9:mps0:0:16:0): Retrying command
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6d a8 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: SCSI Status Error
> (da9:mps0:0:16:0): SCSI status: Check Condition
> (da9:mps0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps0:0:16:0): Retrying command (per sense data)
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 6d a8 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: SCSI Status Error
> (da9:mps0:0:16:0): SCSI status: Check Condition
> (da9:mps0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps0:0:16:0): Retrying command (per sense data)
>       (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 aa 38 00 00 00 08 00 00 length 4096 SMID 762 terminated ioc 804b loginfo 31110d01 scsi 0 state c xfer 0
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 aa 38 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: CCB request completed with an error
> (da9:mps0:0:16:0): Retrying command
> (da9:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 94 a2 aa 38 00 00 00 08 00 00
> (da9:mps0:0:16:0): CAM status: SCSI Status Error
> (da9:mps0:0:16:0): SCSI status: Check Condition
> (da9:mps0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps0:0:16:0): Retrying command (per sense data)
>       (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 0a 70 7b e8 00 00 20 00 length 16384 SMID 1021 terminated ioc 804b loginfo 31110d01 scsi 0 state c xfer 0
> (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 0a 70 7b e8 00 00 20 00
> (da9:mps0:0:16:0): CAM status: CCB request completed with an error
> (da9:mps0:0:16:0): Retrying command
> (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 0a 70 7b e8 00 00 20 00
> (da9:mps0:0:16:0): CAM status: SCSI Status Error
> (da9:mps0:0:16:0): SCSI status: Check Condition
> (da9:mps0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
> (da9:mps0:0:16:0): Retrying command (per sense data)
> mps0: mpssas_prepare_remove: Sending reset for target ID 16
> da9 at mps0 bus 0 scbus33 target 16 lun 0
> da9: <ATA WDC WD40EFRX-68N 0A82> s/n WD-WCC7K2YF4UTL detached
>       (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 0a 70 7b e8 00 00 20 00 length 16384 SMID 599 terminated ioc 804b loginfo 31110d01 scsi 0 state c xfer 0
> (da9:mps0:0:16:0): WRITE(10). CDB: 2a 00 0a 70 7b e8 00 00 20 00
> mps0: Unfreezing devq for target ID 16
> (da9:mps0:0:16:0): CAM status: CCB request completed with an error
> (da9:mps0:0:16:0): Error 5, Periph was invalidated
> GEOM_MIRROR: Device swap0: provider da9p1 disconnected.
> (da9:mps0:0:16:0): Periph destroyed

-- End of security output --

Today, I woke up to these after it started to scrub the pool as planned last night:

Code:

.local kernel log messages:
> uhub0: 8 ports with 8 removable, self powered
> ugen0.2: <VMware VMware Virtual USB Mouse> at usbus0
> da2 at mps0 bus 0 scbus33 target 9 lun 0
> da2: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
> da2: Serial Number WD-WCC7K7XF3TTF
> da2: 600.000MB/s transfers
> da2: Command Queueing enabled
> da2: 3815447MB (7814037168 512 byte sectors)
> da2: quirks=0x8<4K>
> da4 at mps0 bus 0 scbus33 target 11 lun 0
> da4: <ATA WDC WD40EFRX-68N 0A82> Fixed Direct Access SPC-4 SCSI device
> da4: Serial Number WD-WCC7K7PHXACK
> da4: 600.000MB/s transfers
> da4: Command Queueing enabled
> da4: 3815447MB (7814037168 512 byte sectors)
> da4: quirks=0x8<4K>

-- End of security output --

Code:

Checking status of zfs pools:
NAME           SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
freenas-boot  15.9G  2.27G  13.6G         -      -    14%  1.00x  ONLINE  -
xxx      36.2T  20.4T  15.9T         -     4%    56%  1.00x  DEGRADED  /mnt

  pool: xxx
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub in progress since Thu Feb  7 00:00:07 2019
        12.5T scanned at 1.18G/s, 11.4T issued at 1.08G/s, 20.4T total
        1.27M repaired, 56.16% done, 0 days 02:21:11 to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        xxxxxxxx                                        DEGRADED     0     0     0
          raidz2-0                                      DEGRADED     0     0     0
            gptid/ab0351e8-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/abbfceac-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ac8d872a-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ad4a2436-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/ae0d7e64-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/aeca106f-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/af89686d-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee  DEGRADED     0     0   373  too many errors  (repairing)
            gptid/b10b6452-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0
            gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

errors: No known data errors

-- End of daily output --

Running zpool status right now it looks like it is 92% repaired. So do I have a drive issue? Is a controller getting a bit confused (it looks like in one of my logs, da2 and da4 were listed, but I don't see that log showing any signs of an error, I don't know what that log is telling me tbh), or is something just not happy and this scrub/repair its currently doing may help?

I have been running this box for almost 1.5 years now with very few issues, nothing even close to "o shit, my daya!", and have been under ESXi now for many months without issue. Whatever this is seems to have started a week ago with the initial log message of retrying a command on da9, and now I have whatever issue this may be. I have restarted since that da9 retry issue, so I am not sure if da9 is still the same drive or not; I should have noted down its serial number. Right now, I know da9 is serial ending in F4UTL although I am not sure I see that in any of the above logs, nor am I sure how to check what may be wrong.

I figure I will let this thing finish doing its scrub, see what sort of resolution it comes to all by itself (I know FreeNAS is pretty good about not destroying data, so that is my reasoning to let it finish). All other ESXi VM's seem to be fine, thus I don't think its a low level hardware issue causing data errors, although no other VM would be as sensitive as FreeNAS, I am really only running a couple ubuntu VM's doing very simple things as this is just a little homelab playground; and I am not very advanced, so its a pretty boring playground.

Any help would be great!

LIGISTX · Feb 7, 2019

I figured I would wait till it finished since it was close, scrub finished and I am let with this:

pool: xxx
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 11.5M in 0 days 05:58:36 with 0 errors on Thu Feb 7 05:58:43 2019
config:

NAME STATE READ WRITE CKSUM
xxxxxxx DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/ab0351e8-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/abbfceac-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/ac8d872a-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/ad4a2436-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/ae0d7e64-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/aeca106f-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/af89686d-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee DEGRADED 0 0 3.03K too many errors
gptid/b10b6452-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0
gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee ONLINE 0 0 0

errors: No known data errors

Whats next..?

Chris Moore · Feb 7, 2019

LIGISTX said:
Running zpool status right now it looks like it is 92% repaired. So do I have a drive issue?

Based on the errors, I would say this is a failed drive. I have had about a half dozen of these WD40EFRX (5400 RPM Red) drives fail in servers at work over the last three years and that is not an uncommon way for them to go. I just pulled two, two different servers, on Tuesday to force a rebuild on the hot-spare.
If you have a spare ready, I think it is time to swap it out. Probably a good idea to keep a spare. I usually have a couple for my system at home and we keep three or four of each size for the systems at work.

Jailer · Feb 7, 2019

Find the disk in question and check the SMART output.

LIGISTX · Feb 7, 2019

Jailer said:
Find the disk in question and check the SMART output.

I have a scheduled smart test tonight or tomorrow so I’ll see what it says.

LIGISTX · Feb 10, 2019

Ok. Well, this was not very insightful... now what?

Mannekino · Feb 10, 2019

How did you generate a report like that?

LIGISTX · Feb 10, 2019

Mannekino said:
How did you generate a report like that?

This is the script I use. I like the format and it seems to work fine for my needs.

https://github.com/edgarsuit/FreeNAS-Report

Mannekino · Feb 10, 2019

Very nice, thank you. I've just tested it and it works nicely.

LIGISTX · Feb 10, 2019

Jailer said:
Find the disk in question and check the SMART output.

Any specific smart info besides what I showed in the emailed smart report above I should look for? I’m not exactly sure what this is showing me since nothing seems to be wrong.

Jailer · Feb 10, 2019

Yes you need to run smartctl -a /dev/dax and replace x with the drive number you need to check.

https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/

LIGISTX · Feb 10, 2019

Jailer said:
Yes you need to run smartctl -a /dev/dax and replace x with the drive number you need to check.

https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/

How do I translate this to a drive number?

Code:

raidz2-0                                      DEGRADED     0     0     0

        gptid/ab0351e8-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/abbfceac-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/ac8d872a-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/ad4a2436-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/ae0d7e64-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/aeca106f-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/af89686d-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee  DEGRADED     0     0 3.03K  too many errors

        gptid/b10b6452-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

        gptid/b1d949c1-44ea-11e8-8cad-e0071bffdaee  ONLINE       0     0     0

LIGISTX · Feb 10, 2019

I believe this is the smart data from every drive. Still not entirely sure what is wrong, I don’t see anything that looks worrisome.

Suggestions?

Code:

########## SMART status report for da1 drive (Western Digital Red: WD-WCC7K5KAY62J) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   160   160   021    Pre-fail  Always       -       6958
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       100
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       7148
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       98
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       86
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       60
194 Temperature_Celsius     0x0022   126   110   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      7131         -
Short offline       Completed without error       00%      7063         -





########## SMART status report for da2 drive (Western Digital Red: WD-WCC7K7XF3TTF) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   167   166   021    Pre-fail  Always       -       6650
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       125
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10670
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0022   127   112   000    Old_age   Always       -       23
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10652         -
Short offline       Completed without error       00%     10585         -





########## SMART status report for da3 drive (Western Digital Red: WD-WCC7K7XF3A0E) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   164   164   021    Pre-fail  Always       -       6783
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       129
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10673
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       127
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       122
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       58
194 Temperature_Celsius     0x0022   126   109   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10656         -
Short offline       Completed without error       00%     10588         -





########## SMART status report for da4 drive (Western Digital Red: WD-WCC7K7PHXACK) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   160   160   021    Pre-fail  Always       -       6983
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       129
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10670
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       127
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       122
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       51
194 Temperature_Celsius     0x0022   123   105   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10654         -
Short offline       Completed without error       00%     10585         -





########## SMART status report for da5 drive (Western Digital Red: WD-WCC7K0YNA6JX) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   156   156   021    Pre-fail  Always       -       7191
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       125
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10668
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       58
194 Temperature_Celsius     0x0022   123   103   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10651         -
Short offline       Completed without error       00%     10583         -





########## SMART status report for da6 drive (Western Digital Red: WD-WCC7K5HF3E72) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   167   167   021    Pre-fail  Always       -       6641
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       129
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10678
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       127
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       101
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       78
194 Temperature_Celsius     0x0022   126   109   000    Old_age   Always       -       24
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10661         -
Short offline       Completed without error       00%     10593         -





########## SMART status report for da7 drive (Western Digital Red: WD-WCC7K6VNN8K4) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   161   161   021    Pre-fail  Always       -       6908
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       125
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10669
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0022   123   106   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10652         -
Short offline       Completed without error       00%     10584         -





########## SMART status report for da8 drive (Western Digital Red: WD-WCC7K7PHXY1H) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   164   164   021    Pre-fail  Always       -       6766
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       125
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10669
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       59
194 Temperature_Celsius     0x0022   122   105   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10652         -
Short offline       Completed without error       00%     10584         -





########## SMART status report for da9 drive (Western Digital Red: WD-WCC7K2YF4UTL) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   169   169   021    Pre-fail  Always       -       6541
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       125
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10665
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       44
194 Temperature_Celsius     0x0022   123   106   000    Old_age   Always       -       27
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10647         -
Short offline       Completed without error       00%     10343         -





########## SMART status report for da10 drive (Western Digital Red: WD-WCC7K3NCPRT8) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   162   162   021    Pre-fail  Always       -       6866
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       125
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   086   086   000    Old_age   Always       -       10669
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       123
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       39
194 Temperature_Celsius     0x0022   122   105   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     10651         -
Short offline       Completed without error       00%     10584         -

LIGISTX · Feb 11, 2019

@Jailer @Chris Moore any advice from here?

Chris Moore · Feb 11, 2019

LIGISTX said:
SMART overall-health self-assessment test result: PASSED

That "PASSED" is a useless indicator. I have only ever seen ONE drive out of literally hundreds that was even able to respond to commands that didn't say PASSED even when it had hundreds or even thousands of errors. A drive will still say it is PASSING as long as it is usable at all, even if it has bad sectors and long delays from read retries. I don't even look at that.

Chris Moore · Feb 11, 2019

Check glabel status to find out exactly which drive is indicated by
gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee
Then, if you would, share the full output of smartctl -x /dev/daX and we can take a look to see if there is some indicator that shows a drive fault.

Chris Moore · Feb 11, 2019

Also, there is this, if you want to review the resource:

Hard Drive Troubleshooting Guide (All Versions of FreeNAS)
https://forums.freenas.org/index.ph...bleshooting-guide-all-versions-of-freenas.17/

LIGISTX · Feb 11, 2019

Chris Moore said:
Check glabel status to find out exactly which drive is indicated by
gptid/b04ad4fc-44ea-11e8-8cad-e0071bffdaee
Then, if you would, share the full output of smartctl -x /dev/daX and we can take a look to see if there is some indicator that shows a drive fault.

I did (I think) provide the smart output from every drive in the array a few posts up, I believe before the one I tagged you in. Unless that wasn’t the correct info? To me it looked like they all read the same. But when I get home I will try this as well.

Chris Moore · Feb 11, 2019

The command smartctl -a gives less detail than smartctl -x and you didn't provide all the data that smartctl -a would and I am only interested in the drive that is being a problem.
You don't need to do anything. I'm sure that you can find the right answer on your own.

LIGISTX · Feb 11, 2019

Chris Moore said:
The command smartctl -a gives less detail than smartctl -x and you didn't provide all the data that smartctl -a would and I am only interested in the drive that is being a problem.
You don't need to do anything. I'm sure that you can find the right answer on your own.

I thought I had provided the -a info, but either way, I don't see much wrong with this either... Do you see anything in here that indicates an issue? I don't see any reallocated sectors or rear or write errors/ retry's etc.

https://pastebin.com/9PqRaHL8

So, uh, by "not needing to do anything", what do you mean exactly?

Also, I am running another long test just to make double sure this test was indeed ran over the weekend as it appeared to via my emailed status, and as it should have via freenas webui. But, I figure best to just make sure...

Important Announcement for the TrueNAS Community.

Array Degraded - Help

Guru

Guru

Hall of Famer

Not strong, but bad

Guru

Guru

Attachments

Patron

Guru

Patron

Guru

Not strong, but bad

Guru

Guru

Guru

Hall of Famer

Hall of Famer

Hall of Famer

Guru

Hall of Famer

Guru

Similar threads