Disk going bad or expected behavior?

Status
Not open for further replies.

soulbabel

Cadet
Joined
Nov 23, 2013
Messages
8
I built my first FreeNAS server with six 4TB WD Red drives about 2 months ago. The drives are in a RAIDZ2 array. A few days ago I got a reported SMART 'read failure' error on one of the disks after an extended test, so I ran a zpool scrub and 256K of data was repaired. I ran another SMART extended test and a 'read failure' occurred at a different LBA, and I'm in the process of running another zpool scrub in which another 256K of data has been repaired so far. Should I take this as sign of the disk starting to go bad, or is this typical wear and tear behavior?

I actually have three new 4TB WD Red drives that are currently undergoing disk stress testing in a separate Linux machine. I was planning on building a separate RAIDZ2 with eight total disks with them after I get five more drives, but I'm in no rush to get that done. Should I just use one of these disks to resilver the array and RMA the disk with the errors?

I'm still new to all this so thanks for all your help. Also, the server uses ECC RAM if that is helpful. I am enclosing the output of smartctl and zpool status below:

cmd: smartctl -a /dev/da5
Code:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      15
  3 Spin_Up_Time            0x0027  187  176  021    Pre-fail  Always      -      7641
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      47
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  100  100  000    Old_age  Always      -      701
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      47
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      41
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      1530
194 Temperature_Celsius    0x0022  119  118  000    Old_age  Always      -      33
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  200  000    Old_age  Always      -      0
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      7
 
SMART Error Log Version: 1
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed: read failure      10%      678        24681960
# 2  Extended offline    Completed: read failure      90%      676        24681960
# 3  Short offline      Completed: read failure      10%      666        24681960
# 4  Short offline      Completed without error      00%      656        -
# 5  Short offline      Completed without error      00%      630        -
# 6  Extended offline    Completed: read failure      90%      621        24682008
# 7  Short offline      Completed without error      00%      618        -
# 8  Short offline      Completed without error      00%      606        -
# 9  Short offline      Completed without error      00%      594        -
#10  Short offline      Completed without error      00%      582        -
#11  Short offline      Completed without error      00%      570        -
#12  Short offline      Completed without error      00%      557        -
#13  Short offline      Completed without error      00%      556        -
#14  Short offline      Completed without error      00%      544        -
#15  Short offline      Completed without error      00%      532        -
#16  Short offline      Completed without error      00%      520        -
#17  Short offline      Completed without error      00%      508        -
#18  Short offline      Completed without error      00%      496        -
#19  Short offline      Completed without error      00%      484        -
#20  Short offline      Completed without error      00%      472        -
#21  Short offline      Completed without error      00%      460        -


cmd: zpool status
Code:
pool: tank1
 state: ONLINE
  scan: scrub in progress since Fri Dec 20 21:53:03 2013
        7.75T scanned out of 12.2T at 175M/s, 7h20m to go
        256K repaired, 63.70% done
config:
 
        NAME                                            STATE     READ WRITE CKSUM
        tank1                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/6177eafb-4c2c-11e3-a43a-002590d72d3b  ONLINE       0     0     0
            gptid/626456f8-4c2c-11e3-a43a-002590d72d3b  ONLINE       0     0     0
            gptid/634c34ad-4c2c-11e3-a43a-002590d72d3b  ONLINE       0     0     0
            gptid/64368d6a-4c2c-11e3-a43a-002590d72d3b  ONLINE       0     0     0
            gptid/6524b559-4c2c-11e3-a43a-002590d72d3b  ONLINE       0     0     0
            gptid/66130c69-4c2c-11e3-a43a-002590d72d3b  ONLINE       0     0     0  (repairing)
        logs
          gptid/349c2eca-4c2d-11e3-a43a-002590d72d3b    ONLINE       0     0     0
 
errors: No known data errors
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Your long SMART tests are failing, so I'd RMA the drive.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Definitely not expected behavior. This drive should be considered somewhere between 1 hour and 1 year away from not working. Replace now.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Actually, your disk is already bad if its failing smart tests(which it is).
 
Status
Not open for further replies.
Top