Help needed with SMART Test Failure on brand new HDD

Status
Not open for further replies.

alieg

Dabbler
Joined
Jul 12, 2013
Messages
44
So I'd been getting some SMART errors for one of my disks a while ago, I have mirrored disks & bittorrent sync to 2 PC's for important files, so I decided just to monitor it - nothing really happened for several months until last weekend when my 2nd HDD failed aSMART test.
I decided to replace both drives (approx 18 months old 2TB WD Greens) on Monday, but as soon as I replaced the first drive I got a SMART failure from the new disk, & have been continuing to get errors every time I reboot.

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada0, Failed SMART usage Attribute: 1 Raw_Read_Error_Rate.


I don't know if I can trust the results for a brand new drive, & I'm having trouble interpreting them,
if someone could at least point me in the right direction, that would be great :)

SMART Attributes Data Structure revision number: 12033
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0xc82f 000 200 051 Pre-fail Always FAILING_NOW 3298534883328
3 Spin_Up_Time 0xfd27 000 253 021 Pre-fail Always FAILING_NOW 0
4 Start_Stop_Count 0x6432 002 100 000 Old_age Always - 5497558138882
5 Reallocated_Sector_Ct 0xc833 000 200 140 Pre-fail Always FAILING_NOW 0 (1792 0)
7 Seek_Error_Rate 0xfd2e 000 253 000 Old_age Always - 9895604649984
9 Power_On_Hours 0x6432 081 100 000 Old_age Always - 81 (10 0 0)
10 Spin_Retry_Count 0xfd32 000 253 000 Old_age Always - 12094627905536
11 Calibration_Retry_Count 0xfd32 000 253 000 Old_age Always - 13194139533312
12 Power_Cycle_Count 0x6432 002 100 000 Old_age Always - 211106232532994
192 Power-Off_Retract_Count 0xc832 000 200 000 Old_age Always - 212205744160768
193 Load_Cycle_Count 0xc832 010 200 000 Old_age Always - 213305255788554
194 Temperature_Celsius 0x6e22 033 110 000 Old_age Always - 33 (196 0 0 0 0)
196 Reallocated_Event_Count 0xc832 000 200 000 Old_age Always - 0 (50432 0)
197 Current_Pending_Sector 0xc832 000 200 000 Old_age Always - 217703302299648
198 Offline_Uncorrectable 0xfd30 000 253 000 Old_age Offline - 218802813927424
199 UDMA_CRC_Error_Count 0xc832 000 200 000 Old_age Always - 219902325555200
200 Multi_Zone_Error_Rate 0xc808 000 200 000 Old_age Offline - 0

Warning! SMART ATA Error Log Structure error: invalid SMART checksum.
SMART Error Log Version: 0
No Errors Logged

Version: FreeNAS-9.2.1.5-RELEASE-x64 (80c1d35)
 

Attachments

  • Screenshot 2014-08-28 21.56.43.png
    Screenshot 2014-08-28 21.56.43.png
    671.8 KB · Views: 470

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'm divided between cable issue (UDMA CRC errors are astronomical) or just plain messed-up HDD (nearly all values are crazy high).

Replace the SATA cable. If that doesn't work, try it in another computer. If it's not back to normal, RMA it. Test new drives on said alternative computer first.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's a disk that is totally f......


I'd just RMA it right now to be honest. The firmware is clearly out to lunch on that drive.
 

philiplu

Explorer
Joined
Aug 10, 2014
Messages
58
Note that the insane raw values are actually ((SMART ID#in next line) << 40 + (real raw value?)). E.g. ID#12, Power Cycle Count, is 211,106,232,532,994. That's hex C00000000002, which is (192<<40) + 2. The next line has SMART ID# 192, and a power cycle count of 2 is reasonable. That probably rules out a cable issue. Probably explains the checksum error as well. So yeah, firmware is buggy, RMA it.

What kind of drive is this, BTW? That part scrolled off your screenshot.
 

alieg

Dabbler
Joined
Jul 12, 2013
Messages
44
Thanks everyone,

I had a go at replacing the SATA cable, I offlined the drive, formatted & resilvered the drive and ran a long test on the disk
I don't know if this is a weird issue - if I then run smartctl -a /dev/ada0 it returns "No self tests have been logged..." & if I issue the same command it returns a list of tests - see screenshots

Screenshot 2014-08-30 09.01.57.png Screenshot 2014-08-30 09.02.26.png

I'm still getting a "previous self-test completed with error (read test element)" output repeated, but none of those crazy values... yet

Screenshot 2014-08-30 09.18.50.png

I've got in touch with Amazon, who I bought the drives from to return them. They're both 2TB WD Red & my previous drives were both 2TB WD Green
Here's 2 of the SMART error emails - I noticed this when I first installed the drive - it was reporting a capacity of 137GB - after I rebooted, it showed the correct 2TB, but I noticed it again here

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada0, Failed SMART usage Attribute: 1 Raw_Read_Error_Rate.

Device info:
WDWDWDEFEF-6-6UZUZ �, S/N: D-D-C4C4H1H14Y4Y, WWN:2-f6b2f6-b00000000, FW:800A0AWD, 137 GB

This message was generated by the smartd daemon running on:

host name: freenas
DNS domain: local

The following warning/error was logged by the smartd daemon:

Device: /dev/ada0, Failed SMART usage Attribute: 1 Raw_Read_Error_Rate.

Device info:
WDWDWDEFEF-6-6UZUZ �, S/N: D-D-C4C4H1H14Y4Y, WWN:4-ee24ee-22f6b2f6b, FW:800A0AWD, 2.00 TB

For details see host's SYSLOG.

I'm looking at new drives now, but I'm having problems knowing what and where to buy them. I'm reluctant to get more WD hardware, or get anything from Amazon because of their packaging

I was looking at these Seagate Pipeline's as they claim they have a higher temp tolerance
http://www.ebuyer.com/414223-seagate-2tb-pipeline-internal-hard-drive-st2000vm003
 

Z300M

Guru
Joined
Sep 9, 2011
Messages
882

alieg

Dabbler
Joined
Jul 12, 2013
Messages
44
Seagate has a line of drives intended specifically for NAS machines: the 2TB version would be ST2000VN...

Yeah those were the other ones I was looking at before, but I'm finding them a little harder to find

Ok, I'm really confused now, I've run more tests, & they've passed on both disks. However now Device: /dev/ada1, previous self-test completed with error (read test element) keeps being repeated in the console, plus 1 or 2 times about a servo/seek test element
 
Status
Not open for further replies.
Top