Is this a drive failing?

Status
Not open for further replies.

KempelofDoom

Explorer
Joined
Apr 11, 2014
Messages
72
I get these emails now with the following.

freenas.CRASH_SVR kernel log messages:
> ahcich8: Timeout on slot 13 port 0
> ahcich8: is 00000000 cs 00002000 ss 00002000 rs 00002000 tfd 50 serr 00000000 cmd 10008d17
> (ada7:ahcich8:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 e0 15 dc 40 10 00 00 00 00 00
> (ada7:ahcich8:0:0:0): CAM status: Command timeout
> (ada7:ahcich8:0:0:0): Retrying command

I have a disk to replace ada7 but when I follow the guide to replacing a disk I don't have any of the buttons to begin the process. Is the above an indication of a drive failure? It passes scrubs with no errors. I get those messages daily now. Is there a command line guide for drive replacement?
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
How about starting by adding your FreeNAS version and hardware specs as the forum rules state?
Did you configure regular SMART tests?

Then you should poll the SMART information for the drive with smartctl -a -q noserial /dev/ada7 and see if there are indicators of a failing drive. Furthermore dmesg may give you more information about the error.
 

KempelofDoom

Explorer
Joined
Apr 11, 2014
Messages
72
Build FreeNAS-9.2.1.3-RELEASE-x64 (dc0c46b)
Platform AMD A8-5600K APU with Radeon(tm) HD Graphics
Memory 32180MB

smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p3 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD30EZRX-00DC0B0
Firmware Version: 80.00A80
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Apr 28 09:50:56 2014 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (40320) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 404) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x70b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 182 181 021 Pre-fail Always - 5891
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 32
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8074
10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 31
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 23
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 839516
194 Temperature_Celsius 0x0022 109 104 000 Old_age Always - 41
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[root@freenas] ~#
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
The load cycle count of this drive is extremely high (839,516). These drives are usually rated for 300,000-500,000 cycles in their total life time - and your drive is just about a year old. This is a known issue on WD Greens (2 and 3 TB), my build is facing the same problems. You should read this article for more information: Hacking WD Greens(and Reds) with WDIDLE3.exe

The rest of the SMART values look OK. Is the drive connected directly to the mainboard or do you use a separate controller? Which mainboard are you using?

EDIT: This drive never performed a SMART test. You should probably configure automatic smart tests, see the FreeNAS manual for details. It wouldn't hurt to run a short test to see the results.
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
Sorry for the double post. Missing buttons in the GUI have been an issue for touchscreen devices and certain browser in the past (for example see Bug #4486 and Bug #4130). You might want to check with another browser and/or upgrade to the latest FreeNAS version although I don't think any GUI related changes went into 9.2.1.4 and 9.2.1.5.

Correctly replacing the drive from the command line is a bit tricky, since FreeNAS expects certain things like a GPT partition for the ZFS to reside in and a swap partition, both which are created correctly when using the GUI.
 

KempelofDoom

Explorer
Joined
Apr 11, 2014
Messages
72
Sorry for the double post. Missing buttons in the GUI have been an issue for touchscreen devices and certain browser in the past (for example see Bug #4486 and Bug #4130). You might want to check with another browser and/or upgrade to the latest FreeNAS version although I don't think any GUI related changes went into 9.2.1.4 and 9.2.1.5.

Correctly replacing the drive from the command line is a bit tricky, since FreeNAS expects certain things like a GPT partition for the ZFS to reside in and a swap partition, both which are created correctly when using the GUI.

Total noob here but I just figured out what I was doing wrong. I was looking for a volume status button but didn't realize the button that looked like a sheet of paper was the view status button till now. Once I clicked on that, I was able to see the disks and can now do the offline and replace process. Man do I feel dumb.
 

warri

Guru
Joined
Jun 6, 2011
Messages
1,193
We're all just human ;)

If you are replacing the disk with a new WD Green, keep in mind to use wdidle3 on it before (or after) like explained in the other thread. If you have other WD Greens in your pool now, this would be important to do as well or you might see more drives failing shortly.
 
Status
Not open for further replies.
Top