mgb
Dabbler
- Joined
- Aug 10, 2015
- Messages
- 10
I'm in the process of burn-in testing a new server and there's 1 SAS HDD (out of 10) that's reporting quite a high number of "Errors Corrected by ECC" compared to the others.
The system:
Chassis: Supermicro CSE-826BE26-R920LPB (Dual Expanders Backplane)
Motherboard: Supermicro X1oDRi-T
HBA: 2x LSI 9207-8i (only 1 is currently connected)
HDDs: 10x HGST 2TB 7K400 SAS2
The procedure:
After running memtest86 and memtest86+ on the server, I followed the [How To] Hard Drive Burn-In Testing (thanks @qwertymodo). You'll notice the ~8TB of data processed which is the result of badblocks default 4 passes.
The suspicious HDD:
Out of the other 9 HDDs, 2 had 0 errors, 2 had 1 error, 3 had less than 10 errors, 2 had less than 30 errors and then this one shows over 5000. Seems extremely high compared to the others.
This is definitely something I should be worried about, right?
Any suggestions on how to more thoroughly test this drive?
I'm thinking I should RMA the drive.
Thanks in advance!
--mgb
The system:
Chassis: Supermicro CSE-826BE26-R920LPB (Dual Expanders Backplane)
Motherboard: Supermicro X1oDRi-T
HBA: 2x LSI 9207-8i (only 1 is currently connected)
HDDs: 10x HGST 2TB 7K400 SAS2
The procedure:
After running memtest86 and memtest86+ on the server, I followed the [How To] Hard Drive Burn-In Testing (thanks @qwertymodo). You'll notice the ~8TB of data processed which is the result of badblocks default 4 passes.
The suspicious HDD:
Code:
root@sysresccd /root % smartctl -q noserial -a /dev/sdh smartctl 6.4 2015-06-04 r4109 [x86_64-linux-3.14.50-std460-amd64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: HGST Product: HUS724020ALS640 Revision: A280 Compliance: SPC-4 User Capacity: 2,000,398,934,016 bytes [2.00 TB] Logical block size: 512 bytes LB provisioning type: unreported, LBPME=0, LBPRZ=0 Rotation Rate: 7200 rpm Form Factor: 3.5 inches Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Wed Oct 7 14:32:51 2015 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Current Drive Temperature: 32 C Drive Trip Temperature: 85 C Manufactured in week 14 of year 2015 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 4 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 6 Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 2048408550375424 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 5304 354 0 5658 8029 8001.691 0 write: 0 0 0 0 6 8001.596 0 verify: 1535 176 0 1711 26283 0.064 0 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background long Completed - 46 - [- - -] # 2 Background long Completed - 6 - [- - -] # 3 Background short Completed - 0 - [- - -] Long (extended) Self Test duration: 22650 seconds [377.5 minutes]
Out of the other 9 HDDs, 2 had 0 errors, 2 had 1 error, 3 had less than 10 errors, 2 had less than 30 errors and then this one shows over 5000. Seems extremely high compared to the others.
This is definitely something I should be worried about, right?
Any suggestions on how to more thoroughly test this drive?
I'm thinking I should RMA the drive.
Thanks in advance!
--mgb