Seagate SAS drives not dead but not healthy

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
Hi All,

I think I have some drives I have to replace, but is there a way to know for sure? I am running FreeNAS-11.2-U7 and have other SAS drives with none of these issues and the SAS shelf is a HP D2600 with only one cable and one I/O module installed.

They are failing :
1) Read SMART Self-Test Log Failed
2) failed to read SMART values
3) they have one or more "Elements in grown defect list"
4) they have many "Recovered via rewrite in-place"

They are still working but I am concerned and think at times they might be slowing down scrub operations or others. Is there a camcontrol command to clear the history on the Seagate SAS drives?

Thanks,
Joe

root@feenas1[~]# smartctl -x /dev/da0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor: SEAGATE
Product: ST33000650SS
Revision: RS17
Compliance: SPC-4
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Logical block size: 512 bytes
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000c50066ff7fb7
Serial number: 9XK0K33T
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Mon Jan 6 10:50:16 2020 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Disabled or Not Supported
Read Cache is: Enabled
Writeback Cache is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature: 32 C
Drive Trip Temperature: 68 C

Manufactured in week 15 of year 2011
Specified cycle count over device lifetime: 10000
Accumulated start-stop cycles: 16
Specified load-unload count over device lifetime: 300000
Accumulated load-unload cycles: 97
Elements in grown defect list: 13

Vendor (Seagate) cache information
Blocks sent to initiator = 1687146662
Blocks received from initiator = 1247742005
Blocks read from cache and sent to initiator = 1554906834
Number of read and write commands whose size <= segment size = 48875294
Number of read and write commands whose size > segment size = 737151

Vendor (Seagate/Hitachi) factory information
number of hours powered up = 27224.63
number of minutes until next internal SMART test = 27

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 2490732599 0 0 2490732599 0 77829.633 0
write: 0 0 0 0 0 13962.738 0
verify: 254926 0 0 254926 0 0.000 0

Non-medium error count: 4831

SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background long Completed 32 27199 - [- - -]
# 2 Background short Completed 32 27158 - [- - -]
# 3 Background short Completed 32 27133 - [- - -]
# 4 Background short Completed 32 27108 - [- - -]
# 5 Background short Completed 32 27084 - [- - -]
# 6 Background short Completed 32 27060 - [- - -]
# 7 Background long Completed 32 27030 - [- - -]
# 8 Background short Completed 32 26987 - [- - -]
# 9 Background short Completed 32 26963 - [- - -]
#10 Background short Completed 32 26939 - [- - -]
#11 Background long Completed 32 26935 - [- - -]
#12 Background short Completed 32 26818 - [- - -]
#13 Background short Completed 32 26794 - [- - -]
#14 Background short Completed 32 26770 - [- - -]
#15 Background short Completed 32 26746 - [- - -]
#16 Background long Completed 32 26713 - [- - -]
#17 Background short Completed 32 26664 - [- - -]
#18 Background short Completed 32 26543 - [- - -]
#19 Background long Completed 32 26529 - [- - -]
#20 Background short Completed 32 26495 - [- - -]

Long (extended) Self Test duration: 27600 seconds [460.0 minutes]

Background scan results log
Status: waiting until BMS interval timer expires
Accumulated power on time, hours:minutes 27224:38 [1633478 minutes]
Number of background scans performed: 63, scan progress: 0.00%
Number of background medium scans performed: 63

# when lba(hex) [sk,asc,ascq] reassign_status
1 14069:07 000000001875ce88 [1,16,0] Recovered via rewrite in-place
2 25369:38 00000000b52b3fca [1,18,7] Recovered via rewrite in-place
3 25463:56 00000000adbe9b06 [1,18,7] Recovered via rewrite in-place
4 25464:49 00000000b40c724a [1,18,7] Recovered via rewrite in-place
5 25468:39 00000000a9d5a23e [1,18,7] Recovered via rewrite in-place
6 25468:41 00000000aa075feb [1,18,7] Recovered via rewrite in-place
7 25468:41 00000000aa076dd7 [1,18,7] Recovered via rewrite in-place
8 25468:52 00000000ab21f0a6 [1,18,7] Recovered via rewrite in-place
9 25468:57 00000000abaebded [1,18,7] Recovered via rewrite in-place
10 25471:33 00000000ab0b4c26 [1,18,7] Recovered via rewrite in-place
11 25501:33 0000000055a078ca [1,18,7] Recovered via rewrite in-place
12 25503:10 00000000a7fb5389 [1,18,7] Recovered via rewrite in-place
13 25503:17 00000000adf62c47 [1,18,7] Recovered via rewrite in-place
14 25520:23 0000000034b7e6aa [1,18,7] Recovered via rewrite in-place
15 25520:53 0000000029be8786 [1,18,7] Recovered via rewrite in-place
16 25522:32 000000008051a196 [1,18,7] Recovered via rewrite in-place
17 25545:51 0000000041d8da13 [1,18,7] Recovered via rewrite in-place
18 25546:48 00000000740a7d4d [1,18,7] Recovered via rewrite in-place
19 25546:51 000000007688f0ff [1,18,7] Recovered via rewrite in-place
20 25547:49 00000000a75d25be [1,18,7] Recovered via rewrite in-place
21 25547:49 00000000a75fb5cf [1,18,7] Recovered via rewrite in-place
22 25733:05 0000000067a82942 [1,18,7] Recovered via rewrite in-place
23 25748:38 000000002993bb8d [1,18,7] Recovered via rewrite in-place
24 25750:06 000000002db476b5 [1,18,7] Recovered via rewrite in-place
25 25750:15 0000000034fd733b [1,18,7] Recovered via rewrite in-place
26 25750:16 000000003661149d [1,18,7] Recovered via rewrite in-place
27 25779:48 0000000051ec88b8 [1,18,7] Recovered via rewrite in-place
28 25807:11 0000000073134d4e [1,18,7] Recovered via rewrite in-place
29 25810:50 000000009068551e [1,18,7] Recovered via rewrite in-place
30 25811:29 00000000adf8bff0 [1,18,7] Recovered via rewrite in-place
31 25811:39 00000000b5256bb4 [1,18,7] Recovered via rewrite in-place
32 25814:50 000000012e5b6ef7 [1,18,7] Recovered via rewrite in-place
33 25893:49 000000007b6dc8af [1,18,7] Recovered via rewrite in-place
34 25913:44 00000000003ff816 [1,18,7] Recovered via rewrite in-place
35 25952:07 0000000015df8ce3 [1,18,7] Recovered via rewrite in-place
36 25952:25 000000001875c3d7 [1,18,7] Recovered via rewrite in-place
37 25952:25 000000001875ce88 [1,18,7] Recovered via rewrite in-place
38 25956:02 0000000099d83b34 [1,18,7] Recovered via rewrite in-place
39 25956:27 00000000ad2ae7b9 [1,18,7] Recovered via rewrite in-place
40 25956:47 00000000bc7954c7 [1,18,7] Recovered via rewrite in-place
41 25956:58 00000000c4d7bfde [1,18,7] Recovered via rewrite in-place
42 25957:09 00000000cd6b55dd [1,18,7] Recovered via rewrite in-place
43 25967:15 00000000003ff79d [1,18,7] Recovered via rewrite in-place
44 25992:11 00000000003ff73e [1,18,7] Recovered via rewrite in-place
45 25994:21 000000000854ec0a [1,18,7] Recovered via rewrite in-place
46 25994:27 000000000ceec0a1 [1,18,7] Recovered via rewrite in-place
47 25995:02 000000002c6990ca [1,18,7] Recovered via rewrite in-place
48 25995:07 000000003038b5b8 [1,18,7] Recovered via rewrite in-place
49 25995:10 0000000032cd373f [1,18,7] Recovered via rewrite in-place
50 26196:13 00000000003ff7fb [1,18,7] Recovered via rewrite in-place
51 26460:56 000000004ede3439 [1,18,7] Recovered via rewrite in-place
52 26462:17 000000004ede5c18 [1,18,7] Recovered via rewrite in-place
53 26470:39 00000000003ff78f [1,18,7] Recovered via rewrite in-place
54 26687:07 00000000003ff81c [1,18,7] Recovered via rewrite in-place
55 26697:56 00000000003ffa07 [1,18,7] Recovered via rewrite in-place
56 26698:07 00000000b0b0ce0f [1,18,7] Recovered via rewrite in-place
57 26739:05 00000000ba674fef [1,18,7] Recovered via rewrite in-place
58 26830:07 00000000240d4e68 [1,18,7] Recovered via rewrite in-place
59 26830:19 000000002eec88fe [1,18,7] Recovered via rewrite in-place
60 26832:17 00000000963fd79b [1,18,7] Recovered via rewrite in-place
61 26832:51 00000000af96ca90 [1,18,7] Recovered via rewrite in-place
62 26835:09 00000000d8f04e58 [1,18,7] Recovered via rewrite in-place
63 26835:39 00000000df7699c9 [1,18,7] Recovered via rewrite in-place
64 26839:45 0000000116dfee52 [1,18,7] Recovered via rewrite in-place
65 26903:51 000000013e427f31 [1,18,7] Recovered via rewrite in-place
66 26903:56 00000000003ff79c [1,18,7] Recovered via rewrite in-place
67 26936:26 00000000003ffa0e [1,18,7] Recovered via rewrite in-place
68 26942:43 0000000032de2d25 [1,18,7] Recovered via rewrite in-place
69 26943:29 00000000592ba81d [1,18,7] Recovered via rewrite in-place
70 26944:25 00000000878de961 [1,18,7] Recovered via rewrite in-place
71 26944:26 0000000087cff807 [1,18,7] Recovered via rewrite in-place
72 26944:31 000000008c23ea95 [1,18,7] Recovered via rewrite in-place
73 26944:37 0000000090e5b859 [1,18,7] Recovered via rewrite in-place
74 26944:37 0000000090e6ba36 [1,18,7] Recovered via rewrite in-place
75 26944:37 0000000090e9b134 [1,18,7] Recovered via rewrite in-place
76 26945:16 00000000af965be2 [1,18,7] Recovered via rewrite in-place
77 26945:51 00000000c98a3a9c [1,18,7] Recovered via rewrite in-place
78 26945:51 00000000c98c54df [1,18,7] Recovered via rewrite in-place
79 26945:51 00000000c98f2d61 [1,18,7] Recovered via rewrite in-place
80 26946:19 00000000de360183 [1,18,7] Recovered via rewrite in-place
81 26948:11 00000001238061d8 [1,18,7] Recovered via rewrite in-place
82 26949:00 000000013e43e75a [1,18,7] Recovered via rewrite in-place
83 26980:53 000000002930e7d4 [1,18,7] Recovered via rewrite in-place
84 26982:08 000000003605185d [1,18,7] Recovered via rewrite in-place
85 26985:36 000000004698528f [1,18,7] Recovered via rewrite in-place
86 26986:01 000000005af2d539 [1,18,7] Recovered via rewrite in-place
87 26987:30 00000000003ff73e [1,18,7] Recovered via rewrite in-place
88 27016:18 000000001e4646f5 [1,18,7] Recovered via rewrite in-place
89 27025:15 00000000003ff7f4 [1,18,7] Recovered via rewrite in-place
90 27035:08 000000009db6c368 [1,18,7] Recovered via rewrite in-place
91 27035:58 00000000c128c38c [1,18,7] Recovered via rewrite in-place
92 27135:10 00000000003ff80e [1,18,7] Recovered via rewrite in-place
93 27143:39 00000000141c5a4b [1,18,7] Recovered via rewrite in-place
94 27144:13 000000003379541d [1,18,7] Recovered via rewrite in-place
95 27144:50 0000000054685896 [1,18,7] Recovered via rewrite in-place
96 27144:53 00000000575403ef [1,18,7] Recovered via rewrite in-place
97 27146:08 00000000981da863 [1,18,7] Recovered via rewrite in-place
98 27146:25 00000000a62accdd [1,18,7] Recovered via rewrite in-place
99 27146:40 00000000b26182c4 [1,18,7] Recovered via rewrite in-place
100 27147:08 00000000c7e73d74 [1,18,7] Recovered via rewrite in-place
101 27147:09 00000000c90f975f [1,18,7] Recovered via rewrite in-place
102 27147:32 00000000da005167 [1,18,7] Recovered via rewrite in-place
103 27148:30 00000001038725b0 [1,18,7] Recovered via rewrite in-place
104 27149:10 000000011dbbd089 [1,18,7] Recovered via rewrite in-place
105 27166:06 00000000003ff93d [1,18,7] Recovered via rewrite in-place

Protocol Specific port log page for SAS SSP
relative target port id = 1
generation code = 1
number of phys = 1
phy identifier = 0
attached device type: expander device
attached reason: power on
reason: power on
negotiated logical link rate: phy enabled; 6 Gbps
attached initiator port: ssp=0 stp=0 smp=1
attached target port: ssp=0 stp=0 smp=1
SAS address = 0x5000c50033ff7fb5
attached SAS address = 0x500143802368273f
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 0
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 0
Phy reset problem count: 0
relative target port id = 2
generation code = 1
number of phys = 1
phy identifier = 1
attached device type: expander device
attached reason: power on
reason: unknown
negotiated logical link rate: phy enabled; 1.5 Gbps
attached initiator port: ssp=0 stp=0 smp=1
attached target port: ssp=0 stp=0 smp=1
SAS address = 0x5000c50033ff7fb6
attached SAS address = 0x500143802368273d
attached phy identifier = 0
Invalid DWORD count = 0
Running disparity error count = 0
Loss of DWORD synchronization = 2
Phy reset problem = 0
Phy event descriptors:
Invalid word count: 0
Running disparity error count: 0
Loss of dword synchronization count: 2
Phy reset problem count: 0
 

artlessknave

Wizard
Joined
Oct 29, 2016
Messages
1,506
I'm confused. you know the drives are failing but you don't...believe they are failing?
the first thing to do with drives that give failing errors is to replace them. of COURSE they are causing performance issues...they are failing :/
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
That drive was replaced because it had : "Elements in grown defect list: 13"

It is odd that it and others pass the smart short and log tests

The right query on SAS drives is smartctl -x /dev/daxx as it shows the Error counter log with no uncorrected errors:

Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 941006820 0 0 941006820 0 2171263.067 0
write: 0 0 0 0 0 152606.766 0
verify: 3931722295 1 0 3931722296 1 591364.266 0


I have the same question as this guy : https://sourceforge.net/p/smartmontools/mailman/message/35168309/
 
Top