The drives are brand new.
I thought the HBA card was the culprit aswell...and still could be but i think there i a low chance of 2 in a row being bad.
I have a few HBA cards(E) with the same controller on it i could try.
its worth a try, i can change to an older chassis with slower backplane (SATA 3Gbps) just to see if the problem persists
Below is 1 / 12 drives that are repporting Errors
Code:
root@truenas[/home/admin]# smartctl -a /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.107+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba MG09ACA... Enterprise Capacity HDD
Device Model: TOSHIBA MG09ACA18TE
Serial Number: Z2H0A51LFJDH
LU WWN Device Id: 5 000039 c38d10385
Firmware Version: 0105
User Capacity: 18,000,207,937,536 bytes [18.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Sun Jul 23 22:06:33 2023 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1504) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 8575
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 36
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 559
10 Spin_Retry_Count 0x0033 100 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 36
23 Helium_Condition_Lower 0x0023 100 100 075 Pre-fail Always - 0
24 Helium_Condition_Upper 0x0023 100 100 075 Pre-fail Always - 0
27 MAMR_Health_Monitor 0x0023 100 100 030 Pre-fail Always - 199514
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 1
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 35
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 47
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 31 (Min/Max 18/45)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 2176
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 134348802
222 Loaded_Hours 0x0032 099 099 000 Old_age Always - 540
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 612
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 18518740887
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 6890963785
SMART Error Log Version: 1
ATA Error Count: 2176 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 2176 occurred at disk power-on lifetime: 510 hours (21 days + 6 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 3f db 53 40 Error: ICRC, ABRT at LBA = 0x0053db3f = 5495615
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 a0 00 a0 d9 53 40 00 13d+06:32:21.688 READ FPDMA QUEUED
60 70 00 30 d8 53 40 00 13d+06:32:21.688 READ FPDMA QUEUED
60 a8 08 88 d7 53 40 00 13d+06:32:21.688 READ FPDMA QUEUED
60 70 00 18 d7 53 40 00 13d+06:32:21.688 READ FPDMA QUEUED
60 50 00 c8 d6 53 40 00 13d+06:32:21.688 READ FPDMA QUEUED
Error 2175 occurred at disk power-on lifetime: 409 hours (17 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 08 3f dc 7e 40 Error: ICRC, ABRT at LBA = 0x007edc3f = 8313919
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 f8 10 40 dc 7e 40 00 9d+01:27:27.316 READ FPDMA QUEUED
60 60 08 e0 d7 7e 40 00 9d+01:27:27.312 READ FPDMA QUEUED
60 f8 00 e8 cf 7e 40 00 9d+01:27:27.312 READ FPDMA QUEUED
60 68 10 78 c8 7e 40 00 9d+01:27:27.305 READ FPDMA QUEUED
60 f8 08 78 c0 7e 40 00 9d+01:27:27.305 READ FPDMA QUEUED
Error 2174 occurred at disk power-on lifetime: 370 hours (15 days + 10 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 ff a3 2f 40 Error: ICRC, ABRT at LBA = 0x002fa3ff = 3122175
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 18 08 00 a4 2f 40 00 7d+09:54:22.423 READ FPDMA QUEUED
60 c8 00 38 9c 2f 40 00 7d+09:54:22.422 READ FPDMA QUEUED
60 18 00 20 9c 2f 40 00 7d+09:54:22.422 READ FPDMA QUEUED
60 20 00 00 9c 2f 40 00 7d+09:54:22.421 READ FPDMA QUEUED
60 f0 00 10 99 2f 40 00 7d+09:54:22.420 READ FPDMA QUEUED
Error 2173 occurred at disk power-on lifetime: 369 hours (15 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 5f 7d eb 40 Error: ICRC, ABRT at LBA = 0x00eb7d5f = 15433055
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 18 08 60 7d eb 40 00 7d+09:47:06.903 READ FPDMA QUEUED
60 50 00 10 7d eb 40 00 7d+09:47:06.903 READ FPDMA QUEUED
60 50 00 b8 76 eb 40 00 7d+09:47:06.901 READ FPDMA QUEUED
60 18 00 98 76 eb 40 00 7d+09:47:06.900 READ FPDMA QUEUED
60 e8 00 b0 71 eb 40 00 7d+09:47:06.898 READ FPDMA QUEUED
Error 2172 occurred at disk power-on lifetime: 369 hours (15 days + 9 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 43 00 6f 8b 4a 40 Error: ICRC, ABRT at LBA = 0x004a8b6f = 4885359
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 68 00 08 8a 4a 40 00 7d+09:39:05.309 READ FPDMA QUEUED
60 f8 08 10 88 4a 40 00 7d+09:39:05.309 READ FPDMA QUEUED
60 70 00 a0 87 4a 40 00 7d+09:39:05.309 READ FPDMA QUEUED
60 18 08 88 87 4a 40 00 7d+09:39:05.309 READ FPDMA QUEUED
60 18 00 68 87 4a 40 00 7d+09:39:05.309 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 403 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Code:
zpool status RaidZ2HDDx12
root@truenas[/home/admin]# zpool status RaidZ2HDDx12
pool: RaidZ2HDDx12
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
repaired.
scan: scrub canceled on Fri Jul 21 21:02:33 2023
config:
NAME STATE READ WRITE CKSUM
RaidZ2HDDx12 DEGRADED 0 0 0
raidz2-0 DEGRADED 402 0 0
ebe8c2eb-2533-4a13-8aaa-3734fb5b1d8d DEGRADED 32 0 0 too many errors
a0a3f34d-d4d9-4110-a1aa-2315c0e4002f ONLINE 0 0 0
340a9673-ede1-453a-bb46-2ce484c6aa36 DEGRADED 72 0 0 too many errors
5a48a761-b60a-40ff-9795-f7e9bfa76f47 ONLINE 0 0 0
95f7aa5a-dd0c-4984-9e58-a7231cf8b929 ONLINE 6 0 0
bae59d59-7c85-415d-a57c-50d736bf5f4f ONLINE 0 0 0
f1562200-2055-40af-902b-3e983b9a618a DEGRADED 270 0 0 too many errors
a359990b-a6a4-4165-b455-98712e5c215b FAULTED 111 0 0 too many errors
110cd40e-a2c2-4c5c-91f4-a4334717b57a DEGRADED 18 0 0 too many errors
9b2c5579-853b-4f0d-8394-2507fcee01b4 FAULTED 23 0 0 too many errors
08e8bf74-f131-467b-8803-db9e27b277e0 ONLINE 0 0 0
39845aed-cb63-4205-a306-f3ff0fa3138a DEGRADED 16 0 0 too many errors
errors: No known data errors