Detailed hardware specs? Pool layout? Smart output?
You did nothing in your post but tell a story. You need to give data so people can help you.
Sent from my Nexus 5X using Tapatalk
Like in my avatar:
Xeon E5-2658v4 with 96Gb of ECC Reg. RAM in an ESXi 6.5 host.
FreeNAS 11- RC3 runs with 20 cores and 64Gb RAM
I use 8x4Tb harddisk from different brands (2xWD red, 3x Seagate Barracuda, 3x Toshiba MD40) on a Dell 6gpbs 9211-8i IT SAS controller (frimware P20, device passthrough).
Zpool of RAIDZ2
At this moment i run a long smart test on disk 8 (da7) which is every boot resilvering since i replaced disk 7 (da6).
I checked the cables and they seem to be fine.
I don't understand what happens, after resilver i get this message in my alerts
Code:
KRITIEK: 24 mei 2017 11:15 - De status van volume Data is ONLINE: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
When checking status it says Resilver completed, 13 errors
And at this moment i run a long smart test (smartctl -t long /dev/da7)
So i have to wait and see, here is my report until now (30% of smart selftest completed)
Code:
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" MD04ACA... Enterprise HDD
Device Model: TOSHIBA MD04ACA400
Serial Number: Y6J7KYDWFSAA
LU WWN Device Id: 5 000039 76ba0323b
Firmware Version: FP2A
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed May 24 12:17:54 2017 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 247) Self-test routine in progress...
70% of test remaining.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 475) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 6725
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 5
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 25
10 Spin_Retry_Count 0x0033 100 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 4
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 2
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 17
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 42 (Min/Max 26/43)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 0
222 Loaded_Hours 0x0032 100 100 000 Old_age Always - 23
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 637
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
I know temperature is high but that is always with these Toshiba disks, in my old NAS i had also 2 of these and they ran always at 45-47 degrees C. Never had any problems with those. These in my FreeNAS are newer.
Maybe there went something else wrong during the first resilver version of da6?
I red all about problems with resilver form older FN versions, would it help if i put da7 offline, the scrub my pool and when it is finished let it get back online and resilver?