This is an older 45drive system with 45 x 4 TB drives that we reformatted and installed TrueNAS 13.0-U5.2. I know it has things that are not popular like the Highpoint Rocket 750 card. I also did not follow best practices by building multiple vdevs to join into a pool. Neither of those should change underlying disk mechanics though.
I had a drive go bad last week (da20). Hot spare kicked in. Ordered replacement drive and another for stock. While replacing bad drive, I probably bumped its neighbor (da19) just enough and the other hot spare kicked in for that drive. Finished replacement of first drive and that spare returned to available. The second drive shows no SMART issues in the short test log and I don't see any other problems with it through zfs, but the spare is still attached to that drive. The only error I received on the da19 is this:
This morning I got a similar error on a third drive (da34) but hot spare did not kick in.
I know both drives should probably be replaced but would replace da34 before da19.
Also wondering why hot spare kicked in on da19 but not da34. I read in another thread that zfs is not smart aware so does not kick in spares based on that but the "failed to read SMART" is the only error I saw for da19 and the error for da34 seems more definitive.
I read in another thread ( https://www.truenas.com/community/t...ad-drive-replacement-resilver-complete.88796/ ) to try zpool detach on the spare drive that is still in use. I tried the detach from the GUI on the in use spare but it errored out with "[EZFS_NOTSUP] Cannot detach root-level vdevs".
Results from smartctrl -a on the 2 drives are:
I had a drive go bad last week (da20). Hot spare kicked in. Ordered replacement drive and another for stock. While replacing bad drive, I probably bumped its neighbor (da19) just enough and the other hot spare kicked in for that drive. Finished replacement of first drive and that spare returned to available. The second drive shows no SMART issues in the short test log and I don't see any other problems with it through zfs, but the spare is still attached to that drive. The only error I received on the da19 is this:
* Device: /dev/hptnr [hpt_disk_1/20/1], Read SMART Error Log Failed.
This morning I got a similar error on a third drive (da34) but hot spare did not kick in.
* Device: /dev/hptnr [hpt_disk_2/11/1], Self-Test Log error count increased from 0 to 1.
I know both drives should probably be replaced but would replace da34 before da19.
Also wondering why hot spare kicked in on da19 but not da34. I read in another thread that zfs is not smart aware so does not kick in spares based on that but the "failed to read SMART" is the only error I saw for da19 and the error for da34 seems more definitive.
I read in another thread ( https://www.truenas.com/community/t...ad-drive-replacement-resilver-complete.88796/ ) to try zpool detach on the spare drive that is still in use. I tried the detach from the GUI on the in use spare but it errored out with "[EZFS_NOTSUP] Cannot detach root-level vdevs".
Results from smartctrl -a on the 2 drives are:
Code:
(da19) SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 4 3 Spin_Up_Time 0x0027 230 175 021 Pre-fail Always - 5491 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 51 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 002 002 000 Old_age Always - 72182 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 51 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 50 193 Load_Cycle_Count 0x0032 190 190 000 Old_age Always - 31309 194 Temperature_Celsius 0x0022 127 119 000 Old_age Always - 25 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 6641 - # 2 Short offline Completed without error 00% 6628 - # 3 Short offline Completed without error 00% 5242 -
Code:
(da34) - hpt_disk_2/11/1 SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 10 3 Spin_Up_Time 0x0027 182 173 021 Pre-fail Always - 7866 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 49 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 001 001 000 Old_age Always - 73195 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 49 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 48 193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3343 194 Temperature_Celsius 0x0022 125 118 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed: read failure 50% 7653 4333832 # 2 Short offline Completed without error 00% 6262 - # 3 Short offline Completed without error 00% 6087 -