bicycle_wreck
Cadet
- Joined
- Aug 2, 2020
- Messages
- 4
Hello,
Currently running FreeNAS-11.3-U1. I have a primary pool consisting of two 8 TB WD Red drives in a ZFS mirror. Weekly scrubs on the primary pool. Primary pool is replicated on a third (single) WD Red drive as a backup.
This morning I received an email alert:
Shortly after, within ten minutes, I received a follow-up email alert:
Checking my zpool status shows me that the other pools were scrubbed with zero errors, but the primary pool resilvered around 8 GB of data:
I've had this configuration (with some software updates here and there) for about four years, but this is the first time a scrub has ever resilvered anything, so I decided to run a SMART (short) test on each drive in the primary mirror.
ada0:
ada1:
So, each of the primary drives has a Reallocated_Sector_Ct above zero (18 for ada0 and 20 for ada1). I originally tested both drives before install with badblocks and then smartctl long, and those counts were zero. The Reallocated_Sector_Ct for the backup WD Red is 0.
In the past four years, I suppose, the counts have increased. Given the recent (and first) resilver and these error counts, I'm thinking about ordering a couple more WD Reds and building a new pool. Is this an overreaction? Are there any other diagnostics I should run (e.g., smartctl long)?
Thank you.
Currently running FreeNAS-11.3-U1. I have a primary pool consisting of two 8 TB WD Red drives in a ZFS mirror. Weekly scrubs on the primary pool. Primary pool is replicated on a third (single) WD Red drive as a backup.
This morning I received an email alert:
New alerts:
* Pool primary state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state..
Current alerts:
* Scrub of pool 'backup' finished.
* Scrub of pool 'primary' finished.
* Scrub of pool 'primary' started.
* Scrub of pool 'backup' started.
* A system update is available. Go to System -> Update to download and apply the update.
* Replication "primary -> root@localhost:backup/" succeeded.
* Pool primary state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state..
Shortly after, within ten minutes, I received a follow-up email alert:
The following alert has been cleared:
* Pool primary state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state..
Current alerts:
* Scrub of pool 'backup' finished.
* Scrub of pool 'primary' finished.
* Scrub of pool 'primary' started.
* Scrub of pool 'backup' started.
* A system update is available. Go to System -> Update to download and apply the update.
* Replication "primary -> root@localhost:backup/" succeeded.
Checking my zpool status shows me that the other pools were scrubbed with zero errors, but the primary pool resilvered around 8 GB of data:
Code:
pool: primary state: ONLINE status: Some supported features are not enabled on the pool. The pool can still be used, but some features are unavailable. action: Enable all features using 'zpool upgrade'. Once this is done, the pool may no longer be accessible by software that does not support the features. See zpool-features(7) for details. scan: resilvered 8.10G in 0 days 00:05:28 with 0 errors on Sun Aug 2 06:11:23 2020 config: NAME STATE READ WRITE CKSUM primary ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/6248a972-5454-11e7-9cea-002590f01e26 ONLINE 0 0 0 gptid/62dac90d-5454-11e7-9cea-002590f01e26 ONLINE 0 0 0 errors: No known data errors
I've had this configuration (with some software updates here and there) for about four years, but this is the first time a scrub has ever resilvered anything, so I decided to run a SMART (short) test on each drive in the primary mirror.
ada0:
Code:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 112 3 Spin_Up_Time 0x0007 146 146 024 Pre-fail Always - 451 (Average 453) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 55 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 18 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18 9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 27450 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 55 22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1191 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 1191 194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Min/Max 20/39) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 18 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
ada1:
Code:
=== START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 131 131 054 Pre-fail Offline - 116 3 Spin_Up_Time 0x0007 147 147 024 Pre-fail Always - 446 (Average 448) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 58 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 20 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 128 128 020 Pre-fail Offline - 18 9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 27450 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 58 22 Helium_Level 0x0023 100 100 025 Pre-fail Always - 100 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 1188 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 1188 194 Temperature_Celsius 0x0002 176 176 000 Old_age Always - 34 (Min/Max 20/39) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 20 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
So, each of the primary drives has a Reallocated_Sector_Ct above zero (18 for ada0 and 20 for ada1). I originally tested both drives before install with badblocks and then smartctl long, and those counts were zero. The Reallocated_Sector_Ct for the backup WD Red is 0.
In the past four years, I suppose, the counts have increased. Given the recent (and first) resilver and these error counts, I'm thinking about ordering a couple more WD Reds and building a new pool. Is this an overreaction? Are there any other diagnostics I should run (e.g., smartctl long)?
Thank you.