PhiloEpisteme
Guru
- Joined
- Oct 18, 2018
- Messages
- 969
Hi folks,
First, my build.
FreeNAS Release: FreeNAS-11.2-RELEASE-U2
Board: X11SSM-F i3-7100, 32GB ECC Ram, LGA 2151/Socket, IPMI, 2x GbE Intel i210-AT
HBA: LSI/Broadcom SAS9207-8i, Firmware 20.00.07.00
Storage Pool 1: 1 vdev 6 x 3 TB drives in RAID-z2
Storage Pool 2: 1 vdev 6 x 2 TB drives in RAID-z2
I got the following alert for one of my drives from my email report.
If I look at the smartctl output for the drive I see the output attached to this post.
The pieces that I _think_ are most salient are
When I first saw the errors I checked the Checksum under storage->pools->{pool}->status for that drive and saw that it was the only non-zero value listed. I ran another scrub and checking the value now shows 0s for all Checksums across all disks in that pool.
The latest scrub status reports as
When it first started happening I was receiving the email alert regularly. Now I have not received the alert via email or the UI in 5 days. I haven't pulled the drive yet as I was in the process of finishing the burn-in on a replacement drive. The pool lists as HEALTHY and the drive lists as online. The drive is in a 6-disk RAIDZ2 vdev.
So, whats the deal? I imagine I should replace the drive anyway rather than risk it. Seagate is willing to replace the drive with a used, refurbished drive. It has very few online hours though so it is a shame to replace it with a used drive. Why have the alerts gone away and the checksum returned to 0? I did upgrade to U2 within the last few days, in case that is relevant.
First, my build.
FreeNAS Release: FreeNAS-11.2-RELEASE-U2
Board: X11SSM-F i3-7100, 32GB ECC Ram, LGA 2151/Socket, IPMI, 2x GbE Intel i210-AT
HBA: LSI/Broadcom SAS9207-8i, Firmware 20.00.07.00
Storage Pool 1: 1 vdev 6 x 3 TB drives in RAID-z2
Storage Pool 2: 1 vdev 6 x 2 TB drives in RAID-z2
I got the following alert for one of my drives from my email report.
Code:
New alerts: * Device: /dev/da3 [SAT], 1 Currently unreadable (pending) sectors * Device: /dev/da3 [SAT], 1 Offline uncorrectable sectors Alerts: * Device: /dev/da3 [SAT], 1 Currently unreadable (pending) sectors * Device: /dev/da3 [SAT], 1 Offline uncorrectable sectors
If I look at the smartctl output for the drive I see the output attached to this post.
The pieces that I _think_ are most salient are
Code:
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
Code:
Error 21 occurred at disk power-on lifetime: 848 hours (35 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 65 79 6f 01 Error: UNC at LBA = 0x016f7965 = 24082789 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- 60 00 d0 30 85 6f 41 00 1d+13:53:18.405 READ FPDMA QUEUED 60 00 00 30 84 6f 41 00 1d+13:53:18.405 READ FPDMA QUEUED 60 00 00 30 83 6f 41 00 1d+13:53:18.405 READ FPDMA QUEUED 60 00 00 30 82 6f 41 00 1d+13:53:18.405 READ FPDMA QUEUED 60 00 00 30 81 6f 41 00 1d+13:53:18.405 READ FPDMA QUEUED
When I first saw the errors I checked the Checksum under storage->pools->{pool}->status for that drive and saw that it was the only non-zero value listed. I ran another scrub and checking the value now shows 0s for all Checksums across all disks in that pool.
The latest scrub status reports as
Code:
SCRUB Status: FINISHED Errors: 0
When it first started happening I was receiving the email alert regularly. Now I have not received the alert via email or the UI in 5 days. I haven't pulled the drive yet as I was in the process of finishing the burn-in on a replacement drive. The pool lists as HEALTHY and the drive lists as online. The drive is in a 6-disk RAIDZ2 vdev.
So, whats the deal? I imagine I should replace the drive anyway rather than risk it. Seagate is willing to replace the drive with a used, refurbished drive. It has very few online hours though so it is a shame to replace it with a used drive. Why have the alerts gone away and the checksum returned to 0? I did upgrade to U2 within the last few days, in case that is relevant.