Hi!
I've been using FreeNAS on a few different boxes for a few years, but running into an issue with my latest build. It's been running FreeNAS 24x7 for close to two or three years now.
OS Version:
FreeNAS-11.2-U5
(Build Date: Jun 24, 2019 18:41)
Processor & MB:
Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz (4 cores)
ASUS PRIME Z270-K LGA1151
Memory:
32 GiB ( 4 x 8GB XPG Gammix D10 DDR4 2666MHz (PC4 21300))
Sata Controller:
Ableconn PEX10-SAT 10 Port SATA 6G PCI Express Host Adapter Card - AHCI 6 Gbps SATA III
(the rest are connected directly to the MB)
I can hunt down the rest of the specs as needed but basically it has 10 x WD RED sata drives (regular not pro), and 5 x Iron Wolf Seagate drives, all 15 are 4TB.
I have short smart scans set to run weekly, and long scans to run once a month. The OS runs on 2 x 64GB mirrored thumbdrives (Corsair Flash Voyager Vega 64GB).
Lately, I've been getting a lot of email alerts about drives not capable of smart self check. The server will stay on for a few weeks with no issue, and then suddenly the alerts begin randomly (time doesn't match up with the monthly scan). I'll get the alert about one drive, then the next day about another, etc etc this goes on until there are 7 or 8 drives in the email. At that point the server can become unresponsive. I can't SSH etc. If I can, I'll check the pool status and either it's degraded and re-silvered, or one out of the 7 or 8 drives, has been removed. If I restart the server, it says the pool is online, all drives are online, and there are no issues.
I'll then go and manually kick off a full long test of the one that was removed (I noted the gptid, and found the drive info), and see that the drive has passed the long test. I am in no way an expert at reading the results, but my understanding is that as long as the value and worst value are above the threshold, I should be good. I've attached a sample of one of the WD drives and one of the Seagate drives. The most recent one that was removed, is the WD drive.
The WD one seems ok to me. The Seagate one also seems ok, but probably close to giving up on me. The drives are all 2-3 years old.
I guess my question is, why do I keep getting these alerts, and how do I fix whatever the issue is? Is it just genuinely that all the drives happen to be going bad so soon, all at similar times?
I've been using FreeNAS on a few different boxes for a few years, but running into an issue with my latest build. It's been running FreeNAS 24x7 for close to two or three years now.
OS Version:
FreeNAS-11.2-U5
(Build Date: Jun 24, 2019 18:41)
Processor & MB:
Intel(R) Core(TM) i5-7400 CPU @ 3.00GHz (4 cores)
ASUS PRIME Z270-K LGA1151
Memory:
32 GiB ( 4 x 8GB XPG Gammix D10 DDR4 2666MHz (PC4 21300))
Sata Controller:
Ableconn PEX10-SAT 10 Port SATA 6G PCI Express Host Adapter Card - AHCI 6 Gbps SATA III
(the rest are connected directly to the MB)
I can hunt down the rest of the specs as needed but basically it has 10 x WD RED sata drives (regular not pro), and 5 x Iron Wolf Seagate drives, all 15 are 4TB.
I have short smart scans set to run weekly, and long scans to run once a month. The OS runs on 2 x 64GB mirrored thumbdrives (Corsair Flash Voyager Vega 64GB).
Lately, I've been getting a lot of email alerts about drives not capable of smart self check. The server will stay on for a few weeks with no issue, and then suddenly the alerts begin randomly (time doesn't match up with the monthly scan). I'll get the alert about one drive, then the next day about another, etc etc this goes on until there are 7 or 8 drives in the email. At that point the server can become unresponsive. I can't SSH etc. If I can, I'll check the pool status and either it's degraded and re-silvered, or one out of the 7 or 8 drives, has been removed. If I restart the server, it says the pool is online, all drives are online, and there are no issues.
I'll then go and manually kick off a full long test of the one that was removed (I noted the gptid, and found the drive info), and see that the drive has passed the long test. I am in no way an expert at reading the results, but my understanding is that as long as the value and worst value are above the threshold, I should be good. I've attached a sample of one of the WD drives and one of the Seagate drives. The most recent one that was removed, is the WD drive.
The WD one seems ok to me. The Seagate one also seems ok, but probably close to giving up on me. The drives are all 2-3 years old.
I guess my question is, why do I keep getting these alerts, and how do I fix whatever the issue is? Is it just genuinely that all the drives happen to be going bad so soon, all at similar times?
Attachments
Last edited: