gregnostic
Dabbler
- Joined
- May 19, 2016
- Messages
- 17
I'm running FreeNAS 9.10.2. I've got a SuperMicro 846BE16 chassis with a Xeon E5-2630v4 in an X10DRi motherboard. An LSI SAS 9211-4i in IT mode connects the chassis' SAS2 backplane to the HBA.
I was previously running 24 3TB Toshiba disks in four RAID-Z2 vdevs of six disks each. Free space was running low--but was still well under the recommended 80% usage threshold--so I decided to upgrade. I swapped in six new 10TB Seagate Ironwolf NAS (non-Pro) disks to grow one of the vdevs, using the process described in the FreeNAS manual (take old disk offline, swap disks, replace disk, let zpool resilver). It may be worth mentioning that the disks are all in 512B sector mode (because that's what they replaced).
Prior to this swap, I hadn't experienced any issues with disk faults. Less than a day after the final disk resilvered, FreeNAS reported that one of the new Ironwolf disks had encountered several read and write faults (a few dozen each) and kicked the drive out of the pool.
I took the disk, connected it via USB dock to my desktop, and ran some tests using SeaTools. SeaTools reported the drive as being fine and didn't appear to find any bad sectors. So I re-inserted the disk, the pool resilvered, and things were normal for a little while. Since this occurred, I ran long SMART tests on the disks and there no errors.
But despite the fact that the drives are reporting healthy, about once a day since, a disk will fault. Removing the disk from the machine and reinserting it brings it and the pool back to normal. Until the next incident. So far, five of the six disks have reported these faults, but two of those only one time and early on. Three of the disks have faulted repeatedly.
None of the other 18 disks in the zpool are experiencing any problems. Thus I don't expect this is a hardware issue for the backplane, SAS cable, HBA, or motherboard/CPU/memory. If those were having problems, presumably the other disks would be experiencing faults as well.
Given that there's no outward indication that anything is wrong with the disks themselves, I'm at a bit of a loss as to what to do from here. Things work perfectly until faults are reported with the disks (and all the faults seem to happen at once, they don't collect over time) and the drive is kicked offline.
Does anyone have any suggestions for what I might do to either: a) prove the disks are actually faulty; or b) prevent the fault incidents if the disks are fine?
I was previously running 24 3TB Toshiba disks in four RAID-Z2 vdevs of six disks each. Free space was running low--but was still well under the recommended 80% usage threshold--so I decided to upgrade. I swapped in six new 10TB Seagate Ironwolf NAS (non-Pro) disks to grow one of the vdevs, using the process described in the FreeNAS manual (take old disk offline, swap disks, replace disk, let zpool resilver). It may be worth mentioning that the disks are all in 512B sector mode (because that's what they replaced).
Prior to this swap, I hadn't experienced any issues with disk faults. Less than a day after the final disk resilvered, FreeNAS reported that one of the new Ironwolf disks had encountered several read and write faults (a few dozen each) and kicked the drive out of the pool.
I took the disk, connected it via USB dock to my desktop, and ran some tests using SeaTools. SeaTools reported the drive as being fine and didn't appear to find any bad sectors. So I re-inserted the disk, the pool resilvered, and things were normal for a little while. Since this occurred, I ran long SMART tests on the disks and there no errors.
But despite the fact that the drives are reporting healthy, about once a day since, a disk will fault. Removing the disk from the machine and reinserting it brings it and the pool back to normal. Until the next incident. So far, five of the six disks have reported these faults, but two of those only one time and early on. Three of the disks have faulted repeatedly.
None of the other 18 disks in the zpool are experiencing any problems. Thus I don't expect this is a hardware issue for the backplane, SAS cable, HBA, or motherboard/CPU/memory. If those were having problems, presumably the other disks would be experiencing faults as well.
Given that there's no outward indication that anything is wrong with the disks themselves, I'm at a bit of a loss as to what to do from here. Things work perfectly until faults are reported with the disks (and all the faults seem to happen at once, they don't collect over time) and the drive is kicked offline.
Does anyone have any suggestions for what I might do to either: a) prove the disks are actually faulty; or b) prevent the fault incidents if the disks are fine?