Multiple new Ironwolf 10TB disks reporting occasional faults in FreeNAS, otherwise appear fine

gregnostic · Dec 10, 2017

I'm running FreeNAS 9.10.2. I've got a SuperMicro 846BE16 chassis with a Xeon E5-2630v4 in an X10DRi motherboard. An LSI SAS 9211-4i in IT mode connects the chassis' SAS2 backplane to the HBA.

I was previously running 24 3TB Toshiba disks in four RAID-Z2 vdevs of six disks each. Free space was running low--but was still well under the recommended 80% usage threshold--so I decided to upgrade. I swapped in six new 10TB Seagate Ironwolf NAS (non-Pro) disks to grow one of the vdevs, using the process described in the FreeNAS manual (take old disk offline, swap disks, replace disk, let zpool resilver). It may be worth mentioning that the disks are all in 512B sector mode (because that's what they replaced).

Prior to this swap, I hadn't experienced any issues with disk faults. Less than a day after the final disk resilvered, FreeNAS reported that one of the new Ironwolf disks had encountered several read and write faults (a few dozen each) and kicked the drive out of the pool.

I took the disk, connected it via USB dock to my desktop, and ran some tests using SeaTools. SeaTools reported the drive as being fine and didn't appear to find any bad sectors. So I re-inserted the disk, the pool resilvered, and things were normal for a little while. Since this occurred, I ran long SMART tests on the disks and there no errors.

But despite the fact that the drives are reporting healthy, about once a day since, a disk will fault. Removing the disk from the machine and reinserting it brings it and the pool back to normal. Until the next incident. So far, five of the six disks have reported these faults, but two of those only one time and early on. Three of the disks have faulted repeatedly.

None of the other 18 disks in the zpool are experiencing any problems. Thus I don't expect this is a hardware issue for the backplane, SAS cable, HBA, or motherboard/CPU/memory. If those were having problems, presumably the other disks would be experiencing faults as well.

Given that there's no outward indication that anything is wrong with the disks themselves, I'm at a bit of a loss as to what to do from here. Things work perfectly until faults are reported with the disks (and all the faults seem to happen at once, they don't collect over time) and the drive is kicked offline.

Does anyone have any suggestions for what I might do to either: a) prove the disks are actually faulty; or b) prevent the fault incidents if the disks are fine?

Chris Moore · Dec 10, 2017

There has actually been some discussion about this:
https://forums.freenas.org/index.ph...h-seagate-10tb-enterprise-st10000nm0016.58251

gregnostic · Dec 10, 2017

Interesting. Good, in a way, to know that I'm not the only one having similar issues. Not so great that there isn't a known resolution other than "just switch to different disks."

Thanks for identifying the connection to the other thread.

Chris Moore · Dec 11, 2017

You might want to contact Seagate about it. The more information they can get, the more users they find are affected, the more likely they are to develop a solution.

gregnostic · Dec 11, 2017

Would that I could. Unfortunately I've now had two incidents where two disks were dropped from the vdev at the same time, leaving me with no redundancy in the vdev. That's a rather more precarious situation than what I'm willing to keep running, considering that three of the disks are going offline regularly.

If there was something more concrete than "hope that Seagate can find and deploy a solution in the immediate future" I might be able to justify keeping it online and debugging the issue. But as it is, I have to consider my available options so I can bring my storage back online in a way that I can trust. (And right now, I definitely don't trust it.)

Chris Moore · Dec 11, 2017

I know that it is not an easy thing, but I think your only solution would be to obtain another set of 10TB disks. I don't think it is possible to go from the 10TB disks you have back to the ones you were using before and I am sure you know that if you lose 1 vdev you lose the pool.
Sorry I can't offer any less painful solution.

iXsystems is using WD drives in the systems they sell.

I have also heard good things about the HGST and they are less expensive than the WD drives:
https://www.newegg.com/Product/Product.aspx?Item=N82E16822146148

No point second guessing it now, but why did you jump all the way to 10TB? Why not 6 or 8TB?

Chris Moore · Dec 11, 2017

gregnostic said:
But as it is, I have to consider my available options so I can bring my storage back online in a way that I can trust.

I did as much searching as I can tonight and I can't find any way of replacing a drive with a smaller drive. It appears to be specifically prohibited by the ZFS design. You can expand, but not contract a vdev. Sorry

gregnostic · Dec 11, 2017

Yeah, I suspect you're right about needing new disks.

As for why 10TB, it came down to the odd math of upgrading disks rather than adding new disks. Because my chassis is full of disks (24 in total), I couldn't just add another vdev; I could only grow an existing one. That means growing the vdev from 3TB disks to 10TB disks gives me roughly 17TiB of additional usable space ((10TB - 3TB) * 0.909TiB/TB * 4/6) whereas upgrading to 6TB disks would only give me 7.2TiB of additional usable space. Given the prices of the available disks, the price per additional usable GiB was roughly equal. And so with all things being (roughly) equal, I chose the option that would have eventually given me the largest overall capacity once I upgraded all the vdevs.

Now, of course, I wish I had chosen a different model of disk or at least a different capacity. But the reasoning at the time seemed sound...

gregnostic · Dec 11, 2017

Chris Moore said:
I did as much searching as I can tonight and I can't find any way of replacing a drive with a smaller drive. It appears to be specifically prohibited by the ZFS design. You can expand, but not contract a vdev. Sorry

Yeah, that's what my research has come up with as well. I honestly don't expect another answer, but I figured I'd ask just in case someone out there happened to know whether it might be possible in the case of very little data being added after the vdev was grown.

This is really just a case of wishful thinking since it could've (potentially) saved me a good deal of money--a large restocking fee at a minimum, or perhaps the full cost of new disks, depending on what direction I plan to head in.

I appreciate your input, even if it wasn't the magical answer I wanted to hear. ;)

Important Announcement for the TrueNAS Community.

Multiple new Ironwolf 10TB disks reporting occasional faults in FreeNAS, otherwise appear fine

gregnostic

Dabbler

Chris Moore

Hall of Famer

gregnostic

Dabbler

Chris Moore

Hall of Famer

gregnostic

Dabbler

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

gregnostic

Dabbler

gregnostic

Dabbler

Similar threads

Important Announcement for the TrueNAS Community.

Multiple new Ironwolf 10TB disks reporting occasional faults in FreeNAS, otherwise appear fine

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Dabbler

Hall of Famer

Hall of Famer

Dabbler

Dabbler

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Multiple new Ironwolf 10TB disks reporting occasional faults in FreeNAS, otherwise appear fine"

Similar threads