Weird drive dropout problem

Status
Not open for further replies.

tomf84

Dabbler
Joined
Apr 4, 2013
Messages
10
Strange one this.

Previously known good solid working box.

4x 1TB SATA drives (actually one is a 2TB since long term plan is to go to 4x2TB and that's the first step) in RAIDZ1
Gigabyte Atom board with four SATA ports, Gigabit ethernet
4GB RAM, boots off USB stick
This case which has a simple backplane that just passes the SATA connection straight through

Symptoms:
Starting a few weeks ago, when copying large files off the NAS to my desktop via a CIFS share, it would progress as normal for about a minute, then almost invariably the transfer speed would drop to zero and stay there. One drive, always on the same "slot" in the array, has its drive activity light solidly on (and stays so until reboot). All other drives have their light off. When the freezing happens, the web UI isn't responsive. A couple of minutes later, the OS seems to give up on the stuck drive and the transfer continues, and the UI comes back to life. "zfs status" shows errors after this. The kernel log (nearly?) always tells me it has lost track of the drive, and the /dev node for the drive disappears until reboot.

I have
- moved the drives around - in all but one time I have seen this issue now (out of perhaps 20 or 30 times while troubleshooting), the problem does NOT follow the drive.
- Swapped out the SATA cables
- Swapped out the power supply
- Tried with the onboard SATA ports and also an Adaptec PCI card - so it's not the port itself - which is very weird.
- Memtest86 passes cleanly
- checked temperatures - all about 35C
- checked SMART data - nothing nasty.
- tried the drive in the "sticky" SATA port on its own power cable and SATA connection separate from the case's own backplane

No difference.

Here's the weird bit.
I knocked up a quick tool to do heavy random reads on all four drives under Ubuntu, and the sticky drive thing isn't reproducible under that OS.
Is it worth installing the ZFS bits into Ubuntu and importing the array to see if the problem replicates under a different OS?

I have a backup.

Thoughts really appreciated.
 

tomf84

Dabbler
Joined
Apr 4, 2013
Messages
10
As a result of this issue I have abandoned FreeNAS and moved to a vanilla Ubuntu install with ZFS-on-Linux. It works flawlessly, with better performance to boot.

I speculate that something was introduced at some point in FreeBSD that doesn't agree with my hardware.

For historical reference, this is with a Gigabyte GA-D510UD mini-ITX Atom board.
 
Status
Not open for further replies.
Top