tomf84
Dabbler
- Joined
- Apr 4, 2013
- Messages
- 10
Strange one this.
Previously known good solid working box.
4x 1TB SATA drives (actually one is a 2TB since long term plan is to go to 4x2TB and that's the first step) in RAIDZ1
Gigabyte Atom board with four SATA ports, Gigabit ethernet
4GB RAM, boots off USB stick
This case which has a simple backplane that just passes the SATA connection straight through
Symptoms:
Starting a few weeks ago, when copying large files off the NAS to my desktop via a CIFS share, it would progress as normal for about a minute, then almost invariably the transfer speed would drop to zero and stay there. One drive, always on the same "slot" in the array, has its drive activity light solidly on (and stays so until reboot). All other drives have their light off. When the freezing happens, the web UI isn't responsive. A couple of minutes later, the OS seems to give up on the stuck drive and the transfer continues, and the UI comes back to life. "zfs status" shows errors after this. The kernel log (nearly?) always tells me it has lost track of the drive, and the /dev node for the drive disappears until reboot.
I have
- moved the drives around - in all but one time I have seen this issue now (out of perhaps 20 or 30 times while troubleshooting), the problem does NOT follow the drive.
- Swapped out the SATA cables
- Swapped out the power supply
- Tried with the onboard SATA ports and also an Adaptec PCI card - so it's not the port itself - which is very weird.
- Memtest86 passes cleanly
- checked temperatures - all about 35C
- checked SMART data - nothing nasty.
- tried the drive in the "sticky" SATA port on its own power cable and SATA connection separate from the case's own backplane
No difference.
Here's the weird bit.
I knocked up a quick tool to do heavy random reads on all four drives under Ubuntu, and the sticky drive thing isn't reproducible under that OS.
Is it worth installing the ZFS bits into Ubuntu and importing the array to see if the problem replicates under a different OS?
I have a backup.
Thoughts really appreciated.
Previously known good solid working box.
4x 1TB SATA drives (actually one is a 2TB since long term plan is to go to 4x2TB and that's the first step) in RAIDZ1
Gigabyte Atom board with four SATA ports, Gigabit ethernet
4GB RAM, boots off USB stick
This case which has a simple backplane that just passes the SATA connection straight through
Symptoms:
Starting a few weeks ago, when copying large files off the NAS to my desktop via a CIFS share, it would progress as normal for about a minute, then almost invariably the transfer speed would drop to zero and stay there. One drive, always on the same "slot" in the array, has its drive activity light solidly on (and stays so until reboot). All other drives have their light off. When the freezing happens, the web UI isn't responsive. A couple of minutes later, the OS seems to give up on the stuck drive and the transfer continues, and the UI comes back to life. "zfs status" shows errors after this. The kernel log (nearly?) always tells me it has lost track of the drive, and the /dev node for the drive disappears until reboot.
I have
- moved the drives around - in all but one time I have seen this issue now (out of perhaps 20 or 30 times while troubleshooting), the problem does NOT follow the drive.
- Swapped out the SATA cables
- Swapped out the power supply
- Tried with the onboard SATA ports and also an Adaptec PCI card - so it's not the port itself - which is very weird.
- Memtest86 passes cleanly
- checked temperatures - all about 35C
- checked SMART data - nothing nasty.
- tried the drive in the "sticky" SATA port on its own power cable and SATA connection separate from the case's own backplane
No difference.
Here's the weird bit.
I knocked up a quick tool to do heavy random reads on all four drives under Ubuntu, and the sticky drive thing isn't reproducible under that OS.
Is it worth installing the ZFS bits into Ubuntu and importing the array to see if the problem replicates under a different OS?
I have a backup.
Thoughts really appreciated.