Drives keep detaching for seconds...

mka · Jun 26, 2018

Hi, a few month ago I first noticed a problem with my freenas 11.1 setup.

FreeNAS 11.1-U5 -> Intel S1200V3RPS - 32GB ECC Ram - Intel Pentium G3220

raid-z2: 3x WD Red pro 10TB, 2x WD Red 10TB, 1x WD Red 6TB
mirror: 2x Crucial MX500 256GB

While the system was under I/O load, one could hear a sound like one hard drive would power down ...and then come back instantly. This has become an increasing problem with my system during that last months. It is always the same... the moment it happens I get two emails:

Code:

The volume tank0 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.

Code:

The volume tank0 state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.

When I check the pool status it says some like (resilvering in one minute):

Code:

  pool: tank0
 state: ONLINE
  scan: resilvered 208K in 0 days 00:00:01 with 0 errors on Fri Jun 22 08:10:46 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		tank0										   ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			ada2p2									  ONLINE	   0	 0	 0
			gptid/4f220877-2798-11e5-a823-001e6758f3be  ONLINE	   0	 0	 0
			gptid/cd53c072-6ec4-11e8-a70f-001e6758f3be  ONLINE	   0	 0	 0
			gptid/fa6b1cfd-63a4-11e7-a4ab-001e6758f3be  ONLINE	   0	 0	 0
			ada6p2									  ONLINE	   0	 0	 0
			gptid/e4cc4248-eaf6-11e7-98bf-001e6758f3be  ONLINE	   0	 0	 0
		logs
		  gptid/91f29e9c-a8ba-11e4-9b66-001e6758f3be	ONLINE	   0	 0	 0

errors: No known data errors

And interestingly ada2p2 and ada6p2 are not using gptid. This also used to be different a few months back...

The message log looks like this (~% sudo tail -F /var/log/messages)

Code:

Jun 15 08:33:56 nas ada5 at ahcich11 bus 0 scbus11 target 0 lun 0
Jun 15 08:33:56 nas ada5: <WDC WD101KFBX-68R56N0 83.H0A03> s/n 7JH1GZ8C detached
Jun 15 08:33:56 nas GEOM_MIRROR: Device swap1: provider ada5p1 disconnected.
Jun 15 08:33:56 nas (ada5:ahcich11:0:0:0): Periph destroyed
Jun 15 08:33:57 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=12388419720507868321
Jun 15 08:34:07 nas ada5 at ahcich11 bus 0 scbus11 target 0 lun 0
Jun 15 08:34:07 nas ada5: <WDC WD101KFBX-68R56N0 83.H0A03> ACS-2 ATA SATA 3.x device
Jun 15 08:34:07 nas ada5: Serial Number 7JH1GZ8C
Jun 15 08:34:07 nas ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Jun 15 08:34:07 nas ada5: Command Queueing enabled
Jun 15 08:34:07 nas ada5: 9537536MB (19532873728 512 byte sectors)
Jun 15 08:34:08 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=114792066075657344
Jun 15 08:34:08 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=15860307608178835282
Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=11885823204073833750
Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=12388419720507868321
Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=9980259974459671096
Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=3171397022653890544

It's always a different disk (ada5 is an example here). And I seems I'm loosing swap paritions... everytime this happens to a drive, the next day I get a warning mail like this:

Code:

Checking status of gmirror(8) devices:

		Name	Status  Components

mirror/swap0  DEGRADED  ada7p1 (ACTIVE)

mirror/swap1  COMPLETE  ada5p1 (ACTIVE)

						ada4p1 (ACTIVE)

mirror/swap2  COMPLETE  ada3p1 (ACTIVE)

						ada2p1 (ACTIVE)


-- End of daily output --

So swap0 seems to be missing it's mirror. But over time this escalated to this...

Code:

Checking status of gmirror(8) devices:

		Name	Status  Components

mirror/swap0  DEGRADED  ada7p1 (ACTIVE)

mirror/swap2  COMPLETE  ada3p1 (ACTIVE)

						ada2p1 (ACTIVE)

I'm now missing swap1 altogether and swap0 is degraded.

I had been doing the FreeNAS 11 updates as they arrive. And I've been upgrading from Western Digital RED 6TB to Western Digital RED (Pro) 10TB. Every 3-6months I get a new harddrive and replace the oldest drive in my pool. I first thought the PSU can't handle the new drives, but the new one are actually specified with lower power consumption. Then I checked the power cords and made sure every hard drive had a direct connection to the PSU. But the problem remained...

Does anyone has an idea how to proceed? Why keep my drives detaching and why am I also losing swap partitions (although the pool itself is healthy)

Thank you

Chris Moore · Jun 26, 2018

mka said:
Hi, a few month ago I first noticed a problem with my freenas 11.1 setup.

FreeNAS 11.1-U5 -> Intel S1200V3RPS - 32GB ECC Ram - Intel Pentium G3220

raid-z2: 3x WD Red pro 10TB, 2x WD Red 10TB, 1x WD Red 6TB

mirror: 2x Crucial MX500 256GB

What drive controller are you using? It looks like that board has six SATA ports but this is 8 drives, unless I missed it, you didn't say how they are connected.

mka said:
I had been doing the FreeNAS 11 updates as they arrive. And I've been upgrading from western digital RED 6TB to Western Digital RED (Pro) 10TB. Every 3-6months I get a new harddrive and replace the oldest drive in my pool. I first thought the PSU can't handle the new drives, but the new one are actually specified with lower power consumption. Then I checked the power cords and made sure every hard drive had a direct connection to the PSU. But the problem remained...

How long has the power supply been in service, and what kind is it? Even if it has been working fine for the last ten years, it could be getting tired.

mka said:
Does anyone has an idea how to proceed? Why keep my drives detaching and why am I also losing swap partitions (although the pool itself is healthy)

The problems you are describing indicate that the drive is dropping communication with the system for some time and then, when communication is restored, the system is automatically reslivering the pool, but swap does not automatically recover.
Are you doing any power saving settings to try and make the system sleep?

pro lamer · Jun 26, 2018

Is it possible the disk heads park due to the short (by default) idle timeout... https://forums.freenas.org/index.php?posts/98858

Chris Moore · Jun 26, 2018

pro lamer said:
Is it possible the disk heads park due to the short (by default) idle timeout... https://forums.freenas.org/index.php?posts/98858

He is using NAS drives (Red and Red Pro) it should not have that problem in the disk hardware, but it is possible to have reconfigured FreeNAS.

pro lamer · Jun 26, 2018

Chris Moore said:
Red and Red Pro) it should not have that problem

I haven't checked if mine are setup to 8 seconds by default yet (and cannot promise to do it this month) but the thread I referred to, mentioned "WD Greens (and Reds)" literally.

Chris Moore · Jun 26, 2018

pro lamer said:
I haven't checked if mine are setup to 8 seconds by default yet (and cannot promise to do it this month) but the thread I referred to, mentioned "WD Greens (and Reds)" literally.

First, the article was written in 2014 and some things have changed. You can still find WD Green drives out in the world, but he product line has been merged with the WD Blue drives and the same 'parking problem' exists with the Blue drives.
Second, the author is talking about the difference between the WD Green and the WD Red, not saying that they both suffer from the same malady. It is the Green / Blue drives that park their heads automatically, not the Red drives. Red and Red Pro drives are designed to run 24/7 in a NAS environment.

Chris Moore · Jun 26, 2018

pro lamer said:
I haven't checked if mine are setup to 8 seconds by default yet (and cannot promise to do it this month) but the thread I referred to, mentioned "WD Greens (and Reds)" literally.

PS. I have 80 of the WD Red drives and 60 of the WD Red Pro drives in servers at work. They don't park their heads or spin down.

cyberjock said:
If you've been an avid reader of the forums it appears that WD has potentially dropped the ball on WD Reds recently. A few users have reported that their WD Reds have an aggressive head parking setting and it was corrected by using the WDIDLE3 tool. Of course, WD doesn't seem to be acknowledging or denying anything(no surprises there!) but it is cause for concern.

I think the time for this concern is historical, not current.

mka · Jun 27, 2018

Thanks for the reply.

The PSU is a Enermax Triathlor 450W about 4 years old. The consumption is typically around 60W. It is specified with 20A on 5V&12V. But the problem doesn't seem to be correlated with spin up times. The drive standby is configured as 60min but they usually don't spin down (only tested by hearing).

I only checked total power consumption. But maybe the new drives feature a different peak power characteristic.

mka · Jun 27, 2018

The tank0 with the HDDs is connected to the Intel onboard SATA controller.

The tank1 with ssds uses a small Digitus DS 30104 PCIe sata controller.

Chris Moore · Jun 27, 2018

mka said:
The drive standby is configured as 60min but they usually don't spin down (only tested by hearing).

So you have attempted to create a power management scheme for the drives so that they will go to 'standby'. This could be a problem of your own making. When FreeNAS wants data, it wants it immediately and if the drive has gone to sleep, and doesn't respond quickly enough, FreeNAS will see that as a device fault.

mka · Jul 3, 2018

Ok. I'll test that. I've used the same schema for more than 6 years without any problems and configured using the FreeNAS UI. But my problems don't appear in classical standby situations… But rather high load situations. For example I'm working on the NAS for a couple of minutes and the drives are not in standby anymore. I tar a big directory with 10000 files and one drive starts detaching.

I've just ordered a new PSU to test that area. Since the problem increased with every new drive I added. The new PSU is more powerful on 12V and should arrive within the next couple of hours.

mka · Jul 3, 2018

Disabling the standby measures didn't effect the detaching issue I had.

kdragon75 · Jul 3, 2018

Try a different PSU. Enermax are not exactly high end and if it has a bad (they all go bad) electrolytic capacitor you could lose efficiency and/or introduce riple into the feed. You could hook up an oscilloscope and monitor your 12v and 5v rails but its probably cheaper and easier to just swap the PSU

mka · Jul 3, 2018

I already have. In about 20minutes it is installed and I will test any correlation with the issues I observed.

mka · Jul 3, 2018

It's a bit too early to say, but it looks like the issue has disappeared with the new PSU. Earlier I was able to force it quite easily… but not anymore. I've been testing for 2 hours and everything seems to work as expected.

Fingers crossed!

mka · Jul 7, 2018

The problems have officially disappeared ... thank you all.

toadman · Jul 8, 2018

Glad you fixed it!

An observation and a curious question. I noticed in your setup you said:

raid-z2: 3x WD Red pro 10TB, 2x WD Red 10TB, 1x WD Red 6TB

So that's effectively 4 x 6TB = 24TB of space in the pool. Correct? (2 disks of parity, but the pool only uses 6 TB of space per disk due to the smallest disk being 6TB)

If so, I'm curious why you didn't go with 5 disks of 10TB? That would give you 3 x 10 = 30TB of space in that pool with less disks. (Though as it stands I suppose you can quickly go from 24TB to 40TB by just replacing that 6TB when the need arises.)

Chris Moore · Jul 9, 2018

toadman said:
(Though as it stands I suppose you can quickly go from 24TB to 40TB by just replacing that 6TB when the need arises.)

Because of the statement about the PSU being 4 years old:

mka said:
The PSU is a Enermax Triathlor 450W about 4 years old.

I am guessing that the OP originally had 6TB drives in the system when it was built and has been upgrading to 10TB drives. Once all the drives are upgraded, he will be able to access all the space. Also, there is this:

mka said:
I've used the same schema for more than 6 years without any problems

Purely a guess on my part, but the OP may have been using FreeNAS long enough that they have rotated through 2 (or more) sets of drives. I am on my third full set of drives, not counting the ones I have bought as replacements. I started with 1TB drives, moved to 2TB drives and last year I ordered a full box of 25 drives, so I could upgrade the drives in each of my system from 2TB to 4TB. Here is a photo I took before I unwrapped some of them:

mka · Jul 10, 2018

correct. The original zpool was a RaidZ2 with 6x 2TB Green drives. It went to 4TB RED Drives... and so on. Since many years I replace one hard drive every 3-6 months.

I will buy the last 10TB drive soon.

Important Announcement for the TrueNAS Community.

Drives keep detaching for seconds...

Contributor

Hall of Famer

Guru

Hall of Famer

Guru

Hall of Famer

Hall of Famer

Contributor

Contributor

Hall of Famer

Contributor

Contributor

Wizard

Contributor

Contributor

Contributor

Guru

Hall of Famer

Contributor

Similar threads