mka
Contributor
- Joined
- Sep 26, 2013
- Messages
- 107
Hi, a few month ago I first noticed a problem with my freenas 11.1 setup.
FreeNAS 11.1-U5 -> Intel S1200V3RPS - 32GB ECC Ram - Intel Pentium G3220
While the system was under I/O load, one could hear a sound like one hard drive would power down ...and then come back instantly. This has become an increasing problem with my system during that last months. It is always the same... the moment it happens I get two emails:
When I check the pool status it says some like (resilvering in one minute):
And interestingly ada2p2 and ada6p2 are not using gptid. This also used to be different a few months back...
The message log looks like this (~% sudo tail -F /var/log/messages)
It's always a different disk (ada5 is an example here). And I seems I'm loosing swap paritions... everytime this happens to a drive, the next day I get a warning mail like this:
So swap0 seems to be missing it's mirror. But over time this escalated to this...
I'm now missing swap1 altogether and swap0 is degraded.
I had been doing the FreeNAS 11 updates as they arrive. And I've been upgrading from Western Digital RED 6TB to Western Digital RED (Pro) 10TB. Every 3-6months I get a new harddrive and replace the oldest drive in my pool. I first thought the PSU can't handle the new drives, but the new one are actually specified with lower power consumption. Then I checked the power cords and made sure every hard drive had a direct connection to the PSU. But the problem remained...
Does anyone has an idea how to proceed? Why keep my drives detaching and why am I also losing swap partitions (although the pool itself is healthy)
Thank you
FreeNAS 11.1-U5 -> Intel S1200V3RPS - 32GB ECC Ram - Intel Pentium G3220
- raid-z2: 3x WD Red pro 10TB, 2x WD Red 10TB, 1x WD Red 6TB
- mirror: 2x Crucial MX500 256GB
While the system was under I/O load, one could hear a sound like one hard drive would power down ...and then come back instantly. This has become an increasing problem with my system during that last months. It is always the same... the moment it happens I get two emails:
Code:
The volume tank0 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Code:
The volume tank0 state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
When I check the pool status it says some like (resilvering in one minute):
Code:
pool: tank0 state: ONLINE scan: resilvered 208K in 0 days 00:00:01 with 0 errors on Fri Jun 22 08:10:46 2018 config: NAME STATE READ WRITE CKSUM tank0 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ada2p2 ONLINE 0 0 0 gptid/4f220877-2798-11e5-a823-001e6758f3be ONLINE 0 0 0 gptid/cd53c072-6ec4-11e8-a70f-001e6758f3be ONLINE 0 0 0 gptid/fa6b1cfd-63a4-11e7-a4ab-001e6758f3be ONLINE 0 0 0 ada6p2 ONLINE 0 0 0 gptid/e4cc4248-eaf6-11e7-98bf-001e6758f3be ONLINE 0 0 0 logs gptid/91f29e9c-a8ba-11e4-9b66-001e6758f3be ONLINE 0 0 0 errors: No known data errors
And interestingly ada2p2 and ada6p2 are not using gptid. This also used to be different a few months back...
The message log looks like this (~% sudo tail -F /var/log/messages)
Code:
Jun 15 08:33:56 nas ada5 at ahcich11 bus 0 scbus11 target 0 lun 0 Jun 15 08:33:56 nas ada5: <WDC WD101KFBX-68R56N0 83.H0A03> s/n 7JH1GZ8C detached Jun 15 08:33:56 nas GEOM_MIRROR: Device swap1: provider ada5p1 disconnected. Jun 15 08:33:56 nas (ada5:ahcich11:0:0:0): Periph destroyed Jun 15 08:33:57 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=12388419720507868321 Jun 15 08:34:07 nas ada5 at ahcich11 bus 0 scbus11 target 0 lun 0 Jun 15 08:34:07 nas ada5: <WDC WD101KFBX-68R56N0 83.H0A03> ACS-2 ATA SATA 3.x device Jun 15 08:34:07 nas ada5: Serial Number 7JH1GZ8C Jun 15 08:34:07 nas ada5: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes) Jun 15 08:34:07 nas ada5: Command Queueing enabled Jun 15 08:34:07 nas ada5: 9537536MB (19532873728 512 byte sectors) Jun 15 08:34:08 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=114792066075657344 Jun 15 08:34:08 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=15860307608178835282 Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=11885823204073833750 Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=12388419720507868321 Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=9980259974459671096 Jun 15 08:34:09 nas ZFS: vdev state changed, pool_guid=1627826801526537155 vdev_guid=3171397022653890544
It's always a different disk (ada5 is an example here). And I seems I'm loosing swap paritions... everytime this happens to a drive, the next day I get a warning mail like this:
Code:
Checking status of gmirror(8) devices: Name Status Components mirror/swap0 DEGRADED ada7p1 (ACTIVE) mirror/swap1 COMPLETE ada5p1 (ACTIVE) ada4p1 (ACTIVE) mirror/swap2 COMPLETE ada3p1 (ACTIVE) ada2p1 (ACTIVE) -- End of daily output --
So swap0 seems to be missing it's mirror. But over time this escalated to this...
Code:
Checking status of gmirror(8) devices: Name Status Components mirror/swap0 DEGRADED ada7p1 (ACTIVE) mirror/swap2 COMPLETE ada3p1 (ACTIVE) ada2p1 (ACTIVE)
I'm now missing swap1 altogether and swap0 is degraded.
I had been doing the FreeNAS 11 updates as they arrive. And I've been upgrading from Western Digital RED 6TB to Western Digital RED (Pro) 10TB. Every 3-6months I get a new harddrive and replace the oldest drive in my pool. I first thought the PSU can't handle the new drives, but the new one are actually specified with lower power consumption. Then I checked the power cords and made sure every hard drive had a direct connection to the PSU. But the problem remained...
Does anyone has an idea how to proceed? Why keep my drives detaching and why am I also losing swap partitions (although the pool itself is healthy)
Thank you
Last edited by a moderator: