SOLVED Detached drive causes system log to fill and system to become unresponsive

NASbox · Dec 9, 2019

I upgraded from 11.1-U7 to 11.2-U7 and my log started to fill with these messages:

Dec 7 14:15:44 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534

and the system became unresponsive and had to be power cycled.

How do identify what pool/drive pool_guid=1487853121776782679 vdev_guid=12501387109407037534 relate to?
(converting the numbers (which appears to be decimal) to hex didn't yield anything obvious.)

What state changed?

When I reverted to 11.1-U7, the problem disappeared. Any suggestions?

G8One2 · Dec 9, 2019

I get those when a scrub is initiated on the pool. They arent really errors rather, informational. The system becoming unresponsive is likely, a different issue.

NASbox · Dec 9, 2019

G8One2 said:
I get those when a scrub is initiated on the pool. They arent really errors rather, informational. The system becoming unresponsive is likely, a different issue.

In my case I am getting several hundred of them -- the system becomes totally unresponsive.

dlavigne · Dec 10, 2019

Something else is going on. What's the hardware specs? Any issues with SMART tests? Or zpool status?

NASbox · Dec 10, 2019

dlavigne said:
Something else is going on. What's the hardware specs? Any issues with SMART tests? Or zpool status?

Hardware specs - i7-2600 / 32GB DDR3 RAM - Idle system.

Thanks Dru, I thought I checked, but I only checked zpool status on 11.1-U7. To answer your question specifically, I took a chance and rebooted on 11.2-U7 and repeated the zpool status:

Code:

#>zpool status
  pool: BACKUP02
state: UNAVAIL
status: One or more devices could not be opened.  There are insufficient
        replicas for the pool to continue functioning.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-3C
  scan: none requested
config:

        NAME                    STATE     READ WRITE CKSUM
        BACKUP02                UNAVAIL      0     0     0
          12501387109407037534  UNAVAIL      0     0     0  was /dev/gptid/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Turns out that I left a backup drive in that wasn't removed properly. This pool consists of a single disk vdev, and it is only mounted/accessed by scripts from the command line. As soon as I put the disk back in, the messages stopped. I properly remounted and immediately exported BACKUP02, and the messages have stopped and AFAIK the system is now stable/functioning correctly.

I am considering this matter resolved and attributing it to sloppy procedures.

My situation was not the normal use case, but is very similar to what would happen if a single drive pool failed on boot-something that might happen in actual use

@dlavigne for the benefit of the community, I decied to simulate this situation:

Using 11.2-U7, connected BACKUP02
Powered off the system using the GUI
Disconnected the disk (hot swap mount)
Powered the system back on
Tailed the messages on /var/log/messages
Reconnected the disk - The messages stopped (see below)

Code:

Dec 11 00:56:30 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534
Dec 11 00:56:30 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534
Dec 11 00:56:30 freenas ada3 at ahcich5 bus 0 scbus6 target 0 lun 0
Dec 11 00:56:30 freenas ada3: <WDC WD100EFAX-68LHPN0 83.H0A83> ACS-2 ATA SATA 3.x device
Dec 11 00:56:30 freenas ada3: Serial Number XXXXXXXX
Dec 11 00:56:30 freenas ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
Dec 11 00:56:30 freenas ada3: Command Queueing enabled
Dec 11 00:56:30 freenas ada3: 9537536MB (19532873728 512 byte sectors)
Dec 11 00:56:31 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534
Dec 11 00:56:31 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534
Dec 11 00:56:32 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534
Dec 11 00:56:32 freenas ZFS: vdev state changed, pool_guid=1487853121776782679 vdev_guid=12501387109407037534

@dlavigne I don't know if you want to refer this to the development team, but I would think that the frequency of this error message (65 messages/second on my system) should be reduced and something done so that this condition doesn't cause the system to become unresponsive (based on log messages I think it took about a half hour for that to happen).

sretalla · Dec 11, 2019

NASbox said:
@dlavigne I don't know if you want to refer this to the development team, but I would think that the frequency of this error message (65 messages/second on my system) should be reduced and something done so that this condition doesn't cause the system to become unresponsive (based on log messages I think it took about a half hour for that to happen).

It can't be a problem in FreeNAS since you imported/created/mounted that pool at CLI (not supported). More likely something that might be corrected in FreeBSD or ZFS.

If a pool isn't able to mount at boot, it shouldn't show up in zpool status. Pools managed with the GUI behave like that, so your scenario shouldn't happen during supported use of FreeNAS.

NASbox · Dec 11, 2019

sretalla said:
It can't be a problem in FreeNAS since you imported/created/mounted that pool at CLI (not supported). More likely something that might be corrected in FreeBSD or ZFS.

If a pool isn't able to mount at boot, it shouldn't show up in zpool status. Pools managed with the GUI behave like that, so your scenario shouldn't happen during supported use of FreeNAS.

I just put it out there incase anyone cares. The pool was created using the GUI, but subsequeintly I took it over using the command line.

If someone has a test system they aren't worried about, it would be a good idea to repeat this test with the gui. Create single disk pool, power off, remove SATA cable and reboot. See if it causes the system to crash. If it is a FreeBSD/ZFS issue, then someone with some cred can file a bug report as appropriate.

Important Announcement for the TrueNAS Community.

SOLVED Detached drive causes system log to fill and system to become unresponsive

NASbox

Guru

G8One2

Patron

NASbox

Guru

dlavigne

Guest

NASbox

Guru

sretalla

Powered by Neutrality

NASbox

Guru

Similar threads

Important Announcement for the TrueNAS Community.

SOLVED Detached drive causes system log to fill and system to become unresponsive

NASbox

Guru

G8One2

Patron

NASbox

Guru

dlavigne

Guest

NASbox

Guru

sretalla

Powered by Neutrality

NASbox

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Detached drive causes system log to fill and system to become unresponsive"

Similar threads