Error Replacing Disk

Thibaut · Aug 13, 2021

Hello, this is somewhat related to this other thread (89487), but I think it deserves a separate discussion since it pinpoints an error occurring in the TrueNAS interface.

I'm using TrueNAS-12.0-U4, but the problem has occurred on earlier v12 versions also, I can't remember whether it happened under v11 as well but I think it did.

The situation:

A drive fails in a raidz pool and the system outputs an alert letting know that it has to be replaced.
The failed hard disk is referred as UNAVAILABLE in the TrueNAS interface:

The failed hard disk is put offline using the menu and confirmed in the dialog that pops up:

The failed hard disk is then taken out of the server, and physically replaced with a new hard disk that then appears in the Storage > Disks list:

Finally, the reference to the failed disk is replaced in the pool's disks list:

The replacement process starts:

The problem:

After a moment, an error is displayed:

The full content of the error report is:

Code:

Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 277, in replace
    target.replace(newvdev)
  File "libzfs.pyx", line 391, in libzfs.ZFS.__exit__
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 277, in replace
    target.replace(newvdev)
  File "libzfs.pyx", line 2060, in libzfs.ZFSVdev.replace
libzfs.ZFSException: no such pool or dataset

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 977, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 279, in replace
    raise CallError(str(e), e.code)
middlewared.service_exception.CallError: [EZFS_NOENT] no such pool or dataset
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 367, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 403, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 973, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/replace_disk.py", line 122, in replace
    raise e
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool_/replace_disk.py", line 102, in replace
    await self.middleware.call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1241, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1206, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1212, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1139, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1113, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
middlewared.service_exception.CallError: [EZFS_NOENT] no such pool or dataset

Although at this point it could seem that the whole disk replacement process has failed, the resilvering effectively gets started, as can be seen from the CLI with the zpool status mypool command:

At that point, the TrueNAS web interface does NOT report the silvering being in progress.
After a moment, reloading the TrueNAS web page shows that a resilvering is going on:

Reproducibility:

From what I experimented so far, this error doesn't occur each time a disk is replaced. Out of three disks replaced so far under v12, two have shown this problem but one went without a glitch.

I currently have no clue about what is causing the problem and why it happens, or whether it is some problem in the TrueNAS code?
Any idea regarding this would be welcome.

Thank you.

Important Announcement for the TrueNAS Community.

Error Replacing Disk

Thibaut

Dabbler

Similar threads