Can't import pool I/O error

jspenc

Cadet
Joined
Dec 11, 2021
Messages
7
Hey, Think this might be a bit of a long shot given what I've seen on other posts but keen to get a look in before I give up

I've added my specs to my signature, in terms of vdevs I had 2 with 3 drives in each. RAIDZ1

I've been having issues with my system, at this stage I think it was related to a faulty SAS cable but I can't 100% pin it on that at this stage; I have replaced the cable and things seem more solid, drives are no longer dropping off but as the pool no longer loads I can't confirm this. I had replaced a drive since I started experiencing issues, and was about to replace another before I hit the current issue.

Anyhow when I replaced one of the SAS cables (the other is internal in the system) I thought I would give plugging one of the SAS connectors into the motherboard, essentially try to work out if the HBA was to blame.

When the system booted after that my pool didn't even appear in the GUI, I didn't give too much thought to this, I shut the system down and moved the SAS connector back onto the HBA. The problem is now when I try to import the pool I get the error below. I should mention the system did previously use this onboard SAS connection when I only had 3 drives but this HBA has been in "production" for over a year now. Come to think of it moving to the HBA didn't cause any issue so maybe that isn't to fault.

Code:
root@freenas:/ # zpool import NAS
cannot import 'NAS': I/O error
        Destroy and re-create the pool from
        a backup source.


I've tried running this import command with various flags that other posts have suggested.


The output of zpool import is

Code:
root@freenas:/ # zpool import
   pool: NAS
     id: 13256022188567172076
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        NAS                                             FAULTED  corrupted data
          raidz1-0                                      ONLINE
            gptid/9fb2f2a2-ff14-11e5-88b4-70106f3e74fc  ONLINE
            gptid/20b6d4af-505c-11ec-9610-70106f3e74fc  ONLINE
            gptid/a1a5e7aa-ff14-11e5-88b4-70106f3e74fc  ONLINE
          raidz1-1                                      DEGRADED
            gptid/7ec79194-7cb5-11ea-93b0-70106f3e74fc  FAULTED  corrupted data
            gptid/e39a8a76-5280-11eb-a93b-70106f3e74fc  ONLINE
            gptid/e6819d66-513f-11ec-9467-70106f3e74fc  ONLINE


The faulted drive is the one I had scheduled for replacement. Given I'm in RAIDZ1 my assumption is I should be able to still get the data on the pool :confused:

After running this I still only get boot information in /var/log/messages likewise for dmesg so not anything else to go on here I'm afraid.

One thing that makes me think that moving the SAS connector caused issues is that when I run gabel status I get duplicates of each of my partitions? maybe this is okay and I'm just misunderstanding but it felt relevant.

Code:
root@freenas:/ # glabel status
                                      Name  Status  Components
gptid/7ec79194-7cb5-11ea-93b0-70106f3e74fc     N/A  da0p2
gptid/20b6d4af-505c-11ec-9610-70106f3e74fc     N/A  da1p2
gptid/9fb2f2a2-ff14-11e5-88b4-70106f3e74fc     N/A  da2p2
gptid/a1a5e7aa-ff14-11e5-88b4-70106f3e74fc     N/A  da3p2
gptid/e39a8a76-5280-11eb-a93b-70106f3e74fc     N/A  da4p2
gptid/e6819d66-513f-11ec-9467-70106f3e74fc     N/A  da5p2
gptid/daf9bb1e-d17a-11e8-92d6-70106f3e74fc     N/A  da6p1
gptid/db1a7481-d17a-11e8-92d6-70106f3e74fc     N/A  da6p2
gptid/e66b3688-513f-11ec-9467-70106f3e74fc     N/A  da5p1
gptid/e3846caf-5280-11eb-a93b-70106f3e74fc     N/A  da4p1
gptid/a198eb0a-ff14-11e5-88b4-70106f3e74fc     N/A  da3p1
gptid/9fa35dce-ff14-11e5-88b4-70106f3e74fc     N/A  da2p1
gptid/20128937-505c-11ec-9610-70106f3e74fc     N/A  da1p1
gptid/7d7eb525-7cb5-11ea-93b0-70106f3e74fc     N/A  da0p1



This is the output of camcontrol

Code:
root@freenas:/ # camcontrol devlist
<ATA ST8000DM004-2CX1 0001>        at scbus0 target 26 lun 0 (pass0,da0)
<ATA ST8000DM004-2CX1 0001>        at scbus0 target 28 lun 0 (pass1,da1)
<ATA WDC WD40EFRX-68W 0A82>        at scbus0 target 29 lun 0 (pass2,da2)
<ATA WDC WD40EFRX-68W 0A82>        at scbus0 target 31 lun 0 (pass3,da3)
<ATA WDC WD80EFAX-68K 0A81>        at scbus0 target 32 lun 0 (pass4,da4)
<ATA WDC WD80EDAZ-11T 0A81>        at scbus0 target 33 lun 0 (pass5,da5)
<AHCI SGPIO Enclosure 2.00 0001>   at scbus7 target 0 lun 0 (pass6,ses0)
<SanDisk Extreme 0001>             at scbus9 target 0 lun 0 (pass7,da6)



Hangs head in shame, I don't have a backup of the full pool, this is a home server and the expense is simply to high to do that. I do have a copy of my most import assets backed up to the cloud so not the end of the world but I would obviously still rather keep or get what data I can out if anyone has any suggestions.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Try removing da0. Then your pool will appear as just missing a member of the raidz1-1 stripe, and should be able to be imported. Then just run your usual replacement procedure.
 

jspenc

Cadet
Joined
Dec 11, 2021
Messages
7
Thanks for the swift response.

I have shutdown, removed the drive and rebooted.

Weirdly if I run zpool import it is still showing the partition that is on the now removed drive as faulted :frown:

Code:
root@freenas:~ # zpool import
   pool: NAS
     id: 13256022188567172076
  state: FAULTED
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        NAS                                             FAULTED  corrupted data
          raidz1-0                                      ONLINE
            gptid/9fb2f2a2-ff14-11e5-88b4-70106f3e74fc  ONLINE
            gptid/20b6d4af-505c-11ec-9610-70106f3e74fc  ONLINE
            gptid/a1a5e7aa-ff14-11e5-88b4-70106f3e74fc  ONLINE
          raidz1-1                                      DEGRADED
            gptid/7ec79194-7cb5-11ea-93b0-70106f3e74fc  FAULTED  corrupted data
            gptid/e39a8a76-5280-11eb-a93b-70106f3e74fc  ONLINE
            gptid/e6819d66-513f-11ec-9467-70106f3e74fc  ONLINE

The partition Id is no longer listed in glabel status
Code:
root@freenas:~ # glabel status
                                      Name  Status  Components
gptid/20b6d4af-505c-11ec-9610-70106f3e74fc     N/A  da0p2
gptid/9fb2f2a2-ff14-11e5-88b4-70106f3e74fc     N/A  da1p2
gptid/a1a5e7aa-ff14-11e5-88b4-70106f3e74fc     N/A  da2p2
gptid/e39a8a76-5280-11eb-a93b-70106f3e74fc     N/A  da3p2
gptid/e6819d66-513f-11ec-9467-70106f3e74fc     N/A  da4p2
gptid/daf9bb1e-d17a-11e8-92d6-70106f3e74fc     N/A  da5p1
gptid/db1a7481-d17a-11e8-92d6-70106f3e74fc     N/A  da5p2
gptid/e66b3688-513f-11ec-9467-70106f3e74fc     N/A  da4p1
gptid/e3846caf-5280-11eb-a93b-70106f3e74fc     N/A  da3p1
gptid/a198eb0a-ff14-11e5-88b4-70106f3e74fc     N/A  da2p1
gptid/9fa35dce-ff14-11e5-88b4-70106f3e74fc     N/A  da1p1
gptid/20128937-505c-11ec-9610-70106f3e74fc     N/A  da0p1



Importing the pool via the CLI results in the same IO error as before.

If I try to import from the TrueNas UI, the pool isn't actually listed. Do I possibly need to export or disconnect first because I do see it listed on ui/storage/pools/ page.


1639300792270.png
 

jspenc

Cadet
Joined
Dec 11, 2021
Messages
7
Update of my journey. I decided to be brave an do the export through the GUI, which did then allow me to retry the import through the GUI. I get the following stack trace when attempting the import, my python isn't the best but from what I can tell this is just a TrueNAS wrapper over the underling ZFS lib. It's a shame ZFS isn't giving a clearer error than just I/O error and from what I can see there is no verbose mode available .

Code:
Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
    res = MIDDLEWARE._run(*call_args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
    return self._call(name, serviceobj, methodobj, args, job=job)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
    return methodobj(*params)
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 371, in import_pool
    self.logger.error(
  File "libzfs.pyx", line 391, in libzfs.ZFS.__exit__
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 365, in import_pool
    zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
  File "libzfs.pyx", line 1095, in libzfs.ZFS.import_pool
  File "libzfs.pyx", line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: I/O error
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 367, in run
    await self.future
  File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 403, in __run_body
    rv = await self.method(*([self] + args))
  File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 975, in nf
    return await f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1421, in import_pool
    await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1256, in call
    return await self._call(
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1221, in _call
    return await self._call_worker(name, *prepared_call.args)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1227, in _call_worker
    return await self.run_in_proc(main_worker, name, args, job)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1154, in run_in_proc
    return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1128, in run_in_executor
    return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('I/O error',)


In the meanwhile I've ran long SMART tests on the drive that was removed which have passed. I'm a bit nonplussed about if there any viable next steps now...
 

jspenc

Cadet
Joined
Dec 11, 2021
Messages
7
Discovered this command, which does seem to provide some additional debug info

Code:
[root@freenas ~]# zdb -e NAS -ul

Configuration for import:
        vdev_children: 2
        version: 5000
        pool_guid: 13256022188567172076
        name: 'NAS'
        state: 0
        hostid: 16122292
        hostname: 'freenas.local'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 13256022188567172076
            children[0]:
                type: 'raidz'
                id: 0
                guid: 8181317042114998967
                nparity: 1
                metaslab_array: 35
                metaslab_shift: 36
                ashift: 12
                asize: 11995904212992
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 9461126416905487160
                    whole_disk: 1
                    DTL: 451
                    create_txg: 4
                    path: '/dev/gptid/9fb2f2a2-ff14-11e5-88b4-70106f3e74fc'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 3447576264904098358
                    DTL: 389
                    create_txg: 4
                    expansion_time: 1639079830
                    resilver_txg: 32899901
                    path: '/dev/gptid/20b6d4af-505c-11ec-9610-70106f3e74fc'
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 6581231283218499162
                    whole_disk: 1
                    DTL: 449
                    create_txg: 4
                    path: '/dev/gptid/a1a5e7aa-ff14-11e5-88b4-70106f3e74fc'
            children[1]:
                type: 'raidz'
                id: 1
                guid: 18424999579640736559
                nparity: 1
                metaslab_array: 385
                metaslab_shift: 37
                ashift: 12
                asize: 23998232788992
                is_log: 0
                create_txg: 23235487
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 11813773381553303204
                    path: '/dev/gptid/7ec79194-7cb5-11ea-93b0-70106f3e74fc'
                    DTL: 459
                    create_txg: 23235487
                    removed: 1
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 2669887881765897796
                    DTL: 279
                    create_txg: 23235487
                    path: '/dev/gptid/e39a8a76-5280-11eb-a93b-70106f3e74fc'
                children[2]:
                    type: 'disk'
                    id: 2
                    guid: 10727936423154890141
                    DTL: 466
                    create_txg: 23235487
                    expansion_time: 1639090577
                    path: '/dev/gptid/e6819d66-513f-11ec-9467-70106f3e74fc'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2
zdb: can't open 'NAS': Input/output error

ZFS_DBGMSG(zdb) START:
spa.c:6032:spa_import(): spa_import: importing NAS
spa_misc.c:411:spa_load_note(): spa_load(NAS, config trusted): LOADING
vdev.c:129:vdev_dbgmsg(): disk vdev '/dev/gptid/9fb2f2a2-ff14-11e5-88b4-70106f3e74fc': best uberblock found for spa NAS. txg 32960835
spa_misc.c:411:spa_load_note(): spa_load(NAS, config untrusted): using uberblock with txg=32960835
vdev.c:129:vdev_dbgmsg(): disk vdev '/dev/gptid/7ec79194-7cb5-11ea-93b0-70106f3e74fc': vdev_load: vdev_dtl_load failed [error=5]
spa_misc.c:396:spa_load_failed(): spa_load(NAS, config trusted): FAILED: vdev_load failed [error=5]
spa_misc.c:411:spa_load_note(): spa_load(NAS, config trusted): UNLOADING
ZFS_DBGMSG(zdb) END




Also I have given running this command a go, TBH I'm not sure if it's just locked up or it's actually doing something, will leave to run over night to see if it makes any progress

zpool import NAS -f -F -m -X
 

zsw12abc

Dabbler
Joined
Nov 22, 2022
Messages
25
Hey @jspenc , I faced the same issue as u faced.
did u solve ur issue?
 
Joined
Nov 6, 2022
Messages
2
@jspenc @zsw12abc any success to this critical problem?

It's really too sad. One of my disks has failed and it seems that this will end my journey with Truenas. What's really the point in using raid when there are no working tools that I can use to repair the pool? I could have used raid-0 instead.

'I/O error' doesn't really tell me what to do next.
 
Joined
Nov 6, 2022
Messages
2
I've also started a reddit-thread but after two months without any leads and success, I will probably wipe the pool and start using Windows as a file-server again.
 
Top