Untypical reboot loop and crashed pool: How to rescue data first?

chaddictive · Jan 20, 2021

Hi there,

I would rerally appreciate some help from the community. I started 6 weeks ago with Truenas, read a lot, tested a lot, got everything running and thought: Since all is good, lets get my 2 x 4 TB WD Red into production now. A week later I "de-indexed" the main folder in my Pool via Windows SMB and stopped it in the middle before it wasnt ending - I started this because of some speed issues in my LAN when accessing the NAS, but it was the Router in the end. Also I copied a 78GB Image to the NAS in thame day over night (both I don't know if that triggered the issue). In the morning I hear a beep avery minute...I got a reboot loop - my pool crashed and no chance to import it, so far I tried. It is the same random issue some guys have here. Same issue already has a JIRA Ticket here.

Now before I try to solve that issue on a long term, has anybody an idea how to rescue my data first? What I have tried already:

1. Install latest Freenas 11.3.U5 version (on seperate drive) and tried to import pool over GUI (which is visible for import): After import the NAS restarts and reboot loop starts again. To get out of that loop, I need to disconnect the two HDDs, start Truenas and disconnect the Pool in GUI. Then I can restart again including the HDDs.
2. Install fresh Truenas 12.0 U1.1., try to import manually by "zpool import -o readonly=on RaidPool" and also "zpool import -o readonly=on -f RaidPool". NAS restarts but still no pool imported. it is like nothing happened. The same I tried in Freenas with same result.
3. Created a new pool on a seperate drive and tried to import data with no luck (actually it was just a try without reading before, that this doesnt make sence)

Any other commands I should try for an import via shell and without destroing my data? Or try it on a VM? Or with some special tool for windows? Thats the reason I ask here first, before I damage anything. I "just" need to access my data in any way possible first.

My Specs:
System: Truenas 12.0 U1.1
Mainboard: 880GMA-E35 FX
CPU: AMD Athlon(tm) II X3 455 Processor
Ram: 1x G.SKill Ripjaws 4GB DDR3 PC3-10666 + 1x 4GB Corsair XMS3 1333MHZ (I changed that right now two 2x Ripjaws, I hope that wasnt the issue)
Drives: 1x Kinsgston SSD + 2x 4TB WD Red Plus

The message I got on the console after trying to import readonly without success is the following:

Code:

traverse_,visitbp() at traverse_visitbp+8x703/frame 8xfffffe0228cec3e0 .
traverse_impi() at traverse_impi+Ox317/frame 8xfffffe0228cec500
traverse_pool0 at traverse_pool+0x149/frane 8xfffffe0228cec5c0
spa_load() at spajoad+Ox141a/frame 8xfffffe0228cec720
spa_load_best() at spajoad_best+0x65/frame 8xfffffe0228cec770
spa_import() at spa_inport+8x27b/frame 8xfffffe0228cec830
Zfs_ioc_pool_import() at zfs_ioc_pooi_import+8x163/frane Oxfffffe0228cec880
Zfsdev_ioctl() at zfsdev_iocti.0x715/frame Oxfffffe0220cec920
devfs_ioct l_f () at devfs_ioctl_f+Ox126/frame Oxfffffe0228cec980
kern_ioctl() at kern_ioctl+0x267/frame 0xfffffe0226cec9f0
sys_ioctl() at sys_ioctl+0x15b/frane 8xfffffe0228cecac0
Amd64_syscall0 at and64_syscall+8xa86/frame 8xfffffe0228cecbf0
fast syscall_common()) at fast_syscall_common+0x101/frame 8xfffffe0220cecbf0
---syscall (54, FreeBSD ELF64, sys_ioctl), rip = 8x80120106a, rsp = 0x7fffffffb
c68, rbp = 0x7fffffffbce0 ---
KDB: enter: panic
[thread pid 1787 tid 100557]
Stopped at kdb_enter+0x3b: movq $0,kdb_why
db:0:kdb.enter.default> write cn_mute 1
cn_mute
db:0:kdb.enter.default> textdump dump
db:0:kdb.enter.default> reset
cpu reset: Restarting BSP
cpu=reset_proxy: Stopped CPU 2

If the message within the first reboot loop is intersting for you, I can post that too. But its pretty the same in the end with "KDB: enter: panic...etc.".

I searched days already in forums, I hope I didnt miss an easy way out of this. If so, sorry for that and I would be very gratefull for any hint.

Thanks upfront for any comment on this.

sretalla · Jan 20, 2021

chaddictive said:
Install fresh Truenas 12.0 U1.1., try to import manually by "zpool import -o readonly=on RaidPool" and also "zpool import -o readonly=on -f RaidPool". NAS restarts but still no pool imported. it is like nothing happened.

Are you sure nothing imported? you would see an error if not.

Check zfs list before you assume nothing is there.

You could also try an import with the -F switch, which may be able to discard the last few transactions to get to a healthy pool (but may lose some very recent data).

chaddictive said:
Or with some special tool for windows?

After you have determined that there is really no other way to import it in TrueNAS or linux, you might resort to Klennet (https://www.klennet.com/zfs-recovery/default.aspx) or there's also an "open source" option, which may or may not provide some options to recover: https://github.com/Stefan311/ZfsSpy

chaddictive · Jan 20, 2021

Are you sure nothing imported? you would see an error if not.

Check zfs list before you assume nothing is there.

You could also try an import with the -F switch, which may be able to discard the last few transactions to get to a healthy pool (but may lose some very recent data).

Jepp, no error and nothing imported. I type into shell "zpool import -f RaidPool", the cursor goes down one line but no message returns and after around 20 seconds the NAS reboots. When I watch the NAS on my monitor, then I get the KDB: enter: panic part (see picture).

After the reboot I check with "zfs list" and get the following:

Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.

root@truenas[~]# zfs list
NAME USED AVAIL REFER MOUNTPOINT
WD_3TB 15.6M 2.63T 88K /mnt/WD_3TB
WD_3TB/.system 12.7M 2.63T 104K legacy
WD_3TB/.system/configs-03127bf8be404c849cf141a05d5f9b13 96K 2.63T 96K legacy
WD_3TB/.system/configs-e6cb7c26dd1243d3962c3f87f9eddfe5 88K 2.63T 88K legacy
WD_3TB/.system/cores 640K 1023M 640K legacy
WD_3TB/.system/rrd-03127bf8be404c849cf141a05d5f9b13 5.79M 2.63T 5.79M legacy
WD_3TB/.system/rrd-e6cb7c26dd1243d3962c3f87f9eddfe5 5.23M 2.63T 5.23M legacy
WD_3TB/.system/samba4 192K 2.63T 192K legacy
WD_3TB/.system/services 96K 2.63T 96K legacy
WD_3TB/.system/syslog-03127bf8be404c849cf141a05d5f9b13 264K 2.63T 264K legacy
WD_3TB/.system/syslog-e6cb7c26dd1243d3962c3f87f9eddfe5 168K 2.63T 168K legacy
WD_3TB/.system/webui 88K 2.63T 88K legacy
WD_3TB/Backup 88K 2.63T 88K /mnt/WD_3TB/Backup
boot-pool 1.37G 433G 96K none
boot-pool/ROOT

That is only my 3TB HDD. My RaidPool based on two 4 TB HDDs is not there.

chaddictive · Jan 20, 2021

And before that I checked to be sure, please see pictures:

The two HDDs are there, the pool can be seen in the GUI for an import.

chaddictive · Jan 20, 2021

sretalla said:
After you have determined that there is really no other way to import it in TrueNAS or linux, you might resort to Klennet (https://www.klennet.com/zfs-recovery/default.aspx) or there's also an "open source" option, which may or may not provide some options to recover: https://github.com/Stefan311/ZfsSpy

Thanks already for that hint. I'll wait with it, maybe you or somebody else gets me on the right track. Anyway I would pay something to get my data back, but 400 $ for Klennet is way too much. Maybe I would try the open source software, but they say it may not work with newer ZFS Pools. Well, lets see how far I get.

Arwen · Jan 20, 2021

@chaddictive, you mention 2 x 4TB WD Reds. I truly hope they are WD Red Plus or Pro.

Recently Western Digital did something utterly stupid, and changed an existing NAS line, the "WD Red", into SMR disks, (Shingled Magnetic Recording). SMR is dog slow for write heavy loads. Worse, Western Digital has a bug in the disk's firmware that ZFS can trigger. Thus completely un-suitable for a FreeNAS/TrueNAS. (Unlike Seagate Archive disks, which while slow, DO work with ZFS.)

Whence Western Digital got caught making the WD Reds in to SMR disks, they relabeled the old, CMR WD Reds as Red Pluses. And still marketed the SMR drives as NAS capable. As I said, stupid.

So, please get us the 2 x 4TB disk model numbers. If I understand correctly, these WD40EFAX are the affected disks.

chaddictive · Jan 21, 2021

@Arwen : Thanks for the hint.

In this case, my WD Reds are Plus WD40EFRX, NX HA500. The Model number is WD40EFRX - 68N32NO. When I bought them, I thought: I'll spent a few bucks more for Plus, without nowing the differences behind. Good thought, even though it wont be the solution for my issue now I guess.

chaddictive · Feb 7, 2021

sretalla said:
Are you sure nothing imported? you would see an error if not.

Check zfs list before you assume nothing is there.

You could also try an import with the -F switch, which may be able to discard the last few transactions to get to a healthy pool (but may lose some very recent data).

After you have determined that there is really no other way to import it in TrueNAS or linux, you might resort to Klennet (https://www.klennet.com/zfs-recovery/default.aspx) or there's also an "open source" option, which may or may not provide some options to recover: https://github.com/Stefan311/ZfsSpy

Since I dont get any further with reparation of the pool I will try some possibilities of data rescue now. Any other possibilities to integrate / import the pool with another OS? Linux or anything else? If there is a possibility, it is an option for me to install a completely different OS to raise my chances. Thanks again for any hint.
ZfsSpy (open source) I may try too, but since it is not supported anymore I gues it wont work with the newer ZFS type.

koli · Feb 7, 2021

chaddictive said:
Since I dont get any further with reparation of the pool I will try some possibilities of data rescue now. Any other possibilities to integrate / import the pool with another OS? Linux or anything else? If there is a possibility, it is an option for me to install a completely different OS to raise my chances.

Proxmox comes to mind, I don't have a first hand experience but I believe it should be straight forward (under normal circumstances. Defintely linux as well, Ubuntu very likely.
Btw, hardware your are running is no no for Truenas, I think that's why you are not getting any advice here. People don't really want to touch something like that...

TangoFB · Mar 9, 2021

Not sure if this will help, but I have run into this exact same error. I originally thought my old hardware gave up the ghost and built a whole new system and then when I was importing my pool... BOOM, reboot loop again.

That brought me here to your post. One difference is that I was able to get my pool to import as ReadOnly, but I had issues getting it to properly mount in /mnt so I could actually access it with rsync, or any other tools.

I was finally able to get the pool properly listed under /mnt with the following command:

Code:

 zpool import -R /mnt -o readonly=on <poolname>

I hope this helps someone since I have seen this is a growing issue.

chaddictive · Mar 14, 2021

TangoFB said:
I hope this helps someone since I have seen this is a growing issue.

Thanks even for this info, so I am not the only one with this issue and maybe this issue dint arise because of my Newbie status to TrueNas.

Actually I tried to rescue my data via Ubuntu and GhostBSD, whith no positive outcome so far... :(

Only new info I have, that GhostBSD (FreeBSD) sees my "RaidPool" with "The pool cannot be imported due to damaged devices or data". And when I try to import the pool with the same commands as in TrueNas, a reboot loop starts too...
...great, FreeBSD is consistent

Samuel Tai · Mar 14, 2021

How was your 2x4TB pool constructed? As a mirror or as a stripe? If a stripe, your only option is Klennet. If a mirror, try importing your pool with only one drive connected.

chaddictive · Apr 17, 2021

Samuel Tai said:
How was your 2x4TB pool constructed? As a mirror or as a stripe? If a stripe, your only option is Klennet. If a mirror, try importing your pool with only one drive connected.

It is a mirror. I just tried only one of the drives via Ubuntu and same issue so far.

With "sudo zpool import -o readonly=on RaidPool" I get the message, that "the pool was previously in use from another system....the pool can be imported, use 'zpool import -f" to import the pool."

With "sudo zpool import -o readonly=on -f RaidPool" the curser just blinks the whole time and I dont get a message back. I will leave it alone for a few hours and will try the same on FreeBSD. Lets see.

But thanks for that hint. It is/was worth a try.

chaddictive · Apr 17, 2021

A

Samuel Tai said:
If a stripe, your only option is Klennet.

And actually: I tried Klennet in free version to read only what is on the drives: Klennet didnt work and couldnt find any data...

chaddictive · Jan 31, 2022

Since version U7 (which includes OpenZFS 2.0.6) is out already, this should have theoretically fixed the issue (for anybody that may have this same issue), as I read here.

Sadly this wasnt true for me, I get this I/O error message back, when trying to import the Pool via UI.

Error: concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 94, in main_worker
res = MIDDLEWARE._run(*call_args)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/worker.py", line 39, in _call
return methodobj(*params)
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 979, in nf
return f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 371, in import_pool
self.logger.error(
File "libzfs.pyx", line 391, in libzfs.ZFS._exit_
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py", line 365, in import_pool
zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
File "libzfs.pyx", line 1095, in libzfs.ZFS.import_pool
File "libzfs.pyx", line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: I/O error
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 367, in run
await self.future
File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 403, in __run_body
rv = await self.method(*([self] + args))
File "/usr/local/lib/python3.9/site-packages/middlewared/schema.py", line 975, in nf
return await f(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py", line 1421, in import_pool
await self.middleware.call('zfs.pool.import_pool', pool['guid'], {
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1256, in call
return await self._call(
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1221, in _call
return await self._call_worker(name, *prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1227, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1154, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1128, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: ('I/O error',)

When I try to import via shell I do by "zpool import -o readonly=on -f Poolname" and get this back:

If anybody can help, I would really really appreciate.

Cheers

Arwen · Jan 31, 2022

The command line is more or less correct.

You can import your pool using the force option, "-F'. You will loose the last 5 seconds of data in flight. And you will want to perform a scrub of the pool after import to see if any other data is lost.

Whether you want to risk it, is up to you.

If it were me, with a simple 2 disk mirror, I would buy another drive and low level clone one of the pool's disks. That would give me a back out plan in case I wanted to try again. Better yet, but 2 new disks and clone both.

chaddictive · Feb 23, 2022

Thanks Arwen.

Nothing happens the, see screenshot after I used "-F" in capital letter.

winnielinnie · Feb 23, 2022

chaddictive said:
Thanks Arwen.

Nothing happens the, see screenshot after I used "-F" in capital letter.

View attachment 53453

Check if it successfully imported:

zpool status -v

Arwen · Feb 24, 2022

chaddictive said:
Thanks Arwen.

Nothing happens the, see screenshot after I used "-F" in capital letter.

View attachment 53453

The first command used lower case "-f", which does not do the recovery mode. In the second command, nothing normally "happens" / prints after a successful import.

So, do as @winnielinnie wrote, check pool status.

chaddictive · Feb 24, 2022

I did read a lot (in my opinion) but I newer read that basic thing, that the commands are case sensitive.

And I am getting back hope now. I did checked the pool status and it says its online:

--------------------------------------
root@truenas[~]# zpool status -v
pool: RaidPool
state: ONLINE
config:

NAME STATE READ WRITE CKSUM
RaidPool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/b0b1f12e-51e9-11eb-a93f-8c89a5838bcd ONLINE 0 0 0
gptid/b0bf43df-51e9-11eb-a93f-8c89a5838bcd ONLINE 0 0 0

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:19 with 0 errors on Thu Feb 24 03:45:19 2022
---------------------------------------

I guess now comes more basic stuff for you guys, but I couldn't Import the Pool via UI - no pools available...

Important Announcement for the TrueNAS Community.

Untypical reboot loop and crashed pool: How to rescue data first?

Dabbler

Powered by Neutrality

Dabbler

Dabbler

Dabbler

MVP

Dabbler

Dabbler

Dabbler

Cadet

Dabbler

Never underestimate your own stupidity

Dabbler

Dabbler

Dabbler

MVP

Dabbler

MVP

MVP

Dabbler

Similar threads