HELP ZFS Pool data recovery

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
My TrueNAS setup:
  • Lives on Proxmox VM
  • Three 4TB hard drives passed through to the VM
  • Running in RaidZ1
  • File system: ZFS
Please help!

4 days ago I had one of my 4TB hard drives fail on me. I instantly ordered a new one, it came in yesterday. Threw it into my Proxmox server. Located it and wiped it through the Proxmox interface (or so I thought). Turns out I wiped one of the dives that were in the ZFS pool (which is no longer redundant), and not the new one. The dumbest mistake I've made in my life!

So what I have right now, is 3 working 4TB drives. One of them is new, with no data on it. The other one is wiped (used to have all my data on it), and the last one's data is still intact.

Is there any chance I can recover the data on the drive I wiped so I can get the pool back up and running? Currently, TrueNAS says the pool is unavailable.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Is there any chance I can recover the data on the drive I wiped so I can get the pool back up and running?

Your pool was built with RAIDZ1, so it lacks the ability to tolerate two drive failures. Your only hope here is if your "failed" disk isn't entirely failed and can be recovered. What sort of failure did it suffer?
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
I am not sure, I hear it spinning and it seems to work nothing out of the ordinary, but it doesn't show in Proxmox.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
So TrueNAS lives in a Proxmox VM. I use a command to pass the drives through, to the VM. Proxmox has to see it in order for TrueNAS to see it. When the failure happened. The drive was gone from Proxmox, and I went into TrueNAS and the pool said it was degraded. The drive was no longer showing in TrueNAS either.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's super unsupported and highly unadvisable. Please see


You've created a situation where I can't help you, because I have no idea how "use a command to pass the drives through" works, and there are multiple layers that could be messing with it. This makes it very difficult to debug what's happened. You need someone intimately familiar with Proxmox and whatever "drive pass through" technique you used to look in Proxmox to see why the drive isn't available, and then if it is recoverable. I am not that person.

We've seen people use file-based virtual disks and all sorts of wacky stuff to address insufficiencies in their systems, but the problem with this is that if your file-based virtual disk (just using this as an example) gets fsck'ed out of existence, ALL the blocks that were on that file become irretrievable, whereas with a real hard disk, only the failed blocks become irretrievable. ZFS may be able to cope with drives where failed blocks become irretrievable and work around that. This is one of many reasons you MUST use PCIe passthru for the disk controller or HBA and let TrueNAS have direct access to the storage controller.

I don't know how to help you. This isn't to say that your data is permanently lost. You may yet be able to debug your disk, but that is outside the scope of the TrueNAS community forums. You might check with the Proxmox folks.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
That's what I meant by "pass through" it passed the entire drive and controller over to the VM so proxmox wouldn't have to touch it. Basically what I'm asking is it possible to recover the data that was lost because I wiped the disk. I haven't written anything to the disk since.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That's what I meant by "pass through" it passed the entire drive and controller over to the VM so proxmox wouldn't have to touch it. Basically what I'm asking is it possible to recover the data that was lost because I wiped the disk. I haven't written anything to the disk since.

If the controller is passed through to the VM, then there is no reason that Proxmox would be able to see the controller or the disk. You previously indicated

Located it and wiped it through the Proxmox interface (or so I thought).

which suggests that you did not actually pass through the controller; Proxmox would be unable to access the drive if it had been properly passed through to the VM. Perhaps you should provide a full description of your system and its configuration.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
How would I be able to pass it through to a VM if proxmox doesn't see it. Think about that for a sec. Proxmox has to see EVERYTHING if it can't see it, then how would it be possible to pass it to a VM.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Think about that for a sec.

Think about my post count for a sec. Think about who the author of the resource I quoted above is. I have been doing virtualization of FreeNAS for more than a decade. I might know a little something about it. I'm not just some forum rando.

One hard learned lesson is that you pass the disk CONTROLLER, not the disks, through to the VM. This causes the hypervisor to be unable to see the disk controller, because the disk controller has been reserved for the VM. Then the VM is able to grab control of the disk controller and manipulate the disks appropriately. This is the safe way to attach disks to your TrueNAS VM, because it closely resembles a bare metal setup.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Turns out the drive didn't fail. There's still hope! So it's back in TrueNAS. The pool says unavailable. How do I get it back? I appreciate your help.
 

Attachments

  • Screenshot 2023-06-17 121654.png
    Screenshot 2023-06-17 121654.png
    39.4 KB · Views: 155

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Turns out the drive didn't fail. There's still hope! So it's back in TrueNAS. The pool says unavailable. How do I get it back? I appreciate your help.

I believe that ZFS is not convinced that the disks are all available without conflict because one of the disks appears to be active on "another" ZFS system (which may just be your existing ZFS system but some hours ago).

It seems to think that a forced import could be successful. That's probably what I'd try. From the command line, try

zpool import -Fn Tank

which should indicate whether or not ZFS thinks a forced import will work. This is somewhat more accurate than just the status output message.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
I tried the command, it finished with no output. Pool still unavailable and "zpool import" still shows the same. What next?
 

Attachments

  • Screenshot 2023-06-17 121654.png
    Screenshot 2023-06-17 121654.png
    39.4 KB · Views: 139
  • Screenshot 2023-06-17 122924.png
    Screenshot 2023-06-17 122924.png
    50.8 KB · Views: 102

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The one that shows unavailable is the one I formatted by accident.

Is it actually there in the TrueNAS VM? You haven't indicated whether this is TrueNAS CORE or TrueNAS SCALE.
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
Yes, the drive is showing. This is TrueNAS core. The 4tb drives are all the old hard drives. while the New one is sitting right next to me, unplugged.
 

Attachments

  • Screenshot 2023-06-17 123502.png
    Screenshot 2023-06-17 123502.png
    52 KB · Views: 105

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
This is the same error I get when doing it through the gui
 

Attachments

  • Screenshot 2023-06-17 124011.png
    Screenshot 2023-06-17 124011.png
    5.9 KB · Views: 103

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, the drive is showing. This is TrueNAS core. The 4tb drives are all the old hard drives. while the New one is sitting right next to me, unplugged.

My guess is that there is something it doesn't like about the disk beginning with GPTID 11b39573. Does /dev/gptid/11b39573* exist if you look for it from the UNIX shell? Like maybe not even being there?
 

Dawson

Explorer
Joined
Jun 17, 2023
Messages
80
My guess is that there is something it doesn't like about the disk beginning with GPTID 11b39573. Does /dev/gptid/11b39573* exist if you look for it from the UNIX shell? Like maybe not even being there?
How would I look for that?
 

Attachments

  • Screenshot 2023-06-17 130419.png
    Screenshot 2023-06-17 130419.png
    33.9 KB · Views: 134

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
How would I look for that?

It's not there in your list. So that's what ZFS is mad about. It wants to see all its disks or it doesn't like it. I think you need to treat this as a failed disk replacement scenario. The manual has the proper current process for taking care of this.
 
Top