Backup & Restore of a VM Zvol via Replication

Turnspit

Dabbler
Joined
Jun 10, 2023
Messages
16
I'm currently testing out my future backup & restore strategy to another (then to be offsite) TrueNAS device for my datasets as well as Zvols for VM usage.
While backing up and restoring dataset via Replication works like a charm, I'm having trouble to get replicated Zvols to run for VMs again.

I've set up my server A to push replications via SSH to server B, and then pull them back from B to A again.

Neither on server B, nor on server A (after receiving the replicated Zvol back again) I'm able to select the Zvol for VM usage (having set "Read Only" to "Off" on the Zvol). I've also tried to go with a "Full Filesystem Replication", to no avail.

I've then set up another Zvol, identical to the original one, and copied the contents of the replicated Zvol over to the new one via "dd". I'm losing my snapshots in the process, but lo and behold, I'm able to use this Zvol for a VM and it boots up correctly.

Is there a way to directly used a replicated Zvol for a VM. Is there maybe something wrong with my snapshot/replication tasks (see screenshots)?
I'm running on version 22.12.3.2 on both machines.


Snapshot.png

Replication-1.png

Replication-2.png
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
If backup target is going to remain a backup target then you don't want to go and modify the data on the backup server, it will have to get rolled back for the next replication.
I would clone the object, and then access the clone for testing, and you can throw it away after testing.
You can keep the clone for a while, but it depends on the origin snapshot, and if these are being expired then that will block because the clone exists, so you would need to manage that. If you want a full fat clone, then you can zfs send/recv the origin snapshot into a new zvol on the backup server and use that.

See: man zfs-clone on the terminal, or check the web.

I don't use zvols much, but have replicated them between machines and used them as block devices attached to VMs. Sometimes I use them in the VM, and then shutdown the VM and import them on the ZFS machine and access the data there. Not TN, but ZFS, so the concept is sound.

Looking at your configuation just Full Filesystem Replication maybe.
Have you tried opening the VM on the backup server? Check the partition table: sgdisk -p /dev/zvol/pool/path/to/zvol
Have you compared the zvol properties on each end? zfs get all pool/path/to/zvol | sort possibly with | grep -v default on the end
You might check the zvol and the snapshots are the same and the same size on each end too. zfs list -r -t snap pool/path/to/zvol

I just checked a zvol I had replicated from my TNS server to my backup server, and the sgdisk test works on both, and I did nothing special, it was part of a heirarchical replication of a dataset tree.
 
Last edited:

Turnspit

Dabbler
Joined
Jun 10, 2023
Messages
16
I was not intending to modify the files on the backup target, I just messed with the Zvol to see if it would get recognized there as a proper VM target.

Full Filesystem Replication doesn't make a difference sadly, already had tried that out.

I checked the original, as well as the replicated, as well as the replicated-replicated back to the original host Zvol with sgdisk as well as zfs get all - they look exactly the same.

Comparing the snapshots it seems that the replicated ones are a tiny bit smaller than the original ones - might that be an issue?
Bildschirmfoto vom 2023-07-17 20-39-22.png



The interesting thing I observed and just tested out with a fresh replication:
Upon having freshly replicated, I don't get the option to choose the replicated Zvol as a VM target. Without changing anything else (even letting the Zvol stay at Read Only) and just issuing the sgdisk -p /dev/zvol/pool/path/to/zvol command, the Zvol is suddenly available as a VM disk.

Does the Zvol maybe somehow need to get initialized, for the system to properly recognize it as a Zvol?
When trying to add the Zvol as a VM disk manually before, I also got the error message "Zvol is required", as if the system wouldn't properly recognized it.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
As far a I know as soon as the replication is complete the zvol is usable. Minor size differences I don't think are an issue, metadata, compression maybe. Maybe midleware is not cogniscient of the zol just after it is replicated, and needs a poke to rescan. Iirc you can restart is with systemctl.
 

Turnspit

Dabbler
Joined
Jun 10, 2023
Messages
16
So I did a few more tests, all with the same results:

Freshly re-replicating 2 different Zvols of 2 VMs from the backup target back over to the main machine - nothing.
Setting "Read-Only" to "Off" on the replicated Zvols - nothing.
Running systemctl restart middlewared - nothing.

Running sgdisk -p /dev/zvol/pool/path/to/zvol on each of the Zvols separately - success!
After sgdisk on each of the Zvol respectively, they became available one after the other as a disk target for a VM.

I recorded the process for illustration purposes:
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
I can't see a reason sgdisk would make a difference, it is just reading.
The /dev/zvol namespace is populated by udev before sgdisk can use it which happens when the zd device is instantiated, I assume when volume replication is complete.
If middleware has a hook in udev then it already knows.
How long does it take after the replication for webui to see the zvols as existing if you do nothing?
Could it just be timing and you have to wait for TN to get organized?
 

Turnspit

Dabbler
Joined
Jun 10, 2023
Messages
16
First of all, thanks for your continued support samarium! :smile:

I did another fresh re-replication this morning, turned Read-Only on the now new Zvol Off, waited about 5 hours, and took another look - Zvol isn't registered still for VM usage.
Restarted middleware - nothing.
Ran sgdisk on the Zvol - it immediately pops up.

It seems to me that before running sgdisk (or maybe any other command properly identifying the Zvol as a Zvol?) TrueNAS does not detect it as a proper Zvol.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
First of all, thanks for your continued support samarium! :smile:

I did another fresh re-replication this morning, turned Read-Only on the now new Zvol Off, waited about 5 hours, and took another look - Zvol isn't registered still for VM usage.
zfs Restarted middleware - nothing.
Ran sgdisk on the Zvol - it immediately pops up.

It seems to me that before running sgdisk (or maybe any other command properly identifying the Zvol as a Zvol?) TrueNAS does not detect it as a proper Zvol.

I suppose you could try something really minimal like: : < /dev/zvol/path/to/zvol which merely opens and cloces the zvol.

Or you could try manually creating a zvol zfs create -V 1G -o volblocksize=16k pool/path/to/zvol and see if TN detects it and allows you to add it do a VM, and if it doesn't, if using sgdisk does allow it do be detected.

While you are testing those things, maybe monitor kernel messages journalctl -kf

What happens if after you have replicated once, and webui has noticed the zvol, you modify the zvol on the original host and replicate the changes to the backup host? Does the zvol disappear? Is the updated data visible? Can you still access the data with your VM on the backup host? Do you have run sgdisk again?

You should log a JIRA ticket, since it seems to be a TN webui/middleware issue after using TN replication.
 

Turnspit

Dabbler
Joined
Jun 10, 2023
Messages
16
journalctl -kf just posts something like debugfs: Directory 'zd144' with parent 'block' already present! upon replicating or manually creating a Zvol, otherwise stays silent during the process.

Running : < /dev/zvol/path/to/zvol on a freshly replicated Zvol immediately makes it visible for VM usage.

The manually created Zvol is also usable as a VM disk from the get-go, no need opening and closing it manually or running sgdisk.

It really seems like the UI/middleware aren't properly aware of a freshly replicated Zvol.

I will test the changing/replicating again later/tomorrow when I've got some more spare time.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
I'm wondering if this has something to do with the zfs event daemon, zed, and some interaction between it and middlewared.

I don't know much about either. Something for me to read on the train tomorrow.
 

samarium

Contributor
Joined
Apr 8, 2023
Messages
192
Well, I did a little reading, and a little testing, and ...

If you run zpool events -vf in one window and journalctl -kf in another window and udevadm monitor in another window and then run thru something like this:

Code:
pool=name path=vm/path zvol=/dev/zvol z1=vol1 z2=vol2 s1=snap1 s2=snap2
test -e $zvol/$pool/$path/$z1 && zfs destroy -r $pool/$path/$z1
test -e $zvol/$pool/$path/$z2 && zfs destroy -r $pool/$path/$z2
zfs create -V 1G $pool/$path/$z1 && sleep 2 && sgdisk -n1:: -c1:$z1 -p $zvol/$pool/$path/$z1
zfs snapshot $pool/$path/$z1@$s1 && sleep 2 && sgdisk -c1:$z1-$s1 -p $zvol/$pool/$path/$z1
zfs snapshot $pool/$path/$z1@$s2 && sleep 2 && sgdisk -c1:$z1-$s2 -p $zvol/$pool/$path/$z1

so we have vol1 with a couple of snapshots and we can replicate and see what happens to events, kernel messages, and udev
Code:
# check monitor windows, maybe press return a few times in each
zfs send -RceL $pool/$path/$z1@$s2 | zfs recv -Fuv $pool/$path/$z2
# check monitor windows, maybe press return a few times in each
sgdisk -p $zvol/$pool/$path/$z2
# check monitor windows, maybe press return a few times in each

It seems like each time I access a zvol, I get a a kernel message about the zvol device, and udev events.
I'm guessing that middlewared is hooked into udev to pickup the new zd, but I can't test that at the moment because I am awayfrom my TNS testing VM, and I haven't had time to setup a temporary one. Not sure what would be useful for middleware monitoring, journalctl -xe -u middlewared might not see anyting.

Let me know if you find something.

Thinking about it some more, now that you know the device is good and you can trigger availability by accessing it, you don't really care anymore. You probably should still log a JIRA ticket, but it is up to iX to figure out a mechanism that works for them to detect the availability of a replicated zvol, maybe the set refreservation zevent would be a hint to go looking.
 
Last edited:
Top