Running a command before snapshots happen

Status
Not open for further replies.

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
Is it possible to run a command before the snapshot runs? I have an rsync script that backs up another system and would like to have a snapshot made after the script runs.

I know that I could run zfs snapshot, but I want it to be the same as the periodic snapshots.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Would it be simpler to rsync the periodic snapshot to wherever?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What? Explain a little better. It sounds like you want the snapshot taken at the same time that the rsync starts copying data. If you did that then you wouldn't have the data copied from the rsync.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
> Would it be simpler to rsync the periodic snapshot to wherever?

Sorry, this wasn't what I meant.

> What? Explain a little better. It sounds like you want the snapshot taken at the same time that the rsync starts copying data. If you did that then you wouldn't have the data copied from the rsync.

No. I'm going to be running the rsync job on the NAS.
Here's what I was thinking. If the periodic snapshot had a pre-command, I could use this to start the rsync of another system, and after it finished, the snapshot would be performed.

Basically, the operation would go something like this:
1) Periodic snapshot task starts.
2) Runs precommand
2a) rsync runs to sync a remote server to the local nas.
3) Snapshot is taken.

The only modifications to this dataset would be the rsync process.

This would be similar to the way the NetApp filer we are replacing works with it's snapvault function.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, rsync doesn't work like that. Once it starts running it returns you to the command line and runs in the background(at least it did for me). So even if it ran as a precommand the snapshot would be taken seconds after the rsync starts and your snapshot wouldn't include your rsync'd data.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
>Yeah, rsync doesn't work like that. Once it starts running it returns you to the command line and runs in the background(at least it did for me). So even if it ran as a precommand the snapshot would be taken seconds after the rsync starts and your snapshot wouldn't include your rsync'd data.

Rsync is just like any other unix process. It will not return to the command prompt after it is invoked until it is finished or the command was requested to go into the background. I have already used rsync on the NAS to do the copy. I just want the snapshot to be taken after rsync runs.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I see a few options here.

1) Snapshots are pretty much instantaneous, so just schedule the rsync task to occur after the snapshot task. OK, scheduling snapshots is tricky, but if you set up the task and then rename the latest snapshot so the time is when you want the snapshots to occur, it should be reliable. (Basically, if you want snapshots to occur on the hour: set a snapshot task, wait for it to run, use "zfs rename" to change name of the snapshot so the minutes part of the name is "00". Your next snapshot for that task should occur on the hour.)

Is it really that important that you get a snapshot right after the rsync completes?

2) Instead of relying on two tasks to rsync and snapshot, write a wrapper script that executes the rsync command and then makes a new snapshot. You can even name the snapshot using the same format as the snapshot task would (@auto-YYYYmmdd-HHMM-"retention time (2m/4d/3w/etc.)"). Rsync should indeed not return until the task is complete, unless it was invoked with an '&' at the end to indicate that it should be a background task.


Personally, I picked option 1. Once you've got a few rsync tasks running, for varying sizes of data, it gets harder and harder to make sure you're snapshotting after everything is done. So instead, just snapshot on each hour and run your rsync tasks right after that.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
I see a few options here.

1) Snapshots are pretty much instantaneous, so just schedule the rsync task to occur after the snapshot task. OK, scheduling snapshots is tricky, but if you set up the task and then rename the latest snapshot so the time is when you want the snapshots to occur, it should be reliable. (Basically, if you want snapshots to occur on the hour: set a snapshot task, wait for it to run, use "zfs rename" to change name of the snapshot so the minutes part of the name is "00". Your next snapshot for that task should occur on the hour.)

Is it really that important that you get a snapshot right after the rsync completes?

I planned on having a zfs replication task run. From what I have seen, the replication task runs after the snapshot is taken.

2) Instead of relying on two tasks to rsync and snapshot, write a wrapper script that executes the rsync command and then makes a new snapshot. You can even name the snapshot using the same format as the snapshot task would (@auto-YYYYmmdd-HHMM-"retention time (2m/4d/3w/etc.)"). Rsync should indeed not return until the task is complete, unless it was invoked with an '&' at the end to indicate that it should be a background task.


Personally, I picked option 1. Once you've got a few rsync tasks running, for varying sizes of data, it gets harder and harder to make sure you're snapshotting after everything is done. So instead, just snapshot on each hour and run your rsync tasks right after that.

In this case, the dataset will only have 1 rsync process ran against it. I was going to have several datasets setup like this with different schedules. Only 1 rsync process would ever touch the data.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I'm not sure how replication is implemented, but I think I'd just handle all three tasks in a wrapper script run as a cron task.
1) rsync
2) snapshot
3) replicate

Basically, I think your requirements are greater than what can be provided by the current GUI. You could file a bug requesting the "pre-snapshot" feature, but for now I think rolling your own is the better option.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
I'm not sure how replication is implemented, but I think I'd just handle all three tasks in a wrapper script run as a cron task.
1) rsync
2) snapshot
3) replicate

Basically, I think your requirements are greater than what can be provided by the current GUI. You could file a bug requesting the "pre-snapshot" feature, but for now I think rolling your own is the better option.

I was looking through the scripts. The snapshot script (autosnap.py) calls autorepl.py at the end. I don't know much about python, so I can't figure out what the snapshotting thing is doing. I do know the snapshot script is called every minute via cron.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Yeah, I wouldn't bother modifying the existing scripts, I'd just create a new shell script that ran the three commands.
1) date # to get the new snapshot name
2) rsync <options> <source> <destination>
3) zfs snapshot <dataset>@<snapshot name>
4) zfs send <dataset>@<snapshot name> # ... whatever you need to do to send it somewhere (file, ssh destination, etc)

You can then set cron to run that script at whatever interval you want.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
Yeah, I wouldn't bother modifying the existing scripts, I'd just create a new shell script that ran the three commands.
1) date # to get the new snapshot name
2) rsync <options> <source> <destination>
3) zfs snapshot <dataset>@<snapshot name>
4) zfs send <dataset>@<snapshot name> # ... whatever you need to do to send it somewhere (file, ssh destination, etc)

You can then set cron to run that script at whatever interval you want.

I thought about that, except for the #4. I wanted to do as little outside of the GUI. I've read that things can break if you do. My #2 is already a script and doesn't modify anything related to freenas. The actual snapshots, from what I've seen, are queried from zfs, but the creation of the snapshots are setup through the GUI.

I think for now, I'll have a cron job run when I need it to, and have the snapshot task run 30-60 minutes later.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, generally you should try to stick to the GUI. In this case I think you could get away with #1-4 without significant problems. Although the GUI may not acknowledge the ZFS snapshots for you so you'd have to delete them manually. Then they may appear if you reboot. I'm not sure exactly how the GUI tracks the snapshots and has the moment of revelation where it realizes there is a new snapshot. It may be the second you click the menu item to display your snapshots.

Honestly, if you really want to do everything via cronjob I'm sure it would work. Your GUI may lie to you about the ZFS snapshots that exist and may not let you delete them from the GUI even if it sees them, but you seem advanced enough to understand those downfalls. I'd just do it how you want and see what happens.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
From my experience, the GUI is now aware of manual (re. command line created) snapshots.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
Yeah, generally you should try to stick to the GUI. In this case I think you could get away with #1-4 without significant problems. Although the GUI may not acknowledge the ZFS snapshots for you so you'd have to delete them manually. Then they may appear if you reboot. I'm not sure exactly how the GUI tracks the snapshots and has the moment of revelation where it realizes there is a new snapshot. It may be the second you click the menu item to display your snapshots.

Honestly, if you really want to do everything via cronjob I'm sure it would work. Your GUI may lie to you about the ZFS snapshots that exist and may not let you delete them from the GUI even if it sees them, but you seem advanced enough to understand those downfalls. I'd just do it how you want and see what happens.

As I understand it, the actual zfs snapshots are queried by the GUI to see what they are and isn't stored anywhere else. They probably won't come back after a reboot as they are part of the filesystem. The goal was to have the snapshot taken after the rsync transfer so that the replication would have everything in it and the secondary would be up to date in the even the primary dies. The other reason was to let the FreeNAS part handle expiring snapshots.

I do understand some of those downfalls. The rsync script I wrote for another purpose. It did it's own snapshot, but never cleaned them up. That was something that I always did manually. This script is for backing up servers at another place and I felt that using the snapshot functionallity of the GUI was better since it cleaned up after itself anyway.

If I knew enough about python, I could probably add the pre-command to the snapshot functionallity on the GUI, but unfortunately, I don't.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
I'm not sure what you mean by:
As I understand it, the actual zfs snapshots are queried by the GUI to see what they are and isn't stored anywhere else. They probably won't come back after a reboot as they are part of the filesystem.
Snapshots made by the GUI or command line are stored by ZFS in the filesystem. If the GUI stores anything in a database, it keeps it up to date with changes made on the command line (this was one of the big changes in, I think, FreeNAS 8.3). I can back this up that snapshots that I've made automatically from GUI tasks, manually from the GUI, and manually on the command line all display in the GUI and don't disappear after a reboot.

If I understand your goals correctly, you want to run an rsync task, snapshot the result of that operation, replicate that snapshot to another server, and then remove the snapshot (either now or sometime in the future).

I think I'd accomplish this by completely ignoring the GUI all together.
1) rsync <whatever>
2) zfs snapshot <dataset>@tmp-replication
3) zfs send <dataset>@tmp-replication <...>
4) zfs destroy <dataset>@tmp-replication

If you want to keep other snapshots for purposes of local rollbacks, configure that in the GUI, but your operations should be able to be handled completely independent of the GUI.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
I'm not sure what you mean by:
Snapshots made by the GUI or command line are stored by ZFS in the filesystem. If the GUI stores anything in a database, it keeps it up to date with changes made on the command line (this was one of the big changes in, I think, FreeNAS 8.3). I can back this up that snapshots that I've made automatically from GUI tasks, manually from the GUI, and manually on the command line all display in the GUI and don't disappear after a reboot.

Basically, the snapshot that is taken is not stored in freenas's config file. When you go in the gui to list snapshots, it runs a command every time (eg. zfs list -t snapshot). The snapshot task is stored in the config.

If I understand your goals correctly, you want to run an rsync task, snapshot the result of that operation, replicate that snapshot to another server, and then remove the snapshot (either now or sometime in the future).

I think I'd accomplish this by completely ignoring the GUI all together.
1) rsync <whatever>
2) zfs snapshot <dataset>@tmp-replication
3) zfs send <dataset>@tmp-replication <...>
4) zfs destroy <dataset>@tmp-replication

If you want to keep other snapshots for purposes of local rollbacks, configure that in the GUI, but your operations should be able to be handled completely independent of the GUI.

I could do that. The down side is that I'd have to write something to manage the snapshots. If I did that, I'd make the script manage the link trees instead.

I'll file a wishlist bug on the snapshot feature to add a pre-command.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
Basically, the snapshot that is taken is not stored in freenas's config file. When you go in the gui to list snapshots, it runs a command every time (eg. zfs list -t snapshot). The snapshot task is stored in the config.
Gotcha, though I still don't know what you meant by:
They probably won't come back after a reboot as they are part of the filesystem.
The snapshots will be destroyed on reboot? They won't be displayed in the GUI?

I could do that. The down side is that I'd have to write something to manage the snapshots. If I did that, I'd make the script manage the link trees instead.
What link trees? And, I'm not sure how you like the snapshots to be managed now, but you might be helped by the [thread=10304]ZFS Rollup script[/thread] that I wrote for deleting snapshots at different intervals.
 

emcadmin

Dabbler
Joined
Jan 14, 2013
Messages
15
Gotcha, though I still don't know what you meant by: The snapshots will be destroyed on reboot? They won't be displayed in the GUI?

I said that, oops. Should have been "will". No, snapshots aren't destroyed on reboot.

What link trees? And, I'm not sure how you like the snapshots to be managed now, but you might be helped by the [thread=10304]ZFS Rollup script[/thread] that I wrote for deleting snapshots at different intervals.

It's a function of my script. If the destination is a symlink, it will create a new directory and use --link-dest in rsync to create a hardlink tree. I use this on linux for my own machines. I'm just adapting it to work on freebsd/nas. Sorry for the snapshot confusion. For on of the systems I'm working with, the schedule is more complex than all the other systems.
I have this setup on a netapp fas270:
sv_daily 10@mon-fri@20
sv_hourly 10@mon-fri@10,13,15
sv_weekly 2@sat@20
Format was <snap name> <count>@<day list>@<hour list>

This is why I wanted a pre command. It would be easier to manage. It would let the GUI manage all the snapshots and the script would beable to run before the snapshot takes place.

At the moment, I have a cron job setup for those transfer times (2 actually) and 3 snapshot schedules configured. Unfortunately, the snapshot schedule can't be defined in a similar fashion to cron jobs in the GUI.

Maybe I'm just being picky, but this is what we (myself and the other people in IT here) agreed on.
 

fracai

Guru
Joined
Aug 22, 2012
Messages
1,212
It's a function of my script. If the destination is a symlink, it will create a new directory and use --link-dest in rsync to create a hardlink tree. I use this on linux for my own machines. I'm just adapting it to work on freebsd/nas.
Gotcha, I used to incrementally backup some data like this using "rsnapshot". Now I just rsync the data to my pool and let snapshots handle the incremental backup and my rollup script to prune extraneous snapshots.

Sorry for the snapshot confusion. For on of the systems I'm working with, the schedule is more complex than all the other systems.
I have this setup on a netapp fas270:
sv_daily 10@mon-fri@20
sv_hourly 10@mon-fri@10,13,15
sv_weekly 2@sat@20
Format was <snap name> <count>@<day list>@<hour list>

This is why I wanted a pre command. It would be easier to manage. It would let the GUI manage all the snapshots and the script would beable to run before the snapshot takes place.

At the moment, I have a cron job setup for those transfer times (2 actually) and 3 snapshot schedules configured. Unfortunately, the snapshot schedule can't be defined in a similar fashion to cron jobs in the GUI.

Maybe I'm just being picky, but this is what we (myself and the other people in IT here) agreed on.
Being picky is what IT is about I thought? ;-)

Anyway, I think I see what you're doing and my methods would end up with more snapshots than I think you'd like to see.

I agree though, I wish snapshots were scheduled the same way as cron jobs. (I find it irritating that you can't set the minute that the snapshot will occur)
 
Status
Not open for further replies.
Top