Running out of storage space even after deleting stuff

Status
Not open for further replies.

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
I have an ix systems built freenas mini, purchased on amazon. It is running Freenas 9.10.2-U1. I have eight 4TB WDC HDD in it.

I originally created one volume. Let's just call it VOLUME01. I then created one dataset to it.
When I click on Storage -> Volumes it looks like this:

Code:
VOLUME01
	-VOLUME01  (child of "VOLUME01")
		 -IN  (child of the "VOLUME01" child) this I believe is the only dataset


The idea behind this was the "IN" child dataset would be for the inbound data that comes in for backing up a remote server. It uses rsync to get the remote server data and compare it to the IN directory and make appropriate changes to IN to sync the remote server directory.

Then every evening the IN directory gets copied into the /mnt/VOLUME01/backup/<date> folder. So if I go to the /backup/ folder it will list every single day for the past several months. The IN folder is always up to date. Yes, there is a TON of duplicate data but I'm a newbie and didn't trust the deduplication feature since I have a rotation script to delete old /backup/<date> folders about 2 months out figuring that I could endlessly do full directory backups without worry about 95%+ of the data being duplicated.

I then have snap shots setup for the IN folder to run every day and retain for 2 months.

When I look at snap shots all of them have "0" for usage and between 190gb to 210gb for refer. There are around 175 of these line items in the snap shots section.

My concern is that I received an email saying disk space was 95% usage. I then deleted around 20 backups dated folders and every few I would get an email saying 94% usage, 93% usage, 92%, 91%, etc and then I stopped receiving the emails because I wasn't using more than 90% of disk usage.

Fast forward 24 hours and I get several emails in a matter of minutes saying 91% usage, 92% usage, 93% usage, 94% usage, 95% usage! Back to where I was, even though there was maybe only 50mb of new data in the IN folder (dataset)

So i'm a bit confused. I thought maybe that the snap shots are actually each using 200GB (and there are about 3 a day even though the IN folder is only 250GB total).

I'm just lost on where the data is hiding. When I run a DF -h I get
VOLUME01 97% usage
VOLUME01/IN at 40% usage (220gb out of 567gb)


Any ideas? Anything need to be cleared up?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Anything need to be cleared up?
Almost certainly the snapshots, and the space taken by them is unintuitive at best. Do you really need snapshots? If I'm understanding you correctly, the box is only used as a backup destination, you keep backups for two months, then delete them (and yes, doing a full backup every day is extraordinarily wasteful of storage, but it has the advantage of being simple). If you keep snapshots for two months as well, in effect, you're keeping your backups for four months. Do you really need to do that?
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
I'm just lost on where the data is hiding. When I run a DF -h I get
VOLUME01 97% usage
VOLUME01/IN at 40% usage (220gb out of 567gb)

Any ideas? Anything need to be cleared up?

Does the written property of ZFS give you some insight? See also its description in the Native Properties section of the zfs man page ("The amount of referenced space written to this dataset since the previous snapshot.").

Try the commands

zfs list -r -o name,avail,used,refer,usedds,usedsnap,usedchild,written -t all VOLUME01/IN

or alternatively

zfs list -r -o space,refer,written -t all VOLUME01/IN

The latter one shows more columns not related to snapshots but can be more easily memorized. I hope I got the dataset part right for your demand.
 

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
danb35 and MrToddsFriends thank you for the reply.

danb35 - The reason for snap shots is because there will be 200 mb worth of change every day. I want to be able to easily revert back or maybe if there is a way see what the changes are from the snap shot compared to the IN dataset. Basically, i set up the "basic" method of a ton of duplicated data for the primary purposes of keeping it simple even though it would use more disk space so that if I ever have to "go back" to a particular date I can simply scp the dated folder to the remote server. Ultimately though, I think you are right, there really isn't any great reason to do snap shots as well as how I copy the entire IN dataset into a bacups folder and dated. I figured that each snap shot didn't take much space as it would only take up space of anything that gets deleted/modified/added. Is that not right?

MrToddsFriends - When I run that command, I see the written property and throughout the last 60 days (2 months) it shows 195GB from August and then every few days 180M, 238M, 200M, 115M, 185M, 115M, etc. Totaling around 30GB. The biggest entries in the written column are the 195GB, 17GB and 4GB. The rest are all hundreds of MB's.

Again, my initial thought with snap shots is that it would take up the space of the IN dataset (which is only about 200 mb) and then from there only take up space according to what gets deleted, moved, modified or added of the IN dataset. I guess my understanding of snap shots is not correct.
 
Joined
Apr 9, 2015
Messages
1,258
Maybe the space of a single snapshot is 200MB, but lets say that you have 100 snapshots sitting there take that 200 times 100 and you find that you have 20000MB of space used to hold them. So that is 2GB on just a few snapshots. But from what I can see my guess is there has been a lot of data written at some point and if you are snapshotting the whole pool recursively you probably have a lot more tied up than you think.

If you REALLY want to know how much space your snapshots are taking then go into the storage tab and click on the snapshot sub tab. Then add up the whole column that says "Used" and you will have a pretty good guess.
 

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
Hi all, OP here. I deleted ALL snap shots and disabled snapshots for the time being. I have around 94% usage right now.

I think my best bet at this point is to leverage the deduplication functionality. If you recall from the original post, I have an "IN" dataset that is a folder where the rsync comes into. Then every evening i copy the IN folder and put it into a backups folder (not in the IN dataset) and date it the day's date. I hold these for a rolling 30 day period of time. So at any given point in time, I have 30 of about 99% of the same files. Let's say I have a 500mb file in the IN dataset, that means that I have 30 of that exact same 500mb file in the backup folders in their appropriate dates (11/1, 11/2, 11/3, etc).

Since I have this already all configured, what is the best method of using the deduplication functionality so that I only have ONE 500mb file and then the 11/1, 11/2, 11/3, etc all reference that ONE 500mb file?

A corilary question is what happens if I delete ONE of the 500mb files, will it really not be deleted since there are other copies of it? In other words, lets say the original 500mb file is in the IN dataset (and then 30 diffeent places in the backup folders). If I delete the 500mb IN file, would that remove it from the 30 different places or is it not direct like that because the 500mb file really isn't the "IN" file itself, but a placement on the hard disk?

Thanks all!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I think my best bet at this point is to leverage the deduplication functionality.
I doubt it--I don't think the Mini holds enough RAM to make use of deduplication safe.
what is the best method of using the deduplication functionality so that I only have ONE 500mb file and then the 11/1, 11/2, 11/3, etc all reference that ONE 500mb file?
There isn't one; dedup doesn't work that way, it works on the block level. If you enable dedup on the relevant pool/dataset, as data is written, it will be deduplicated. But there's no way to deduplicate data that's already there.

What you need to do is come up with a sane backup strategy--this isn't it.
 

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
I doubt it--I don't think the Mini holds enough RAM to make use of deduplication safe.

There isn't one; dedup doesn't work that way, it works on the block level. If you enable dedup on the relevant pool/dataset, as data is written, it will be deduplicated. But there's no way to deduplicate data that's already there.

What you need to do is come up with a sane backup strategy--this isn't it.
danb35 - Thanks for the reply. After posting I've been reading about dedupe, and I agree, it isn't the best method and the Mini wouldn't have sufficient RAM to go this route.

Sane backup strategy - As you correctly pointed out previously, what I have going on right now has the benefit of simplicity. Files come in via rsync into IN dataset. Every evening it copies the IN folder and puts it into a backup folder named appropriately "2017-11-03" for today. Then once a week it puts it into a WEEKLY backup folder (and daily). And then once a month it puts it into a MONTHLY backup folder. Every day it then deletes the DAILY that is 60 days previous (or maybe 30, I may have shrunk it). Either way, I want to retain onto long term backups (YEARLY) forever, monthly for probably a year or two, and then dailies for the past 30 days (since of course I would also have the weekls for the past year).

I just can't let any files get deleted from the main server that rsyncs to this server because it would delete the appropriate files in the IN folder on the freenas and then ultimately they won't be in the backups. I'm just trying to keep ALL FILES that ever get rsynced into the IN folder long term.

I'm not sure how to sanely put this together beyond what I've done which seems very basic and logical. Long term store (monthly backups), mediam term storage (weeklys) and short term storage (dailies)
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
FYI, rsync has too many options, one of which is not to delete on the destination.
 

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
FYI, rsync has too many options, one of which is not to delete on the destination.
Yes, this is understood but we want the "IN" folder on the freenas to be identical (including deletions) as the server it is getting the files from. We would then use the backups, if ever needed (probably not) for the files that were deleted.

Maybe not doing these manual backups and only using snapshots would be the best option because that would keep the deleted files?
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
How you backup is up to you. Another of rsync's too many options is --inplace which can be useful on COW filesystems.
 

onlineforums

Explorer
Joined
Oct 1, 2017
Messages
56
Any input or feedback on a better method based upon what I am doing to accomplish the same thing while using less storage resources?
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
I'm not sure how to sanely put this together beyond what I've done which seems very basic and logical. Long term store (monthly backups), mediam term storage (weeklys) and short term storage (dailies)
You can set up snapshot tasks to do all of these. Duplicating the data over and over again seems rather wasteful.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
A better approach is to make snapshots on your IN directory. Daily, weekly, monthly, etc, set them to expire. Done.

And no dupes.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
And if you want a duplicate on a separate media then just copy the content of a snapshot there, at any time
 
Status
Not open for further replies.
Top