4TB of Free Space borrowed by SOMETHING in only 12 Hours.. But What?

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36
V. FreeNAS-11.2-U2.1

I'll let the pictures speak for themselves.. but ZFS shows 4TB being eaten away in a very short amount of time.. Over this period of time I was generating Plex thumbnails for existing movies.. However the dataset holding these thumbnails does not show any anomalies.. Well.. aside from the fact that when I get the properties in Windows there is a HUGE disparity between Size vs Size on disk, but this is logical to me since size matches with ZFS Statistics. I ran a few commands at the console to eliminate the possibility of a GUI issue.. the reports in the legacy and new GUI indicate the same progressive loss of space over the past 12 hours..

Aside from the Thumbnail calculation there is also a RSYNC configured to replicate approx 7TB of data, but it has been complete.. I haven't shut it off yet, but it has been done for a day or so. There was some file renaming done on the receive size for organization purposes in my plex server.

Now following a reboot the space available does seem to correct itself. So my question is this.. What exactly about ZFS causes this MASSIVE drift in available space being reported correctly.. is there some kind of temp files for RSYNC, or some hidden metadata gradually taking 4TB of temporary space, or what? Is there some command capable of uncovering where this space is going, or what is consuming it? Can this space be reclaimed without needing to reboot the system via a command? Would this space return on its own if ZFS was just left to its own devices? There MUST be some command or something to uncover what is going on here.

I've searched the posts here and see other people posting similar threads regarding space disparity being resolved from a reboot, but there isn't really a clear answer.. I'm interested in getting to the bottom of this. 4TB of disparity is big.. there has to be some command capable of showing what is temporarily using this space, and reclaiming it without rebooting?

I was about to disable the thumbnail process, and the rsync to stop this from happening, but that MAY not be the best approach here.. I'll leave everything configured alone for the moment so I can test any suggestions if anyone has any.

#1 Command or method to show what is temporarily using the space and how much?

#2 Command or method to reclaim free space without rebooting.

At the rate my free space was being consumed I would have been out of free space in a very short time period.

I haven't setup snapshots yet... My space went from 19.5TB to 15.5TB overnight without adding any data...


dash.JPG


Space being eaten overnight then returning after a reboot....

media space 3.JPG

Code:
NAME                                                     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
puddle                                                   14.9T  20.2T         0     88K              0      20.2T
puddle/.system                                           14.9T  46.7M         0     96K              0      46.6M
puddle/.system/configs-02e83c25ec534b63a4d545000fc9a19d  14.9T  3.40M         0   3.40M              0          0
puddle/.system/cores                                     14.9T  5.26M         0   5.26M              0          0
puddle/.system/rrd-02e83c25ec534b63a4d545000fc9a19d      14.9T  36.9M         0   36.9M              0          0
puddle/.system/samba4                                    14.9T   276K         0    276K              0          0
puddle/.system/syslog-02e83c25ec534b63a4d545000fc9a19d   14.9T   724K         0    724K              0          0
puddle/.system/webui                                     14.9T    88K         0     88K              0          0
puddle/apps                                              14.9T  20.0G         0   20.0G              0          0
puddle/home                                              14.9T    88K         0     88K              0          0
puddle/iocage                                            14.9T  2.81G         0   3.70M              0      2.81G
puddle/iocage/download                                   14.9T   272M         0     88K              0       272M
puddle/iocage/download/11.2-RELEASE                      14.9T   272M         0    272M              0          0
puddle/iocage/images                                     14.9T    88K         0     88K              0          0
puddle/iocage/jails                                      14.9T  1.54G         0     96K              0      1.54G
puddle/iocage/jails/plex                                 14.9T   296M         0     96K              0       295M
puddle/iocage/jails/plex/root                            14.9T   295M         0    295M              0          0
puddle/iocage/jails/radarr                               14.9T   484M         0     96K              0       484M
puddle/iocage/jails/radarr/root                          14.9T   484M         0    484M              0          0
puddle/iocage/jails/sabnzbd                              14.9T   245M         0     96K              0       245M
puddle/iocage/jails/sabnzbd/root                         14.9T   245M         0    245M              0          0
puddle/iocage/jails/sonarr                               14.9T   458M         0     96K              0       458M
puddle/iocage/jails/sonarr/root                          14.9T   458M         0    458M              0          0
puddle/iocage/jails/syncovery                            14.9T  96.4M         0     92K              0      96.3M
puddle/iocage/jails/syncovery/root                       14.9T  96.3M         0   96.3M              0          0
puddle/iocage/log                                        14.9T   116K         0    116K              0          0
puddle/iocage/releases                                   14.9T  1.00G         0     88K              0      1.00G
puddle/iocage/releases/11.2-RELEASE                      14.9T  1.00G         0     88K              0      1.00G
puddle/iocage/releases/11.2-RELEASE/root                 14.9T  1.00G     1.08M   1024M              0          0
puddle/iocage/templates                                  14.9T    88K         0     88K              0          0
puddle/media                                             14.9T  20.2T         0   20.2T              0          0

No Snapshots to speak of
Code:
NAME                                                 USED  AVAIL  REFER  MOUNTPOINT
freenas-boot/ROOT/11.2-U2.1@2005-01-01-01:49:47     2.19M      -   755M  -
freenas-boot/ROOT/11.2-U2.1@2019-03-25-22:36:53     2.29M      -   755M  -
puddle/iocage/releases/11.2-RELEASE/root@plex        240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@sonarr      240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@radarr      240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@syncovery   240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@sabnzbd     144K      -  1024M  -
 

Attachments

  • media space.JPG
    media space.JPG
    58 KB · Views: 281
  • media space 2.JPG
    media space 2.JPG
    56.5 KB · Views: 278
Last edited:

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36
Snapshoots?
No.. No snapshots. None setup yet.

I'm fully perplexed... Does RSYNC use space up that gets reclaimed on a reboot? I can't figure this out.

I see other threads discussing similar disparities that get resolved by a reboot.. Do I need to schedule reboots? Doesn't seem right.


Code:
#zfs list -t snapshot
NAME                                                 USED  AVAIL  REFER  MOUNTPOINT
freenas-boot/ROOT/11.2-U2.1@2005-01-01-01:49:47     2.19M      -   755M  -
freenas-boot/ROOT/11.2-U2.1@2019-03-25-22:36:53     2.29M      -   755M  -
puddle/iocage/releases/11.2-RELEASE/root@plex        240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@sonarr      240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@radarr      240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@syncovery   240K      -  1024M  -
puddle/iocage/releases/11.2-RELEASE/root@sabnzbd     144K      -  1024M  -
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, go through the files in that dataset and figure out what is taking up so much space.
 

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36
Well, go through the files in that dataset and figure out what is taking up so much space.
I did that.. do you think that's not the very first thing I did?

Lets see.. I did an aging report on the stuff added to the filesystem with only 49GB added in the past 24 hours, not 4TB..
chart.JPG


I also analyzed the file system deeply in a number of ways.. including searching for hidden objects, etc.... the overall structure and usage of the filesystem viewing reports etc.. the df command said 20T, everything being reported by ZFS said 20T... I was only able to find 16TB used in the filesystem. If it was files taking up space, why would the space come back after a reboot?

@Ericloewe What method to scan the dataset can you recommend???
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Does the reboot take a very long time, waiting on ZFS to import the pool?
 

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36
I started writing the post when I saw 4TB was missing overnight... researching the situation simultaneously...

When I started finding posts regarding rebooting bringing back the free space I then documented as much as I could do before doing the actual reboot so I could try to get to the bottom of exactly what was going on here.

Obviously I could say.. problem resolved.. I rebooted and my space is back.. but the part of my brain which needs "understanding" is fighting me on that.

I guess ultimately it doesn't matter.. and it all comes down to how ZFS accounts for its space.. but given the fact that a reboot seems to "sure up" accurate space accounting, it leaves me wondering if there is a way to trigger that without dismounting the pool and remounting it.. or whatever.

Some Backstory: I just spent a chunk of change buying another 2x 10TB drives after buying an initial 6x 10TB drives ... (I went with Mirroring instead of Parity, so I'm already sorta weeping over getting seeing 1/2 of the space you just added to your pool gone in 12 hours when no new data was copied into the pool during that period is what I would describe simply as a .. Wait.. WTF @^%@%^$# moment which calls for more understanding.

If what is required is a Full Dismount and Remount of a ZFS pool to get accurate usage numbers.. then so be it.. I just thought given the scenario and a situation which was possibly replicate-able, and my inability to find an answer by searching.. it might be good to get to the bottom of it.

Given what I know beyond a doubt.. a reboot resolved inaccurate space accounting by a factor of 4TB... What are the candidates for what was going on?

Again not critical stuff here.. just wondering. I'll continue to research on my own as well.

Rich
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This almost sounds like ZFS is falling behind on freeing blocks after files are rewritten or deleted- the process must be completed at import time if it was interrupted. But I'm not sure df would reflect that...

Do you have atime=on? That would cause a massive quantity of rewrites if you ran rsync on a directory with tons of files.
 

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36
This almost sounds like ZFS is falling behind on freeing blocks after files are rewritten or deleted- the process must be completed at import time if it was interrupted. But I'm not sure df would reflect that...

Do you have atime=on? That would cause a massive quantity of rewrites if you ran rsync on a directory with tons of files.

I'll check that.. I didn't modify the atime setting so it would be theoretically at whatever 11.2 has it set at.


So if ZFS was left to its devices we would expect the space accounting to revert back and reclaim the 4TB if one was patient??*

Thats all I really need to know actually.. then I could look for that to happen.. I would be interested to see it take place.
 

Death Dream

Dabbler
Joined
Feb 18, 2019
Messages
24
I had something happening to me that sounds very similar to what you are seeing. I have two 120gig SSDs in a mirrored setup. Most of my plugins are here but I also have a single dataset that is used for my downloads. This dataset has a high turn around of downloading and then moving the file off the dataset. Thing is, the dataset is completely empty when not in use but the dataset was acting like it was completely full. Rebooting the server fixed the issue and shows it empty now. Nothing I read really explained what was happening here and everything I was reading kept talking about the snapshots being the issue and it wasn't the case.

I'm not a fan of having to reboot the server to get the free space freed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
A reboot shouldn't fix anything, it's just that ZFS' background deletion requires that any pending frees be done before the pool finishes importing.

It'd be interesting to get some numbers on how that process is coming along in the background. zpool get freeing poolname should give an idea of what's going on.
 

Death Dream

Dabbler
Joined
Feb 18, 2019
Messages
24
I'm sure this will change as my dataset within this pool increases and decreases in size.

Code:
NAME        PROPERTY  VALUE    SOURCE
MiscMirror  freeing   0        default
 

RchGrav

Dabbler
Joined
Feb 21, 2014
Messages
36
A reboot shouldn't fix anything, it's just that ZFS' background deletion requires that any pending frees be done before the pool finishes importing.

It'd be interesting to get some numbers on how that process is coming along in the background. zpool get freeing poolname should give an idea of what's going on.

Thanks @Ericloewe this is more like the kind of stuff I was looking for when I made the initial post.
 

Death Dream

Dabbler
Joined
Feb 18, 2019
Messages
24
Restarting my jail that is doing all the downloading fixes this issue too. Forgot to update this thread once I found that out. Easier than rebooting the whole box.
 
Top