Windows Server or FreeNAS for deduplication

Alex White · Jul 12, 2017

Hi all,

I've done quite a lot of searching for this but most forum threads seem to recommend compression over deduplication, however I don't think this would work well for our requirements, so I've signed up to hopefully find out a bit more. We use NextCloud and we're looking to upgrade our storage backend, currently a Synology box. The way NextCloud stores data, specifically the previous versions of files, is by simply storing another copy of the file, that is incredibly similar to the standard file (ie. a word document might only have an additional character, but it will still store 2 files, one without the new character as the old version and one with the new character as the current version).

We currently have around 30TB of data on NextCloud, so we're looking to build a new system capable of storing around 300TB as it is growing at around 1TB per week. I've done tests with Windows Server 2016 and with 5TB of real world user data our deduplication rate was around 66%, with memory usage still under 4GB including running Windows server. Looking at ZFS people seem to suggest 1GB of RAM for every 1TB of deduplicated data, which would mean quite a high spec server for our eventual 300TB data set. I would prefer to go with Linux if possible, however I don't see how we could get the same reduction with compression or afford the memory required for deduplication.

Performance is not an issue for us, as most of our users are uploading over a 1Mbps to 20Mbps broadband connection. We simply need the most efficient way to store the data. Is there anyway that we could get the same levels of data savings with Linux without using heaps of RAM, or are we stuck with Windows? Any help would be much appreciated.

Many thanks
Alex

Ericloewe · Jul 12, 2017

Alex White said:
I would prefer to go with Linux if possible

FreeNAS is not Linux. Good thing, too.

ZFS deduplication would probably help with such an egregiously inefficient setup, but it takes an absurd amount of RAM.

A better solution would be to use something that avoids the whole "entirely new file if one bit changes" situation.

Alex White · Jul 12, 2017

Thanks, unfortunately when we were looking at Dropbox style alternatives there weren't that many solutions other than OwnCloud and NextCloud that we could find, and we're now fairly invested in these having started with OwnCloud and moved to NextCloud recently. Their answer to the duplication issue is that it should be handled by the filesystem, which is irritating but other than this it is very good. Basically we're stuck with this system, I'm just hoping to avoid using a Windows backend for the storage if something else can come close.

eldo · Jul 12, 2017

I've just swapped owncloud / nextcloud out for seafile. One of its leg ups is that it does versioning using only the changed bits. For my users we made the migration by installing the sync agent and having their library(named whatever they wish) synced to the $user\owncloud directory. Once the user is migrated, I Uninstall OC, unsync seafile, rename the OC base directory, and resync. From then on its basically transparent.
We didn't OC for any features other than file sync, and of course YMMV

On the down side, OC has its file store as directories, where seafile uses some blocking witchcraft, and to gain access without the agent you would locally need to do a fuse mount, and it's read only.

Not an issue for me with FreeNAS' zfs snapshots (in case something goes pear shaped) but it was more convenient to remotely access specific files and directories over a winscp session in a pinch.

Sent from my SM-G930T using Tapatalk

Alex White · Jul 13, 2017

Thanks for pointing out Seafile, I'll certainly give that a trial run over the next few days. It'd be quite a big change for us (we've put a lot of time into NextCloud) and some of the things we could do with are only available in the pro version (such as 2 factor authentication), so I think we'll still need something to store our NextCloud data on for now until we can move everyone over.

I was doing a bit more research into ZFS deduplication and read that you could store the hash table (I'm still new to FreeNAS and ZFS so forgive me if I get a few of the terms wrong) onto an SSD rather than into the RAM? Is this practical in a production environment or does the performance really become unusable? And would it be easy to recover from in the event of a power failure (unlikely as we are in a datacentre with UPSs and generators but still good to plan for) or drive failure? If this is a possibility then it would certainly make FreeNAS feasible with a couple of 480GB SSDs in.

eldo · Jul 13, 2017

No idea bout dedup, it's something I've never had any need for.
I did find this

When the system starts storing the DDTs (dedup tables) on disk because they no longer fit into RAM, performance craters.

here: http://doc.freenas.org/11/storage.html#deduplication

Seafile doesn't have the best support on FreeBSD, though I'm going to see if I can get it working. Unfortunately the pro version is only supported on linux.
I've also tried their virtual drive component out, and while it's a neat idea it was waaaaay too flaky and unstable for me to use even to test with on my win7 x64 workstation.

Important Announcement for the TrueNAS Community.

Windows Server or FreeNAS for deduplication

Alex White

Cadet

Ericloewe

Server Wrangler

Alex White

Cadet

eldo

Explorer

Alex White

Cadet

eldo

Explorer

Similar threads