Alex White
Cadet
- Joined
- Jul 12, 2017
- Messages
- 3
Hi all,
I've done quite a lot of searching for this but most forum threads seem to recommend compression over deduplication, however I don't think this would work well for our requirements, so I've signed up to hopefully find out a bit more. We use NextCloud and we're looking to upgrade our storage backend, currently a Synology box. The way NextCloud stores data, specifically the previous versions of files, is by simply storing another copy of the file, that is incredibly similar to the standard file (ie. a word document might only have an additional character, but it will still store 2 files, one without the new character as the old version and one with the new character as the current version).
We currently have around 30TB of data on NextCloud, so we're looking to build a new system capable of storing around 300TB as it is growing at around 1TB per week. I've done tests with Windows Server 2016 and with 5TB of real world user data our deduplication rate was around 66%, with memory usage still under 4GB including running Windows server. Looking at ZFS people seem to suggest 1GB of RAM for every 1TB of deduplicated data, which would mean quite a high spec server for our eventual 300TB data set. I would prefer to go with Linux if possible, however I don't see how we could get the same reduction with compression or afford the memory required for deduplication.
Performance is not an issue for us, as most of our users are uploading over a 1Mbps to 20Mbps broadband connection. We simply need the most efficient way to store the data. Is there anyway that we could get the same levels of data savings with Linux without using heaps of RAM, or are we stuck with Windows? Any help would be much appreciated.
Many thanks
Alex
I've done quite a lot of searching for this but most forum threads seem to recommend compression over deduplication, however I don't think this would work well for our requirements, so I've signed up to hopefully find out a bit more. We use NextCloud and we're looking to upgrade our storage backend, currently a Synology box. The way NextCloud stores data, specifically the previous versions of files, is by simply storing another copy of the file, that is incredibly similar to the standard file (ie. a word document might only have an additional character, but it will still store 2 files, one without the new character as the old version and one with the new character as the current version).
We currently have around 30TB of data on NextCloud, so we're looking to build a new system capable of storing around 300TB as it is growing at around 1TB per week. I've done tests with Windows Server 2016 and with 5TB of real world user data our deduplication rate was around 66%, with memory usage still under 4GB including running Windows server. Looking at ZFS people seem to suggest 1GB of RAM for every 1TB of deduplicated data, which would mean quite a high spec server for our eventual 300TB data set. I would prefer to go with Linux if possible, however I don't see how we could get the same reduction with compression or afford the memory required for deduplication.
Performance is not an issue for us, as most of our users are uploading over a 1Mbps to 20Mbps broadband connection. We simply need the most efficient way to store the data. Is there anyway that we could get the same levels of data savings with Linux without using heaps of RAM, or are we stuck with Windows? Any help would be much appreciated.
Many thanks
Alex