Why "Optimal" number of disks?

Status
Not open for further replies.

Stanri010

Explorer
Joined
Apr 15, 2014
Messages
81
So the general recommendation for the number of disks goes like this

  • Start a RAIDZ1 at at 3, 5, or 9, disks.
  • Start a RAIDZ2 at 4, 6, or 10 disks.
  • Start a RAIDZ3 at 5, 7, or 11 disks.
Based on everything I've been able to find, the recommendation is based off of minimizing the amount of overhead due to parity. Or in other words, minimizing the amount of waste space.

4-disk RAID-Z2 = 128KiB / 2 = 64KiB = good
5-disk RAID-Z2 = 128KiB / 3 = ~43KiB = BAD!
6-disk RAID-Z2 = 128KiB / 4 = 32KiB = good
10-disk RAID-Z2 = 128KiB / 8 = 16KiB = good

However, there's articles floating around that pretty much says that recommendation is not all that important because of either compression or it's just not that big of a impact due to different block sizes.

ZFS stripe width: http://blog.delphix.com/matt/2014/06/06/zfs-stripe-width/

4k overhead: https://web.archive.org/web/2014040...s.org/ritk/zfs-4k-aligned-space-overhead.html

Calculator: https://jsfiddle.net/Biduleohm/hfqdpbLm/8/embedded/result/

So am I correct to understand that if I run a 8 drive raidz2 vdev, the only downside is that I might be losing out on a little bit of theoretical space. There shouldn't be any throughput or IOPS performance hits right?

EDIT: I guess my question stems from the enclosure I've got are generally in multiples of 4 per cage and the HBA has 1 SAS breakout to 4 SATA. For future dependability and using raidz2, multiples of 8 is ideal for my situation.

Between the following two configuartions, 8 and 10 drives, the overhead difference doesn't seem to be that much. Or is that calculator wrong?

8.PNG

10.PNG
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I clearly say in the first post of the calculator's thread that the blocks overhead is experimental and that I created another thread to ask how to calculate it precisely (yeah, it's titled checksums overhead because first I wanted to ask about that but then the thread changed to a discussion on all the types of overheads). Moreover, by blocks overhead I initially wanted to say "space lost because a file is smaller than the smaller block" but I think it's ridiculously small for a home server since every file is bigger than 4 kB.

There is also the allocation overhead that I haven't added for now because it's more or less the same issue, a member give me a formula but it seems it's not the right formula anymore because of some changes in ZFS.

I need to confirm all of this on IRC with the devs, I hadn't has the time to do it for now, sorry :)
 
Last edited:
D

dlavigne

Guest
However, there's articles floating around that pretty much says that recommendation is not all that important because of either compression or it's just not that big of a impact due to different block sizes.

Assuming that this is the article you are referring to, it should be noted that it was written by one of the creators of ZFS who is now a maintainer of OpenZFS.
 

Stanri010

Explorer
Joined
Apr 15, 2014
Messages
81
Assuming that this is the article you are referring to, it should be noted that it was written by one of the creators of ZFS who is now a maintainer of OpenZFS.

Yes that's the article I quoted in my post.

How come there's a strong push to keep everyone on the the prescribed optimal number of disks?
 
D

dlavigne

Guest
Which push? The ZFS Primer section of the FreeNAS Guide refers to that and the Volume Manager no longer suggests optimal number of disks (just minimal).
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Actually the "power of two number of data drives" rule is used for two different things: the performance and the overhead size. But the rule is useless for the performance if compression is enabled. And the overhead size doesn't impact the perfomance, it's just some overhead on the usable space.

So, you merged two totally different things in this thread, just be careful to not confound the two ;)
 
Last edited:

Stanri010

Explorer
Joined
Apr 15, 2014
Messages
81
My mistake then. I just see the "rule" (RAIDZ1 at at 3, 5, or 9, disks. RAIDZ2 at 4, 6, or 10 disks. RAIDZ3 at 5, 7, or 11 disks.) posted everwhere but never an true explanation as to why or to what extent you're no longer "optimal". Before posting, I couldn't even figure out if it were just overhead loss or if there's an actual read/write throughput loss.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yeah, the summary is:

a) The rule isn't useful for the performance side of things if you enable compression (and you should, at least for lz4, it just is utterly awesome)

b) The rule is useful to reduce the overhead size (but I don't know the formula (yet) so the gain is maybe just 0.5% for example)

You've already linked the excellent article on the perfomance side of this rule so you have the answer to "but never an true explanation as to why or to what extent you're no longer "optimal"." ;)
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'd say that you should enable compression if you plan to store data which can be compressed. If you are only storing compressed backups then there is no need to select compression, however it doesn't hurt to select compression because as I understand it, if the file you are saving is already compressed then it will not be compressed again, it will just be stored as is. It is something to do with the algorithm (beyond my understanding) where if it tries to compress the file and if it cannot without increasing the file size then it just bypasses the compression. This is done with a sample of the file to save so it is a quick decision. This is as I understand it.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yep, unless you're sure you don't want compression then you should enable it (again, true for lz4, gzip isn't the same) because there is almost no performance hit if it can't compress a file (because it's already compressed) ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd say that you should enable compression if you plan to store data which can be compressed. If you are only storing compressed backups then there is no need to select compression, however it doesn't hurt to select compression because as I understand it, if the file you are saving is already compressed then it will not be compressed again, it will just be stored as is. It is something to do with the algorithm (beyond my understanding) where if it tries to compress the file and if it cannot without increasing the file size then it just bypasses the compression. This is done with a sample of the file to save so it is a quick decision. This is as I understand it.

Correct. If a block that is to be written doesn't shrink by 12.5% or more, then the uncompressed block is stored. Saw the code myself about 6 months ago on this topic. :P
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Correct. If a block that is to be written doesn't shrink by 12.5% or more, then the uncompressed block is stored. Saw the code myself about 6 months ago on this topic. :p
12.5%? That seems awfully high, given abundant CPU power. Is that configurable by choosing a different compression level?
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
No, you can't change it.

Code:
zio_compress_data(enum zio_compress c, void *src, void *dst, size_t s_len)
{
    uint64_t *word, *word_end;
    size_t c_len, d_len;
    zio_compress_info_t *ci = &zio_compress_table[c];

    ASSERT((uint_t)c < ZIO_COMPRESS_FUNCTIONS);
    ASSERT((uint_t)c == ZIO_COMPRESS_EMPTY || ci->ci_compress != NULL);

    /*
     * If the data is all zeroes, we don't even need to allocate
     * a block for it.  We indicate this by returning zero size.
     */
    word_end = (uint64_t *)((char *)src + s_len);
    for (word = src; word < word_end; word++)
        if (*word != 0)
            break;

    if (word == word_end)
        return (0);

    if (c == ZIO_COMPRESS_EMPTY)
        return (s_len);

    /* Compress at least 12.5% */
    d_len = s_len - (s_len >> 3);
    c_len = ci->ci_compress(src, dst, s_len, d_len, ci->ci_level);

    if (c_len > d_len)
        return (s_len);

    ASSERT3U(c_len, <=, d_len);
    return (c_len);
}
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Well, it seems like you can easily change this "d_len = s_len - (s_len >> 3);" to this "d_len = s_len - (s_len >> 2);" for example. However I don't know if you can change it for anything other than a power of two division (should be ok but be careful to performance issues if you do that).
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
Well, it seems like you can easily change this "d_len = s_len - (s_len >> 3);" to this "d_len = s_len - (s_len >> 2);" for example. However I don't know if you can change it for anything other than a power of two division (should be ok but be careful to performance issues if you do that).

Well sure, if you want to patch the kernel yourself then anything is possible. I meant it wasn't possible to alter without recompiling ZFS yourself.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ah ok :)
 

DaveFL

Explorer
Joined
Dec 4, 2014
Messages
68
So if you are limited to 6 disks, possibly 8 (6 are hotswap). What would you use?
 

rihad

Cadet
Joined
Jan 6, 2019
Messages
7
So the general recommendation for the number of disks goes like this
  • Start a RAIDZ1 at at 3, 5, or 9, disks.
  • Start a RAIDZ2 at 4, 6, or 10 disks.

This is a weird recommendation from a space overhead perspective. If you have 4 disks and decide to use raidz2 instead of raidz1 to comply with the "rule of thumb", you will only have 2 disks worth of usable space, not 3. And given that disks age and wear out at a more or less equal speed, having 2 spares isn't that much better than having just one ) There still could be not enough time for the new replaced disk to catch-up (resilver). Trying to install disks from different manufacturers in the hope that they won't die at about the same time is just too much hassle. Off-site backups should always complement any RAID setup.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
having 2 spares isn't that much better than having just one )
That is an incorrect and dangerous statement.


given that disks age and wear out at a more or less equal speed
That is far from reality. In the middle of the bathtub curve, disk failures are highly unpredictable and rarely correlate to any measure of wear.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
This is a weird recommendation from a space overhead perspective.
You quoted a post from almost four years ago, which referred to a "general recommendation" that was out-of-date even then (as the rest of this thread demonstrates). Why?
 
Status
Not open for further replies.
Top