So... what comes after RAID-Z3?

Status
Not open for further replies.

RedBear

Explorer
Joined
May 16, 2015
Messages
53
After reading the article online talking about how (and why) RAID5 "died" in 2009, and seeing a reference in the newbie slideshow that even RAID-Z3 will stop being capable of guaranteeing data protection as soon as 2019, I am very curious as to what comes next. If I'm understanding things correctly this is happening because the individual drive data density is becoming so high that an unrecoverable read error (and/or second/third drive failure) is becoming ever more highly probable during the ever-longer rebuild processes.

Are we looking at a future where we have to stick to sub-10TB drives, or need ten fully redundant mirrors of each 25TB drive just to guarantee long-term data retention? Genuinely curious here, since the whole reason I'm getting into FreeNAS/ZFS in the first place is to be capable of building myself storage systems that will take care of my data for the coming decades. Surely there is a practical limit to new, higher RAID-Z levels. I've certainly heard no mention of RAID-Z4 or 5 even being in development. And yet, as the slideshow says, RAID-Z3 will soon "fail" at its task. Eventually we'll get stuck in a loop of having arrays fail and then having the backup array fail before we can even complete the restoration process.

Seems like the relatively near future of computing actually calls for something far more resilient and recoverable than ZFS, or data storage costs are going to start ballooning within a decade or so. Perhaps we need filesystems that happily allow a few unrecoverable bit errors here and there rather than freaking out and losing 100TB of data. (Although, I guess we already have plenty of filesystems that do that, amirite? *wink, wink, nudge, nudge*)

Or am I misunderstanding the whole mathematical inevitability of the future bit-storage situation?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I'd say 2019 sounds very pessimistic, I'm not even sure that RAID-Z2 will be dead in 2019...
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
This whole discussion is maddening.

RAID-Z is not dead (not even close), RAID-Z2 is not even remotely threatening to be dead, and RAID-Z3 won't be dead before any of you are collecting social security checks.

Now of course, I certainly don't recommend someone put 5x6TB drives in RAID-Z. Or 8x2TB drives in RAID-Z. That would, in fact, be stupid.

But blanket soundbites that unsophisticated people take to be gospel from a nerd are really quite damaging. "RAID-Z is dead", with the implication that there is no situation, no pool, in which RAID-Z is correct, which has been everywhere for a few years, is one such statement. RAID-Z is no more dead than Cyberjock's mom.

Depending on the size of drives in the array, the accuracy of the stated failure rates, the number of drives in the vdev, and all that other jazz, RAID-Z, in fact, may be alive and well. For example, in a small FreeNAS, say 3x3TB, I wouldn't even *REMOTELY* hesitate to run that on RAID-Z. Like anything else, you have to understand the risks, you have to assess where your particular ability to administer your server and maintain your hard drives will fall on the continuum, you have to understand how your particular collection of vdevs and/or pools crosses the risk axis, and so on.

*I* run RAID-Z, gladly, on one of my own vdev's having 3x2TB drives.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The real problem with a hypothetical RAIDZ4 is that you need a set of functions that allow you to reconstruct the original two blocks' worth of data from any two out of 6 blocks generated from the original.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The real problem with a hypothetical RAIDZ4 is that you need a set of functions that allow you to reconstruct the original two blocks' worth of data from any two out of 6 blocks generated from the original.

That's easily achieved from a coding perspective. But the performance implication with relation to CPU usage gets exponentially larger with each additional parity level. So RAIDZ4 might be pretty ugly. I think that instead of RAIDZ4 you'd be better off with smaller RAIDZ3 vdevs. Lost space due to parity, whether larger RAIDZ4 or smaller RAIDZ3 should basically even out if you are trying to even out the risk. ;)
 

RedBear

Explorer
Joined
May 16, 2015
Messages
53
This whole discussion is maddening.

RAID-Z is not dead (not even close), RAID-Z2 is not even remotely threatening to be dead, and RAID-Z3 won't be dead before any of you are collecting social security checks.

Now of course, I certainly don't recommend someone put 5x6TB drives in RAID-Z. Or 8x2TB drives in RAID-Z. That would, in fact, be stupid.

But blanket soundbites that unsophisticated people take to be gospel from a nerd are really quite damaging. "RAID-Z is dead", with the implication that there is no situation, no pool, in which RAID-Z is correct, which has been everywhere for a few years, is one such statement. RAID-Z is no more dead than Cyberjock's mom.

Depending on the size of drives in the array, the accuracy of the stated failure rates, the number of drives in the vdev, and all that other jazz, RAID-Z, in fact, may be alive and well. For example, in a small FreeNAS, say 3x3TB, I wouldn't even *REMOTELY* hesitate to run that on RAID-Z. Like anything else, you have to understand the risks, you have to assess where your particular ability to administer your server and maintain your hard drives will fall on the continuum, you have to understand how your particular collection of vdevs and/or pools crosses the risk axis, and so on.

*I* run RAID-Z, gladly, on one of my own vdev's having 3x2TB drives.

Well... um, duh?

I did attempt to make it abundantly clear that I was only referring to the largest available hard drives (6/8TB) and those even larger drives (10/15/25TB) that may become available in the next decade or so. Their data density is so high that they are exceeding their own mathematical capabilities to avoid having unrecoverable read errors during the time it takes to rebuild a failed drive in the array. I believe that was the point of the original article declaring RAID5 to be "dead" in 2009. It's nice that you still have a purpose for a 2TB 3-disk RAID-Z set, but the future will need much more space than that. Of course if you stick with historical drive sizes you can also stick with historical RAID levels like RAID5/RAID-Z, and soon RAID-Z2. But as time marches onward, the individual drives will get so data-dense that even RAID-Z3 levels of data protection will be essentially a gamble rather than a near certainty. In other words, RAID-Z3 will eventually be "dead" for practical purposes. Whether it occurs in 5 years or 15 years is a quibble.

As I suspected the primary solution that is being proposed, nonchalantly, is simply adding more full mirrors of each disk. This of course increases costs much faster than simply adding a Z-level of redundancy to a multi-disk pool. I don't think having, as I said before, ten full mirrors of each single 25TB hard drive will ever be economically feasible for anyone. Even enterprise users are going to start balking at their data storage costs increasing geometrically every few years.

That's easily achieved from a coding perspective. But the performance implication with relation to CPU usage gets exponentially larger with each additional parity level. So RAIDZ4 might be pretty ugly. I think that instead of RAIDZ4 you'd be better off with smaller RAIDZ3 vdevs. Lost space due to parity, whether larger RAIDZ4 or smaller RAIDZ3 should basically even out if you are trying to even out the risk. ;)

And finally Cyberjock (the author of the PowerPoint slideshow claiming RAID-Z3 will be dying somewhere around 2019) weighs in. This pretty much confirms my suspicions, that there are practical limits to going to higher Z-levels due to the calculations required. There's a point of diminishing returns, where it becomes more feasible to simply throw more physical storage hardware at the problem. But then we run into issues with money, electricity, and physical space availability. Not just at home but in data centers as well.

Of course, we seem to be approaching the physical limits of data density in spinning disk hard drives, so maybe we'll never actually see one larger than 10 or 12TB. Then again, someone just released a 6TB SSD, so maybe ultra high capacity SSDs will be the norm in 20 years. But SSDs have read errors too, so we're left with the same problem, ultimately. There is simply a practical limit to how dense we can make our data before various forms of entropy try to take it away from us, and no one seems to have an answer to that mathematical brick wall besides "just use more mirrors dewd".

It's an interesting conundrum, and it will be interesting to see how the data storage industry uses that magical word "innovation" to someday solve it. If they can.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
When the drive density starts to approach the mathematical limits of URE, it would seem to me that we will build error correction into the device. I don't know what that looks like, but there is no reason why we couldn't have some form of on board raid built in. Tiny dedicated proc doing onboard z2, or z3 like correction and parity, such that all you see is one huge, reliable, fast device... By the time we are approaching these limits, they should be able to incorporate tiered storage as well. Think, hybrid NVRAM, SSD, HDD with builtin z3 like characteristics. Chain em all up via whatever uber BUS we're on... hell maybe they are all quantum and wireless. Maybe it is local, maybe it is cloud... maybe everything is "live" and streamed and only a few data centers have to consider the problem.

At some point you will have all the storage you want, as fast as you can afford, and it will 'just work'. My kids already have it... they just have some old prick in the middle making it work vs buying off the shelf or signing up.

Density does not increase in a vacuum.
 

RedBear

Explorer
Joined
May 16, 2015
Messages
53
When the drive density starts to approach the mathematical limits of URE, it would seem to me that we will build error correction into the device. I don't know what that looks like, but there is no reason why we couldn't have some form of on board raid built in. Tiny dedicated proc doing onboard z2, or z3 like correction and parity, such that all you see is one huge, reliable, fast device... By the time we are approaching these limits, they should be able to incorporate tiered storage as well. Think, hybrid NVRAM, SSD, HDD with builtin z3 like characteristics. Chain em all up via whatever uber BUS we're on... hell maybe they are all quantum and wireless. Maybe it is local, maybe it is cloud... maybe everything is "live" and streamed and only a few data centers have to consider the problem.

At some point you will have all the storage you want, as fast as you can afford, and it will 'just work'. My kids already have it... they just have some old prick in the middle making it work vs buying off the shelf or signing up.

Density does not increase in a vacuum.

On the surface what you say would seem to make sense, but I think ultimately you're engaging in a bit of magical thinking here. When you think it through it actually doesn't make much sense. Hard drives already contain some basic error detection and correction technologies, and hot-spare sectors, just like SSDs, so that they don't give up the ghost if they find a single unreadable sector. Given that we are already brushing our noses up against a truly hard limit on ultimate data density, it doesn't seem practical that we will somehow figure out how to shrink the data down far enough that we can have RAID-Z3-like redundancy within the enclosure of the hard drive itself, without losing so much data to parity that we end up with a drive with less capacity than what we started with. Even if it were possible, it would only be a stop-gap measure, with its own practical limit rapidly reached.

Whether it's local or cloud, the bits have to be stored in a physical location on some physical hardware device. Magical thinking doesn't shift the physical limitations of the universe, and we seem to be rapidly approaching those limitations from several different directions simultaneously. Or maybe we'll be using base-4 DNA-based 3-D storage media 30 years from now, and none of these current practical limits with magnetic storage will matter anymore. But is there any guarantee that DNA-based storage would be any more reliable, with a lower URE? If not, we're left with the same problem of how to keep the data from degrading into noise, and back to the limits of data redundancy versus physical space occupied.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
Of course any speculation requires magical thinking. The fact is we are solving problems that don't exist yet.

Solving the problem of reliable data is a function of measuring the error rate, and applying corrective math. We can apply the principles to dna-based, or quantum based storage, we can build the necessary functions into the media and substrate itself. Perhaps that DNA is structured such that the checksum and parity bits are built in to correct at a far higher level than encountered.

In terms of pure magnetic hard limits... meh. They exist. Then they are beaten. It is the same with error rates 10^14 can easily become 10^28. Who knows maybe we get down to lossless compression in atomic structures. We are already well beyond capacity having meaning. It might be pretty short-sighted to assume data requires storage on a physical device in a physical location. Why couldn't it exist as a stream or a wave or a state? What is the density of light?

Obviously I'm just having fun. Truth is our current math and physics can easily address this near term. There is no reason z2 or z3 couldn't be built into hardware. With 6TB+ densities of SSD here now. We could stack them deeper, we can join them up. Magnetic limits are just like tape or punch cards; they matter for a while then they do not.
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Just a thought: drives size are increasing but the CPU computational power is increasing too. So even if RAID-Z4 uses too much ressources now to be useful, you can be pretty sure that in the future it'll be just a light load...

As for the future storage types: I bet on a 3D solution, not necessarily holographic, but a storage more than just on a surface.
 

RedBear

Explorer
Joined
May 16, 2015
Messages
53
Of course any speculation requires magical thinking. The fact is we are solving problems that don't exist yet.

Solving the problem of reliable data is a function of measuring the error rate, and applying corrective math. We can apply the principles to dna-based, or quantum based storage, we can build the necessary functions into the media and substrate itself. Perhaps that DNA is structured such that the checksum and parity bits are built in to correct at a far higher level than encountered.

In terms of pure magnetic hard limits... meh. They exist. Then they are beaten. It is the same with error rates 10^14 can easily become 10^28. Who knows maybe we get down to lossless compression in atomic structures. We are already well beyond capacity having meaning. It might be pretty short-sighted to assume data requires storage on a physical device in a physical location. Why couldn't it exist as a stream or a wave or a state? What is the density of light?

Obviously I'm just having fun. Truth is our current math and physics can easily address this near term. There is no reason z2 or z3 couldn't be built into hardware. With 6TB+ densities of SSD here now. We could stack them deeper, we can join them up. Magnetic limits are just like tape or punch cards; they matter for a while then they do not.

You may be having fun, but you're still strongly implying that there are no hard physical limits that will ever be insurmountable. I have to fundamentally disagree with that assessment.

Just a thought: drives size are increasing but the CPU computational power is increasing too. So even if RAID-Z4 uses too much ressources now to be useful, you can be pretty sure that in the future it'll be just a light load...

As for the future storage types: I bet on a 3D solution, not necessarily holographic, but a storage more than just on a surface.

Both drive density and CPU speeds are not going up the way they used to. The only reason we've gotten any real improvement out of modern day CPUs is because we've transitioned from single-threaded processing to highly parallelized processing during the last decade or so. But that too (parallelization of computing tasks) has its limits. And then, as Cyberjock noted, higher Z-levels will start to eat up so much disk space for parity that we might as well just go to multiple full blown mirrors and cut out the need for higher powered CPUs.

Going from 2D to 3D storage with URE rates remaining constant will just mean that we reach the limits of RAID-Z3 even faster. For sure 3D storage of some kind will make our data more dense in the future, but it won't make those bits any easier to protect from entropy.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Most of you are thinking about this the wrong way. A URE on a RAID5 prevents recovery of a data block because you are missing additional redundancy which would allow you to rebuild it. That's a hard error. That's the 2009 issue.

On RAIDZ2, you can lose a drive and ALSO experience a URE while rebuilding the drive and you still have sufficient redundancy to do so. The URE does not invalidate the entire drive, it merely invalidates the block that is being read. As long as that can be rebuilt from the remaining redundancy, you're fine. This is why RAIDZ2 is a very good choice for resiliency: you would need to lose a drive, have a bad block, and then have ANOTHER corresponding bad block on one of the remaining hard drives in order to get an unrecoverable error.

RAIDZ3 protects against even more failure than that.

Any claims that RAIDZ3 dies in 2019 is based on a misunderstanding of what is going on. In order for a block to be unrecoverable on RAIDZ3, you need to have a URE (or drive loss) covering that SPECIFIC block on four separate devices. At least for the foreseeable future, this is statistically so frickin' unlikely that it just isn't going to happen unless you've dropped your entire array out a second-story window, baked it to death, or otherwise found a way to cause large numbers of drives to simultaneously develop huge numbers of faults.

Put differently, each of your devices could develop a bad block rate of 1 in 100,000 and as long as the bad blocks were randomly placed across the drives, RAIDZ3 would very likely be able to fully recover the data. This is an experiment that you can actually try -- you don't need failing disks, even! Just write garbage blocks out to an experimental pool.

The real concern is that as the amount of time it takes to rebuild an array increases, the window of opportunity for additional drives to fail becomes much larger. This IS a real problem, of course.

RAID6 (RAIDZ2) becomes more unreliable as we pass into the 2020's, but RAIDZ3 is good for awhile longer than that.

Anyone wishing to participate in this discussion should probably be reading http://queue.acm.org/detail.cfm?id=1670144 and understand what I'm talking about prior to replying. See in particular figure 7.
 

RedBear

Explorer
Joined
May 16, 2015
Messages
53
Thanks very much for the very well-informed input, jgreco. I've certainly never said that RAID-Z3 is no good at the moment, and I don't know of anything better. If you think 2019 is too soon for RAID-Z3 to die you might want to ask Cyberjock to update his newbie slideshow with a more realistic estimate. 2026, perhaps?

A very interesting quote from that link:

RAID-6 is inadequate, leading to the need for triple-parity RAID, but that, too, if current trends persist, will become insufficient. Not only is there a need for triple-parity RAID, but there's also a need for efficient algorithms that truly address the general case of RAID with an arbitrary number of parity devices.

Beyond RAID-5 and -6, what are the implications for RAID-1, simple two-way mirroring? RAID-1 can be viewed as a degenerate form of RAID-5, so even if bit error rates improve at the same rate as hard-drive capacities, the time to repair for RAID-1 could become debilitating. How secure would an administrator be running without redundancy for a week-long scrub? For the same reasons that make triple-parity RAID necessary where RAID-6 had sufficed, three-way mirroring will displace two-way mirroring for applications that require both high performance and strong data reliability. Indeed, four-way mirroring may not be far off, since even three-way mirroring is effectively a degenerate, but more reliable, form of RAID-6, and will be susceptible to the same failings.

Sounds a lot like what I've been saying. As drives get larger they will need to be fully mirrored to two other disks, and then three other disks, and then four, and so on. Costs could become quite prohibitive at some point. Might be two or three decades, but it seems inevitable. We might need a petabyte of raw storage in the future just to keep a single 100TB drive full of data functional.

And the Figure 7 you refer to seems to show RAID6 (RAID-Z2) about to cross 50% probability of data loss by 2019, while triple-parity approaches a disturbing 10% or so at the same time.

Interesting non-sequitur: Blade Runner just started playing on channel 519 as I'm composing this. It's supposedly set in 2019. Good thing the planet doesn't actually look like that quite yet.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
And the Figure 7 you refer to seems to show RAID6 (RAID-Z2) about to cross 50% probability of data loss by 2019, while triple-parity approaches a disturbing 10% or so at the same time.

No, it doesn't. Compare that to Figure 6. There's no scale provided, but it is clear that comparatively, RAID6 in 2019 will be facing a similar crisis to RAID5 in 2009. This makes various assumptions about disk sizes, etc., but it should be clear that the RAIDZ3 curve is substantially further out there.

Also see https://www.cafaro.net/2014/05/26/why-raid-5-is-not-dead-yet/
 

RedBear

Explorer
Joined
May 16, 2015
Messages
53
No, it doesn't. Compare that to Figure 6. There's no scale provided, but it is clear that comparatively, RAID6 in 2019 will be facing a similar crisis to RAID5 in 2009. This makes various assumptions about disk sizes, etc., but it should be clear that the RAIDZ3 curve is substantially further out there.

Also see https://www.cafaro.net/2014/05/26/why-raid-5-is-not-dead-yet/

You are so right. Silly me, my brain assumed that since Figure 7 has four sections and it's just labeled "probability" it was 25% per section, but it's just an expansion of the same data on Figure 6. My mistake.

I assume Figure 6 is supposed to be the one actually showing 0-100% on the left, which means RAID5 reaches 100% probability of data loss about 2017. RAID-Z3 (or their very interesting RAID-7.3 terminology) appears to be in pretty good shape till about, what, 2030, at a wild guess?

Thanks for the perspective.
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
You may be having fun, but you're still strongly implying that there are no hard physical limits that will ever be insurmountable. I have to fundamentally disagree with that assessment.
Nope. I'm suggesting that when we hit hard limits, we add structure, or pivot. e.g. With steel we triangulate (structure & math), or we switch to Titanium. With data we add algorithms, and we switch mediums (magnetic to laser, 3D, holographic, quantum). Creativity and innovation is awesome.

Another fun part is, that the need for storage decreases as we increase bandwidth. I might have 100TB of media, but my kids won't need to. We can stream anything. I have 96GB of storage on my phone... when I upgraded again I added none of that back. I just stream. My laptop has literally no meaningful data on it. You steal it... I don't care. Wipe it, meh. I don't care how big the ssd is, doesn't matter how fast the proc is. Content gets created in the cloud or synced. If I need horsepower I rdp in to a workstation or server. Conceptually I could buy those cycles from amazon or any provider.

Honestly I've been dying to put together a 200+ TB rig with NVME and all the goodies. But outside of enterprise, who needs it? There really aren't that many consumers of that space and speed beyond 4K vid , and the datacenter.

I suppose the difference philosophically is when people throw around distant dates for failure I tend to LMAO. Especially in tech. 5 years is forever. 10 is insanity. What disruptive technologies will come into being in that time? But it is fun to think about, accepting that we will be wildly wrong. The thoughts might lead to innovation or progress.
 

RedBear

Explorer
Joined
May 16, 2015
Messages
53
Another fun part is, that the need for storage decreases as we increase bandwidth. I might have 100TB of media, but my kids won't need to. We can stream anything. I have 96GB of storage on my phone... when I upgraded again I added none of that back. I just stream. My laptop has literally no meaningful data on it. You steal it... I don't care. Wipe it, meh. I don't care how big the ssd is, doesn't matter how fast the proc is. Content gets created in the cloud or synced. If I need horsepower I rdp in to a workstation or server. Conceptually I could buy those cycles from amazon or any provider.

Honestly I've been dying to put together a 200+ TB rig with NVME and all the goodies. But outside of enterprise, who needs it? There really aren't that many consumers of that space and speed beyond 4K vid , and the datacenter.

I'd like to say you have a limited use case, but I have to acknowledge that streaming and cloud storage is finally becoming many folks' primary way of using data. Your children's children will probably have no clue what you mean when you say "local storage". We will be considered the oddballs who don't quite grasp the purpose of the Internet.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Cloud storage is becoming many folks' primary way of using data? I mean, sure, it's growing, but, I don't that I would characterize it as taking over traditional data storage. My only cloud storage is emergency backups.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That won't happen until the US last mile providers get a frickin' clue.
 
Status
Not open for further replies.
Top