This makes absolutely no since to me... and
Not sure why. Its actually pretty logical.
Say you have a 4 disk RAIDZ1. Each "Stripe" has 3 disks of data and 1 disk of parity. Imagine if you have 100% reliable RAM. If ZFS already knows that one of those four parts are bad and knows which part is bad, why would you NOT want to fix it? It know something is wrong. If it willingly chose not to fix it, then you really have no redundancy for that stripe since you are
already missing 1 of the 4 pieces. So if you were to lose a disk at that moment you'd be in trouble as you'd start resilvering and when you got to that stripe... oops. It has no protection for missing 2 of 4 pieces. Instant resilvering failure. So either you acknowledge the error and fix it while you have the ability to, or you ignore it and potentially have problems with the pool later. Unfortunately, where the "ZFS knows" is that its assuming its not bad in RAM and that something else in the storage path such as cabling, hard disk platters.
It's no different than calling up the bank the instant you realize there's a bank error. You don't want to wait until later when checks start bouncing because of an unauthorized withdrawal. You want it taken care of right now. You know something is wrong, why wouldn't you fix it? ZFS just tries to proactively fix any errors as it goes. It makes perfect sense to me.
The only place that things go horribly wrong is if you aren't using ECC RAM. Then instead of it being a bad disk its bad RAM locations. Then ZFS does the wrong thing. And since ZFS was supposed to be the most reliable file system ever built they made a choice. The choice was to assume the RAM is good and trust that the disk is bad, or vice versa. Guess which one happens far far far more frequently. Even more so if you choose to use ECC RAM. So they made the conscious choice to assume the RAM is good and that the disk is bad(or at least something in the data path from the disk is causing corruption). If the corruption was a fluke from cabling or something, no harm done since it rewrote exactly what was already on the platters. But if the disk was bad then the error should be corrected. The better way to verify that is to do regular scrubs, which is already recommended.
This is very different than any hardware RAID I've ever used. It doesn't write data back to the drive unless something has changed.
Oh, I promise you, if a disk has a read error the array WILL write that stripe again. It's not that anything changed, its that you had a read error so it attempted to fix it by writing to it again so the hard drive will remap the bad sector to a spare sector, then write your new data that was recalculated from parity. Now you have full redundancy. If it didn't do that, it would be just like my example above. You've technically lost some amount of redundancy for that stripe of data. So again, why would you willingly NOT fix the error when you can fix it right now and instead wait until later when it might cause data loss for your array. I know what I'd do if I wanted to sell my RAID controllers to you and I didn't want customers complaining about corrupted data and failed array rebuilds from using my controllers.
I'm not sure how you could prove the reads and writes on hardware RAIDs without causing hardware read errors. But with ZFS is easy. zpool iostat 1 and/or gstat will show you second for second what the reads and writes are. If you were to write some trash data to one disk, as you read your data you'll see it rebuild on the disk you trashed. It would require some work as you don't want to trash the partition table or the zfs identifiers. But its totally doable. I had a bad disk, so I got to see this first hand with my blazingly fast 5MB/sec 6 disk RAIDZ2 pool.
1. I can't believe you'd use Wikipedia as a reliable source for data. /hangsheadinshame
2. Another wiki.. really? Do you not trust technical documentation from Oracle and/or Sun? Not to mention that, if I'm not mistaken, Solaris was for specific hardware only sold by Sun. And all of it had ECC RAM. Open solaris was the general availability software, and that's not what that wiki is for.
3. That presentation has been linked to many times here. Nothing surprising about it and its all been discussed to death in other threads. One thing someone commented on is that the presentation you linked is to Sun's ZFS implementation and not FreeBSD's. As such, results may differ. There are some things that internally work a little differently in RAM on FreeBSD than with Sun's ZFS implementation, but will still provide the same on-disk final product. Don't ask me what those are as I never used Sun products and documentation from Sun's website was removed years ago when Oracle bought them. Just like FreeBSD's code didn't just port over to Linux for the ZFS on Linux project and clearly that project has major obstacles to overcome in some areas despite being derived from FreeBSD's code base. You may get the same final product on disk, but how you get there is different. But hey, that presentation does say ZFS "fails to maintain data integrity in the face of memory corruption". That sure sounds like something I would want to avoid.... such as with ECC RAM.