Confirming whether checkpoint *guarantees* reversion to the same exact state at the low level of "HDD data held on platters"?

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
I'm planning some benchmarking with my live pool, as part of pursuing a long-standing and severe ZFS bug/problem/issue.

Ideally, I would do this on a temporary pool with spare disks, but that's not practical or realistic - unfortunately the issue probably only arises with a large, fast, and part-filled pool, that I probably can't replicate by copying a bunch of sample files - I'd need to create a full scale pool with about £1800 of spare disks (~8 x 8TB @ 7200), then dd over about 15TB of data to get the same exact HDD content layout. That's just not going to happen.

The benchmarking itself is to test how far the issue can be mitigated by sysctls alone, and hopefully to find which sysctls have an impact and how far to tune them before other negative effects arise. The sysctls all control quite low level ZFS/HDD stuff - how writes are coalesaced, how TXG data is aggregated, min/max block sizes for user data/spacemaps, min+max cache/IO settings, and other low level vdev/HDD IO-related ZFS sysctls. At the moment progress on the issue is hampered because there are only guesses of optimal values, or even whether sysctls can help much without incurring significant down-sides, and I want to get guesses based on real data.

My approach will be to checkpoint the pool, quiesce everything, and then systematically configure various combinations of candidate sysctl changes. For each trial run, I plan to rewind the pool to the checkpoint, reboot to clear ARC/L2ARC, perform various well-defined sets of IO operations with those sysctls set (to cover different file size mixes+total size+activity types), dtrace and time the operations, and use the output to narrow down any overall sweet spots as well as discover which sysctls seem to dominate or best mitigate this particular issue, within my own pool. I can then feed that back as information on a bug report, where at present there is still a bit too much guesswork as to optimal values and possible negative side-effects.

I won't be changing any disks or vdevs, splitting mirrors, or reguid-ing, just R/W to datasets in the existing pool, so checkpointing shouldn't hit problems according to man pages and technical docs. I plan to use the initial checkpoint to ensure the pool is in a known precisely identical state - same exact HDD sectors in use, holding same exact data. That way I can be sure test runs don't leave any changes behind, even "invisibly" (metadata, pool properties, etc) and even at the lowest level of "data on platters", to contaminate subsequent test runs, and to get back to my pool unchanged once finished.

From my understanding, ZFS' COW approach and the basic design of checkpoints * should * always guarantee that's so. The revert should just discard all changed blocks without trace, leaving the original blocks pristine (except maybe an uberblock or 2?). But since I'll be relying on this at a very low level, to ensure comparability of test runs involving testing different new metadata block sizes/layout-controlling sysctls on disk and also for the pool's final rewind to a "known" clean state after all benchmarks are done, I want to check extremely carefully with someone who knows much more than I do about the technicalities of it.

So my question is whether, indeed, whatever I change regarding blocks written, including their different block sizes/coalescence settings/whatever, the checkpoint will always revert the pool to precisely the exact layout/low level data on platters, as when I took the checkpoint originally, or the extent of any caveats/edge cases/exceptions if not.
 

Stilez

Guru
Joined
Apr 8, 2016
Messages
529
Top