DrKK
FreeNAS Generalissimo
- Joined
- Oct 15, 2013
- Messages
- 3,630
I just want to put this in one place for everyone.
I was skeptical about the new ZFS-on-boot-pool feature introduced in 9.3. I thought it was.....how do I say....overkill. I was (or so I thought) totally fine with the UFS boot that we had in 9.2 and before.
Well I've had my ass saved twice that wouldn't have happened without ZFS on boot.
It turns out that whole shitloads of USB boot devices (in my case, especially some brand new Kingston Micro DT's---apparently, Kingston's continued descent into lack of reliability/repeatability continues into USB thumb drives) that were corrupting my FreeNAS operating system. I would never have known this was happening, without the new SCRUB BOOT POOL feature, which reported between 1 and 10 CKSUM errors, every time I ran it, on a brand new FreeNAS, with a brand new Kingston thumb drive, I am building for an associate.
Since I originally installed this appliance with only one boot device (I mean, hell, it's worked for us in the FreeNAS community fine for years to have only one UFS boot device), these CKSUM errors were uncorrectable.
By the time I added a mirrored device to the boot pool, I had already had a corruption situation.
How do I know that? I'm so glad you asked.
I ran the "VERIFY INSTALL" feature, again, new in 9.3, located under SYSTEM->UPDATE!!! This process did *NOT* finish, indicating corruption. (I ran that same process on a known good FreeNAS---i.e., my main FreeNAS server, that has never had a single bit of error on any data or boot device---and it finished in about 30 seconds, no errors).
So these are my recommendations:
If you have a single boot device, and you've already upgraded? That's cool. Buy a second device. Run a boot scrub and a "verify install" right now. When your second device arrives, put it in service mirroring your boot pool.
I was skeptical about the new ZFS-on-boot-pool feature introduced in 9.3. I thought it was.....how do I say....overkill. I was (or so I thought) totally fine with the UFS boot that we had in 9.2 and before.
Well I've had my ass saved twice that wouldn't have happened without ZFS on boot.
It turns out that whole shitloads of USB boot devices (in my case, especially some brand new Kingston Micro DT's---apparently, Kingston's continued descent into lack of reliability/repeatability continues into USB thumb drives) that were corrupting my FreeNAS operating system. I would never have known this was happening, without the new SCRUB BOOT POOL feature, which reported between 1 and 10 CKSUM errors, every time I ran it, on a brand new FreeNAS, with a brand new Kingston thumb drive, I am building for an associate.
Since I originally installed this appliance with only one boot device (I mean, hell, it's worked for us in the FreeNAS community fine for years to have only one UFS boot device), these CKSUM errors were uncorrectable.
By the time I added a mirrored device to the boot pool, I had already had a corruption situation.
How do I know that? I'm so glad you asked.
I ran the "VERIFY INSTALL" feature, again, new in 9.3, located under SYSTEM->UPDATE!!! This process did *NOT* finish, indicating corruption. (I ran that same process on a known good FreeNAS---i.e., my main FreeNAS server, that has never had a single bit of error on any data or boot device---and it finished in about 30 seconds, no errors).
So these are my recommendations:
- Kingston's dodginess with respect to their known RAM and SSD shenanigans may be extending to their USB flash devices. As much as I never thought I'd ever say this in my life, for the time being, I am recommending against using Kingston SSDs, RAM, and DOKs (thumb drives), until they can get themselves together, or, alternately, surrender their market share to competitors that don't play this kind of game (e.g., Sandisk). I'm not sure how serious of a situation this is, but, it's not on me to figure that out. The situation is non-zero, and I'd rather not have the risk.
- Take two identical, new, thumb drives. Make these your boot pool for FreeNAS at install time, from the first minute. I recommend 16GB.
- Occasionally manually run boot pool scrubs (the button is located in the system->boot area).
- Occasionally manually run "verify install"s (the button is located in the system->update area).
- I would only keep up to about a dozen boot clones on your boot pool. Delete ones you don't need. When the device is too full to perform an update, the result, in my experience, was somewhat counterintuitive and nasty (I didn't realize my problem was a full boot device).
- The minute you get CKSUM (or other) scrubbing errors on a boot device, replace it. These things are cheap enough that there's no excuse to run a boot pool you are not 100% confident with.
If you have a single boot device, and you've already upgraded? That's cool. Buy a second device. Run a boot scrub and a "verify install" right now. When your second device arrives, put it in service mirroring your boot pool.