Yesterday at 5am, my FreeNAS sent me following alert.
> Boot pool status is ONLINE: None.
Somehow I wasn't asleep and noticed the alert and decided to look further into it. Web UI does not respond, ssh logins, but doesn't get to the shell prompt. File sharing, rsync and everything else worked fine. Crap!
IPMI doesn't work, so I need to get to the basement. Couldn't reboot using ctrl+alt+del, system just froze. So I reset system. It booted, but few minutes later, while I was looking around, scrub started again on
Put
Time wasted -- 2 hours. Why so long? Because I've spent an hour trying to login via IPMI, which requires java and there were exceptions...
So what the moral of this story?
1. Keep backups of your system configuration (I had backups, but they were couple days old)
1. Always make mirrored boot pool because you never know how long usb key will live. Previously I used one for 6+ years without issues, but this one (Sandisk Ultra Fit 32GB) didn't last even a 6 months.
1. Always scrub your pools, at least once a month.
1. Did I say about backups? Especially geli keys.
Also, need to point out, that it's a first time I had to restore system from crash, and it is first time I had geli encryption on my system. The documentation is confusing about passphrase and recovery keys. I don't think there is passphrase in my config, I don't recall setting it up, but there is mention somewhere that every time you download new keys, previous are invalidated. This is utterly confusing. Perhaps at 5-ish am I didn't detect the difference between encryption key and recovery key. But this is something I need to figure out -- will setup test VM and will play with these things over.
How was your day? Stay safe and keep backups.
> Boot pool status is ONLINE: None.
Somehow I wasn't asleep and noticed the alert and decided to look further into it. Web UI does not respond, ssh logins, but doesn't get to the shell prompt. File sharing, rsync and everything else worked fine. Crap!
IPMI doesn't work, so I need to get to the basement. Couldn't reboot using ctrl+alt+del, system just froze. So I reset system. It booted, but few minutes later, while I was looking around, scrub started again on
freenas-boot
and boom, again system froze. Aha, I though. Let's reboot again, and just in case copy geli keys, latest config database and one pwenc_secret
from /data
. System reset, boot, scp four files to a different system and scrub again froze system. Ok, then.. let's find a new flash drive. Do I have any available? Nope, ordering two on Amazon, delivery in a week:-( Found couple older ones, created install media, booted server, installed FreeNAS to the new flash drive. Rebooted. Put
pwenc_secret
, freenas-v1.db
into one directory, created geli
subfolder, put keys from two pools there, tarred into single file and uploaded archive as restore to the fresh install of FreeNAS; system reboots twice and I end up with working system yet again, with alarms that I system was rebooted twice and no data and config loss. Time wasted -- 2 hours. Why so long? Because I've spent an hour trying to login via IPMI, which requires java and there were exceptions...
So what the moral of this story?
1. Keep backups of your system configuration (I had backups, but they were couple days old)
1. Always make mirrored boot pool because you never know how long usb key will live. Previously I used one for 6+ years without issues, but this one (Sandisk Ultra Fit 32GB) didn't last even a 6 months.
1. Always scrub your pools, at least once a month.
1. Did I say about backups? Especially geli keys.
Also, need to point out, that it's a first time I had to restore system from crash, and it is first time I had geli encryption on my system. The documentation is confusing about passphrase and recovery keys. I don't think there is passphrase in my config, I don't recall setting it up, but there is mention somewhere that every time you download new keys, previous are invalidated. This is utterly confusing. Perhaps at 5-ish am I didn't detect the difference between encryption key and recovery key. But this is something I need to figure out -- will setup test VM and will play with these things over.
How was your day? Stay safe and keep backups.