Server didn't come backup after Power Outage

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
I was away for the weekend and noticed I couldn't reach my Freenas server over my VPN and plex wouldn't work. I was able to reach my pfsense router over the VPN and rebooted it but not the freenas box or the IPMI for it. When I got home my Supermicro case was alarming and the red light was on. My battery backup for the server was dead. After I turned the battery backup back on the server alarm and red light went off.

The Server log through the IPMI
22020/02/07 20:45:27PS1 StatusPower SupplyPower Supply Failure Detected - Asserted
32020/02/10 00:00:54PS1 StatusPower SupplyPower Supply Failure Detected - Deasserted

freenasSuper.local had an unscheduled system reboot. The operating system successfully came back online at Fri Feb 7 15:46:19 2020.

FIY I fixed the time on the IPMI interface after seeing the server came backup tomorrow <G>.

In the past before I upgraded to 11.3 I would get an email if my server was on UPS battery

1) Why didn't I get an email? Is that a bug in 11.3
2) Why didn't the system shut down before the battery was exhausted (current setting is 900 seconds)?
3) Why didn't the server come back up so I could reach it thru the VPN? Not sure but I think that it did but I couldn't reach it till I rebooted the pfSense router when I got home because my windows machine wasn't connecting to the internet (DNS error) that was fixed by a reboot of the router when I got home.
4) how can I test my UPS without unplugging it.
5) Do I have to worry that my data is not 100% if the server didn't shut down properly?

Supermicro CSE-836BE16-R920B 3U Server Chassis 2x 920W 16-Bay BPN-SAS2-836EL1
Motherboard SuperMicro x9srl-f
CPU Intel Xeon E5-2650 V2
- Benchmark
Memory 4x M393B2G70QH0-YK0 Samsung 16GB PC3L-12800R DDR3-1600MHz ECC Registered Server Ram 64GB
8 x WD Red 4TB ZFS Raid Z2
HBA HP-H220-6Gbps-SAS-PCI-E-3-0-HBA-LSI-9207-8i-P20-IT-Mode-for-ZFS-FreeNAS-unRAID
Heatsink Dynatron R27
SSD Boot Kinsgston 120 GB
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hey NasKar,

Let starts with the easiest one :
5) Do I have to worry that my data is not 100% if the server didn't shut down properly?

You can clear all doubt by doing a scrub. Your pool should be Ok but to have the absolute evidence about it, just do a scrub. It will either finish without any error or will fix whatever error may have be caused by the failure. With RaidZ2, it would be very surprising (almost impossible) for ZFS not to be able to recover if something got corrupted.

1) Why didn't I get an email? Is that a bug in 11.3
2) Why didn't the system shut down before the battery was exhausted (current setting is 900 seconds)?

Now that your system is back up, it would be a good time to test your e-mail alert config. In the GUI, you can send you a test e-mail to confirm your config is working. Does this test work as of now ?

It is about the same for NUT, the program managing the UPS. And that one is related with your next question :
4) how can I test my UPS without unplugging it.

NUT can do both long and short test. You can do a short test basically anytime. Because you just suffered a full discharge, wait for your battery to be at 100% before doing a long test. Do some research around NUT and the upsc command... You should find what you need.

As for why the server did not came back, it can be because the UPS itself did not or because the server is configured for not to in its BIOS.
 

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
Hey NasKar,

Let starts with the easiest one :


You can clear all doubt by doing a scrub. Your pool should be Ok but to have the absolute evidence about it, just do a scrub. It will either finish without any error or will fix whatever error may have be caused by the failure. With RaidZ2, it would be very surprising (almost impossible) for ZFS not to be able to recover if something got corrupted.




Now that your system is back up, it would be a good time to test your e-mail alert config. In the GUI, you can send you a test e-mail to confirm your config is working. Does this test work as of now ?

It is about the same for NUT, the program managing the UPS. And that one is related with your next question :


NUT can do both long and short test. You can do a short test basically anytime. Because you just suffered a full discharge, wait for your battery to be at 100% before doing a long test. Do some research around NUT and the upsc command... You should find what you need.

As for why the server did not came back, it can be because the UPS itself did not or because the server is configured for not to in its BIOS.
Code:
root@freenasSuper:~ # zpool scrub v1
root@freenasSuper:~ # zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:01:01 with 0 errors on Fri Feb  7 03:46:01 2020
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          ada0p2    ONLINE       0     0     0

errors: No known data errors

  pool: v1
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub in progress since Sun Feb  9 21:49:06 2020
        10.8G scanned at 1.35G/s, 2.11M issued at 270K/s, 19.8T total
        0 repaired, 0.00% done, no estimated completion time
config:

        NAME                                            STATE     READ WRITE CKSUM
        v1                                              ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/25d002c6-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
            gptid/269dfc75-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
            gptid/6c65eeea-28f6-11e9-b8d8-0cc47a086882  ONLINE       0     0     0
            gptid/284641fa-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
            gptid/290a719c-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
            gptid/29ea2573-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
            gptid/2aa2373b-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
            gptid/2b66cbc8-e570-11e8-895f-0cc47a086882  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/c9d7aef5-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/caf6fc4d-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/cc12d56b-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/cd30af17-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/ce509cdf-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/cf7a7624-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/d0afc546-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0
            gptid/d1dbbacf-d8e6-11e9-9cac-0cc47a086882  ONLINE       0     0     0

errors: No known data errors

Thanks the info.
Looks like the pool is ok.

Can send a test email with /system/email/send test email
 

troybs1d

Dabbler
Joined
Feb 7, 2020
Messages
22
If your UPS supports serial with commands to turn off/on specific outlets (or outlet groups) possibly setup a basic ARM based PC with a (good quality) USB Serial device so you can control the UPS if you're remote from the system is an option.

What sized UPS do you have too? That system should operate at about 250-325 watts per PSU (or less idealing) so even if you put both PSUs on one UPS then it should be a decent amount of time. For real-world load testing the UPS you can migrate the FreeNAS system to a basic power strip temporarily, find a device that draws about the same amount of current (ex. audio amplifier, mini-frig, etc) & just do a simple unplug test with a stopwatch (aka phone app nowadays) to see if the batteries are still holding up.
 

NasKar

Guru
Joined
Jan 8, 2016
Messages
739
If your UPS supports serial with commands to turn off/on specific outlets (or outlet groups) possibly setup a basic ARM based PC with a (good quality) USB Serial device so you can control the UPS if you're remote from the system is an option.

What sized UPS do you have too? That system should operate at about 250-325 watts per PSU (or less idealing) so even if you put both PSUs on one UPS then it should be a decent amount of time. For real-world load testing the UPS you can migrate the FreeNAS system to a basic power strip temporarily, find a device that draws about the same amount of current (ex. audio amplifier, mini-frig, etc) & just do a simple unplug test with a stopwatch (aka phone app nowadays) to see if the batteries are still holding up.
Code:
upsc ups
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50
battery.date: 2001/09/25
battery.mfr.date: 2010/08/05
battery.runtime: 1868
battery.runtime.low: 120
battery.type: PbAc
battery.voltage: 27.2
battery.voltage.nominal: 24.0
device.mfr: American Power Conversion
device.model: Back-UPS BR1000G
device.serial: xxxxxxxxxxxxxxxxxxxx
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/ugen1.3
driver.parameter.synchronous: no
driver.version: 2.7.4
driver.version.data: APC HID 0.96
driver.version.internal: 0.41
input.sensitivity: medium
input.transfer.high: 147
input.transfer.low: 88
input.voltage: 119.0
input.voltage.nominal: 120
ups.alarm: No battery installed!
ups.beeper.status: disabled
ups.delay.shutdown: 20
ups.firmware: 868.L1 .D
ups.firmware.aux: L1 
ups.load: 28
ups.mfr: American Power Conversion
ups.mfr.date: 2010/08/05
ups.model: Back-UPS BR1000G
ups.productid: 0002
ups.realpower.nominal: 600
ups.serial: xxxxxxxxxxxxxxxxxxxx
ups.status: ALARM OL
ups.test.result: No test initiated
ups.timer.reboot: 0
ups.timer.shutdown: -1
ups.vendorid: 051d 

My BR1000G is connected to my server with a USB cable
 

troybs1d

Dabbler
Joined
Feb 7, 2020
Messages
22
Okay got it. A fairly standard UPS, I can't tell the current load but guessing 28% so about 168 watts on relative average. I'm expect like 10 - 15 minutes if the battery is still good but the manufacture date of the battery(ies) is Aug 2010 so I'm pretty certain they need to be replaced at this point in 2020. Expect 3 (and maybe 5) years for battery life. If you temporarily connected the system to a basic power strip since you have dual PSUs you'll see now down time. Even if you connected a 60w incandence light bulb to that UPS I wouldn't expect it to last long at this time.

I'm a fan of the CyberPower OR1500LCDRT2U / OR2200LCDRT2U (you can get a Nema 5-20 to 5-15 adapter for like $15) as they have both USB & serial connectivity. If you happen to run/convert to VMWare ESXi they provide a premade VM that can gracefully shutdown the hypervisor server (thus your FreeNAS VM) that is web configurable. You can also control outlet groups through serial/USB too.
 
Top