danb35
Hall of Famer
- Joined
- Aug 16, 2011
- Messages
- 15,504
Since updating from 11.2 to 11.2-U1, my FreeNAS server has rebooted on its own, five times, once of those times after reverting to 11.2. Nothing obvious in the logs either.
Background:
I bought a few WD Easystore drives to shuck and resilver into my pool to expand it. After spending most of the last week running badblocks on them (which completed with no issues), I resilvered two of them into the pool yesterday. That ultimately completed, though it took much longer than I'd expected (perhaps trying to resilver two drives at the same time wasn't a good plan). Since I have plenty of spare bays in my server, the two replacement drives were in two of them. So last night, I pulled the two replaced drives out of my system to move the new drives into their "permanent" slots. The disks were showing in the GUI as da20 and da21. Offline da21 through the GUI,
The system fans hadn't restarted yet, but I expected they'd start back up as temps rose--the server's in a pretty cool location right now (and in a detached building from my house). About an hour later, I found that my assumption was wrong when I got a temperature alert email on a couple of my disks. So I logged into the IPMI admin page, set the fans to a higher speed, and checked again after a while. Surprisingly, disk temps didn't seem to be dropping. Since the system was prompting me to update to 11.2-U1 anyway, I went ahead and ran that update. Once it completed, I powered down the server, powered it back up, and my temp problems appeared solved.
When I got up this morning, I had three emails about uncommand system restarts (side note--those emails give the time in UTC, not local time--very confusing). Checking system uptime confirmed that the system had really restarted. Not good. A little later in the morning, it happened again. Figuring it was something to do with 11.2-U1, I reverted to 11.2-RELEASE and rebooted. About an hour ago, the system rebooted again.
Troubleshooting:
IPMI event logs show nothing at all from the relevant timeframe. The system is attached to a 3 kVA UPS, which shows no history of power issues during the relevant period. The system has dual redundant power supplies, each with far more capacity than is necessary to run it in its current configuration. The system log doesn't appear to have anything interesting either. There's simply nothing at all for 3.5 hours preceding the last reboot, but here's the log from that (too big to include, pastebin here).
System configuration:
SuperMicro SuperStorage Server 6047R-E1R36L (Motherboard: X9DRD-7LN4F-JBOD, Chassis: SuperChassis 847E16-R1K28LPB)
2 x Xeon E5-2670, 128 GB RAM, Chelsio T420E-CR
Pool: 6 x 6 TB RAIDZ2, 6 x 4 TB RAIDZ2, (2 x 2 TB + 4 x 3 TB) RAIDZ2
Jails: Plex Media Server, Urbackup, Transmission/SABNZBd+/Sonarr/Radarr, BOINC
APC SUM3000RMXL2U UPS + SUM48RMXLBP2U battery pack
I'm kind of at a loss here, but not happy at all with a suddenly-unstable system. Thoughts?
Background:
I bought a few WD Easystore drives to shuck and resilver into my pool to expand it. After spending most of the last week running badblocks on them (which completed with no issues), I resilvered two of them into the pool yesterday. That ultimately completed, though it took much longer than I'd expected (perhaps trying to resilver two drives at the same time wasn't a good plan). Since I have plenty of spare bays in my server, the two replacement drives were in two of them. So last night, I pulled the two replaced drives out of my system to move the new drives into their "permanent" slots. The disks were showing in the GUI as da20 and da21. Offline da21 through the GUI,
sesutil locate da21 on
to make sure I'm pulling the right one, pull the disk, put it into the right bay, online through the GUI, sesutil locate all off
(I didn't use da21 off
because da21 was no longer in the same place, and I thought it might get confused). At this point, all the system fans stopped, and eventually the locate light was turned off. After giving it a few seconds to resilver da21, I then moved da20 using a similar process, though I ran sesutil locate da20 off
once I'd confirmed da20's location.The system fans hadn't restarted yet, but I expected they'd start back up as temps rose--the server's in a pretty cool location right now (and in a detached building from my house). About an hour later, I found that my assumption was wrong when I got a temperature alert email on a couple of my disks. So I logged into the IPMI admin page, set the fans to a higher speed, and checked again after a while. Surprisingly, disk temps didn't seem to be dropping. Since the system was prompting me to update to 11.2-U1 anyway, I went ahead and ran that update. Once it completed, I powered down the server, powered it back up, and my temp problems appeared solved.
When I got up this morning, I had three emails about uncommand system restarts (side note--those emails give the time in UTC, not local time--very confusing). Checking system uptime confirmed that the system had really restarted. Not good. A little later in the morning, it happened again. Figuring it was something to do with 11.2-U1, I reverted to 11.2-RELEASE and rebooted. About an hour ago, the system rebooted again.
Troubleshooting:
IPMI event logs show nothing at all from the relevant timeframe. The system is attached to a 3 kVA UPS, which shows no history of power issues during the relevant period. The system has dual redundant power supplies, each with far more capacity than is necessary to run it in its current configuration. The system log doesn't appear to have anything interesting either. There's simply nothing at all for 3.5 hours preceding the last reboot, but here's the log from that (too big to include, pastebin here).
System configuration:
SuperMicro SuperStorage Server 6047R-E1R36L (Motherboard: X9DRD-7LN4F-JBOD, Chassis: SuperChassis 847E16-R1K28LPB)
2 x Xeon E5-2670, 128 GB RAM, Chelsio T420E-CR
Pool: 6 x 6 TB RAIDZ2, 6 x 4 TB RAIDZ2, (2 x 2 TB + 4 x 3 TB) RAIDZ2
Jails: Plex Media Server, Urbackup, Transmission/SABNZBd+/Sonarr/Radarr, BOINC
APC SUM3000RMXL2U UPS + SUM48RMXLBP2U battery pack
I'm kind of at a loss here, but not happy at all with a suddenly-unstable system. Thoughts?