freenas that won't post after 5 years

sahuagin2

Cadet
Joined
Sep 30, 2019
Messages
4
Hoping to get some suggestions in how to diagnose what I need to fix.

tl;dr - was working. now it's dead. Won't post. ipmi sort of works.

Purchased FreeNAS Mini July 2014.
Upgraded a few years ago to 32GB memory.
Still running original boot drive.
Has 4x4TB HGST RED drives. Z1 configuration.
250GB SSD, not in use.
Chelsio 10Gb T5 card, one connection.
Intel Atom CPU C2750 @ 2.40GHz, 8 cpu
8GB SWAP, usually full, plex seems to have a memory leak.
Exporting NFS, SAMBA, AFS/timecapsule.
Jails running. Redmine, git, prometheus/grafana, plex

ada0 HGST 4TB
ada1: HGST 4TB
ada2 Crucial 250GB
ada3 16GB SATA Flash Drive SFDE001A
ada4: M4-CT512M4SSD2 (500GB Crucial SSD?)
ada5: HGST 4TB
ada6: HGST 4TB


Symptoms. I've been running disk intensive operations for the last few days. Today I noticed that one program was hung waiting for NFS. Couldn't get to freenas gui. Ping not responding. Reboot freenas via powerbutton. Powercycles but doesn't post. No signal on vga connection, drive indicators don't light up to indicate drive activity.
I tried a 2nd monitor. I haven't tried a second cable.
ipmi is reachable via my iphone application. I can get limited information and reboot the device from ipmi.

What's been running on the machine:
Program on NFS that ungzips a few 100GB of files and processes the data, has been running for 3 days, Memory swap intensive on host, little cpu.
Rsync from laptop of about 30GB of pictures. (likely would have triggered plex to reindex as well)
Timecapsule backup.

Remote notification information.
Sept 18: A new update is available. Version FreeNAS-11.2-U6
Sept 18: zfs memory throttle: 1 event - the number of times ZFS had to limit the ARC growth in the last 10 minutes
Sept 19: Security run. swap_pager_getswapspace: failed
Sept 19: starting scrub of pool 'freenas-boot'
Sept 19: scrub of pool 'freenas-boot' finished
Sept 20-21: "swap_pager_getswapspace: failed"
Sept 22: starting scrub of pool 'volume1'
Sept 22: ipv4 tcphandshake last collected sec = 00:00:13 ago (shows that the machine was slow)
Sept 22: Recovered: ipv4 udperrors last collected sec(was warning for 13 minutes and 31 seconds)
Sept 22: 10min cpu usage = 86.2%
Sept 22: 10 min disk utilization ada[0,1,5,6] - 91%, 90.6%, 90.4%, 90.9% the percentage of time the disk was busy during the last 10 minutes
Sept 22: ipv4.tcphandshake last collected sec = 00:00:11 ago
Sept 22: ipv4.udperrors udperrors last colllected sec 00:00:11 ago
Sept 22: Recovered: 10 min cpu usage (was warning for 1 minutes)
Sept 22: Recovered: ipv4.tcphandshake was warning for 9 minutes and 54 seconds
Sept 22: Recovered: ipv4.udperrors was warning for 9 minutes and 54 seconds
Sept 22: Recovered: 10 min disk utilization ada[0,1,5,6] - was warning for 27m, 7s, 24 m, 7s, 20m 7s, 25m 7s
Sept 22: needs attention 10 min disk utilization ada[0,1,5,6] - 10 min disk utilization, % time the disk was busy, 90,91,90.1,90.3%
Sept 22: Recovered: 10 min disk utilization ada[0,1,5,6] - 1h10m, 35m, 28m, 21m
Sept 22: 10min disk utilization ada[0,1,5] 90,92.6,02.1% (note, ada6 doesn't show up here)
Sept 22: recovered ada5 was warning for 18 minutes
Sept 22: Space usage for pool "freenas-boot" is 86%. Optimal pool performance...
Sept 22: Recovered 10m disk utilization ada3, was warning for 11 minutes
Sept 22: A new Update is available. Go to System -> Update to download and apply.
Sept 22: scrub of pool 'volume1' finished
Sept 22: recovered ada[0,1] 10 min disk utilization, was warning for 3 hours and [21,18] minutes



Sept 23: Changes in mounted filesystems. (I take this to mean that I updated before this message). /mnt/iocage/releases/11.2-RELEASE... freenas-boot/ROOT/11.2-U6@2019-9-22-10:00:59
Sept 23: Space usage for pool "freenas-boot" is 86
Sept 24: Changes in mounted filesystems. iocage to 11.2-U6, as above
Sept 27: starting scrub of pool 'freenas-boot'
Sept 27: scrub of pool 'freenas-boot' finished
Sept 28: @4:15 AM security check
Sept 28@9:36 PM: Unscheduled system reboot: The operating system successfully came back online at Sun Sept 29 01:26:20 2019.(Note that the email is predicting the future.)

Sept 28-Sept 30 - I think the program I'm running is taking a long time, so I leave it.

Sept 30: I note that the program has 0 cpu usage and is in rpcconnect, connect, and the program does no networking. So I start to investigate.
Sept 30: Reboot machine with power button. Can't get signal to monitor to see bios. Machine doesn't boot to os. Drives not spinning up.
Sept 30: Try a different, newer monitor, no signal.
Sept 30: IPMI connects. Can Operate Power control. See Lan Settings. FRU, limited info. ASRock Manufacturer. Mfg. Date/Time 2012/07/27 17:31:00
: Sensors, Event Log, Health Check all return Undefined completion code = CAh
: Power Supply - doesn't return error but doesn't show any information.


I suspect that it's dead, but if I can diagnose what component I can avoid replacing the entire thing. I'd appreciate your thoughts on what else to try.

Thank you.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
Contact iX, the Avaton boards in the mini's are known to die after a period of time.
 

sahuagin2

Cadet
Joined
Sep 30, 2019
Messages
4
Contact iX, the Avaton boards in the mini's are known to die after a period of time.
Thanks, I'll give that a shot.

I've been able to get the ipmi web page to respond and it seems to be indicating that it can't detect a CPU fan speed, and the cpu temp is 123C.
 
Top