Python 3.6 crash - Failed to Fully fault in core file segment

Sethbest · Nov 4, 2019

Hi there,
I've had this server up and running a few years now, going through regular updates and replacing failing drives, running mostly as an SMB share and hosting my Plex server. I have a 5 disk RAIDZ2 setup.

Recently I had a drive failure. So I turned it off from the web console, swapped in a replacement drive and booted it up. WebUI never comes back up so I go to my server room (closet) and check. There's nothing but a spew of errors (see below). No commands do anything and an occasional line swings through about inability to boot middleware or about the ethernet going up and down. It doesn't look like it's loaded my storage pool. I can't get the thing to shutdown. So I nervously hardbooted it, hoping the storage wasn't up so I don't frag volume. It boots back up and same errors but eventually does get to the usual 1-12 display. It never gets back online (no web ui) so i shut it down safe this time since it lets me (typing 11 then y). I pull the new disk thinking that's screwing things up, plug in the old failed volume. Same behavior.

Failed to fully fault in a core file segment at VA 080064e00 with pid 302 (Python 3.6), uid -: exited on signal 11 (core dumped)
VM_fault : Pager read error. pid 384 (python 3.6)

This repeats, with the pid's generally increasing the longer it runs.

I've SSH'd into it but I'm reaching the edge of my experience (I'm a windows sysadmin, my linux familiarity is pretty low).

I've got a backup of about 80% of my media luckily (though stupidly not the other 20%) so hopefully my ZFS volume is still healthy somewhere in there. I'd appreciate any tips, including how to get some useful logs from SSH (I've copy pasted a chunk of what i could output into a txt file attached)

Thanks.

Sethbest · Nov 4, 2019

Tooling around a bit I managed to FTP into the server and sure enough my storage array is fine, so really this is just a matter of what to do with these errors. Any slight direction is helpful, Thanks!

Sethbest · Nov 5, 2019

I'm assuming at this point due to the lack of finding this python crash posted about anywhere else this is either-

1. A weird one-off error that is resulting from a corrupt configuration
2. A bug resulting from my last update (this might be my first time rebooting since then).

pschatz100 · Nov 6, 2019

Please post a description of your system. This is required by forum rules because it is difficult to offer comments without knowing anything about your hardware or what version of FreeNAS you are running. Are you booting off a USB thumb drive? Problems like this often pop up when the boot device is starting to fail.

Most likely, your data is OK.

Sethbest · Nov 6, 2019

Lenovo thinkserve i3 processor, 20 GB ecc ram, 5x 4tb 7200 rpm 3.5" drives. Config is on an 8gb USB drive.

Version I think is 11.2 but I'm not sure how to check exact subversion since the UI doesn't come up.

Since I can get to the shell is there a command to export config backup? I have an old one from somewhere in the 10 version backed up, but I don't know if that will work.

dlavigne · Nov 6, 2019

Have you tried installing the same version to a USB stick to see if that boots? If it does, your original boot device died.

Sethbest · Nov 6, 2019

dlavigne said:
Have you tried installing the same version to a USB stick to see if that boots? If it does, your original boot device died.

I can certainly try this, but while this will rule out a hardware issue won't this just boot into a blank FreeNAS configuration? I had always assumed that all of the configuration was on the USB stick.

dlavigne · Nov 6, 2019

Yes. But if you can boot you can retrieve the latest auto-saved version of the config. See the Save Config section of https://www.ixsystems.com/documentation/freenas/11.2-U6/system.html#general for details.

pschatz100 · Nov 6, 2019

It can't hurt to try @dlavigne suggestion. Booting into a clean installation will not harm your data. If you can restore the auto-saved config, then you will be in good shape.

If you cannot restore the configuration, you can always re-enter the information manually. Yes, it will be annoying to do this.

Sethbest · Nov 7, 2019

Thanks for the tips. I'm in the process now. I tried setting it up on a 32gb USB drive yesterday and it took around 4 hours (after failing the first few times at the install so I tried installing on from another system), appearing to hang at different times. I plugged it in to boot on my server right before going to sleep (pretty late at this point) and it was so slow to boot I gave up and figured i'd just check it in the morning. It looked like it was up and running at that point but I didn't know it's IP so I tried the "1" command to configure the Ethernet settings, but this just brought me to a "are you sure you'd like to delete the current network settings" I'd say "yes" and then it would say "unable to access web interface" and go back to the regular console.

Giving how weird the install onto the USB stick was I scrapped that whole attempt and am trying with a 16GB stick i found instead (i found some forum posts about hanging installs being a result of using too large a USB drive). I just got it installed onto the drive at work and I'll plug it into my server to boot when I get home.

Are the initial installs hardware agnostic? I've been trying to prep the USB drive on more convenient systems to work on, figuring I can plug it into the server for it's first boot-up and be fine (my server is in a low ceiling closet and I sprained my back this weekend so it's been brutal crouching in there as little as I have so far).
---
My janky boot at home (I haven't brought the new USB stick home yet) finally generated a web ui so i looked around for a way to find the autosaved config. I couldn't figure out how to get to the "system dataset" though, so I just loaded in my old config backup (looks like 9.10.1-U4). The UI hasn't come back up yet so I don't know if this worked or not.

Any tips for finding the "system dataset"? I can probably download the file via ftp if I know where it is.
---
Nevermind I found it in another post:

/var/db/system/configs

I'll check there when I get home.

pschatz100 · Nov 7, 2019

Are you saying that it took 4 hours to boot or four hours to create the boot device? It should not take a long time to create the boot device.

Again, without any information about the hardware you are trying to boot, it is impossible to comment. You could be trying to boot on an incompatible hardware configuration.

Hope you were able to find the auto-saved configuration file. It will save you some effort once you get past your hardware issues.

Sethbest · Nov 8, 2019

pschatz100 said:
Are you saying that it took 4 hours to boot or four hours to create the boot device? It should not take a long time to create the boot device.

Again, without any information about the hardware you are trying to boot, it is impossible to comment. You could be trying to boot on an incompatible hardware configuration.

Hope you were able to find the auto-saved configuration file. It will save you some effort once you get past your hardware issues.

Four hours to create the boot device, an unknown amount of time to boot. My 16GB stick that I made yesterday was done in about 30 minutes and boots in less than 5, so the issue with those delays was the 32GB USB stick.

I've got it booted up at home now with an accessible Web UI so I'll experiment with capturing and uploading the autosave config file today as I get time to remote into my home network from work.

I did post hardware info, here it is repeated:
"
Lenovo thinkserve i3 processor, 20 GB ecc ram, 5x 4tb 7200 rpm 3.5" drives. Config is on an 8gb USB drive.
"

I didn't dig through my old receipts to get exact model numbers for the processor or frequency for the RAM or anything, but as I've had this server running a few years with only the occasional issue like this I doubt there's an incompatibility rearing it's head now.

Thanks for the continued help.
----

I am unable to find the autosave configuration file. There is no "system" directory in var/db

I'm searching but haven't found any alternate locations listed in other forums posts yet.

Sethbest · Nov 8, 2019

Ok I think I'm missing something here. From what dlavigne suggested I should be able to boot to a clean config USB and be able to access the autosaved config from the system dataset.

So either

1. this location is actually on my old failing USB drive
or
2. It's in the root of mounted zpool, but on a blank config there is no zpool, the mnt directory is blank.

So I either need to try to get this file off my old USB drive or find a way to import my old disk pool. Is that right?
---
Imported my old pool, that worked rather painlessly. The configs folder though doesn't contain a backup files to restore just a bunhc of .db files which I don't know what to do with.
---
Nevermind the db files are accepted as config backups, neato! I'm uploading it now.

Sethbest · Nov 8, 2019

Alright my server is back up and running! Thanks for the direction. This was a pain but a useful learning experience too.

pschatz100 · Nov 9, 2019

Glad you are back up and running. As a courtesy to others who might find themselves in a similar situation, it would be nice if you will summarize what happened and mark the thread solved.

By the way, frustrations like this with USB flash media are some of the reasons why many users are migrating to small SSD's as boot devices.

Important Announcement for the TrueNAS Community.

Python 3.6 crash - Failed to Fully fault in core file segment

Sethbest

Dabbler

Attachments

Sethbest

Dabbler

Sethbest

Dabbler

pschatz100

Guru

Sethbest

Dabbler

dlavigne

Guest

Sethbest

Dabbler

dlavigne

Guest

pschatz100

Guru

Sethbest

Dabbler

pschatz100

Guru

Sethbest

Dabbler

Sethbest

Dabbler

Sethbest

Dabbler

pschatz100

Guru

Similar threads