Fatal Trap 9 While Installing

Dlauth · Jun 6, 2015

I ran into some issues with my last FreeNAS build on this hardware so I am reinstalling.

Now, when I try to boot up the install ISO, I am getting a Fatal Trap 9: general protection fault while in kernel mode. From looking this up, it is due to using an AMD cpu.

Does anyone know of a way to resolve this?

Hardware:
ASRock QC5000-ITX AMD FT3
http://www.newegg.com/Product/Produ...&cm_mmc=TEMC-RMA-Approvel-_-Content-_-text-_-

jgreco · Jun 6, 2015

There's suggestions for testing and burning in your FreeNAS system over in the hardware forum.

Dlauth · Jun 8, 2015

Anyone?

jgreco · Jun 8, 2015

There's suggestions for testing and burning in your FreeNAS system over in the hardware forum.

https://forums.freenas.org/index.php?threads/building-burn-in-and-testing-your-freenas-system.17750/

Dlauth · Jun 8, 2015

This is due to having an AMD cpu though.

jgreco · Jun 8, 2015

Is it? Because it doesn't sound like a foregone conclusion to me. You indicated that you had "some issues" with your last FreeNAS build on the same hardware. In my mind, that sounds like something may have gone awry, burnt up, or become otherwise marginalized. Despite being an AMD fan for many years (I've still got AMD DX4-100 ADZ's in stock, talk about a rare CPU!) I have to tell you that their CPU's have been consistently a little less robust and that servers built around their CPU's have been a little less pleasant to use over the years.

jgreco · Jun 8, 2015

AMD's CPUs like to burn.

Dlauth · Jun 8, 2015

Your right, I was just thinking maybe the system updated and the latest version of FreeNAS had some issues with AMD APUs.

Running memtest now and it looks like I have a bunch of errors. Does this specifically give errors for RAM or would CPU issues come into the test also?

jgreco · Jun 8, 2015

If you're running memtest and you have a bunch of errors, that means your system is not able to run FreeNAS and be stable. Since your platform doesn't support ECC, it could easily be spewing bad bits into your pool, which ZFS has no way to recover from.

1) This is why we encourage people to run ZFS on appropriate hardware, specifically including ECC.

2) This is why no one bothered to answer you after I gave you what was very likely the correct answer. Unless you can search the forum and actually find instances where other people had the same problem, the people here probably aren't aware of any "problems after an update."

3) In the future, a more detailed problem report (see the Forum Rules, link in red at the top of every page) might get your issue slightly more interest, although, in this case, maybe not, since GPF's are a strong indicator of hardware-screw-ed-ness, and I think everyone here would probably have said "test that hardware."

Dlauth · Jun 8, 2015

so bad mem is possibly the issue.

Since ZFS cannot recover from this, im guessing my data is now corrupt?

jgreco · Jun 8, 2015

Well, no, the problem is that we don't know.

ZFS assumes the host computing platform is trustworthy. ZFS caches a lot of data in RAM.

Now what CAN happen is that bits can rot while in memory, and then if ZFS takes that block, makes an update, and pushes it out to the pool, it'll even look like a valid block because a new checksum will have been calculated for it.

If that's a file data block, then, yes, the file data is corrupted, but that's pretty much the end of it.

The real nightmare is if it is pool metadata - which is strongly likely to remain cached in RAM. So I'm going to give you a trivial hypothetical scenario here, and I need you to understand that this is admittedly contrived. Don't argue the point, listen to the logic behind it. So you have a pool metadata block that holds the free block list. It showed that block 123 was allocated. Now as it happens, that held inode 4's data (that's the root directory inode). A RAM error flips that block from "allocated" to "free". A subsequent write flushes that out to disk. Nothing seems to be wrong. You continue to safely fill the disk with data, because that block is one lonely block and ZFS likes to allocate contiguous ranges of blocks. You fill the pool, 60, 70, 80% full... then one day the pool has sufficiently few blocks on it that ZFS "allocates" block 123 for file data. Data gets written.

Suddenly every frickin' file in your ZFS pool is "gone", because inode 4 got stomped, and inode 4 was the linchpin for the whole damn filesystem.

Conventional filesystems have utilities to help detect badness. FFS calls it "fsck" and NTFS calls it "chkdsk". These can't always detect or fix badness, but they're a necessary evil because hard disks do develop bad blocks and this is their recovery strategy.

ZFS's recovery strategy is to be able to use checksums to identify when data has rotted, and then to be able to pull that data from redundancy. ZFS should never NEED a fsck utility - pool blocks are not supposed to go bad. And ZFS is supposed to be able to reliably detect and correct hard drive blocks that have gone bad.

By running ZFS on a non-ECC system, you eliminate the "reliably detect" capability and introduce new opportunity for undetected corruption.

So ... the problem. There's a good chance no permanent damage was done to your pool. Unfortunately, there's also a reasonably large chance that some damage WAS done to your pool, and it is also possible that undetectable damage was done to the pool. Damage to the pool might come back someday to haunt you, as I outlined above. Or it might not.

The safe thing to do, at this point, if you care about the data, is to use some non-ZFS-replication method to copy all the data off the pool, destroy the pool, recreate the pool, and then reload your pool. This gives you a known good state for the metadata. Any damage done to the file data has already been done, and is up to you to find that on your own.

There have been various arguments about how likely the various factors in all of this actually are. It is like debating how many angels can dance on the head of a pin. I'm just telling you what I know the possibilities to be.

Dlauth · Jun 8, 2015

It looks like the RAM just needed to be reseated.

Should I still transfer the data off and form a new pool?

jgreco · Jun 8, 2015

The risks are as previously outlined. I am not omniscient and cannot tell you with certainty that your metadata is intact.

Dlauth · Jun 8, 2015

No worries.

I do have previous jails that were associated with plugins. Is it possible to specify jails when installing plugins again or do I have to delete my old jails and start again?

jgreco · Jun 8, 2015

No clue, sorry, my FreeNAS boxes are NAS-only.

Dlauth · Jun 8, 2015

Ok so I thought I got it working since I was able to set everything up and was actually in the middle of a movie.

The movie stopped so I went back to check the NAS and it stated checksum failed and after trying to boot a few times, I am now getting the same error with rescue mode now.

Ideas?

jgreco · Jun 9, 2015

I think I don't trust your hardware. I fail to see how I didn't totally hit a home run with my initial answer in this thread. Take the box offline and put it through some continuous thorough hardware testing until you see no failures for a straight month. You should not be able to get errors from CPU (cpuburn), memory (memtest86), or disks (my disk testing script, etc). That is decades of experience in the industry talking. Those same decades tell me that you're unlikely to follow the advice, so I'm checkin' out of this thread.

Important Announcement for the TrueNAS Community.

Fatal Trap 9 While Installing

Dlauth

Dabbler

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

Dlauth

Dabbler

jgreco

Resident Grinch

Similar threads