Brand new Build dead/no post after 1 week

Status
Not open for further replies.

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
Hello all, I'm new to this Forum as I usually direct my questions/problems to the NCIX Forum but since I'm new to FreeNAS/ NAS's in general I figured I should direct my questions here as well even though the problem is mostly hardware... However there have been some "security" emails sent each night that might be the system telling me there was a problem... Not that I have yet been able to figure out what most of the data means yet. (Yay learning!) Thanks for your help, sorry this is a copy and paste from the other post and is hastily edited.

So last week...to the day... I built my first NAS using FreeNAS Everything's been working pretty smoothly learning as I go along and finding I get a nightly report emailed to me with mostly useless information(to me at least) A couple times I've gone to access the NAS and found it powered down but with no real information as to what has happened I've just assumed it crashed. Powers up and resumes without a problem. SMART checks and scrubs all come back clean but a couple times I've received a "security report" with this:

Quote:

> (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 00 18 a2 60 40 a5 00 00 01 00 00
> (ada1:ahcich1:0:0:0): CAM status: Uncorrectable parity/CRC error
> (ada1:ahcich1:0:0:0): Retrying command

coming up three times, one each night in a row.

Upon doing research most people say it's probably a faulty cable but they also report dozens or hundreds of these errors. I only get one... and when I did a scrub and a short SMART test both came back clean. They recommended replacing the SATA cable as an easy suggestion. That was what I was going to do today before the system failed.

This morning I came to wake up and again find the NAS powered down but this time at 0300 with a big long security report. Some of it I understand as it's sort of a summary of the changes I've done like playing with adding and removing the media servers. But this time when I go to turn it back on I just get an infinite power cycle, 1 second on with lights fans spin then off.. repeat repeat repeat.
So I took it apart and did a paperclip test on the PSU which powered on and idled nicely. I re-seated everything and re fiddled with the cables to clean it up a bit more but no dice. The system immediately starts the power cycling again... Is the MOBO dead!? I spent a tonne of money on this system I've never had a system dead in a week before.

Specs:
ASRock Motherboard E3C226D2I Server mobo

Kingston KVR16E11/8 8GB DDR3-1600 ECC CL11 DIMM W/TS Memory x2

Itel Pentium G3220 Dual Core 3.0GHZ Processor LGA1150 Haswell 3MB Cache Retail

Corsair CX Series CX600M 600W ATX 12V 80 Plus Bronze Modular Power Supply

6x3TB WD-Red NAS HDD they're configured in a RaidZ2 setup and have been working really well

as well as a 16GB kingston flash drive for the FreeNAS OS.

Everythings connected to a 1000VA UPS

I know the PSU is a bit on the overpowered side but it was what was needed to get a enough sata power cables.

After the system failed to boot/post I swapped ada1's sata cable like the forums suggested anyways and also repleaced a second cable that is horribly bent sideways fro the chassis closing and pushing against it but I can't see the HDD/cables being an issue for the system not posting.

Thanks!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
First of all, is there anything in the IPMI event log?
 

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
Yes I just learned of the IPMI from another forum user and set it up to see what was going on. This is what the event log shows each time the system was reset
VCCIO_OUTVoltageLower Non-Recoverable - Going Low - Asserted
I tried another PSU and no luck same error.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What's the threshold set to?

If you need some help getting the information, you can adapt this guide to help you.
 

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
What's the threshold set to?

If you need some help getting the information, you can adapt this guide to help you.
VCCIO_OUT: Not AvailableLower Non-Recoverable
Thresholds for this sensor
Live Widget Off | On

  • Lower Non-Recoverable (LNR):0.849
  • Lower Critical (LC):0.9
  • Lower Non-Critical (LNC):0.94
  • Upper Non-Recoverable (UNR):1.27
  • Upper Critical (UC):1.209
  • Upper Non-Critical (UNC):1.15
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That error is classic "bad SATA cable" but can also be from dirty power causing errors in transmissions. That ahci error is literally reporting that the data from the SATA/SAS controller to the disk is corrupted. The cause is for you to figure out, although it's more often than not the cable.

The NAS powering down is a very bad sign. That's almost always a classic hardware problem of some kind. I'd be willing to bet if you check IPMI you'll have a voltage threshold that was hit at some point. That might mean your PSU is not doing its job, the regulators on the motherboard might not be doing their job (or overheating) or something like that. Unfortunately this problem is going to be difficult to identify.

If the AHCI errors continue even with new cables I'd be willing to be that your power fluctuations are responsible for the AHCI errors and the loss of power. The easiest way to rule out the PSU is to plug in a spare one and see if the problem magically goes away.
 

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
That error is classic "bad SATA cable" but can also be from dirty power causing errors in transmissions. That ahci error is literally reporting that the data from the SATA/SAS controller to the disk is corrupted. The cause is for you to figure out, although it's more often than not the cable.

The NAS powering down is a very bad sign. That's almost always a classic hardware problem of some kind. I'd be willing to bet if you check IPMI you'll have a voltage threshold that was hit at some point. That might mean your PSU is not doing its job, the regulators on the motherboard might not be doing their job (or overheating) or something like that. Unfortunately this problem is going to be difficult to identify.

If the AHCI errors continue even with new cables I'd be willing to be that your power fluctuations are responsible for the AHCI errors and the loss of power. The easiest way to rule out the PSU is to plug in a spare one and see if the problem magically goes away.

So assuming it's an AHCI problem then unplugging all the SATA drives should theoretically allow the system to boot and post? Since all the scrubs and smart tests came back normal earlier and now the system is fully down and will not power up for more then 1/2 a second with this VCCIO_OUT i'm sort of under the impression that the mobo itself had issues right out of the gate and just took time to fail. Of the four times total I got the CRC error(And there was only the one repeated error where as most forum messages involve dozens if not hundreds of errors) -not- every time involved finding the system powered down... actually the first power down was the day after i set everything up and there was no email or other notifications other then it was off (I had just assumed there was a shut down timer that i inadvertently turned on in playing/learning).
 

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
So I unplugged all the drives from both the power and from the sata ports and powered off the whole system and reset the cmos jumper.In the BMC the value reported back as normal with no error flag appearing. Tried booting and after a few tries the error flag re-appeared. I guess it's a full on motherboard issue :( thanks for your help though!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Oh, looks like you actually said that voltages had been recorded before I posted. Haha. Woohoo for not responding right away.

The AHCI problems are likely a symptom, not *the* problem.

At this point though I'd say its probably your motherboard. That voltage is not provided by the PSU, so the regulators are likely bad. However, the PSU may have some kind of nasty sinewave due to it not being up to par, so if you don't want to jump to the conclusion that the motherboard itself is bad I'd try a spare PSU and see if you get the same result. If so, then you'll know its the motherboard. ;)
 

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
Yeah I pulled the psu from my comp and tried it and had no change in post. I hope the psu isn't causing damage to the motherboard. I guess we shall find out.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I seriously doubt the PSU is to blame for this... I highly suspect the motherboard just because of the voltage that is the problem.
 

Jared Potter

Dabbler
Joined
Jan 30, 2015
Messages
18
So frustrating. Now I'm safe to assume that since IT was setup as a raidz2 that was run by the os and not the onboard raid that my data should be fine. For the os that's on the flag card should I wipe it reinstall it for the new mobo? I've preplaced a mobo on windows and configured it exactly the same way and still had windows not boot due to the registered drivers and stuff... I'm assuming that's probbaly the same deal here?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So frustrating. Now I'm safe to assume that since IT was setup as a raidz2 that was run by the os and not the onboard raid that my data should be fine. For the os that's on the flag card should I wipe it reinstall it for the new mobo? I've preplaced a mobo on windows and configured it exactly the same way and still had windows not boot due to the registered drivers and stuff... I'm assuming that's probbaly the same deal here?

FreeNAS couldn't care less about being moved across systems. It won't be a problem.
 
Status
Not open for further replies.
Top