YAN (Yet Another Noob) Build

Status
Not open for further replies.

okgunguy

Explorer
Joined
Aug 4, 2015
Messages
72
I've reinstalled FreeNAS 9.3.1, which I thought was the latest version. It sat for a couple of days and when I went back into my FreeNAS Web-GUI it was all jacked up. I wish I had taken a screenshot of it. It appeared like an unformatted webpage. Like the ones that just have the words on there but no buttons where the words would have been?? I restarted the NAS and everything went back to normal after reboot.
I checked and the GUI said an upgrade was available. So I performed an upgrade through the GUI and it says it completed successfully. I have again set up my Snapshots and SMART tests schedules and have let it run a couple of times now. I've re-checked the SMART results and they are still perfect. My Snapshots are being taken and I've even put a few movies back on there.
I have not reinstalled Plex yet.
I'm still not ruling out a hardware issue. And I've been researching on how to find out if any ECC errors have occurred. Or if any errors have came out of my RAM. The only commands I can find are 'ipmicfg -sel list' and 'impicfg -sdr' I have no idea what these commands do, or if they will do anything at all. Anybody have some commands or tools to run in command line to see if my RAM is causing any problems? There has to be something. I've looked all through my IPMI and the only logs it's putting out is the problem with my Fans going too slow. Is there a setting in there that I'm missing to pull up more in-depth logs? Or is my RAM just not causing any errors? If they were, would they spit out logs in the same spot as my fans do? I'm sorry that I don't know all the correct terminology of what everything is. The only thing I can think of is to shut down the whole system and reboot into my UBCD and run Memtest until smoke starts to come out of it. 32Gb is a lot of RAM for not having any bad sections. Too much for my luck anyway. There HAS to be something wrong.
I'm not getting mysterious reboots or anything. And after the upgrade the Web-GUI has pulled up perfectly fine every time.
Until I get ahold of some RAM commands, I'm planning on testing out deleting a movie or two and rolling back Snapshots until I get the hang of that. And to make sure it works properly. And to restore my config a couple of times. Thanks for listening and for any advise...

Edit: I'm going to try the command 'memerr' when I get home this evening. It is looking promising. Otherwise, I've found a few more commands listed as Linux commands that I'll try.
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
is my RAM just not causing any errors? If they were, would they spit out logs in the same spot as my fans do?
As far as I know, correctable errors would be logged, and uncorrectable errors would halt the system, but I defer to those who know for sure.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
As far as I know, correctable errors would be logged, and uncorrectable errors would halt the system, but I defer to those who know for sure.

You're right ;)
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Right. One error in a lifetime is ok but one error per month/week/day/... is not normal --> bad stick.
 
Joined
Apr 9, 2015
Messages
1,258
If you have another system to drop the ram into and then run memtest on a stick by stick basis it could help to do a couple things. It will check the ram and it will also help to figure out if maybe something else in the setup could be causing a problem. If all the sticks pass memtest outside their current system then it either could be an incompatibility with the system or some other part of the hardware. I know it sucks to have to do this but either way you will have to run memtest on each stick separately to narrow down the problem so it's like killing two birds with one stone.
 
Last edited:

okgunguy

Explorer
Joined
Aug 4, 2015
Messages
72
I know it sucks to have to do this but either way you will have to run memtest on each stick separately to narrow down the problem
Don't I know it!!!
I just so happen to have a refurb pc coming that I was going to use as my DVD burner/interface with my NAS. It is supposed to have DDR3 memory in it. Even though it won't support ECC, it still should run my UBCD memtests just fine right?
And how long do you suggest I test each stick? 2 passes? 3? 18?
I'll be putting the refurb through all kinds of stress tests anyway, so this won't be that big of a headache.
BTW - if anyone is looking for cheap pc's, Newegg business has all kinds of refurbs. Some as low as around $80 WITH windows. Windows 7 is $100 by itself. I'm getting a 1 year replacement warranty with:
Refurbished: ThinkCentre Desktop Computer M70E Core 2 Duo E8400 (3.00 GHz) 8 GB DDR3 1 TB HDD Windows 7 Professional 64-Bit = $189.99 Item #: 9B-83-798-021
 
Joined
Apr 9, 2015
Messages
1,258
Don't I know it!!!
I just so happen to have a refurb pc coming that I was going to use as my DVD burner/interface with my NAS. It is supposed to have DDR3 memory in it. Even though it won't support ECC, it still should run my UBCD memtests just fine right?
And how long do you suggest I test each stick? 2 passes? 3? 18?

Uhhh, no technically that will not work correctly. You will want to check it in a computer that has ECC functionality. While it may test the memory basics you will not be able to test the ECC. But it should boot as you have unbuffered ram. Personally I would find another system that I have that does support ECC ram and test it there.

Right now since your FreeNAS is obviously having problems I would shut it down, disconnect the drives including the USB drives and start running memtest. Before you start the test I would pull all the sticks out and make sure the slots are clear of any dust. I know it doesn't seem like much but I have had problems with computers where a little piece of dust got in just the wrong place and the system would error. This can also be a problem with the cpu but I would focus on the ram first.

Can just test all four to begin with to see if there are errors, with a minimal connection of hardware. If there are then you will need to test one stick at a time and use one slot to do so. After you have tested every stick in one slot if they all show good start testing a single stick in each slot available. If each slot passes test in pairs in either the blue sockets and if those pass in the black sockets. So basically you are going to be without a NAS for a while. You could either uncover a stick, slot or pair that throws an error. The reason why I tend to test outside the system in question is to try and determine if the board is at fault or the memory.

If everything checks fine then it's time to start suspecting other things so hopefully memtest finds a stick with a problem. Also memtest is not a 100% deal but I have no clue who would have a hardware memory tester for ECC ram, though memtest should be good enough to find a problem if there is one. And remember that if you find one stick bad to keep checking the others, if all four sticks come up bad then it's time to suspect the slot or board. While you can not correctly test the ram in a system that does not support ECC code you can however test the board with ram that is not so you could pull a stick from a known good working system and test it in the board.

Did I mention this is going to suck? Yeah I wasn't kidding on that one but you have to be very thorough in every single step. Unless someone else knows a better idea on how to pinpoint an error better than this plan to have your next few days to a week wasted checking on tests.
 

okgunguy

Explorer
Joined
Aug 4, 2015
Messages
72
Trying to research my best option going into testing my RAM. I want to try and accomplish all tasks as if I have a ton of data that is imperative that I not lose.
I will:
1) export my config - via GUI onto one of my laptops (once my refurb pc gets here, it will be my primary station where all backups will be stored)
2) export my data from the system - via command line ' export zpool r2d2 ' - r2d2 being the name of my pool
3) power down, pull my flash drives, disconnect all my drives SATA cables, put in my UBCD flash, boot up and begin testing
Once completed with testing, I will:
1) plug in my flashdrives and hard drives and boot up - hopefully it will come up with my original config. If not I can import my config from backup
2) import my data pool to the system - via GUI
Am I even close? Or should I use the GUI to export as well? Or should I use only command line?
 
Joined
Apr 9, 2015
Messages
1,258
I think it will be as easy as disconnecting the drives and pull the flash drives after powering down. There should be no changes made to the system at all beyond the disconnect. Once they are plugged back in the only thing that the FreeNAS software will know is that it was powered down and then back up again at a later time.

The reason for disconnecting the drives is to ensure that they in no way can influence the ram testing. No need for the board to be handling the fact that things are connected while doing a ram test. If the ram passes a memory test with flying colors then you can hook the drives back up and boot one last time into memtest and run another test and see if it throws an error with them connected, it's not likely but possible that if the memory tests fine that some other part of the system COULD cause the memory to error. I have dealt with a lot of different systems and sometimes the weirdest things can happen that you would never think could cause a problem but it has. That is why computers are so much fun in general to me, working the problem until it's figured out while trying not to literally pull your hair out in the end.

You also will not need to import the pool once everything is reconnected, just connect the drive cables back up and plug the flash drives back in. If the ram passes all the checks and the drives are plugged into the same ports it will just work. The only reason you would have to import your pool is if you reinstall the FreeNAS software onto the flash drives and start over.
 

okgunguy

Explorer
Joined
Aug 4, 2015
Messages
72
Thanks for that explanation. How can I go about checking the flashdrives as well? These aren't the most reliable pieces of hardware to begin with. And if they are injecting errors into the RAM at boot up, that could be the culprit as well. I understand that they aren't used after boot-up unless you make changes. Would either of the commands ' fsck /dev/da# ' or ' fsck freenas-boot ' do anything? Not sure if the fsck command is for the hardware 'da0' or the volume name 'freenas-boot' or am I off base on that as well? I can't find anything on the forums or google that is a definitive answer on how to check for errors or corruption on a flashdrive. There is this thread talking about SMART data on SATA-DOMs and etc. But I'm gathering the consensus on there is that SMART is pretty much useless for solid state anything. I wish I was in front of my NAS as I write this, or had remote access. I'd just try these commands myself and see if they worked.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
fsck is not appropriate for ZFS. No fsck equivalent exists for ZFS, largely because scrubs alleviate the need for file system consistency checks.
 
Joined
Apr 9, 2015
Messages
1,258
Honestly, my opinion is that if you question the flash drive, throw it and pick up a new one. A pair of 16Gb Cruizer fits can be picked up for less than 10.00 each on Amazon. No need to fret over one dinner out with the wife at Arby's. For that matter pick them up and buy something else to make whoever you have in the house happy and get free shipping.

I wouldn't even worry about backing up the config as it's more practice setting things back up. That is what I did when I switched over from my testbed and old flash drive to my in build system. It was functional within 5 minutes which included importing the drives from the test system. The actual FreeNAS configuration is not much more than users, CIFS and a few other modules and where jail data is stored. Once the drives are imported set the jail data back to where you had it before and then reboot. All jails should be available and ready to go.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
should I use the GUI to export as well?
I agree that the pool export/import is unnecessary, but if you did, you would use the GUI to Detach the volume. In general, only use the CLI for tasks that can't be accomplished through the GUI.
 

okgunguy

Explorer
Joined
Aug 4, 2015
Messages
72
Thank you everyone for all the responses. You guys all rock!!
I'm noticing another problem. Maybe the same problem. In my IPMI logs I'm seeing 'watchdog 2' logs. How do I go about investigating this further? Are these a memory thing? Or what causes these? Or are these anything I should even worry about?
Screenshot (1).png
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I'm seeing 'watchdog 2' logs.
disclaimer: I have no direct experience with IPMI so this post is a bit speculative

As you know, IPMI runs on a separate SOC that lives on your motherboard and runs even when the main system is powered off. The watchdog is the component of the IPMI software that watches the main system to see if it's locked up. If the watchdog judges that the main system has crashed, it may reset it. I assume this is configurable.

You would need to check your mobo documentation to find out what to make of those log entries. For example, I don't know if the Hard Reset events mean "the entire system lost power, just an FYI" or if they mean "I saw the main system crash so I reset it".
 

rogerh

Guru
Joined
Apr 18, 2014
Messages
1,111
I think the watchdog reset means that either FreeNAS is halting or locking-up in some way and being reset, or the timer is inappropriately resetting it during a pause due to, say, a temporary disk read error.[1] Either way it need sorting out, but if it is simply the watchdog being too sensitive perhaps it can be disabled in bios or via a motherboard jumper.

[1] The WD greens don't have TLER, do they? This could lead to pauses that will not harm FreeNAS or ZFS but might lead to a watchdog timing out. And otherwise unexplained periods of unresponsiveness. Shouldn't be too often if the disks are otherwise good.
 

okgunguy

Explorer
Joined
Aug 4, 2015
Messages
72
Here is what my manual says: I'm thinking I should just disable it, or set it to the NMI setting. Thoughts? Either way, I don't think I want it just randomly resetting my NAS, even if there is a hung application. I'd rather be able to get in there and see what is hung up and troubleshoot it. Right? I pulled all my SMART results earlier this morning. And none of my drives are pegging any error counts. It still could be my RAM, I haven't started testing it yet.
Watchdog.png

This is also in the manual:

Power Configuration
Watch Dog Function
If enabled, the Watch Dog Timer will allow the system to reboot when it is inactive
for more than 5 minutes. The options are Enabled and
Disabled
 
Joined
Apr 9, 2015
Messages
1,258
Status
Not open for further replies.
Top