Resource icon

Building, Burn-In, and Testing your FreeNAS system

I've been meaning to post some guidance here for a while now. We frequently see people come to the forums with hardware problems that should have washed out in the system build process, but since many of the users here are DIY'ers without professional experience building servers, it goes from parts-in-box-to-in-use in a few hours.

This process also needs to be repeated when you are upgrading your system by adding new parts or changing out existing parts. There's a little bit of "use your brain" in how strict you need to be, but doing stuff like just dumping more RAM into your box and then booting it with a production pool attached can lead to great sadness. Don't do that. Your hardware needs to be solid and validated, and if it isn't, you can scramble your bits, possibly irretrievably.

This isn't a complete or comprehensive guide, more like a work-in-progress.

The Build Process

It is tempting to just rip open all those boxes you got, put it together, and pray. Bad idea.

1) Set up a suitable workspace.

1a) If you are in a low humidity environment, pay extra special attention to environmental static controls, including making sure that the clothes you're wearing have been conditioned with fabric softener. Fabric softener can be diluted with water in a sprayer and applied to carpets, which also reduces the nasty winter zaps!

1b) All computer assembly work should be done on top of an anti-static mat. Fry's, NewEgg, and Amazon all have these for around $20. This is not optional! Static damage can subtly damage your silicon.

1c) Assemble while wearing an anti-static wrist strap. These are available from $5-$20 at those retailers and elsewhere. Ideally you should wear some ESD gloves. Even the cheap $1-a-pair ones will not only help ESD but will also keep skin oils off your surfaces and help reduce contaminants.

1d) Make sure you hook up the wires from your anti-static gear to a proper ground.

1e) Do not wear static-prone clothing, particularly many synthetics. Short sleeves and shorts can help reduce static!

Handle all components only in your anti-static environment, preferably by their edges, never by exposed contacts. A lot of people like to install their RAM and CPU on a mainboard prior to installing in the case, but this increases the number of components in play at a time. It is ideal to install the mainboard in the chassis first, then ground the chassis, and then install components one at a time. In a small chassis build, this may be impractical, and even in a large chassis it could make installation of the CPU tricky. Only remove components from the packages as they are actually required. Resist the urge to unpack everything and spread it out. Extra handling is extra risk. Keep your mind on grounding and careful handling.

Make sure that when mounting your mainboard, that it is securely supported at all points where the chassis offers screw holes. Make sure that the chassis doesn't have any screw standoffs in places where the motherboard does not have a hole; these can short out a motherboard.

Tighten all screws until you feel moderate resistance, then give just a bit more of a twist. You want a solidly seated screw, not loose, not stripped.

Smoke Test

Our name for the initial power-on test. Computers run on smoke. Once the smoke comes out, they stop working.

You should smoke test on the bench with the chassis open so that you can visually inspect.

All cards should be fully seated, meaning that among other things you should see an even amount of the board's copper fingers exposed along the length of the socket.

All DIMM modules should be fully seated. A properly seated DIMM will include the clips on the side being fully engaged, which is a fairly quick visual test. Also verify that the module's copper fingers appear to be uniform.

Power it on and make sure that the hardware manifest agrees with what the BIOS reports. Observe to make sure that all your fans are spinning, and take some time to address any buzzy or unpleasant vibration noise.

Configuration

In the BIOS:

Reset the BIOS settings to defaults.

Configure server to power on after power loss (if this is desired)

Configure power button to NOT shut off host immediately upon press. A momentary press of the power button is supposed to send a signal to the OS to shut down cleanly and power off. It's a good idea to test that once the OS is installed.

Configure the boot priority to prioritize the USB or SATA DOM, and if possible, disable booting from the data disks.

Burn-In and Testing

The burn-in and testing may be separated out or done as kind of a commingled whole. It may disappoint you to discover that proper burn-in and testing actually requires more than a thousand hours of time - a month and a half. This is the time during which you want infant mortality to strike.

Be sure to run an exhaustive test of memtest86 on all the system memory. If you have chosen to play with fire by not buying an ECC system, this is really the only significant opportunity you have to discover problems without your pool being at risk. If you have ECC, try to identify the way in which your mainboard manufacturer provides failure logging (BIOS, BMC/IPMI, etc). Any memory failures should be addressed. Do not rely on ECC to repair problems in a problematic module. The memtest86 tests can be run several times throughout the burn-in period to validate your memory. Don't just run one pass. Run it for a week or two.

Find and run one of the CPU-stressing utilities such as "CPUstress" (http://www.ultimatebootcd.com/) and run this for a day, and monitor the temperature and fan behaviour in your chassis. Especially if you are attempting to build a "quiet" NAS, now is the opportunity to make sure that your cooling strategy is going to work.

Run SMART tests on all your drives, including the manufacturer's conveyance test and the long test. For this purpose it might be convenient to configure up an instance of FreeNAS.

SMART is not sufficient to weed out bad drives, though it is a good thing to run. It is unable to identify subtle problems in the SATA/SAS channels to your drives, for example. In order to validate both the drives and their ability to successfully shuffle data, be sure to run plenty of tests on them. I suggest:

1) Individual sequential read and write tests. This is basically just using "dd if=/dev/da${n} of=/dev/null bs=1048576" to do a read test, and "dd if=/dev/zero of=/dev/da${n} bs=1048576" to do a write test. Note the reported read/write speeds at the end and compare them to both the other drives in the pool and also any benchmark stats you can find for that particular drive on the Internet. They should be very close.

2) Simultaneous sequential read and write tests. This is using those same tests in parallel.

3) I am kind of lazy so I will then have it do seek testing by starting multiple tests per drive and separating them by a minute or two each. The drives will do lots of seeking within a relatively small locality. Other people like to use different tools to do this; I have absolutely no objection but as I noted I'm kind of lazy.

4) I've recently provided a tool to assist: https://forums.freenas.org/index.php?resources/solnet-array-test.1/

NO AMOUNT OF STRESS should result in the kernel reporting timeouts or other issues communicating with your drives. If it does - update drivers, reseat cables, Google for known problems with your drives or controller, upgrade drive firmware, and/or generally fix your busted hardware. You cannot use a system throwing spurious storage subsystem errors for ZFS.

Once you've hit a system with tests like these for a few weeks, confidence in the platform should be established. Then you move on to installing FreeNAS, setting up ZFS, and beating on that for awhile... For this part, check out the solnet-array-test Resource.
Author
jgreco
Views
29,630
First release
Last update
Rating
5.00 star(s) 4 ratings

More resources from jgreco

Latest reviews

Thanks!

I modified the `dd` commands to be like:
`dd if=/dev/da${n} of=/dev/null bs=1M status=progress`

The `status=progress` bit makes it give status updates as it goes, which is nice
Just retired, one of my first projects is to streamline the wild grown family's infrastructure. After soldering a 100 ohm resistor into our Synology DS415 to get it running again my confidence in that system was broken and a proper NAS had gained top level priority. As a former DEC system engineer I was really pleased to read this guide. I am convienced, that a lot of problems we have today have their root cause in hastily assembled systems that got over the counter without any testing. For me it's kind of Zen to get the system a little bit matured before I put my life in pictures on it.

Thank you for that, good work, indeed.
A good guide that helps a newbie like myself who has never built a computer from parts.
The HDD burning process could be augmented but another resource discuss it as well (https://forums.freenas.org/index.php?resources/hard-drive-burn-in-testing.92/). Merging the two or at least linking them and making them coherent would be a nice upgrade.
Great I saw this now before I start copying over data.
jgreco
jgreco
Yep. The time to discover your FreeNAS box falls over when stressed is before you start loading all your data on it. ;-)
Top