Critical Error During Bad Block Testing

Charles Rhoades · Nov 12, 2015

I've just built a new server and have started the HDD burn-in tests listed here: https://forums.freenas.org/index.php?threads/how-to-hard-drive-burn-in-testing.21451/#post-124942

The drives have passed the Conveyance, Short, and Long SMART tests, and I've kicked off the bad blocks testing. I have 6 Seagate 4TB NAS drives set up in a RaidZ2 volume. I started the bad blocks tests for each drive sequentially in the Shell, and made it through ADA4. When I entered the command for ADA5, the Shell window disappeared, and I cannot re-open. Looking at the Reporting for the Drives, it appears that all are running fine except for ADA5. Shortly thereafter I received an Alert System error report: "CRITICAL: The volume Vol1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected." I haven't stopped the testing yet, it has been running for about 6 hrs now, and I expect it will take a couple days to complete. How should I proceed? Should I continue the tests until they finish on the "good drives" or immediately stop the bad blocks testing and identify the problem drive(s), and start the RMA on the bad drive(s)? How do I safely stop the testing if I cannot reopen SHELL?

solarisguy · Nov 12, 2015

Do you have data in the volume Vol1 and you are running badblocks at the same time ? ? ?

What was the exact command line you had used to start badblocks ? ? ?

I gather that you did not use tmux. You have to learn how to use it.

Enable SSH. Login to your system using SSH. Post here (as CODE) the output of
smartctl -a /dev/ada5
and output of
zpool status Vol1

You should kill all the badblocks processes and do not attempt to start them until you do not fix issues with your volume Vol1 and learn how to use tmux.

Bidule0hm · Nov 12, 2015

First you should have used tmux in a SSH session.

Then, why did you run badblocks on drives already in a pool? that's a very very bad idea.

Edit: solarisguy has been faster...

Charles Rhoades · Nov 12, 2015

No data in the machine at all. Brand new and clean.
I used the Shell and here are the commands:
sysctl kern.geom.debugflags=0x10
tmux
badblocks -ws /dev/ada0
Ctrl+B "
badblocks -ws /dev/ada1
Ctrl+B "
badblocks -ws /dev/ada2
Ctrl+B "
badblocks -ws /dev/ada3
Ctrl+B "
badblocks -ws /dev/ada4
Ctrl+B "
badblocks -ws /dev/ada5

Charles Rhoades · Nov 12, 2015

Well..I'm a noob. I followed the video instructions on initial setting up very carefully, and they set up a Volume and Share and Snapshot. The first thing I figured out was the email setup, then the UPS, then I started with the Burn in as described by qwertymodo. My plan was to follow up with Memtest+. No where did I find any warnings about testing after setting up volumes or anything else. Just doing the best I can. I don't know SSH.

Charles Rhoades · Nov 12, 2015

How do I safely stop the bad blocks testing?

Bidule0hm · Nov 12, 2015

1. Setup the SSH service (look at the services tab in the GUI)

2. Connect to the server with a SSH client

3. Re-attach the tmux session (tmux a) and cancel the commands with Ctrl + C

Or if you can't re-attach the tmux session:

3. Run top to see the PID(s) of the badblocks process(es)

4. kill the badblocks process(es), we will provide the command when you'll be at this step (because it's a dangerous command)

Charles Rhoades · Nov 12, 2015

Ok...I setup SSH and turned it on...but it fails to complete the turning on.

Charles Rhoades · Nov 12, 2015

in the pc world, I'd just re-boot. The server isn't responding

solarisguy · Nov 12, 2015

Yes, you can reboot from the console. Does the console (not the Web GUI) respond?

Important Announcement for the TrueNAS Community.

Critical Error During Bad Block Testing

Charles Rhoades

Dabbler

solarisguy

Guru

Bidule0hm

Server Electronics Sorcerer

Charles Rhoades

Dabbler

Charles Rhoades

Dabbler

Charles Rhoades

Dabbler

Bidule0hm

Server Electronics Sorcerer

Charles Rhoades

Dabbler

Charles Rhoades

Dabbler

solarisguy

Guru

Similar threads