Critical Error During Bad Block Testing

Status
Not open for further replies.

Charles Rhoades

Dabbler
Joined
Oct 27, 2015
Messages
34
I've just built a new server and have started the HDD burn-in tests listed here: https://forums.freenas.org/index.php?threads/how-to-hard-drive-burn-in-testing.21451/#post-124942

The drives have passed the Conveyance, Short, and Long SMART tests, and I've kicked off the bad blocks testing. I have 6 Seagate 4TB NAS drives set up in a RaidZ2 volume. I started the bad blocks tests for each drive sequentially in the Shell, and made it through ADA4. When I entered the command for ADA5, the Shell window disappeared, and I cannot re-open. Looking at the Reporting for the Drives, it appears that all are running fine except for ADA5. Shortly thereafter I received an Alert System error report: "CRITICAL: The volume Vol1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected." I haven't stopped the testing yet, it has been running for about 6 hrs now, and I expect it will take a couple days to complete. How should I proceed? Should I continue the tests until they finish on the "good drives" or immediately stop the bad blocks testing and identify the problem drive(s), and start the RMA on the bad drive(s)? How do I safely stop the testing if I cannot reopen SHELL?
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Do you have data in the volume Vol1 and you are running badblocks at the same time ? ? ?

What was the exact command line you had used to start badblocks ? ? ?

I gather that you did not use tmux. You have to learn how to use it.

Enable SSH. Login to your system using SSH. Post here (as CODE) the output of
smartctl -a /dev/ada5
and output of
zpool status Vol1

You should kill all the badblocks processes and do not attempt to start them until you do not fix issues with your volume Vol1 and learn how to use tmux.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
First you should have used tmux in a SSH session.

Then, why did you run badblocks on drives already in a pool? that's a very very bad idea.

Edit: solarisguy has been faster...
 

Charles Rhoades

Dabbler
Joined
Oct 27, 2015
Messages
34
No data in the machine at all. Brand new and clean.
I used the Shell and here are the commands:
sysctl kern.geom.debugflags=0x10
tmux
badblocks -ws /dev/ada0
Ctrl+B "
badblocks -ws /dev/ada1
Ctrl+B "
badblocks -ws /dev/ada2
Ctrl+B "
badblocks -ws /dev/ada3
Ctrl+B "
badblocks -ws /dev/ada4
Ctrl+B "
badblocks -ws /dev/ada5
 

Charles Rhoades

Dabbler
Joined
Oct 27, 2015
Messages
34
Well..I'm a noob. I followed the video instructions on initial setting up very carefully, and they set up a Volume and Share and Snapshot. The first thing I figured out was the email setup, then the UPS, then I started with the Burn in as described by qwertymodo. My plan was to follow up with Memtest+. No where did I find any warnings about testing after setting up volumes or anything else. Just doing the best I can. I don't know SSH.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
1. Setup the SSH service (look at the services tab in the GUI)

2. Connect to the server with a SSH client

3. Re-attach the tmux session (tmux a) and cancel the commands with Ctrl + C

Or if you can't re-attach the tmux session:

3. Run top to see the PID(s) of the badblocks process(es)

4. kill the badblocks process(es), we will provide the command when you'll be at this step (because it's a dangerous command)
 
Status
Not open for further replies.
Top