FreeNAS stops responding after running badblock

Slavik · Jun 28, 2016

I'm new user. Getting myself familiar with the FreeNAS.
- 9.10-STABLE-201606072003 installed as VM on ESXI 6.0
- DELL PowerEdge 610 with 48 GB ECC Memory. VM has 24 GB of dedicated (reserved) RAM and 3 vCPU
- H200 HBA passed-through to VM
- 1 SATA HDD (Seagate 3TB) connected to H200. The system see it as /dev/da1

*Steps:*
- I read about burn-in and decided to run:
[root@freenas] ~# badblocks -b 4096 -ns /dev/da1
The HDD had some bad blocks. Ops, not good. I'll not use it.

- However, after running badblock for 20+ hours, it prints this to console:
badblocks: Device not configured during test data write, block 487188762

And now I can't connect to web UI - system not responding. Later, I found, that I need restart VM to restore it.
I'm still connected to SSH console, but it feels super slow.

Here is some output from dmesg (complete output attached).

Code:

(da1:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:mps0:0:3:0): Retrying command (per sense data)
        (da1:mps0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 248 terminated ioc 804b scsi 0 state c xfer 131072
        (da1:mps0:0:3:0): READ(10). CDB: 28 00 e8 4f 48 c8 00 00 08 00 length 4096 SMID 318 terminated ioc 804b scsi 0 state c xfer 0(da1:mps0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00

(da1:mps0:0:3:0): CAM status: CCB request completed with an error
(da1:mps0:0:3:0): Retrying command
(da1:mps0:0:3:0): READ(10). CDB: 28 00 e8 4f 48 c8 00 00 08 00
(da1:mps0:0:3:0): CAM status: CCB request completed with an error
(da1:mps0:0:3:0): Error 5, Retries exhausted
(da1:mps0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da1:mps0:0:3:0): CAM status: SCSI Status Error
(da1:mps0:0:3:0): SCSI status: Check Condition
(da1:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:mps0:0:3:0): Error 6, Retries exhausted
(da1:mps0:0:3:0): Invalidating pack

Of course, I'll not use that HDD and get another one. My plan is to get 3 of them into RAIDZ1.

However, my question is: why system becomes unstable, as one hard drive is failing?

Slavik · Jun 28, 2016

Here is complete dmesg output

Robert Trevellyan · Jun 28, 2016

ST3000DM001-9YN1

This model is notorious for sudden early failure.

Slavik said:
why system becomes unstable, as one hard drive is failing?

A failing drive, especially a desktop model, can easily cause a system to become unresponsive, due to long intervals spent retrying commands.

SweetAndLow · Jun 28, 2016

Smart data for drive?

Slavik · Jun 28, 2016

SweetAndLow said:
Smart data for drive?

SMART data is bad. But I'm still concerned, why bad drive can bring the system into bad state.

Is this right command to get SMART data?:

Code:

[root@freenas] /var/tmp# smartctl -l selftest /dev/da1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     17696         1681925312
# 2  Conveyance offline  Completed: read failure       90%     17696         1681925312
# 3  Conveyance offline  Completed: read failure       90%     17696         1681925312
# 4  Short offline       Completed: read failure       90%     17696         1681925312
# 5  Extended offline    Completed: read failure       90%     17436         96040
# 6  Short offline       Completed: read failure       10%     17031         96040

Also, I see this in the FreeNAS log:

Code:

Jun 28 11:59:57 freenas smartd[2206]: Device: /dev/da1 [SAT], 40 Currently unreadable (pending) sectors
Jun 28 11:59:57 freenas smartd[2206]: Device: /dev/da1 [SAT], 40 Offline uncorrectable sectors

I'll replace the drive and configure RAIDZ1 tonight and see how it goes...

maglin · Jun 28, 2016

SMART data is retrieved with
smartclt -a /dev/dax where x is the drive number.

Your badblocks should be
badblocks -ws -b 4096 /dev/dax

The -ns is for non destructive test. You want to actually write data to every block at least 4x.

You also need to set up the io subsystem so it can write into the MBR of the drive. You can find that in the thread for HDD burn in.

System response I can't comment on but if you had already made a pool with that drive before the badblocks and if your system dataset was on it I could see that effecting FreeNAS.

Sent from my iPhone using Tapatalk

SweetAndLow · Jun 28, 2016

Was there a pool configured on this drive? The issue with system dataset being on this drive could bring the system down.

Slavik · Jun 28, 2016

SweetAndLow said:
Was there a pool configured on this drive? The issue with system dataset being on this drive could bring the system down.

Yes, the drive had one pool on the single drive.

The system itself, FreeNAS, was installed on VM storage (SSD).

SweetAndLow · Jun 28, 2016

Freenas store most of its configuration files on your zfs pool. So when you did badblocks and your drive started to fail everything on that drive was ruined. That is what slowed down your system. It makes complete sense and that is what should if happened.

Important Announcement for the TrueNAS Community.

FreeNAS stops responding after running badblock

Slavik

Dabbler

Slavik

Dabbler

Attachments

Robert Trevellyan

Pony Wrangler

SweetAndLow

Sweet'NASty

Slavik

Dabbler

maglin

Patron

SweetAndLow

Sweet'NASty

Slavik

Dabbler

SweetAndLow

Sweet'NASty

Similar threads