FreeNAS stops responding after running badblock

Status
Not open for further replies.

Slavik

Dabbler
Joined
Jun 6, 2016
Messages
39
I'm new user. Getting myself familiar with the FreeNAS.
- 9.10-STABLE-201606072003 installed as VM on ESXI 6.0
- DELL PowerEdge 610 with 48 GB ECC Memory. VM has 24 GB of dedicated (reserved) RAM and 3 vCPU
- H200 HBA passed-through to VM
- 1 SATA HDD (Seagate 3TB) connected to H200. The system see it as /dev/da1

*Steps:*
- I read about burn-in and decided to run:
[root@freenas] ~# badblocks -b 4096 -ns /dev/da1
The HDD had some bad blocks. Ops, not good. I'll not use it.

- However, after running badblock for 20+ hours, it prints this to console:
badblocks: Device not configured during test data write, block 487188762

And now I can't connect to web UI - system not responding. Later, I found, that I need restart VM to restore it.
I'm still connected to SSH console, but it feels super slow.

Here is some output from dmesg (complete output attached).
Code:
(da1:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:mps0:0:3:0): Retrying command (per sense data)
        (da1:mps0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 248 terminated ioc 804b scsi 0 state c xfer 131072
        (da1:mps0:0:3:0): READ(10). CDB: 28 00 e8 4f 48 c8 00 00 08 00 length 4096 SMID 318 terminated ioc 804b scsi 0 state c xfer 0(da1:mps0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00

(da1:mps0:0:3:0): CAM status: CCB request completed with an error
(da1:mps0:0:3:0): Retrying command
(da1:mps0:0:3:0): READ(10). CDB: 28 00 e8 4f 48 c8 00 00 08 00
(da1:mps0:0:3:0): CAM status: CCB request completed with an error
(da1:mps0:0:3:0): Error 5, Retries exhausted
(da1:mps0:0:3:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00
(da1:mps0:0:3:0): CAM status: SCSI Status Error
(da1:mps0:0:3:0): SCSI status: Check Condition
(da1:mps0:0:3:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da1:mps0:0:3:0): Error 6, Retries exhausted
(da1:mps0:0:3:0): Invalidating pack


Of course, I'll not use that HDD and get another one. My plan is to get 3 of them into RAIDZ1.

However, my question is: why system becomes unstable, as one hard drive is failing?
 

Slavik

Dabbler
Joined
Jun 6, 2016
Messages
39
Here is complete dmesg output
 

Attachments

  • dmesg.txt
    25.6 KB · Views: 250

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
ST3000DM001-9YN1
This model is notorious for sudden early failure.
why system becomes unstable, as one hard drive is failing?
A failing drive, especially a desktop model, can easily cause a system to become unresponsive, due to long intervals spent retrying commands.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Smart data for drive?
 

Slavik

Dabbler
Joined
Jun 6, 2016
Messages
39
Smart data for drive?
SMART data is bad. But I'm still concerned, why bad drive can bring the system into bad state.

Is this right command to get SMART data?:
Code:
[root@freenas] /var/tmp# smartctl -l selftest /dev/da1
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     17696         1681925312
# 2  Conveyance offline  Completed: read failure       90%     17696         1681925312
# 3  Conveyance offline  Completed: read failure       90%     17696         1681925312
# 4  Short offline       Completed: read failure       90%     17696         1681925312
# 5  Extended offline    Completed: read failure       90%     17436         96040
# 6  Short offline       Completed: read failure       10%     17031         96040


Also, I see this in the FreeNAS log:
Code:
Jun 28 11:59:57 freenas smartd[2206]: Device: /dev/da1 [SAT], 40 Currently unreadable (pending) sectors
Jun 28 11:59:57 freenas smartd[2206]: Device: /dev/da1 [SAT], 40 Offline uncorrectable sectors


I'll replace the drive and configure RAIDZ1 tonight and see how it goes...
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
SMART data is retrieved with
smartclt -a /dev/dax where x is the drive number.

Your badblocks should be
badblocks -ws -b 4096 /dev/dax

The -ns is for non destructive test. You want to actually write data to every block at least 4x.

You also need to set up the io subsystem so it can write into the MBR of the drive. You can find that in the thread for HDD burn in.

System response I can't comment on but if you had already made a pool with that drive before the badblocks and if your system dataset was on it I could see that effecting FreeNAS.


Sent from my iPhone using Tapatalk
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Was there a pool configured on this drive? The issue with system dataset being on this drive could bring the system down.
 

Slavik

Dabbler
Joined
Jun 6, 2016
Messages
39
Was there a pool configured on this drive? The issue with system dataset being on this drive could bring the system down.
Yes, the drive had one pool on the single drive.

The system itself, FreeNAS, was installed on VM storage (SSD).
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Freenas store most of its configuration files on your zfs pool. So when you did badblocks and your drive started to fail everything on that drive was ruined. That is what slowed down your system. It makes complete sense and that is what should if happened.
 
Status
Not open for further replies.
Top