SMART questions + a little issue

Status
Not open for further replies.

razvanc.mobile

Dabbler
Joined
Oct 19, 2015
Messages
16
Hey guys,

I'm having a small issue with my FreeNAS server, FreeNAS-9.3-STABLE-201509282017

My hardware is

SR2612UR Chassis, with 12 hotswap 3.5 drive cages
single L5630 CPU
12 GB ram ECC
Intel RS2WC040 raid controller (LSI2008)
12 x 1 TB HDD (will post make and models of each )

The controller is not a true HBA (i mean it has the original intel firmware, not reflashed to IT mode), but it has the option of exporting the drives as JBOD.

This is my RaidZ1 pool. this server is used only for backups (it has an 8 TB iscsi target exported to a windows machine running Veeam B&R, and a smaller dataset exported as NFS to a proxmox 4.0 server).



Code:
  pool: storage
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        storage                                         ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/a6a38928-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/a78b17c9-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/a85f63ae-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/a8f0e77a-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/a984d1a4-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/aa34a38b-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/aaf61800-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/abb6de05-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/ac9cb3dc-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/ad833005-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0
            gptid/ae56d58c-75a5-11e5-9681-001e6728a258  ONLINE       0     0     1
            gptid/af1aab04-75a5-11e5-9681-001e6728a258  ONLINE       0     0     0





So i just saw a message in the UI today, about this error.
As i see it's a checksum error on one of the drives.

I'm trying to identify the drive like this

Code:
camcontrol devlist
<ATA ST31000524NS SN12>            at scbus0 target 14 lun 0 (pass0)
<ATA ST1000NC001-1DY1 CN01>        at scbus0 target 15 lun 0 (pass1)
<ATA ST1000NC001-1DY1 CN01>        at scbus0 target 16 lun 0 (pass2)
<ATA ST31000524NS SN12>            at scbus0 target 17 lun 0 (pass3)
<INTEL SR2612UR I106>              at scbus0 target 18 lun 0 (ses0,pass4)
<ATA WDC WD1002FAEX-0 1D05>        at scbus0 target 19 lun 0 (pass5)
<ATA Hitachi HUA72201 A3EA>        at scbus0 target 20 lun 0 (pass6)
<ATA Hitachi HUA72201 A3EA>        at scbus0 target 21 lun 0 (pass7)
<ATA Hitachi HUA72201 A3EA>        at scbus0 target 22 lun 0 (pass8)
<ATA ST31000528AS CC38>            at scbus0 target 23 lun 0 (pass9)
<ATA Hitachi HUA72201 A3EA>        at scbus0 target 24 lun 0 (pass10)
<ATA Hitachi HUA72201 A3EA>        at scbus0 target 25 lun 0 (pass11)
<ATA Hitachi HUA72201 A3EA>        at scbus0 target 26 lun 0 (pass12)
<Kingston DataTraveler 2.0 PMAP>   at scbus2 target 0 lun 0 (pass13,da0)

Code:
glabel status
                                      Name  Status  Components
gptid/a85f63ae-75a5-11e5-9681-001e6728a258     N/A  mfisyspd0p2
gptid/a8f0e77a-75a5-11e5-9681-001e6728a258     N/A  mfisyspd1p2
gptid/a984d1a4-75a5-11e5-9681-001e6728a258     N/A  mfisyspd2p2
gptid/aa34a38b-75a5-11e5-9681-001e6728a258     N/A  mfisyspd3p2
gptid/aaf61800-75a5-11e5-9681-001e6728a258     N/A  mfisyspd4p2
gptid/abb6de05-75a5-11e5-9681-001e6728a258     N/A  mfisyspd5p2
gptid/ac9cb3dc-75a5-11e5-9681-001e6728a258     N/A  mfisyspd6p2
gptid/ad833005-75a5-11e5-9681-001e6728a258     N/A  mfisyspd7p2
gptid/ae56d58c-75a5-11e5-9681-001e6728a258     N/A  mfisyspd8p2
gptid/af1aab04-75a5-11e5-9681-001e6728a258     N/A  mfisyspd9p2
gptid/a6a38928-75a5-11e5-9681-001e6728a258     N/A  mfisyspd10p2
gptid/a78b17c9-75a5-11e5-9681-001e6728a258     N/A  mfisyspd11p2
gptid/83846e35-7415-11e5-86c0-001e6728a258     N/A  da0p1


Apparently the issue is with the mfisyspd8

Code:
 mfiutil show drives
mfi0 Physical Drives:
14 (  931G) JBOD <ST31000524NS SN12 serial=9WK39MH1> SATA E1:S1
15 (  931G) JBOD <ST1000NC001-1DY1 CN01 serial=Z1D2V8C6> SATA E1:S2
16 (  931G) JBOD <ST1000NC001-1DY1 CN01 serial=Z1D2V9GP> SATA E1:S3
17 (  931G) JBOD <ST31000524NS SN12 serial=9WK3A4YP> SATA E1:S4
19 (  931G) JBOD <WDC WD1002FAEX-0 1D05 serial=WD-WCATR5822785> SATA E1:S9
20 (  931G) JBOD <Hitachi HUA72201 A3EA serial=JPW9H0N01EMMKV> SATA E1:S10
21 (  931G) JBOD <Hitachi HUA72201 A3EA serial=JPW9H0N01E8GBV> SATA E1:S11
22 (  931G) JBOD <Hitachi HUA72201 A3EA serial=JPW9H0N01EML1V> SATA E1:S12
23 (  931G) JBOD <ST31000528AS CC38 serial=5VP3W9HN> SATA E1:S5
24 (  931G) JBOD <Hitachi HUA72201 A3EA serial=JPW9L0N10JLJBV> SATA E1:S6
25 (  931G) JBOD <Hitachi HUA72201 A3EA serial=JPW9H0N01ELD1V> SATA E1:S7
26 (  931G) JBOD <Hitachi HUA72201 A3EA serial=JPW9H0N01EM34V> SATA E1:S8


As i can see, the disk at target 23 is this
23 ( 931G) JBOD <ST31000528AS CC38 serial=5VP3W9HN> SATA E1:S5, which i assume is the same as
<ATA ST31000528AS CC38> at scbus0 target 23 lun 0 (pass9)


So the device with the checksum error is pass9.
Indeed, running smartctl -a -d sat /dev/pass9 i get plenty of smart errors.

Which brings me to my two questions:

1. How can i identify, using mfisyspdXX numerotation which mfisyspd device corresponds to /dev/pass9.
2. I'm unable to use smart monitoring in the UI, no smart tests can be initiated, probably because of the not-so-true-HBA controller i'm using.


The drives show up as mfisyspd 0 to 11 in the freenas UI, and smartctl gives error on /dev/mfisyspdX (No such file or directory) , but works just fine on /dev/pass0->11. So how can i make freenas see my drives with /dev/passX naming, or is there another way of making smartctl check my drives automaticaly, without me having to run it manually?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
@cyberjock @DrKK @jgreco Holy crap that's a 12-wide raidz1 vdev. Now there's something ya don't see every day.

I don't have experience with mfi. It feels like you're playing with fire with this setup. I think it would be a good idea to redesign your server so that you are (1) using IT firmware and (2) have more vdevs in your pool.
 
Last edited:

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Impressive. Ignoring quite a few recommendations that directly ZFS' reliability.

What anodos recommended is totally what you should do. You have much bigger problems than just a failing disk. Backup the data, go to IT firmware and rebuild your zpool with 2 (or more) vdevs.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526

razvanc.mobile

Dabbler
Joined
Oct 19, 2015
Messages
16
The failing disk is not the actual issue I've posted for. I can rebuild/change the zpool anytime. I can also identify the drives by simply looking at the serial number and just pull it out of the hotswap and replace it.

My 2 questions were, and honestly they won't change even if i make 2 raidz3 vdevs out of the 12 drives.

-any way to "map" the /dev/mfisyspd[0-11] devices to /dev/pass[0-11] to find which mfisyspd[] corresponds to the pass[] device?
-related, any way i can tell freenas to look at the smartdata using a specific command, AND integrate it in the UI? i can manually run smartctl -a -d sat /dev/pass[], or even script it.

PS. Do not get me wrong please. I value your advice, and i am aware of the risk of running a 12 drive raidz1. Simply put, in my setup, i value capacity more than safety. Losing less than 10% of the capacity by using raidz1 on zfs instead of single drives with lvm or plain old md raid0 is ok, but losing more than that isn't.


PPS. I've no idea why the quick specs say "No" regarding JBOD on intel's site, while the manual says otherwise. Anyway, the drives are configured as passtrough, not as individual raid0 arrays.

Code:
Up to 64 physical drives in passthrough mode (JBOD) and up to 16 physical drives in up
to 16 RAID arrays per controller in RAID mode.
Drives not configured as part of a RAID array can be configured as “pass through” drives
in Non-RAID mode
Support RAID levels 0, 1, 5, 10, and 50
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The controller should be crossflashable to LSI SAS 9211 IT mode.
 
Status
Not open for further replies.
Top