SOLVED zpools on shelf go offline (messages included)

Status
Not open for further replies.

mlinton

Dabbler
Joined
Jun 20, 2013
Messages
24
Good Afternoon Everyone....

Here is what I have...
Supermicro 16Bay Server
- X7DBE
- 2x Intel Xeon
- 32GB ECC-RAM
- CSE-836TQ-R800B

IBM M1015 (IT Mode)
- Port 1 of the M1015 has a fan out cable going to the front bays of the chassis. Currently, there are only 3, 2 TB drives in RAIDZ (used for backups / mirrors). This pool has been working great although one of my drives is starting fail so the RMA process has begun. That is here nor there...

- Port 2 jumps over to an internal to external SAS adapter, heads out the back of the chassis and plugs into a DELL MD1220 (x24, 2.5 inch hard drive shelf with dual controller modules). Here is where my problem is.

There is only one zpool on the shelf configured as:
- x6, 300 GB SAS drives (two RAIDZ vdev's)
- x1, 60 GB SSD (Cache)
- x1, 60 GB SSD (LOG)
- x1, 300 GB SAS (Spare)

This pool used exclusively for ESXi storage via NFS. The pool is also replicated (or was rather) being replicated to the other internal pool that I mentioned earlier. Usually, this pool works just fine for my needs, 6-12 VMs with relatively low IO needs. Until a few days ago. I would get an email that looks like this...

Device: /dev/da12 [SAT], not capable of SMART self-check

Device: /dev/da11 [SAT], Read SMART Self-Test Log Failed

Device: /dev/da1 [SAT], 40 Currently unreadable (pending) sectors

Device: /dev/da11 [SAT], not capable of SMART self-check

Device: /dev/da11 [SAT], failed to read SMART Attribute Data

Device: /dev/da12 [SAT], Read SMART Self-Test Log Failed

Device: /dev/da10 [SAT], Read SMART Error Log Failed

Device: /dev/da8, failed to read SMART values

Device: /dev/da10 [SAT], Read SMART Self-Test Log Failed The volume zpool02_vmware (ZFS) state is UNAVAIL: One or more devices are faulted in response to IO failures.

Device: /dev/da10 [SAT], failed to read SMART Attribute Data

Device: /dev/da12 [SAT], failed to read SMART Attribute Data

Device: /dev/da10 [SAT], not capable of SMART self-check

Followed up by tons of SYSTEM DOWN emails -- because the zpool was going offline.

During all of this, the other pool (which also has a few small CIFS shares on it) remains online and the rest of the system works as expected. Unfortunately, all of my VMs quickly find themselves without access to storage and become inaccessible. Both the shelf and the server have dual power supplies going to separate battery backup units. The system is also cables to the first UPS to shut down at low battery.

For the last couple of days, once or twice a day this pool goes down. According to the GUI it is simply "unknown" -- yet, clicking on "view disk" shows all of the disk that I expect to see.

I have been hacking at this all morning and cannot seem to find the problem. I even tried rebuilding the pool yesterday (and running long test on all of my drives before hand). For some reason, the SMART reporting on one or more of those drives kicks the entire pool offline.

Interestingly, rebooting the FreeNAS box will bring everything back online and it acts like nothing ever happened. Other than my entire environment having to be rebooted from the ground up (which sucks) there are not permanent problems.

Please Advise, between this and three VERY loud young children -- my brain is fried.

FreeNAS messages are attached. I went back just before the crash and then into the reboot.

Thanks All,
Mark
 

Attachments

  • freenas_messages.txt
    58.7 KB · Views: 247

mlinton

Dabbler
Joined
Jun 20, 2013
Messages
24
oops, I forgot to mention -- da1 is the drive with a known issue in the good pool, da3 and up are a part of the pool with issues.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Port 2 jumps over to an internal to external SAS adapter, heads out the back of the chassis and plugs into a DELL MD1220 (x24, 2.5 inch hard drive shelf with dual controller modules). Here is where my problem is.
Exactly ;)

But seriously, unless your M1015 is on the wrong firmware revision, it seems likely that something along the route from there to the drives is the issue. I'm probably stating the obvious though...
 

mlinton

Dabbler
Joined
Jun 20, 2013
Messages
24
I'm glad I found my own problem. Even if I was wrong!

Just for kicks...
mps0@pci0:9:0:0: class=0x010700 card=0x30201000 chip=0x00721000 rev=0x03 hdr=0x00
vendor = 'LSI Logic / Symbios Logic'
device = 'SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon]'
class = mass storage
subclass = SAS


FreeNAS Verion: FreeNAS-9.3-STABLE-201506232120
 

mlinton

Dabbler
Joined
Jun 20, 2013
Messages
24
Well.... After three smooth days it happened again. It occurred while running a time machine backup for a client computer to zpool01 (the good pool). I wonder if it has something to do with system load.... Here are some messages for anyone with thoughts...

Jul 1 13:51:10 MLTS-NKNAS01 smartd[3597]: Device: /dev/da1 [SAT], 40 Currently unreadable (pending) sectors
Jul 1 13:51:21 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00 length 49152 SMID 471 terminated ioc 804b scsi 0 state c xfer 0
Jul 1 13:51:21 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00
Jul 1 13:51:21 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:21 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:21 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 b3 a8 00 00 08 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c7 e8 00 00 20 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 b3 a8 00 00 08 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c7 e8 00 00 20 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: SCSI Status Error
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI status: Check Condition
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
Jul 1 13:51:22 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command (per sense data)
Jul 1 13:51:23 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 b3 a8 00 00 08 00 length 4096 SMID 469 terminated ioc 804b scsi 0 state 0 xfer 0
Jul 1 13:51:23 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c7 e8 00 00 20 00 length 16384 SMID 509 terminated ioc 804b scsi 0 state 0 xfer 0
Jul 1 13:51:23 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00 length 49152 SMID 98 terminated ioc 804b scsi 0 state 0 xfer 0
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c7 e8 00 00 20 00
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: CCB request aborted by the host
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 b3 a8 00 00 08 00
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: CCB request aborted by the host
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): Retrying command
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): WRITE(10). CDB: 2a 00 00 43 c5 e8 00 00 60 00
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): CAM status: CCB request aborted by the host
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): Error 5, Retries exhausted
Jul 1 13:51:24 MLTS-NKNAS01 da5 at mps0 bus 0 scbus0 target 19 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da5: <SEAGATE ST9300605SS 0004> s/n 6XP5EDNV0000B3369V2A detached
Jul 1 13:51:24 MLTS-NKNAS01 da4 at mps0 bus 0 scbus0 target 18 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da4: <TOSHIBA MBF2300RC 5704> s/n EB10PC4040PT detached
Jul 1 13:51:24 MLTS-NKNAS01 da7 at mps0 bus 0 scbus0 target 21 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da7: <SEAGATE ST9300605SS 0004> s/n 6XP5E6350000B337QBN6 detached
Jul 1 13:51:24 MLTS-NKNAS01 da6 at mps0 bus 0 scbus0 target 20 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da6: <SEAGATE ST9300605SS 0004> s/n 6XP5E60S0000B337QB9X detached
Jul 1 13:51:24 MLTS-NKNAS01 da3 at mps0 bus 0 scbus0 target 17 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da3: <SEAGATE ST9300605SS 0004> s/n 6XP5E5QV0000B337QBHN detached
Jul 1 13:51:24 MLTS-NKNAS01 da8 at mps0 bus 0 scbus0 target 22 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da8: <TOSHIBA MBF2300RC 5704> s/n EB10PC4041N7 detached
Jul 1 13:51:24 MLTS-NKNAS01 da11 at mps0 bus 0 scbus0 target 30 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da11: <ATA KINGSTON SV300S3 BBF0> s/n 50026B775504339C detached
Jul 1 13:51:24 MLTS-NKNAS01 da9 at mps0 bus 0 scbus0 target 23 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da9: <TOSHIBA MBF2300RC 5704> s/n EB10PC40421T detached
Jul 1 13:51:24 MLTS-NKNAS01 da10 at mps0 bus 0 scbus0 target 26 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da9: <TOSHIBA MBF2300RC 5704> s/n EB10PC40421T detached
Jul 1 13:51:24 MLTS-NKNAS01 da10 at mps0 bus 0 scbus0 target 26 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da10: <ATA HGST HTS721010A9 A3J0> s/n JS10106200N7UT detached
Jul 1 13:51:24 MLTS-NKNAS01 ses0 at mps0 bus 0 scbus0 target 25 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 ses0: <DELL MD1220 1.01> detached
Jul 1 13:51:24 MLTS-NKNAS01 da12 at mps0 bus 0 scbus0 target 31 lun 0
Jul 1 13:51:24 MLTS-NKNAS01 da12: <ATA KINGSTON SV300S3 BBF0> s/n 50026B775504374A detached
Jul 1 13:51:24 MLTS-NKNAS01 (ses0:mps0:0:25:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da5p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da5p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da4p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da4p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da7p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da7p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da6p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da6p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da3p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da3p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da8p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da8p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Device da9p1.eli destroyed.
Jul 1 13:51:24 MLTS-NKNAS01 GEOM_ELI: Detached da9p1.eli on last close.
Jul 1 13:51:24 MLTS-NKNAS01 (da12:mps0:0:31:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da11:mps0:0:30:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da10:mps0:0:26:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da9:mps0:0:23:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da8:mps0:0:22:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da7:mps0:0:21:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da6:mps0:0:20:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da5:mps0:0:19:0): Periph destroyed
Jul 1 13:51:24 MLTS-NKNAS01 (da4:mps0:0:18:0): Periph destroyed
Jul 1 13:51:25 MLTS-NKNAS01 (da3:mps0:0:17:0): Periph destroyed
Jul 1 13:51:27 MLTS-NKNAS01 (probe9:mps0:0:25:0): REPORT LUNS. CDB: a0 00 00 00 00 00 00 00 00 10 00 00
Jul 1 13:51:27 MLTS-NKNAS01 (probe9:mps0:0:25:0): CAM status: SCSI Status Error
Jul 1 13:51:27 MLTS-NKNAS01 (probe9:mps0:0:25:0): SCSI status: Check Condition
Jul 1 13:51:27 MLTS-NKNAS01 (probe9:mps0:0:25:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Jul 1 13:51:27 MLTS-NKNAS01 (probe9:mps0:0:25:0): Retrying command (per sense data)
Jul 1 13:51:27 MLTS-NKNAS01 ses0 at mps0 bus 0 scbus0 target 25 lun 0
Jul 1 13:51:27 MLTS-NKNAS01 ses0: <DELL MD1220 1.01> Fixed Enclosure Services SCSI-5 device
Jul 1 13:51:27 MLTS-NKNAS01 ses0: 600.000MB/s transfers
Jul 1 13:51:27 MLTS-NKNAS01 ses0: Command Queueing enabled
Jul 1 13:51:27 MLTS-NKNAS01 ses0: SCSI-3 ENC Device
Jul 1 13:51:27 MLTS-NKNAS01 zfsd: Replace vdev(zpool02_vmware/16307615575899142963) by physical path(attach): cannot replace 16307615575899142963 with /dev/gptid/9c7d276a-1baa$
Jul 1 13:51:27 MLTS-NKNAS01 zfsd: Event exceeds event size limit of 8192 bytes.
Jul 1 13:51:27 MLTS-NKNAS01 zfsd: Truncated 11 characters from event.
Jul 1 13:51:28 MLTS-NKNAS01 ses0: da4,pass5: SAS Device Slot Element: 2 Phys
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 0: SATA device
Jul 1 13:51:28 MLTS-NKNAS01 da3 at mps0 bus 0 scbus0 target 30 lun 0
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 500c04f253bfe912
Jul 1 13:51:28 MLTS-NKNAS01 da3: ses0: phy 1: SAS device type 0 id 1
Jul 1 13:51:28 MLTS-NKNAS01 syslog-ng[5038]: Error processing log message: <ATA KINGSTON SV300S3 BBF0> Fixed Direct Access SCSI-6 device
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( None )
Jul 1 13:51:28 MLTS-NKNAS01 syslog-ng[5038]: Error processing log message: <ATA KINGSTON SV300S3 BBF0> Fixed Direct Access SCSI-6 device
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( None )
Jul 1 13:51:28 MLTS-NKNAS01 da3: Serial Number 50026B775504339C
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 1: parent 0 addr 0
Jul 1 13:51:28 MLTS-NKNAS01 da3: 600.000MB/s transfers
Jul 1 13:51:28 MLTS-NKNAS01 da3: Command Queueing enabled
Jul 1 13:51:28 MLTS-NKNAS01 da3: 57241MB (117231408 512 byte sectors: 255H 63S/T 7297C)
Jul 1 13:51:28 MLTS-NKNAS01 da4 at mps0 bus 0 scbus0 target 31 lun 0
Jul 1 13:51:28 MLTS-NKNAS01 da4: <ATA KINGSTON SV300S3 BBF0> Fixed Direct Access SCSI-6 device
Jul 1 13:51:28 MLTS-NKNAS01 da4: Serial Number 50026B775504374A
Jul 1 13:51:28 MLTS-NKNAS01 da4: 600.000MB/s transfers
Jul 1 13:51:28 MLTS-NKNAS01 da4: Command Queueing enabled
Jul 1 13:51:28 MLTS-NKNAS01 da4: 57241MB (117231408 512 byte sectors: 255H 63S/T 7297C)
Jul 1 13:51:28 MLTS-NKNAS01 ses0: da3,pass4: SAS Device Slot Element: 2 Phys
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 0: SATA device
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 500c04f253bfe913
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 1: SAS device type 0 id 1
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( None )
Jul 1 13:51:28 MLTS-NKNAS01 ses0: phy 1: parent 0 addr 0
Jul 1 13:51:30 MLTS-NKNAS01 da5 at mps0 bus 0 scbus0 target 26 lun 0
Jul 1 13:51:30 MLTS-NKNAS01 da5: <ATA HGST HTS721010A9 A3J0> Fixed Direct Access SCSI-6 device
Jul 1 13:51:30 MLTS-NKNAS01 da5: Serial Number JS10106200N7UT
Jul 1 13:51:30 MLTS-NKNAS01 da5: 600.000MB/s transfers
Jul 1 13:51:30 MLTS-NKNAS01 da5: Command Queueing enabled
Jul 1 13:51:30 MLTS-NKNAS01 da5: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)
Jul 1 13:51:31 MLTS-NKNAS01 da6 at mps0 bus 0 scbus0 target 19 lun 0
Jul 1 13:51:31 MLTS-NKNAS01 da6: <SEAGATE ST9300605SS 0004> Fixed Direct Access SCSI-6 device
Jul 1 13:51:31 MLTS-NKNAS01 da6: Serial Number 6XP5EDNV0000B3369V2A
Jul 1 13:51:31 MLTS-NKNAS01 da6: 600.000MB/s transfers
Jul 1 13:51:31 MLTS-NKNAS01 da6: Command Queueing enabled
Jul 1 13:51:31 MLTS-NKNAS01 da6: 286102MB (585937500 512 byte sectors: 255H 63S/T 36472C)
Jul 1 13:51:31 MLTS-NKNAS01 ses0: da6,pass7: SAS Device Slot Element: 2 Phys
Jul 1 13:51:31 MLTS-NKNAS01 da7 at mps0 bus 0 scbus0 target 17 lun 0
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: SAS device type 1 id 0
Jul 1 13:51:31 MLTS-NKNAS01 da7: ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 syslog-ng[5038]: Error processing log message: <SEAGATE ST9300605SS 0004> Fixed Direct Access SCSI-6 device
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 5000c5005fc6a495
Jul 1 13:51:31 MLTS-NKNAS01 da7: Serial Number 6XP5E5QV0000B337QBHN
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: SAS device type 1 id 1
Jul 1 13:51:31 MLTS-NKNAS01 da7: 600.000MB/s transfersses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: parent 500c04f253bfe9bf addr 5000c5005fc6a496
Jul 1 13:51:31 MLTS-NKNAS01 da7: Command Queueing enabled
Jul 1 13:51:31 MLTS-NKNAS01 da7: 286102MB (585937500 512 byte sectors: 255H 63S/T 36472C)
Jul 1 13:51:31 MLTS-NKNAS01 ses0: da7,pass8: SAS Device Slot Element: 2 Phys
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: SAS device type 1 id 0
Jul 1 13:51:31 MLTS-NKNAS01 da8 at mps0 bus 0 scbus0 target 21 lun 0
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 5000c5005fd3789d
Jul 1 13:51:31 MLTS-NKNAS01 da8: ses0: phy 1: SAS device type 1 id 1
Jul 1 13:51:31 MLTS-NKNAS01 syslog-ng[5038]: Error processing log message: <SEAGATE ST9300605SS 0004> Fixed Direct Access SCSI-6 device
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 da8: Serial Number 6XP5E6350000B337QBN6
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 da8: Serial Number 6XP5E6350000B337QBN6
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: parent 500c04f253bfe9bf addr 5000c5005fd3789e
Jul 1 13:51:31 MLTS-NKNAS01 da8: 600.000MB/s transfers
Jul 1 13:51:31 MLTS-NKNAS01 da8: Command Queueing enabled
Jul 1 13:51:31 MLTS-NKNAS01 da8: 286102MB (585937500 512 byte sectors: 255H 63S/T 36472C)
Jul 1 13:51:31 MLTS-NKNAS01 ses0: da8,pass9: SAS Device Slot Element: 2 Phys
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: SAS device type 1 id 0
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 5000c5005fd3638d
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: SAS device type 1 id 1
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:31 MLTS-NKNAS01 ses0: phy 1: parent 500c04f253bfe9bf addr 5000c5005fd3638e
Jul 1 13:51:35 MLTS-NKNAS01 ses0: da9,pass10: SAS Device Slot Element: 2 Phys
Jul 1 13:51:35 MLTS-NKNAS01 da10 at mps0 bus 0 scbus0 target 18 lun 0
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: SAS device type 1 id 0
Jul 1 13:51:35 MLTS-NKNAS01 da10: ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:35 MLTS-NKNAS01 syslog-ng[5038]: Error processing log message: <TOSHIBA MBF2300RC 5704> Fixed Direct Access SCSI-5 device
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 50000393f8305e1a
Jul 1 13:51:35 MLTS-NKNAS01 da10: Serial Number EB10PC4040PT
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 1: SAS device type 1 id 1
Jul 1 13:51:35 MLTS-NKNAS01 da10: 600.000MB/s transfersses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 1: parent 500c04f253bfe9bf addr 50000393f8305e1b
Jul 1 13:51:35 MLTS-NKNAS01 da10: Command Queueing enabled
Jul 1 13:51:35 MLTS-NKNAS01 da10: 286102MB (585937500 512 byte sectors: 255H 63S/T 36472C)
Jul 1 13:51:35 MLTS-NKNAS01 da9 at mps0 bus 0 scbus0 target 22 lun 0
Jul 1 13:51:35 MLTS-NKNAS01 da9: <TOSHIBA MBF2300RC 5704> Fixed Direct Access SCSI-5 device
Jul 1 13:51:35 MLTS-NKNAS01 da9: Serial Number EB10PC4041N7
Jul 1 13:51:35 MLTS-NKNAS01 da9: 600.000MB/s transfers
Jul 1 13:51:35 MLTS-NKNAS01 da9: Command Queueing enabled
Jul 1 13:51:35 MLTS-NKNAS01 da9: 286102MB (585937500 512 byte sectors: 255H 63S/T 36472C)
Jul 1 13:51:35 MLTS-NKNAS01 da11 at mps0 bus 0 scbus0 target 23 lun 0
Jul 1 13:51:35 MLTS-NKNAS01 da11: <TOSHIBA MBF2300RC 5704> Fixed Direct Access SCSI-5 device
Jul 1 13:51:35 MLTS-NKNAS01 da11: Serial Number EB10PC40421T
Jul 1 13:51:35 MLTS-NKNAS01 da11: 600.000MB/s transfers
Jul 1 13:51:35 MLTS-NKNAS01 da11: Command Queueing enabled
Jul 1 13:51:35 MLTS-NKNAS01 da11: 286102MB (585937500 512 byte sectors: 255H 63S/T 36472C)
Jul 1 13:51:35 MLTS-NKNAS01 ses0: da10,pass11: SAS Device Slot Element: 2 Phys
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: SAS device type 1 id 0
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 50000393f82adf5e
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 1: SAS device type 1 id 1
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 1: parent 500c04f253bfe9bf addr 50000393f82adf5f
Jul 1 13:51:35 MLTS-NKNAS01 ses0: da11,pass12: SAS Device Slot Element: 2 Phys
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: SAS device type 1 id 0
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:35 MLTS-NKNAS01 ses0: phy 0: parent 500c04f253bfe93f addr 50000393f830717e
Jul 1 13:51:36 MLTS-NKNAS01 ses0: phy 1: SAS device type 1 id 1
Jul 1 13:51:36 MLTS-NKNAS01 ses0: phy 1: protocols: Initiator( None ) Target( SSP )
Jul 1 13:51:36 MLTS-NKNAS01 ses0: phy 1: parent 500c04f253bfe9bf addr 50000393f830717f
Jul 1 13:51:39 MLTS-NKNAS01 da12 at mps0 bus 0 scbus0 target 20 lun 0
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Jul 1 13:51:27 MLTS-NKNAS01 (probe9:mps0:0:25:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
Did you power cycle something? Right before this the system lost contact with a whole bunch of drives, just as if they were hot-swapped out.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What is the output of "sas2flash -listall"?
 

mlinton

Dabbler
Joined
Jun 20, 2013
Messages
24
Robert, Thank you for the suggestion. Both the server and shelf have dual power supplies. Each side goes to a separate APC 1500 battery backup unit. Even if one failed a self test or something, the other should have picked up. The server also remained online and no work was being done around that cabinet.

Cyberjock, as requested...

[root@MLTS-NKNAS01] ~# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

Adapter Selected is a LSI SAS: SAS2008(B2)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
----------------------------------------------------------------------------

0 SAS2008(B2) 16.00.00.00 10.00.00.06 07.29.00.00 00:09:00:00

Finished Processing Commands Successfully.
Exiting SAS2Flash.
[root@MLTS-NKNAS01] ~#


Thank you both for your help!
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
The server also remained online and no work was being done around that cabinet.
Well something caused that device to reset, temporarily knocking out a whole bunch of drives. They all came back a few seconds later.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The sas2flash output was as expected.. so that doesn't give any clue what happened.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Try cleaning the PCI-e slot and card. Can't hurt, so might as well start there.
 

mlinton

Dabbler
Joined
Jun 20, 2013
Messages
24
Well... this really sucks....

So, there I was...

I'm working on the cabinet adjacent to the one with the FreeNAS box (finishing up some work on a firewall), when all of a sudden I hear a "BZZZZZZZZ" -- then a bunch of fans go "wwwwwWWWWWWWEEEERRRRRRRR"... (you know where this is going)...

Then my phone blows up with system down messages.

So, I take a VERY close look at everything and as expected it all looks ok... except for the fact that the zpool just went off line. Turns out, this disk shelf is the ONLY piece of Dell equipment that we have and unfortunately that lowers our familiarity with some of it. As it turns out, just because the Dell power supplies show a green LED, it does not mean that they are, how should I put this? TURNED ON!!

One of the power supplies had TWO GREEN LED's and it was plugged into a FAILED UPS. The other one with ONE GREEN LED was plugged into a GOOD UPS. Add all that together and you get a shelf that goes off line every 2-3 days when the one UPS fails it's self test. Because it was a backup unit, it was not being monitored... now it is.

All that to say... Ya'll are great, I feel retarded and everything has been working like a champ since figuring this out...

God bless pfsense, building that firewall fixed my FreeNAS problem.

Good job Robert, you were right!

Thanks again!
 
Status
Not open for further replies.
Top