Oracle 7320 ZFS Appliance and J4410 Diskshelf problem with PMC Sierra 8001

Status
Not open for further replies.

MrZ

Cadet
Joined
May 4, 2018
Messages
4
Hi

So this is my first post on freenas forums (sorry long one) and i have some trouble with old hardware that i want to try to run Freenas on but i hit a brick wall :/ I work as a sysadmin with Unix/Linux since well a long time and one of the reasons that i opted for Freenas was that i have never used freebsd so a lot of new commands and stuff to learn :) and this was my thoughts to ease in to it...

So about my problem(s)
I have errors with multipathing and CAM status.
All of a sudden one path fails and never recovers and to top it off kernel panic and the system takes a dive... I have multiple boxes like this so i switchet around cables HBA's Disk Shellfs and so on but i cant seam to get a stabel system out of my hardware :/.

Errors like this popps in and out
Code:
May  6 09:06:50 freenas (da20:pmspcbsd0:0:20:0): CAM status: SCSI Status Error
May  6 09:06:50 freenas (da20:pmspcbsd0:0:20:0): SCSI status: Check Condition
May  6 09:06:50 freenas (da20:pmspcbsd0:0:20:0): SCSI sense: ABORTED COMMAND asc:4b,5 (Data offset error)

May  6 23:39:33 freenas.local GEOM_MULTIPATH: Error 5, da27 in disk8 marked FAIL
May  6 23:39:33 freenas.local GEOM_MULTIPATH: da5 is now active path in disk8
May  6 23:42:41 freenas.local swap_pager: indefinite wait buffer: bufobj: 0, blkno: 15184, size: 20480


Its stated that Freenas should have suport for PMC Sierra 8001 cards and i see it pop up in dmesg
pcib11: <ACPI PCI-PCI bridge> at device 9.0 on pci0
pci11: <ACPI PCI bus> on pcib11
pmspcv0: <PMC Sierra SPC SAS-SATA Card> mem 0xdf6f0000-0xdf6fffff,0xdf6e0000-0xdf6effff,0xdf6d0000-0xdf6dffff,0xdf6c0000-0xdf6cffff irq 32 at device 0.0 on pci11
pci0: <base peripheral, interrupt controller> at device 20.0 (no driver attached)
(noperiph\:mspcbsd0:0:-1:ffffffff): rescan already queued
(noperiph\:mspcbsd0:0:-1:ffffffff): rescan already queued
pci0: <base peripheral, interrupt controller> at device 20.1 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.2 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 20.3 (no driver attached)

does pmspcv0: <PMC.... means i have suport for the card im guessing so ?

But when i look with
kldstat -n pmspcv.ko
kldstat: can't find file pmspcv.ko: No such file or directory

Code:
# kldstat -d
Id Refs Address			Size	 Name
 1   72 0xffffffff80200000 20a0000  kernel
 2	1 0xffffffff82631000 ffe3c	ispfw.ko
 3	1 0xffffffff82731000 7f2a	 freenas_sysctl.ko
 4	1 0xffffffff82811000 84a6	 ipmi.ko
 5	1 0xffffffff8281a000 ef2	  smbus.ko
 6	1 0xffffffff8281b000 333885   vmm.ko
 7	1 0xffffffff82b4f000 3108	 nmdm.ko
 8	1 0xffffffff82b53000 101c6	geom_mirror.ko
 9	1 0xffffffff82b64000 46c3	 geom_stripe.ko
10	1 0xffffffff82b69000 fbad	 geom_raid3.ko
11	1 0xffffffff82b79000 16e56	geom_raid5.ko
12	1 0xffffffff82b90000 59c9	 geom_gate.ko
13	1 0xffffffff82b96000 4d68	 geom_multipath.ko
14	1 0xffffffff82b9b000 837	  dtraceall.ko
15	9 0xffffffff82b9c000 41e31	dtrace.ko
16	1 0xffffffff82bde000 48f4	 dtmalloc.ko
17	1 0xffffffff82be3000 5b4e	 dtnfscl.ko
18	1 0xffffffff82be9000 67f3	 fbt.ko
19	1 0xffffffff82bf0000 58e8a	fasttrap.ko
20	1 0xffffffff82c49000 1741	 sdt.ko
21	1 0xffffffff82c4b000 bf02	 systrace.ko
22	1 0xffffffff82c57000 c082	 systrace_freebsd32.ko
23	1 0xffffffff82c64000 5452	 profile.ko
24	1 0xffffffff82c6a000 1bbc9	hwpmc.ko
25	1 0xffffffff82c86000 d006	 t3_tom.ko
26	2 0xffffffff82c94000 4626	 toecore.ko
27	1 0xffffffff82c99000 15e3a	t4_tom.ko


I found some page hinting on load it as a module in loader.conf
Code:
vi /boot/loader.conf
pmspcv_load="YES"

still the same output on kldstat -n pmspcv.ko
kldstat: can't find file pmspcv.ko: No such file or directory

To load a module dosent it need to be present under /boot/kernel/ or /boot/modules ???

Disk/HW info
Code:
# camcontrol devlist
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 0 lun 0 (da0,pass2)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 1 lun 0 (da1,pass3)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 2 lun 0 (da2,pass4)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 3 lun 0 (da3,pass5)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 4 lun 0 (da4,pass6)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 5 lun 0 (da5,pass7)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 6 lun 0 (da6,pass8)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 7 lun 0 (da7,pass9)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 8 lun 0 (da8,pass10)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 9 lun 0 (da9,pass11)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 10 lun 0 (da10,pass12)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 11 lun 0 (da11,pass13)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 12 lun 0 (da12,pass14)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 13 lun 0 (da13,pass15)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 14 lun 0 (da14,pass16)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 15 lun 0 (da15,pass17)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 16 lun 0 (da16,pass18)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 17 lun 0 (da17,pass19)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 18 lun 0 (da18,pass20)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 19 lun 0 (da19,pass21)
<STEC ZeusIOPs G3 E12B>			at scbus0 target 20 lun 0 (da20,pass22)
<STEC ZeusIOPs G3 E12B>			at scbus0 target 21 lun 0 (da21,pass23)
<SUN Storage J4410 3529>		   at scbus0 target 22 lun 0 (pass24,ses0)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 23 lun 0 (da22,pass25)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 24 lun 0 (da23,pass26)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 25 lun 0 (da24,pass27)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 26 lun 0 (da25,pass28)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 27 lun 0 (da26,pass29)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 28 lun 0 (da27,pass30)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 29 lun 0 (da28,pass31)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 30 lun 0 (da29,pass32)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 31 lun 0 (da30,pass33)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 32 lun 0 (da31,pass34)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 33 lun 0 (da32,pass35)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 34 lun 0 (da33,pass36)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 35 lun 0 (da34,pass37)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 36 lun 0 (da35,pass38)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 37 lun 0 (da36,pass39)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 38 lun 0 (da37,pass40)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 39 lun 0 (da38,pass41)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 40 lun 0 (da39,pass42)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 41 lun 0 (da40,pass43)
<SEAGATE ST32000SSSUN2.0T 061A>	at scbus0 target 42 lun 0 (da41,pass44)
<STEC ZeusIOPs G3 E12B>			at scbus0 target 43 lun 0 (da42,pass45)
<STEC ZeusIOPs G3 E12B>			at scbus0 target 44 lun 0 (da43,pass46)
<SUN Storage J4410 3529>		   at scbus0 target 45 lun 0 (pass47,ses1)
<SEAGATE ST95000NSSUN500G 1109M2L0ZG SF04>  at scbus1 target 0 lun 0 (pass0,ada0)
<SEAGATE ST95000NSSUN500G 1108M2KAAX SF04>  at scbus2 target 0 lun 0 (pass1,ada1)


# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: none requested
config:

		NAME		STATE	 READ WRITE CKSUM
		freenas-boot  ONLINE	   0	 0	 0
		  ada0p2	ONLINE	   0	 0	 0

errors: No known data errors

  pool: master
 state: ONLINE
  scan: none requested
config:

		NAME											STATE	 READ WRITE CKSUM
		master										  ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/27696752-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/282c244a-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/28f1c382-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/29c004de-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2a92f968-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2b5e73f2-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2c3d6af3-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2d0d63c3-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2dd6801f-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2eac8189-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/2f84c5a5-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/30568a9b-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/31420621-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/322ad8b6-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/330e489b-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/33f0ec7e-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/34ca6c34-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/3598abcc-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/366c7863-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
			gptid/374313e3-5146-11e8-a0b0-002128c0ab6e  ONLINE	   0	 0	 0
		logs
		  gptid/38602bc5-5146-11e8-a0b0-002128c0ab6e	ONLINE	   0	 0	 0
		cache
		  gptid/38e94a9a-5146-11e8-a0b0-002128c0ab6e	ONLINE	   0	 0	 0

errors: No known data errors

# glabel status
									  Name  Status  Components
gptid/971fb895-4f6c-11e8-88d1-002128c0ab6e	 N/A  ada0p1
gptid/27696752-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk3p2
gptid/282c244a-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk4p2
gptid/28f1c382-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk5p2
gptid/29c004de-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk6p2
gptid/2a92f968-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk7p2
gptid/2b5e73f2-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk8p2
gptid/2c3d6af3-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk9p2
gptid/2d0d63c3-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk10p2
gptid/2dd6801f-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk11p2
gptid/2eac8189-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk12p2
gptid/2f84c5a5-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk13p2
gptid/30568a9b-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk14p2
gptid/31420621-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk15p2
gptid/322ad8b6-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk16p2
gptid/330e489b-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk17p2
gptid/33f0ec7e-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk18p2
gptid/34ca6c34-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk19p2
gptid/3598abcc-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk20p2
gptid/366c7863-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk21p2
gptid/374313e3-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk22p2
gptid/38602bc5-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk1p1
gptid/38e94a9a-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk2p1
gptid/2e943c2a-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk12p1
gptid/2dc39b10-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk11p1
gptid/2cfc418c-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk10p1
gptid/2c25fd54-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk9p1
gptid/2b486738-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk8p1
gptid/2a836521-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk7p1
gptid/29a5beba-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk6p1
gptid/28e10dcf-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk5p1
gptid/281dfa3d-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk4p1
gptid/275b601f-5146-11e8-a0b0-002128c0ab6e	 N/A  multipath/disk3p1



P.S sorry for long post :/
 
Last edited:
D

dlavigne

Guest
Were you able to resolve this? If not, it might be worth creating a report at bugs.freenas.org. If you do, post the issue number here.
 
  • Like
Reactions: MrZ

MrZ

Cadet
Joined
May 4, 2018
Messages
4
so far i have run test on both 7320 servers and also tested 2 diffrent shelfs but same kind of errors

I removed the J4410 disk shelf and added a supermicro 45 disk shelf instead no multipathing but same cam errors and drives getting disabled all of a sudden.

I also moved around sas cabels and replaced them aswell same errors.

Today i got SAS cables from oracle 530-3883-01 the one that are specified in zfs appliance documentation and i will try those tonight (i have used standard 8088 cabels from simmular systems like hp/netapp )

Other OS
I tested centos 7.5 and xubuntu 18.04 but same kind of errors when using 7320 and supermicro shelf.

OEL 7.5
Tested oracle linux and this seams to work i dont get any cam errors and multipathing and so on works (no benchmarking and so on done yet)

https://docs.oracle.com/cd/E93554_01/E95779/E95779.pdf
pm80xx 0.1.38 PMC-Sierra PM8001/8006/8081/8088/8089/8074/8076/8077/8070/8072 SAS/ SATA controller driver

EDIT
added the oracle sas cables worket for around 2 hours then

Code:

May 14 18:43:29 freenas2.skynet.local kernel: arp: 192.168.66.11 moved from 02:36:90:00:0c:0a to 00:21:28:c0:ab:6e on epair0b
May 14 19:58:35 freenas2.skynet.local kernel: arp: 192.168.66.11 moved from 02:36:90:00:0c:0a to 00:21:28:c0:ab:6e on epair0b
May 14 20:50:09 freenas2.skynet.local syslog-ng[2014]: syslog-ng shutting down; version='3.7.3'
May 14 20:50:10 freenas2.skynet.local syslog-ng[23888]: syslog-ng starting up; version='3.7.3'
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): WRITE(10). CDB: 2a 00 58 40 33 50 00 00 08 00
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): CAM status: CCB request aborted by the host
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): Retrying command
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): WRITE(10). CDB: 2a 00 58 40 33 50 00 00 08 00
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): CAM status: CCB request aborted by the host
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): Retrying command
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): WRITE(10). CDB: 2a 00 58 40 33 50 00 00 08 00
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): CAM status: CCB request aborted by the host
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): Retrying command
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): WRITE(10). CDB: 2a 00 58 40 33 50 00 00 08 00
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): CAM status: CCB request aborted by the host
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): Retrying command
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): WRITE(10). CDB: 2a 00 58 40 33 50 00 00 08 00
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): CAM status: CCB request aborted by the host
May 14 20:51:52 freenas2.skynet.local (da30:pmspcbsd0:0:31:0): Error 5, Retries exhausted
May 14 20:51:52 freenas2.skynet.local GEOM_MULTIPATH: Error 5, da30 in disk11 marked FAIL
May 14 20:51:52 freenas2.skynet.local GEOM_MULTIPATH: da8 is now active path in disk11
May 14 20:54:49 freenas2.skynet.local /alert.py: [system.alert:393] Alert module '<samba4.Samba4Alert object at 0x814ccada0>' failed: timed out


root@freenas2:~ # gpart list multipath/disk11
Geom name: multipath/disk11
modified: false
state: OK
fwheads: 255
fwsectors: 63
last: 3907029127
first: 40
entries: 152
scheme: GPT
Providers:
1. Name: multipath/disk11p1
   Mediasize: 2147483648 (2.0G)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 65536
   Mode: r1w1e1
   rawuuid: 2dc39b10-5146-11e8-a0b0-002128c0ab6e
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 2147483648
   offset: 65536
   type: freebsd-swap
   index: 1
   end: 4194431
   start: 128
2. Name: multipath/disk11p2
   Mediasize: 1998251360256 (1.8T)
   Sectorsize: 512
   Stripesize: 0
   Stripeoffset: 2147549184
   Mode: r1w1e2
   rawuuid: 2dd6801f-5146-11e8-a0b0-002128c0ab6e
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b
   label: (null)
   length: 1998251360256
   offset: 2147549184
   type: freebsd-zfs
   index: 2
   end: 3907029119
   start: 4194432
Consumers:
1. Name: multipath/disk11
   Mediasize: 2000398933504 (1.8T)
   Sectorsize: 512
   Mode: r2w2e5

freeze...
smartctl -a -d scsi -T permissive /dev/da11

freeze...
root@freenas2:~ # diskinfo -t /dev/multipath/disk11
/dev/multipath/disk11
		512			 # sectorsize
		2000398933504   # mediasize in bytes (1.8T)
		3907029167	  # mediasize in sectors
		0			   # stripesize
		0			   # stripeoffset
		243201		  # Cylinders according to firmware.
		255			 # Heads according to firmware.
		63			  # Sectors according to firmware.
		SEAGATE ST32000SSSUN2.0T		# Disk descr.
		001111L49N4H		9WM49N4H	# Disk ident.
		id1,enc@n500163600050ea3d/type@0/slot@9/elmdesc@DISK_08 # Physical path
		Not_Zoned	   # Zone Mode

Seek times:
		Full stroke: freeze...

after this its kernel panic and reboot...
 
Last edited:
Status
Not open for further replies.
Top