X10SL7-F scrub causes LSI 2308 to reset and removes drives from zpool

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
They are on the same power cable from the PSU. I'm not using a SAS expander of SATA breakout cable, all drives are plugged in with separate SATA cables. I'm under the impression that the SAS ports are also compatible with SATA cables, let me know if not.
Okay, well I will try manually making a zpool with a 250 GB WD and a 320 GB WD drive and test that setup to try to determine whether its the drives or the controller and then I will report back!

The cables are the same, yes.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
Sorry about the delay in getting this testing done! Been busy over the weekend.

Here are the things I have tried so far:
  1. A zpool mirror made up of 320 GB 2.5” WD and 320 GB 2.5” Seagate connected via the LSI 2308 ports, I copied 20 GB of data to it and then ran a scrub and it ran fine. However I decided that two old 2.5” drives may not be up to the bandwidth required to trigger the error if it was related to that.
  2. I borrowed 4 x 1 TB 3.5” drives and setup a raidz2 zpool consisting of two Samsung, one WD and one Seagate. Copied several hundred GB to it and ran scrubs and concurrent copies. No error.
  3. A friend of mine has identical hardware so I took my two boot USB’s and my 4 x 4 TB drives to his place and swapped my disks into his box and then ran a scrub and it worked fine on his LSI 2308 on v16 Firmware
  4. My 4 x 4 TB drives again and encountered the same error as before 2 % in to a scrub of the 1.24 TB of data.
So I’m not sure what else I should test. I haven’t given the new 4 x 1 TB disks a hard workload as I’m not sure how to go about it. Any tips?

What would it be about my disks that would cause the 2308 controller to reset?

All my drives passed the manufacturer software tests in Windows when I tested them last week, so I don't think I would be able to RMA any of my drives..

For my testing so far I have simply copied 100-200 GB of data over a 1000 mbit network to my FreeNAS box and run a concurrent scrub.

I don't know how to use dd that well but I have run the following commands:

dd testing carried out:

Code:
[root@****** /mnt/Test# dd if=/dev/zero of=/mnt/Test/ddfile bs=2048k count=100000

100000+0 records in

100000+0 records out

209715200000 bytes transferred in 87.948327 secs (2384527456 bytes/sec)

[root@*****] /mnt/Test# dd of=/dev/zero if=/mnt/Test/ddfile bs=2048k count=100000

100000+0 records in

100000+0 records out

209715200000 bytes transferred in 19.755141 secs (10615727814 bytes/sec)

[root@*****] /mnt/Test# dd of=/dev/zero if=/mnt/Test/ddfile bs=2048k count=1000000

100000+0 records in

100000+0 records out

209715200000 bytes transferred in 19.802995 secs (10590074903 bytes/sec)

[root@*****] /mnt/Test# dd of=/dev/zero if=/mnt/Test/ddfile bs=2048k count=100000000

100000+0 records in

100000+0 records out

209715200000 bytes transferred in 19.761201 secs (10612472313 bytes/sec)

[root@*****] /mnt/Test# dd if=/dev/zero of=/mnt/Test/ddfile bs=2048k count=800000

800000+0 records in

800000+0 records out

1677721600000 bytes transferred in 695.820471 secs (2411141479 bytes/sec)

[root@*****] /mnt/Test# dd of=/dev/zero if=/mnt/Test/ddfile bs=2048k count=800000

800000+0 records in

800000+0 records out

1677721600000 bytes transferred in 160.763501 secs (10435960823 bytes/sec)


dmesg:
Code:
Copyright (c) 1992-2014 The FreeBSD Project.

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

    The Regents of the University of California. All rights reserved.

FreeBSD is a registered trademark of The FreeBSD Foundation.

FreeBSD 9.3-RELEASE-p8 #2 r275790+6365d9c: Sat Jan 24 09:16:18 PST 2015

    root@build3.ixsystems.com:/tank/home/jkh/build/93/FN/objs/os-base/amd64/fusion/jkh/93/FN/FreeBSD/src/sys/FREENAS.amd64 amd64

gcc version 4.2.1 20070831 patched [FreeBSD]

CPU: Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz (3400.07-MHz K8-class CPU)

  Origin = "GenuineIntel"  Id = 0x306c3  Family = 0x6  Model = 0x3c  Stepping = 3
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

  Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,<b11>,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>

  AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>

  AMD Features2=0x21<LAHF,ABM>

  Standard Extended Features=0x2fbb<GSFSBASE,TSCADJ,SMEP,ENHMOVSB,INVPCID>

  TSC: P-state invariant, performance statistics

real memory  = 17716740096 (16896 MB)

avail memory = 16481837056 (15718 MB)

Event timer "LAPIC" quality 600

ACPI APIC Table: <SUPERM SMCI--MB>

FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs

FreeBSD/SMP: 1 package(s) x 4 core(s) x 2 SMT threads

cpu0 (BSP): APIC ID:  0

cpu1 (AP): APIC ID:  1

cpu2 (AP): APIC ID:  2

cpu3 (AP): APIC ID:  3

cpu4 (AP): APIC ID:  4

cpu5 (AP): APIC ID:  5

cpu6 (AP): APIC ID:  6

cpu7 (AP): APIC ID:  7

WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.

ACPI Warning: FADT (revision 5) is longer than ACPI 5.0 version, truncating length 268 to 256 (20111123/tbfadt-325)

ioapic0 <Version 2.0> irqs 0-23 on motherboard

ispfw: registered firmware <isp_1040>

ispfw: registered firmware <isp_1040_it>

ispfw: registered firmware <isp_1080>

ispfw: registered firmware <isp_1080_it>

ispfw: registered firmware <isp_12160>

ispfw: registered firmware <isp_12160_it>

ispfw: registered firmware <isp_2100>

ispfw: registered firmware <isp_2200>

ispfw: registered firmware <isp_2300>

ispfw: registered firmware <isp_2322>

ispfw: registered firmware <isp_2400>

ispfw: registered firmware <isp_2400_multi>

ispfw: registered firmware <isp_2500>

ispfw: registered firmware <isp_2500_multi>

kbd1 at kbdmux0

cryptosoft0: <software crypto> on motherboard

aesni0: <AES-CBC,AES-XTS> on motherboard

padlock0: No ACE support.

acpi0: <SUPERM SMCI--MB> on motherboard

acpi0: Power Button (fixed)

cpu0: <ACPI CPU> on acpi0

cpu1: <ACPI CPU> on acpi0

cpu2: <ACPI CPU> on acpi0

cpu3: <ACPI CPU> on acpi0

cpu4: <ACPI CPU> on acpi0

cpu5: <ACPI CPU> on acpi0

cpu6: <ACPI CPU> on acpi0

cpu7: <ACPI CPU> on acpi0

hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0

Timecounter "HPET" frequency 14318180 Hz quality 950

Event timer "HPET" frequency 14318180 Hz quality 550

atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0

Event timer "RTC" frequency 32768 Hz quality 0

attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0

Timecounter "i8254" frequency 1193182 Hz quality 0

Event timer "i8254" frequency 1193182 Hz quality 100

Timecounter "ACPI-fast" frequency 3579545 Hz quality 900

acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0

pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0

pci0: <ACPI PCI bus> on pcib0

pcib1: <ACPI PCI-PCI bridge> irq 16 at device 1.0 on pci0

pci1: <ACPI PCI bus> on pcib1

pcib2: <ACPI PCI-PCI bridge> irq 16 at device 1.1 on pci0

pci2: <ACPI PCI bus> on pcib2

mps0: <LSI SAS2308> port 0xe000-0xe0ff mem 0xf7240000-0xf724ffff,0xf7200000-0xf723ffff irq 17 at device 0.0 on pci2

mps0: Firmware: 16.00.01.00, Driver: 16.00.00.00-fbsd

mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>

pci0: <serial bus, USB> at device 20.0 (no driver attached)

ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf7514000-0xf75143ff irq 16 at device 26.0 on pci0

usbus0: EHCI version 1.0

usbus0 on ehci0

pcib3: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0

pci3: <ACPI PCI bus> on pcib3

pcib4: <ACPI PCI-PCI bridge> at device 0.0 on pci3

pci4: <ACPI PCI bus> on pcib4

vgapci0: <VGA-compatible display> port 0xd000-0xd07f mem 0xf6000000-0xf6ffffff,0xf7000000-0xf701ffff irq 16 at device 0.0 on pci4

vgapci0: Boot video device

pcib5: <ACPI PCI-PCI bridge> irq 18 at device 28.2 on pci0

pci5: <ACPI PCI bus> on pcib5

igb0: <Intel(R) PRO/1000 Network Connection version - 2.4.0> port 0xc000-0xc01f mem 0xf7400000-0xf747ffff,0xf7480000-0xf7483fff irq 18 at device 0.0 on pci5

igb0: Using MSIX interrupts with 5 vectors

igb0: Ethernet address: 0c:c4:7a:30:08:1e

igb0: Bound queue 0 to cpu 0

igb0: Bound queue 1 to cpu 1

igb0: Bound queue 2 to cpu 2

igb0: Bound queue 3 to cpu 3

pcib6: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0

pci6: <ACPI PCI bus> on pcib6

igb1: <Intel(R) PRO/1000 Network Connection version - 2.4.0> port 0xb000-0xb01f mem 0xf7300000-0xf737ffff,0xf7380000-0xf7383fff irq 19 at device 0.0 on pci6

igb1: Using MSIX interrupts with 5 vectors

igb1: Ethernet address: 0c:c4:7a:30:08:1f

igb1: Bound queue 0 to cpu 4

igb1: Bound queue 1 to cpu 5

igb1: Bound queue 2 to cpu 6

igb1: Bound queue 3 to cpu 7

ehci1: <EHCI (generic) USB 2.0 controller> mem 0xf7513000-0xf75133ff irq 23 at device 29.0 on pci0

usbus1: EHCI version 1.0

usbus1 on ehci1

isab0: <PCI-ISA bridge> at device 31.0 on pci0

isa0: <ISA bus> on isab0

ahci0: <Intel Lynx Point AHCI SATA controller> port 0xf050-0xf057,0xf040-0xf043,0xf030-0xf037,0xf020-0xf023,0xf000-0xf01f mem 0xf7512000-0xf75127ff irq 19 at device 31.2 on pci0

ahci0: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported

ahcich0: <AHCI channel> at channel 0 on ahci0

ahcich1: <AHCI channel> at channel 1 on ahci0

ahcich2: <AHCI channel> at channel 2 on ahci0

ahcich3: <AHCI channel> at channel 3 on ahci0

ahcich4: <AHCI channel> at channel 4 on ahci0

ahcich5: <AHCI channel> at channel 5 on ahci0

acpi_button0: <Sleep Button> on acpi0

acpi_button1: <Power Button> on acpi0

acpi_tz0: <Thermal Zone> on acpi0

acpi_tz1: <Thermal Zone> on acpi0

uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0

uart1: <16550 or compatible> port 0x2f8-0x2ff irq 3 on acpi0

uart2: <16550 or compatible> port 0x3e8-0x3ef irq 7 on acpi0

orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xce000-0xcefff on isa0

sc0: <System console> at flags 0x100 on isa0

sc0: CGA <16 virtual consoles, flags=0x300>

vga0: <Generic ISA VGA> at port 0x3d0-0x3db iomem 0xb8000-0xbffff on isa0

wbwd0: DevID 0xc3 DevRev 0x33, will not attach, please report this.

coretemp0: <CPU On-Die Thermal Sensors> on cpu0

est0: <Enhanced SpeedStep Frequency Control> on cpu0

p4tcc0: <CPU Frequency Thermal Control> on cpu0

coretemp1: <CPU On-Die Thermal Sensors> on cpu1

est1: <Enhanced SpeedStep Frequency Control> on cpu1

p4tcc1: <CPU Frequency Thermal Control> on cpu1

coretemp2: <CPU On-Die Thermal Sensors> on cpu2

est2: <Enhanced SpeedStep Frequency Control> on cpu2

p4tcc2: <CPU Frequency Thermal Control> on cpu2

coretemp3: <CPU On-Die Thermal Sensors> on cpu3

est3: <Enhanced SpeedStep Frequency Control> on cpu3

p4tcc3: <CPU Frequency Thermal Control> on cpu3

coretemp4: <CPU On-Die Thermal Sensors> on cpu4

est4: <Enhanced SpeedStep Frequency Control> on cpu4

p4tcc4: <CPU Frequency Thermal Control> on cpu4

coretemp5: <CPU On-Die Thermal Sensors> on cpu5

est5: <Enhanced SpeedStep Frequency Control> on cpu5

p4tcc5: <CPU Frequency Thermal Control> on cpu5

coretemp6: <CPU On-Die Thermal Sensors> on cpu6

est6: <Enhanced SpeedStep Frequency Control> on cpu6

p4tcc6: <CPU Frequency Thermal Control> on cpu6

coretemp7: <CPU On-Die Thermal Sensors> on cpu7

est7: <Enhanced SpeedStep Frequency Control> on cpu7

p4tcc7: <CPU Frequency Thermal Control> on cpu7

ZFS filesystem version: 5

ZFS storage pool version: features support (5000)

Timecounters tick every 1.000 msec

ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled

usbus0: 480Mbps High Speed USB v2.0

usbus1: 480Mbps High Speed USB v2.0

ugen0.1: <Intel> at usbus0

uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus0

ugen1.1: <Intel> at usbus1

uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1

uhub0: 2 ports with 2 removable, self powered

uhub1: 2 ports with 2 removable, self powered

ugen0.2: <vendor 0x8087> at usbus0

uhub2: <vendor 0x8087 product 0x8008, class 9/0, rev 2.00/0.05, addr 2> on usbus0

ugen1.2: <vendor 0x8087> at usbus1

uhub3: <vendor 0x8087 product 0x8000, class 9/0, rev 2.00/0.05, addr 2> on usbus1

uhub2: 4 ports with 4 removable, self powered

uhub3: 6 ports with 6 removable, self powered

ugen0.3: <vendor 0x0000> at usbus0

uhub4: <vendor 0x0000 product 0x0001, class 9/0, rev 2.00/0.00, addr 3> on usbus0

uhub4: 4 ports with 3 removable, self powered

ugen0.4: <vendor 0x0557> at usbus0

ukbd0: <vendor 0x0557 product 0x2419, class 0/0, rev 1.10/1.00, addr 4> on usbus0

kbd0 at ukbd0

ums0: <vendor 0x0557 product 0x2419, class 0/0, rev 1.10/1.00, addr 4> on usbus0

ums0: 3 buttons and [Z] coordinates ID=0

ugen1.3: <SanDisk> at usbus1

umass0: <SanDisk Extreme, class 0/0, rev 2.10/0.10, addr 3> on usbus1

umass0:  SCSI over Bulk-Only; quirks = 0x0100

umass0:8:0:-1: Attached to scbus8

ugen1.4: <SanDisk> at usbus1

umass1: <SanDisk Extreme, class 0/0, rev 2.10/0.10, addr 4> on usbus1

umass1:  SCSI over Bulk-Only; quirks = 0x0100

umass1:9:1:-1: Attached to scbus9

SMP: AP CPU #1 Launched!

da0 at mps0 bus 0 scbus0 target 0 lun 0

da0: <ATA ST31000528AS CC38> Fixed Direct Access SCSI-6 device

da0: Serial Number             9VP8ND8J

da0: 300.000MB/s transfersSMP: AP CPU #2 Launched!



da0: Command Queueing enabled

da0: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

da3 at mps0 bus 0 scbus0 target 3 lun 0

SMP: AP CPU #6 Launched!

da3: <ATA WDC WD1001FALS-4 3D06> Fixed Direct Access SCSI-6 device

da3: Serial Number      WD-WCATR3358760

da3: 300.000MB/s transfers

da3: Command Queueing enabled

SMP: AP CPU #4 Launched!

da3: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

da4 at umass-sim0 bus 0 scbus8 target 0 lun 0

da4: <SanDisk Extreme 0001> Removable Direct Access SCSI-6 device

SMP: AP CPU #7 Launched!

da4: Serial Number AA011001140339330562

da4: 40.000MB/s transfers

da4: 30533MB (62533296 512 byte sectors: 255H 63S/T 3892C)

da4: quirks=0x2<NO_6_BYTE>

SMP: AP CPU #3 Launched!

da5 at umass-sim1 bus 1 scbus9 target 0 lun 0

da5: <SanDisk Extreme 0001> Removable Direct Access SCSI-6 device

da5: Serial Number AA011001142318180585

da5: 40.000MB/s transfersSMP: AP CPU #5 Launched!



da5: 30533MB (62533296 512 byte sectors: 255H 63S/T 3892C)

da5: quirks=0x2<NO_6_BYTE>

da1 at mps0 bus 0 scbus0 target 1 lun 0

da1: <ATA SAMSUNG HD103SJ 0001> Fixed Direct Access SCSI-6 device

da1: Serial Number S246J9GZ903228     

da1: 300.000MB/s transfers

da1: Command Queueing enabled

da1: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

da2 at mps0 bus 0 scbus0 target 2 lun 0

da2: <ATA SAMSUNG HD103SJ 0001> Fixed Direct Access SCSI-6 device

da2: Serial Number S246J9GZ903229     

da2: 300.000MB/s transfers

da2: Command Queueing enabled

da2: 953869MB (1953525168 512 byte sectors: 255H 63S/T 121601C)

Timecounter "TSC-low" frequency 1700036282 Hz quality 1000

GEOM: da5: the primary GPT table is corrupt or invalid.

GEOM: da5: using the secondary instead -- recovery strongly advised.

Trying to mount root from zfs:freenas-boot/ROOT/FreeNAS-9.3-STABLE-201501301837 []...

GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)

GEOM_ELI: Device gptid/23bdd134-aeb3-11e4-a05d-0cc47a30081e.eli created.

GEOM_ELI: Encryption: AES-XTS 128

GEOM_ELI:     Crypto: hardware

GEOM_ELI: Device gptid/24b2ae0e-aeb3-11e4-a05d-0cc47a30081e.eli created.

GEOM_ELI: Encryption: AES-XTS 128

GEOM_ELI:     Crypto: hardware

GEOM_ELI: Device gptid/257f7d30-aeb3-11e4-a05d-0cc47a30081e.eli created.

GEOM_ELI: Encryption: AES-XTS 128

GEOM_ELI:     Crypto: hardware

GEOM_ELI: Device gptid/26029b49-aeb3-11e4-a05d-0cc47a30081e.eli created.

GEOM_ELI: Encryption: AES-XTS 128

GEOM_ELI:     Crypto: hardware

ipmi0: <IPMI System Interface> port 0xca2,0xca3 on acpi0

ipmi0: KCS mode found at io 0xca2 on acpi

ipmi0: IPMI device rev. 1, firmware rev. 1.42, version 2.0

ipmi0: Number of channels 2

ipmi0: Attached watchdog

ipmi1: <IPMI System Interface> on isa0

device_attach: ipmi1 attach returned 16

wbwd0: DevID 0xc3 DevRev 0x33, will not attach, please report this.

wbwd0: HEFRAS and EFER do not align: EFER 0x4e DevID 0x00 DevRev 0x00 CR26 0x00

GEOM_ELI: Device da0p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:     Crypto: hardware

GEOM_ELI: Device da1p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:     Crypto: hardware

GEOM_ELI: Device da2p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:     Crypto: hardware

GEOM_ELI: Device da3p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:     Crypto: hardware

vboxdrv: fAsync=0 offMin=0x2ca offMax=0xf62
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I suggest you bring the issue to Seagate's attention and see if they have anything to say.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I suggest you bring the issue to Seagate's attention and see if they have anything to say.
+1

If they're the same company they were when I gave them up in 2008, you'll hit a dead end and basically be stuck with unreliable drives that you can't do jack squat with. Good luck!
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
I have run further tests using the 1TB drives I borrowed from a friend:

1.Setup a new manual RAIDZ2 with 2x4TB WD RED's + 1x4TB Seagate NAS HDD + 1x1TB WD.
Copied 130 GB of data to it, ran a scrub - finished normally.

2.Shutdown, swapped out 1x4TB WD RED for my other 4TB Seagate NAS HDD.
Boot up-resilver the pool-scrub -> same error occurred, 1x4TB WD and 1x4TB Seagate dropped from array. 1x1TB WD and 1x4TB WD RED remain online

3.Shutdown, removed the Seagate added in 2. and replaced with the WD RED so setup was the same as 1.
Bootup-resilver the pool-scrub -> same error occurred, 2x4TB WD and 1x4TB WD dropped from array. 1x4TB Seagate remains online

4.Shutdown, removed the remaining 1x4TB Seagate and added a 1x1TB Samsung, so no Seagate drives in the pool.
Bootup-resilver -> error occurs during resilver, no HDD's dropped from array. All HDD's remain online.
Testing ongoing: however do the above results rule out the Seagate drives as causing the LSI2308 to reset?
I have emailed SuperMicro to ask for their opinion.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
I have also not updated my FreeNAS 9.3 RELEASE since the error occurred, in order to try determine the cause of the problem.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
5. Setup as 4. After copying a further 75 GB to the pool and running a scrub -> same error occurs, All disks except 1TB Samsung are dropped from the pool.
6.Shutdown, after reboot, error occurs again but no disks are dropped from the pool. I assume because I/O wasn't occurring to the pool so ZFS didn't drop the disks due to read errors.

The conclusion is that I think this is an issue with the LSI2308
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sounds like a reasonable conclusion to me.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
Good. What do you think is the best way to contact SuperMicro support? I have emailed support@supermicro.com, as that was the only address I could find. Maybe they will get back to me soon, but they havent yet (24 hours).
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Good. What do you think is the best way to contact SuperMicro support? I have emailed support@supermicro.com, as that was the only address I could find. Maybe they will get back to me soon, but they havent yet (24 hours).

In Europe, at least, they have a form that does pretty much the same as the email, but helps you to not forget details that might help.
 

JayG30

Contributor
Joined
Jun 26, 2013
Messages
158
I'm glad to hear it isn't the seagate NAS drives. Seems like some people have had bad experiences with Seagate, but personally I've had far more success with them than anything from WD. I use the same drives in a setup for 2 years without a single issue so can't say I'm surprised.

Consider talking to the reseller that sold you the parts perhaps? No idea where you are or who you purchased through but places like Newegg will RMA the board for you without much hassle.

So you have no backplane in your setup (what case)?
And you don't think it might be PSU related? I didn't see what you are using mentioned. Perhaps under enough load could cause instability and disks to drop?
 

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
Just a thought but several people have reported the heat sink on the LSI to be very hot and some have also mentioned that it is "wiggly". Have you tried blowing some air over the heat sink on the LSI? Maybe the heat sink isn't bonded very well and the die is overheating. Mind you you'd think that if this was the case it would have surfaced during the HDD tests assuming they were run concurrently.

I'm watching this one as I'm planning on running mixed vendor with 3TB drives as well.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
JayG30: aw really, well I'm glad some people have had good experiences with both Seagate and WD!
Yes, maybe I should try that, I bought the motherboard from Amazon.com, however I live in New Zealand so its a fair effort to send it back. And SuperMicro directly have not been very useful so far, they are telling me to run the v19 driver and see if that helps, So I might see if I can get that from FreeBSD, add it to FreeNAS and update the firmware to v19 to match?

Is it worth a go to try to prove to SuperMicro that it isn't driver related, assuming it has the same error with v19 driver and firmware?

No I don't have a backplane, just SATA cable from the board to my drives, my case is the Fractual Define R4.
Hmm, possibly, I have the Seasonic G-450, I think its unlikely however there is still the possibility I guess.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
GrumpyBear: Yes I have read that too, no I haven't tried determining if its wiggly or running air directly over it. Thats true, and they were run concurrently, so that is another option to look into. I will try a few more software fix attempts first, but will keep this in mind :smile:
I really want to get this sorted, its now been nearly two months and I can't setup my system as final and put my data on it! very frustrating
 

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
My disks survived a week and a half of torture so I've loaded some data on then and am doing some more testing as I go. I've 4 3TB WD Reds and 4 3TB Seagate NAS disks all hooked up to the 2308. I even have the same CPU and amount of RAM but I'm using the low voltage DIMMs from Crucial. I am not running encrypted though.

I'll force a manual scrub later today and see if anything shakes loose.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Got any cables over 1m in length? Who knows, the 2308 might have trouble dealing with excessively long SATA cables.
 

mjt5282

Contributor
Joined
Mar 19, 2013
Messages
139
what BIOS version are you running? I am running the same motherboard and have kernel panic() issues about 50% of the time after reboot with one of the IBM M1015 HBA's that I flashed to the driver 16 IT mode (after flashing to the 2.0 version).
 

GrumpyBear

Contributor
Joined
Jan 28, 2015
Messages
141
... I'll force a manual scrub later today and see if anything shakes loose.
Ran a manual scrub using "zpool scrub <Pool>" from the command line. No issues reported.
I'm using the same case as nas2160 (Fractal Define R4) and all 52cm cables from SuperMicro (had some leftover from my workstation).

I'll run more scrubs over the course of the day today and post if there are any issues.

Note that I'm running a Corsair HX650 power supply whereas you have the Seasonic G450. The Seasonics are very well regarded here and I highly doubt it is even breaking a sweat. I'm consuming 75-100W during a scrub and the peak start up wattage measured was only 185W.
 

nas2160

Dabbler
Joined
Feb 1, 2015
Messages
32
GrumpyBear: That sounds like a good test of the LSI2308. Its good you have very similar hardware.
I think my sata cables are the same as GrumpyBear has used, and are 52 cm.
I'm running BIOS version 2.0 from some time in 2014. (will check on next boot) is there a new one?
Wow that isn't very good! weird
 
Status
Not open for further replies.
Top