Struggling to replace failed disk – Can Not Offline Disk

Status
Not open for further replies.

nhmilleraz

Cadet
Joined
Aug 24, 2015
Messages
5
Struggling to replace failed disk – Can Not Offline Disk


I have recently inherited an environment that includes a FreeNAS system and am having some challenges resolving an issue with a disk that is pending failure. I’m an old network guy (WAN) with some ancient unix background (think SunOS era stuff) so please be gentle if I’ve missed stupid stuff. I’ve searched this forum and other web sources and tried the routinely recommended actions with no luck.


Basic System Information (copied from the System > System Information screen)… The system is an HP DL180-G6 with 12 2TB drives / 8gb of memory and… well – you see below and in dmesg…

Code:
Hostname  NAME_REPLACED.local
Build  FreeNAS-9.2.1.7-RELEASE-x64 (fdbe9a0)
Platform  Intel(R) Xeon(R) CPU X5560 @ 2.80GHz
Memory  8162MB
System Time  Mon Aug 24 07:09:35 MST 2015
Uptime  7:09AM up 1:22, 0 users
Load Average 0.40, 0.67, 0.71 FreeNAS Version: FreeNAS-9.2.1.7-RELEASE-x64 (fdbe9a0)


So – the details of the problem:

The disk in question appears healthy to FreeNAS but SMART is routinely presenting the following error:

Code:
Device: /dev/ciss0 [cciss_disk_05] [SCSI], SMART Failure: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE


I have attempted to follow the instructions for replacing the disk which I find here: http://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive

When I attempt to off-line the drive in question I get the following message:

Code:
NAME_REPLACED manage.py: [middleware.exceptions:38] [MiddlewareError: Disk offline failed: "cannot offline gptid/9a193a6d-294e-11e4-a834-f4ce46829492: no valid replicas, "]


Numerous sources suggest running a scrub and retrying – which I have done without any change in behavior (I’ve also scheduled a weekly scrub as there was nothing of that nature in place).

I finish the scrub and get the same message.

So – two primary questions:

1) I am unable to offline the offending disk to replace it…

2) I suspect that this may be a function of how the file system was mapped to the disks and that the disks appear to be set up as RAID 0 devices as opposed to JBOD – but I’m not sure.

The system in question is an HP DL180 G6 with 12 X 2TB SAS disks. The disk in question reports 46 unresolved errors. I’ve purchased a replacement disk as a precaution.

The system in question contains ALL of the businesses critical data (including years of work product). I have completed a very simple “backup” of data (duplicated the file sets in their most critical shares to a separate and now off-line disk, so losing the file system would not result in huge data loss – but it would seriously impact productivity until I get things back up. Given that the errors on the disk appear to be minimal (at the moment) I’m moving cautiously regarding changes.

Advice on how to proceed?

Thanks in advance!

ADDITIONAL INFORMATION

Dmesg

Code:
[root@NAME_REPLACED] ~# dmesg

Copyright (c) 1992-2013 The FreeBSD Project.

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994

  The Regents of the University of California. All rights reserved.

FreeBSD is a registered trademark of The FreeBSD Foundation.

FreeBSD 9.2-RELEASE-p10 #0 r262572+4fb5adc: Wed Aug  6 17:07:16 PDT 2014

  root@build3.ixsystems.com:/fusion/jkh/921/freenas/os-base/amd64/fusion/jkh/921/freenas/FreeBSD/src/sys/FREENAS.amd64 amd64

gcc version 4.2.1 20070831 patched [FreeBSD]

CPU: Intel(R) Xeon(R) CPU  X5560  @ 2.80GHz (2800.16-MHz K8-class CPU)

  Origin = "GenuineIntel"  Id = 0x106a5  Family = 0x6  Model = 0x1a  Stepping = 5

  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>

  Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,SSE4.1,SSE4.2,POPCNT>

  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>

  AMD Features2=0x1<LAHF>

  TSC: P-state invariant, performance statistics

real memory  = 8589934592 (8192 MB)

avail memory = 8218636288 (7837 MB)

Event timer "LAPIC" quality 400

ACPI APIC Table: <HP  ProLiant>

FreeBSD/SMP: Multiprocessor System Detected: 16 CPUs

FreeBSD/SMP: 2 package(s) x 4 core(s) x 2 SMT threads

 cpu0 (BSP): APIC ID:  0

 cpu1 (AP): APIC ID:  1

 cpu2 (AP): APIC ID:  2

 cpu3 (AP): APIC ID:  3

 cpu4 (AP): APIC ID:  4

 cpu5 (AP): APIC ID:  5

 cpu6 (AP): APIC ID:  6

 cpu7 (AP): APIC ID:  7

 cpu8 (AP): APIC ID: 16

 cpu9 (AP): APIC ID: 17

 cpu10 (AP): APIC ID: 18

 cpu11 (AP): APIC ID: 19

 cpu12 (AP): APIC ID: 20

 cpu13 (AP): APIC ID: 21

 cpu14 (AP): APIC ID: 22

 cpu15 (AP): APIC ID: 23

WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.

ioapic0 <Version 2.0> irqs 0-23 on motherboard

ioapic1 <Version 2.0> irqs 24-47 on motherboard

kbd1 at kbdmux0

cryptosoft0: <software crypto> on motherboard

aesni0: No AESNI support.

padlock0: No ACE support.

acpi0: <HP ProLiant> on motherboard

acpi0: Power Button (fixed)

acpi0: reservation of 0, a0000 (3) failed

acpi0: reservation of 100000, bff00000 (3) failed

cpu0: <ACPI CPU> on acpi0

cpu1: <ACPI CPU> on acpi0

cpu2: <ACPI CPU> on acpi0

cpu3: <ACPI CPU> on acpi0

cpu4: <ACPI CPU> on acpi0

cpu5: <ACPI CPU> on acpi0

cpu6: <ACPI CPU> on acpi0

cpu7: <ACPI CPU> on acpi0

cpu8: <ACPI CPU> on acpi0

cpu9: <ACPI CPU> on acpi0

cpu10: <ACPI CPU> on acpi0

cpu11: <ACPI CPU> on acpi0

cpu12: <ACPI CPU> on acpi0

cpu13: <ACPI CPU> on acpi0

cpu14: <ACPI CPU> on acpi0

cpu15: <ACPI CPU> on acpi0

attimer0: <AT timer> port 0x40-0x43 irq 0 on acpi0

Timecounter "i8254" frequency 1193182 Hz quality 0

Event timer "i8254" frequency 1193182 Hz quality 100

atrtc0: <AT realtime clock> port 0x70-0x71 irq 8 on acpi0

Event timer "RTC" frequency 32768 Hz quality 0

hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0

Timecounter "HPET" frequency 14318180 Hz quality 950

Event timer "HPET" frequency 14318180 Hz quality 450

Event timer "HPET1" frequency 14318180 Hz quality 440

Event timer "HPET2" frequency 14318180 Hz quality 440

Event timer "HPET3" frequency 14318180 Hz quality 440

Timecounter "ACPI-fast" frequency 3579545 Hz quality 900

acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0

pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0

pci0: <ACPI PCI bus> on pcib0

pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0

pci7: <ACPI PCI bus> on pcib1

igb0: <Intel(R) PRO/1000 Network Connection version - 2.4.0> port 0xe880-0xe89f mem 0xfbe60000-0xfbe7ffff,0xfbe40000-0xfbe5ffff,0xfbeb8000-0xfbebbfff irq 28 at device 0.0 on pci7

igb0: Using MSIX interrupts with 9 vectors

igb0: Ethernet address: f4:ce:46:82:94:92

igb0: Bound queue 0 to cpu 0

igb0: Bound queue 1 to cpu 1

igb0: Bound queue 2 to cpu 2

igb0: Bound queue 3 to cpu 3

igb0: Bound queue 4 to cpu 4

igb0: Bound queue 5 to cpu 5

igb0: Bound queue 6 to cpu 6

igb0: Bound queue 7 to cpu 7

igb1: <Intel(R) PRO/1000 Network Connection version - 2.4.0> port 0xec00-0xec1f mem 0xfbee0000-0xfbefffff,0xfbec0000-0xfbedffff,0xfbebc000-0xfbebffff irq 40 at device 0.1 on pci7

igb1: Using MSIX interrupts with 9 vectors

igb1: Ethernet address: f4:ce:46:82:94:93

igb1: Bound queue 0 to cpu 8

igb1: Bound queue 1 to cpu 9

igb1: Bound queue 2 to cpu 10

igb1: Bound queue 3 to cpu 11

igb1: Bound queue 4 to cpu 12

igb1: Bound queue 5 to cpu 13

igb1: Bound queue 6 to cpu 14

igb1: Bound queue 7 to cpu 15

pcib2: <ACPI PCI-PCI bridge> at device 3.0 on pci0

pci6: <ACPI PCI bus> on pcib2

ciss0: <HP Smart Array P212> port 0xd800-0xd8ff mem 0xfbc00000-0xfbdfffff,0xfbbff000-0xfbbfffff irq 24 at device 0.0 on pci6

ciss0: PERFORMANT Transport

pcib3: <ACPI PCI-PCI bridge> at device 7.0 on pci0

pci5: <ACPI PCI bus> on pcib3

pcib4: <ACPI PCI-PCI bridge> at device 9.0 on pci0

pci4: <ACPI PCI bus> on pcib4

pci0: <base peripheral, interrupt controller> at device 20.0 (no driver attached)

pci0: <base peripheral, interrupt controller> at device 20.1 (no driver attached)

pci0: <base peripheral, interrupt controller> at device 20.2 (no driver attached)

uhci0: <Intel 82801JI (ICH10) USB controller USB-D> port 0xb800-0xb81f irq 16 at device 26.0 on pci0

uhci0: LegSup = 0x2400

usbus0 on uhci0

ehci0: <Intel 82801JI (ICH10) USB 2.0 controller USB-B> mem 0xfaff8000-0xfaff83ff irq 18 at device 26.7 on pci0

usbus1: EHCI version 1.0

usbus1 on ehci0

pcib5: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0

pci3: <ACPI PCI bus> on pcib5

pcib6: <ACPI PCI-PCI bridge> irq 16 at device 28.4 on pci0

pci2: <ACPI PCI bus> on pcib6

vgapci0: <VGA-compatible display> mem 0xf8000000-0xf8ffffff,0xfbafc000-0xfbafffff,0xfb000000-0xfb7fffff irq 16 at device 0.0 on pci2

uhci1: <Intel 82801JI (ICH10) USB controller USB-A> port 0xb880-0xb89f irq 23 at device 29.0 on pci0

uhci1: LegSup = 0x2400

usbus2 on uhci1

uhci2: <Intel 82801JI (ICH10) USB controller USB-B> port 0xbc00-0xbc1f irq 19 at device 29.1 on pci0

uhci2: LegSup = 0x2400

usbus3 on uhci2

uhci3: <Intel 82801JI (ICH10) USB controller USB-C> port 0xc000-0xc01f irq 18 at device 29.2 on pci0

uhci3: LegSup = 0x2400

usbus4 on uhci3

ehci1: <Intel 82801JI (ICH10) USB 2.0 controller USB-A> mem 0xfaffa000-0xfaffa3ff irq 23 at device 29.7 on pci0

usbus5: EHCI version 1.0

usbus5 on ehci1

pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0

pci1: <ACPI PCI bus> on pcib7

isab0: <PCI-ISA bridge> at device 31.0 on pci0

isa0: <ISA bus> on isab0

ahci0: <Intel ICH10 AHCI SATA controller> port 0xc880-0xc887,0xc800-0xc803,0xc480-0xc487,0xc400-0xc403,0xc080-0xc09f mem 0xfaffc000-0xfaffc7ff irq 19 at device 31.2 on pci0

ahci0: AHCI v1.20 with 6 3Gbps ports, Port Multiplier not supported

ahcich0: <AHCI channel> at channel 0 on ahci0

ahcich1: <AHCI channel> at channel 1 on ahci0

ahcich2: <AHCI channel> at channel 2 on ahci0

ahcich3: <AHCI channel> at channel 3 on ahci0

ahcich4: <AHCI channel> at channel 4 on ahci0

ahcich5: <AHCI channel> at channel 5 on ahci0

acpi_button0: <Power Button> on acpi0

uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0

qpi0: <QPI system bus> on motherboard

pcib8: <QPI Host-PCI bridge> pcibus 255 on qpi0

pci255: <PCI bus> on pcib8

pcib9: <QPI Host-PCI bridge> pcibus 254 on qpi0

pci254: <PCI bus> on pcib9

ichwd0 on isa0

ichwd0: ICH WDT present but disabled in BIOS or hardware

device_attach: ichwd0 attach returned 6

ichwd0 at port 0x830-0x837,0x860-0x87f on isa0

ichwd0: ICH WDT present but disabled in BIOS or hardware

device_attach: ichwd0 attach returned 6

orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc8fff,0xc9000-0xccfff on isa0

sc0: <System console> at flags 0x100 on isa0

sc0: VGA <16 virtual consoles, flags=0x300>

vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0

atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0

atkbd0: <AT Keyboard> irq 1 on atkbdc0

kbd0 at atkbd0

atkbd0: [GIANT-LOCKED]

wbwd0: HEFRAS and EFER do not align: EFER 0x2e DevID 0xff DevRev 0xff CR26 0xff

coretemp0: <CPU On-Die Thermal Sensors> on cpu0

est0: <Enhanced SpeedStep Frequency Control> on cpu0

p4tcc0: <CPU Frequency Thermal Control> on cpu0

coretemp1: <CPU On-Die Thermal Sensors> on cpu1

est1: <Enhanced SpeedStep Frequency Control> on cpu1

p4tcc1: <CPU Frequency Thermal Control> on cpu1

coretemp2: <CPU On-Die Thermal Sensors> on cpu2

est2: <Enhanced SpeedStep Frequency Control> on cpu2

p4tcc2: <CPU Frequency Thermal Control> on cpu2

coretemp3: <CPU On-Die Thermal Sensors> on cpu3

est3: <Enhanced SpeedStep Frequency Control> on cpu3

p4tcc3: <CPU Frequency Thermal Control> on cpu3

coretemp4: <CPU On-Die Thermal Sensors> on cpu4

est4: <Enhanced SpeedStep Frequency Control> on cpu4

p4tcc4: <CPU Frequency Thermal Control> on cpu4

coretemp5: <CPU On-Die Thermal Sensors> on cpu5

est5: <Enhanced SpeedStep Frequency Control> on cpu5

p4tcc5: <CPU Frequency Thermal Control> on cpu5

coretemp6: <CPU On-Die Thermal Sensors> on cpu6

est6: <Enhanced SpeedStep Frequency Control> on cpu6

p4tcc6: <CPU Frequency Thermal Control> on cpu6

coretemp7: <CPU On-Die Thermal Sensors> on cpu7

est7: <Enhanced SpeedStep Frequency Control> on cpu7

p4tcc7: <CPU Frequency Thermal Control> on cpu7

coretemp8: <CPU On-Die Thermal Sensors> on cpu8

est8: <Enhanced SpeedStep Frequency Control> on cpu8

p4tcc8: <CPU Frequency Thermal Control> on cpu8

coretemp9: <CPU On-Die Thermal Sensors> on cpu9

est9: <Enhanced SpeedStep Frequency Control> on cpu9

p4tcc9: <CPU Frequency Thermal Control> on cpu9

coretemp10: <CPU On-Die Thermal Sensors> on cpu10

est10: <Enhanced SpeedStep Frequency Control> on cpu10

p4tcc10: <CPU Frequency Thermal Control> on cpu10

coretemp11: <CPU On-Die Thermal Sensors> on cpu11

est11: <Enhanced SpeedStep Frequency Control> on cpu11

p4tcc11: <CPU Frequency Thermal Control> on cpu11

coretemp12: <CPU On-Die Thermal Sensors> on cpu12

est12: <Enhanced SpeedStep Frequency Control> on cpu12

p4tcc12: <CPU Frequency Thermal Control> on cpu12

coretemp13: <CPU On-Die Thermal Sensors> on cpu13

est13: <Enhanced SpeedStep Frequency Control> on cpu13

p4tcc13: <CPU Frequency Thermal Control> on cpu13

coretemp14: <CPU On-Die Thermal Sensors> on cpu14

est14: <Enhanced SpeedStep Frequency Control> on cpu14

p4tcc14: <CPU Frequency Thermal Control> on cpu14

coretemp15: <CPU On-Die Thermal Sensors> on cpu15

est15: <Enhanced SpeedStep Frequency Control> on cpu15

p4tcc15: <CPU Frequency Thermal Control> on cpu15

Timecounters tick every 1.000 msec

ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to accept, logging disabled

DUMMYNET 0xfffffe0002fa1080 with IPv6 initialized (100409)

load_dn_sched dn_sched RR loaded

load_dn_sched dn_sched WF2Q+ loaded

load_dn_sched dn_sched FIFO loaded

load_dn_sched dn_sched PRIO loaded

load_dn_sched dn_sched QFQ loaded

usbus0: 12Mbps Full Speed USB v1.0

usbus1: 480Mbps High Speed USB v2.0

usbus2: 12Mbps Full Speed USB v1.0

usbus3: 12Mbps Full Speed USB v1.0

usbus4: 12Mbps Full Speed USB v1.0

usbus5: 480Mbps High Speed USB v2.0

ugen0.1: <Intel> at usbus0

uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0

ugen1.1: <Intel> at usbus1

uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1

ugen2.1: <Intel> at usbus2

uhub2: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2

ugen3.1: <Intel> at usbus3

uhub3: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus3

ugen4.1: <Intel> at usbus4

uhub4: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus4

ugen5.1: <Intel> at usbus5

uhub5: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus5

da0 at ciss0 bus 0 scbus0 target 0 lun 0

da0: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da0: Serial Number PACCP9SXU6IM

da0: 135.168MB/s transfers

da0: Command Queueing enabled

da0: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da0: quirks=0x1<NO_SYNC_CACHE>

da1 at ciss0 bus 0 scbus0 target 1 lun 0

da1: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da1: Serial Number PACCP9SXU6IM

da1: 135.168MB/s transfers

da1: Command Queueing enabled

da1: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da1: quirks=0x1<NO_SYNC_CACHE>

da2 at ciss0 bus 0 scbus0 target 2 lun 0

da2: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da2: Serial Number PACCP9SXU6IM

da2: 135.168MB/s transfers

da2: Command Queueing enabled

da2: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da2: quirks=0x1<NO_SYNC_CACHE>

da3 at ciss0 bus 0 scbus0 target 3 lun 0

da3: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da3: Serial Number PACCP9SXU6IM

da3: 135.168MB/s transfers

da3: Command Queueing enabled

da3: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da3: quirks=0x1<NO_SYNC_CACHE>

da4 at ciss0 bus 0 scbus0 target 4 lun 0

da4: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da4: Serial Number PACCP9SXU6IM

da4: 135.168MB/s transfers

da4: Command Queueing enabled

da4: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da4: quirks=0x1<NO_SYNC_CACHE>

da5 at ciss0 bus 0 scbus0 target 5 lun 0

da5: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da5: Serial Number PACCP9SXU6IM

da5: 135.168MB/s transfers

da5: Command Queueing enabled

da5: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da5: quirks=0x1<NO_SYNC_CACHE>

da6 at ciss0 bus 0 scbus0 target 6 lun 0

da6: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da6: Serial Number PACCP9SXU6IM

da6: 135.168MB/s transfers

da6: Command Queueing enabled

da6: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da6: quirks=0x1<NO_SYNC_CACHE>

da7 at ciss0 bus 0 scbus0 target 7 lun 0

da7: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da7: Serial Number PACCP9SXU6IM

da7: 135.168MB/s transfers

da7: Command Queueing enabled

da7: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da7: quirks=0x1<NO_SYNC_CACHE>

da8 at ciss0 bus 0 scbus0 target 8 lun 0

da8: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da8: Serial Number PACCP9SXU6IM

da8: 135.168MB/s transfers

da8: Command Queueing enabled

da8: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da8: quirks=0x1<NO_SYNC_CACHE>

da9 at ciss0 bus 0 scbus0 target 9 lun 0

da9: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da9: Serial Number PACCP9SXU6IM

da9: 135.168MB/s transfers

da9: Command Queueing enabled

da9: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da9: quirks=0x1<NO_SYNC_CACHE>

da10 at ciss0 bus 0 scbus0 target 10 lun 0

da10: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da10: Serial Number PACCP9SXU6IM

da10: 135.168MB/s transfers

da10: Command Queueing enabled

da10: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da10: quirks=0x1<NO_SYNC_CACHE>

da11 at ciss0 bus 0 scbus0 target 11 lun 0

da11: <COMPAQ RAID 0 OK> Fixed Direct Access SCSI-5 device

da11: Serial Number PACCP9SXU6IM

da11: 135.168MB/s transfers

da11: Command Queueing enabled

da11: 1907697MB (3906963632 512 byte sectors: 255H 32S/T 65535C)

da11: quirks=0x1<NO_SYNC_CACHE>

SMP: AP CPU #1 Launched!

SMP: AP CPU #9 Launched!

SMP: AP CPU #4 Launched!

SMP: AP CPU #13 Launched!

SMP: AP CPU #6 Launched!

SMP: AP CPU #15 Launched!

SMP: AP CPU #7 Launched!

SMP: AP CPU #10 Launched!

SMP: AP CPU #3 Launched!

SMP: AP CPU #11 Launched!

SMP: AP CPU #5 Launched!

SMP: AP CPU #8 Launched!

SMP: AP CPU #2 Launched!

SMP: AP CPU #14 Launched!

SMP: AP CPU #12 Launched!

Timecounter "TSC-low" frequency 1400077924 Hz quality 1000

uhub0: 2 ports with 2 removable, self powered

uhub2: 2 ports with 2 removable, self powered

uhub3: 2 ports with 2 removable, self powered

uhub4: 2 ports with 2 removable, self powered

uhub1: 2 ports with 2 removable, self powered

uhub5: 6 ports with 6 removable, self powered

ugen5.2: <Generic> at usbus5

umass0: <Generic Mass Storage, class 0/0, rev 2.00/1.05, addr 2> on usbus5

umass0:  SCSI over Bulk-Only; quirks = 0x4101

umass0:9:0:-1: Attached to scbus9

da12 at umass-sim0 bus 0 scbus9 target 0 lun 0

da12: <Generic Flash Disk 8.07> Removable Direct Access SCSI-2 device

da12: Serial Number 9A73589E

da12: 40.000MB/s transfers

da12: 2033MB (4164608 512 byte sectors: 255H 63S/T 259C)

da12: quirks=0x2<NO_6_BYTE>

Trying to mount root from ufs:/dev/ufs/FreeNASs1a [ro]...

WARNING: /data was not properly dismounted

GEOM_RAID5: Module loaded, version 1.1.20130907.44 (rev 5c6d2a159411)

ZFS filesystem version: 5

ZFS storage pool version: features support (5000)

GEOM_ELI: Device da0p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da1p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da2p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da3p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da4p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da5p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da6p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da7p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da8p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da9p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da10p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

GEOM_ELI: Device da11p1.eli created.

GEOM_ELI: Encryption: AES-XTS 256

GEOM_ELI:  Crypto: software

vboxdrv: fAsync=0 offMin=0x248 offMax=0x4a5e

bridge0: Ethernet address: 02:9b:8c:26:45:00

igb0: promiscuous mode enabled

bridge0: link state changed to UP

epair0a: Ethernet address: 02:cc:5b:00:0c:0a

epair0b: Ethernet address: 02:cc:5b:00:0d:0b

epair0a: link state changed to UP

epair0b: link state changed to UP

epair0a: promiscuous mode enabled

igb0: link state changed to DOWN

ng_ether_ifnet_arrival_event: can't re-name node epair0b

igb0: link state changed to UP

epair1a: Ethernet address: 02:c3:5b:00:0d:0a

epair1b: Ethernet address: 02:c3:5b:00:0e:0b

epair1a: link state changed to UP

epair1b: link state changed to UP

epair1a: promiscuous mode enabled

ng_ether_ifnet_arrival_event: can't re-name node epair1b

epair2a: Ethernet address: 02:78:60:00:0e:0a

epair2b: Ethernet address: 02:78:60:00:0f:0b

epair2a: link state changed to UP

epair2b: link state changed to UP

epair2a: promiscuous mode enabled

ng_ether_ifnet_arrival_event: can't re-name node epair2b

GEOM_ELI: Device da5p1.eli destroyed.

GEOM_ELI: Detached da5p1.eli on last close.

arp: 192.168.0.17 moved from 02:78:60:00:0e:0a to f4:ce:46:82:94:92 on epair2b

arp: 192.168.0.17 moved from 02:c3:5b:00:0d:0a to f4:ce:46:82:94:92 on epair1b

arp: 192.168.0.17 moved from 02:cc:5b:00:0c:0a to f4:ce:46:82:94:92 on epair0b 


smartctl For the drive in question:

Code:
[root@NAME_REPLACED] ~# smartctl -a --device=cciss,5 /dev/ciss0

smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p10 amd64] (local build)

Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org


/dev/ciss0 [cciss_disk_05] [SCSI]: Device open changed type from 'sat,auto+cciss' to 'cciss'

=== START OF INFORMATION SECTION ===

Vendor:  SEAGATE

Product:  ST32000SSSUN2.0T

Revision:  0514

User Capacity:  2,000,398,934,016 bytes [2.00 TB]

Logical block size:  512 bytes

Rotation Rate:  7202 rpm

Device type:  disk

Transport protocol:  SAS

Local Time is:  Mon Aug 24 06:59:54 2015 MST

SMART support is:  Available - device has SMART capability.

SMART support is:  Enabled

Temperature Warning:  Enabled


=== START OF READ SMART DATA SECTION ===

SMART Health Status: HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE [asc=5d, ascq=10]


Current Drive Temperature:  38 C

Drive Trip Temperature:  68 C


Manufactured in week 42 of year 2010

Specified cycle count over device lifetime:  10000

Accumulated start-stop cycles:  61

Specified load-unload count over device lifetime:  300000

Accumulated load-unload cycles:  61

Elements in grown defect list: 219


Vendor (Seagate) cache information

  Blocks sent to initiator = 417260609

  Blocks received from initiator = 3296184888

  Blocks read from cache and sent to initiator = 1808591969

  Number of read and write commands whose size <= segment size = 54421900

  Number of read and write commands whose size > segment size = 0


Vendor (Seagate/Hitachi) factory information

  number of hours powered up = 10926.33

  number of minutes until next internal SMART test = 4


Error counter log:

  Errors Corrected by  Total  Correction  Gigabytes  Total

  ECC  rereads/  errors  algorithm  processed  uncorrected

  fast | delayed  rewrites  corrected  invocations  [10^9 bytes]  errors

read:  624192224  48  0  624192272  624192546  16315.393  274

write:  0  0  0  0  0  6113.069  0

verify:  4482  0  0   4482  4482  0.078  0


Non-medium error count:  27


SMART Self-test log

Num  Test  Status  segment  LifeTime  LBA_first_err [SK ASC ASQ]

  Description  number  (hours)

# 1  Background long  Completed  -  10924  - [-  -  -]

# 2  Background long  Completed  -  10912  - [-  -  -]

# 3  Background long  Completed  -  10900  - [-  -  -]

# 4  Background long  Completed  -  10888  - [-  -  -]

# 5  Background long  Completed  -  10876  - [-  -   -]

# 6  Background long  Completed  -  10869  - [-  -  -]

# 7  Background long  Completed  -  10852  - [-  -  -]

# 8  Background long  Completed  -  10840   - [-  -  -]

# 9  Background long  Completed  -  10828  - [-  -  -]

#10  Background long  Completed  -  10817  - [-  -  -]

#11  Background long  Completed   -  10804  - [-  -  -]

#12  Background long  Completed  -  10793  - [-  -  -]

#13  Background long  Completed  -  10780  - [-  -  -]

#14  Background long  Completed  -  10768  - [-  -  -]

#15  Background long  Completed  -  10756  - [-  -  -]

#16  Background long  Completed  -  10744  - [-  -  -]

#17  Background long  Completed  -  10732  - [-  -  -]

#18  Background long  Completed  -  10720  - [-  -  -]

#19  Background long  Completed  -  10708  - [-  -  -]

#20  Background long  Completed  -  10696  - [-  -  -]

Long (extended) Self Test duration: 18500 seconds [308.3 minutes]

 
Last edited:
D

dlavigne

Guest
"no valid replicas" means that there isn't any redundancy. What does zpool status say?
 

nhmilleraz

Cadet
Joined
Aug 24, 2015
Messages
5
The output of zpool status -v is as follows:

Code:
[root@NAME_CHANGED] ~# zpool status -v

  pool: NAME_CHANGEDNAS

 state: ONLINE

  scan: scrub in progress since Mon Aug 24 06:13:44 2015

  5.33T scanned out of 5.67T at 489M/s, 0h11m to go

  0 repaired, 94.12% done

config:


  NAME  STATE  READ WRITE CKSUM

  NAME_CHANGEDNAS  ONLINE  0  0  0

  gptid/95d8cdf8-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/96b4f82a-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/978bc9cf-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9863f4fb-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9946c3b3-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9a193a6d-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9aea8dbc-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9bb9795f-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9c8dea95-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9d5c05fe-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9e30a396-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0

  gptid/9f218532-294e-11e4-a834-f4ce46829492  ONLINE  0  0  0


errors: No known data errors
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Your pool is a 12-disk stripe with absolutely no redundancy. When that disk fails, your data will disappear. If you have a way to connect the replacement disk to your machine without removing the failed disk, you can do the disk replacement without removing the failing disk first, which will solve your immediate problem. The replacement process will copy the data from the failing disk to the replacement, and should offline the failing disk when it's finished. To solve your pool configuration problem, you'll need to back up your data, destroy the pool, and recreate it with redundancy.

I'd think you should also have more RAM for 24 TB of storage.
 

nhmilleraz

Cadet
Joined
Aug 24, 2015
Messages
5
OK - thanks... so we do have a mess then... I had hoped I was just mis-understanding the output of zpool status.

Given that FreeNAS is not complaining about the disk - I'm moving to create a full backup of the volume right now and will delete / recreate the volume this evening. It'll be interesting to go through that process - and "pucker" when I push the big red button to kill the existing volume.

I appreciate the feedback - and your kindness re my "newbness" re this technology.
 

INCSlayer

Contributor
Joined
Apr 4, 2015
Messages
197
When you make a new pool making a 12disk vdev is not recommended it is recommended to in that case make a zpool consisting of 2 RaidZ2 vdevs (i think i got the terminology right)

which yes will loose you some space but hey thats the cost of redundancy
 

nhmilleraz

Cadet
Joined
Aug 24, 2015
Messages
5
The simplistic approach that I THINK I am taking is as follows:

1) Attach an external 6TB drive (in a USB attached cradle) to the server (I've done this and FreeNAS sees it)
2) Rsync the entire data set from the existing 12stripe volume to the single drive (I've cleaned out some old data and am down to 4TB)
3) Validate the copy / re-Rsync the files tonight or early tomorrow AM immediatly before step 4.
4) Delete the 12 stripe volume
5) Create a new RaidZ2 pool with sufficient space to handle the existing 4TB data set. I've got some reading to do - but I THINK I'm going to build a 6 disk pool ($+2) - with ~8 TB of space - and then repeat this with a second pool of the same dimensions for other uses
6) Rsync the data set back over to the new volume

Any immediate gotcha's there? I know I've got to consider migration of user rights from the old volume to the new one... I'll start looking at that momentarily - but any short-cuts to processes for that work would be appreciated as well...
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
rsync, if done correctly, will preserve ownership, permissions, etc. You could also create a ZFS volume on the USB drive, and use ZFS replication to transfer the data--this will also preserve all the metadata.

Your plan looks good. For step 5, you can create two separate pools, or you can have a single pool consisting of two, six-disk RAIDZ2 vdevs. Which one better fits your use case is really your call.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I vote you reevaluate things really quick. When rebuilding you will switch from a stripe layout to a raidz type layout and some things are going to change. Performance and total storage available will change. You will also have to reconfigure everything unless you have the same dataset and pool names(maybe?). I don't think this is going to be a trivial thing to just destroy and recreate everything one evening after people have left. I'm also not sure your hardware is a good choice for freenas, someone else should chime in on this though since I'm not sure what the platform is, but a quick look shows a raid card and non intel nic's which are not recommended hardware choices.
 

nhmilleraz

Cadet
Joined
Aug 24, 2015
Messages
5
Thanks SweetAndLow... The challenge that I have is that this is an existing solution being used by a small number of users (single digit) on a daily basis. It is the primary datastore for a small business (about 10 staff). If this dies today they are in a world of hurt. I've been brought in as a consultant to help them stabalize the environment after the (sudden) departure of the guy that built it... So - priority 1 was a static copy of the most critical data (done)... priority 2 was addressing the "failing disk" issue... priority 3 is getting some sort of redundancy in place... and so on... Down stream is a more comprehensive re-design of their storage solution. I appreciate the pointers... I'll fold that in either after step 2 or 3.... :smile:
 
Status
Not open for further replies.
Top