FreeNAS 9.3 with degraded volume - need advice/help.

Houe

Dabbler
Joined
Nov 20, 2015
Messages
16
So the day has come. I woke up to an email informing me that my FreeNAS volume is in a degraded state. This data is very important to me and family. I'm running an older version of FreeNAS as stated in the title so hopefully someone can still help me. When I view the volume status I see the volume status of "DEGRADED". Three of the 4 drives are "ONLINE" and one is "UNAVAIL". When I click the "UNAVAIL" drive the only option I see is a replace button. The email I received from the server is:

The volume Edmund (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.

I later received another email very similar with the added information:

The volume Edmund (ZFS) state is DEGRADED: One or more devices could not be opened. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Device: /dev/ada2, unable to open device


I received one more email with a wall of text that I'll add as a reply to this thread.

I never received any SMART report saying the drive was failing. I don't know if FreeNAS even sees the drive anymore or if it is completely dead. Any way to tell?

This array was a z2 array consisting of 4 drives that were 4TB each. If my understanding is correct my data will be fine even if I lose 1 more drive.

I think I can follow the manual's instructions to replace the drive with a new drive, but my questions is this. Since I can still access the drive data and my "backup" of this data is currently not up to date (the backup might be a year or more old). Should I first back up the data first or should I just replace the drive and try to repair the volume? I know my backups should have been done more frequently but life happens and it was neglected (my fault). How would you suggest I proceed?

Thanks
 

Houe

Dabbler
Joined
Nov 20, 2015
Messages
16
freenas.local kernel log messages:
> Waiting (max 60 seconds) for system process `vnlru' to stop...done
> Waiting (max 60 seconds) for system process `bufdaemon' to stop...done
> Waiting (max 60 seconds) for system process `syncer' to stop...
> Syncing disks, vnodes remaining...0 0 0 0 done All buffers synced.
> Copyright (c) 1992-2014 The FreeBSD Project.
> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
> The Regents of the University of California. All rights reserved.
> FreeBSD is a registered trademark of The FreeBSD Foundation.
> FreeBSD 9.3-RELEASE-p31 #0 r288272+e7e804d: Mon May 16 21:29:59 PDT 2016
>
> root@build3.ixsystems.com:/tank/home/stable-builds/FN/objs/os-base/amd
> 64/tank/home/stable-builds/FN/FreeBSD/src/sys/FREENAS.amd64 amd64 gcc
> version 4.2.1 20070831 patched [FreeBSD]
> CPU: Intel(R) Xeon(R) CPU E3-1225 v3 @ 3.20GHz (3192.68-MHz K8-class CPU)
> Origin = "GenuineIntel" Id = 0x306c3 Family = 0x6 Model = 0x3c Stepping = 3
> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
> Features2=0x7ffafbff<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,SMX,EST,TM2,SSSE3,<b11>,FMA,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,TSCDLT,AESNI,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
> AMD Features=0x2c100800<SYSCALL,NX,Page1GB,RDTSCP,LM>
> AMD Features2=0x21<LAHF,ABM>
> Standard Extended Features=0x27ab<GSFSBASE,TSCADJ,SMEP,ENHMOVSB,INVPCID>
> TSC: P-state invariant, performance statistics real memory =
> 9107931136 (8686 MB) avail memory = 8022417408 (7650 MB) Event timer
> "LAPIC" quality 600
> ACPI APIC Table: <LENOVO TC-FB >
> FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
> FreeBSD/SMP: 1 package(s) x 4 core(s)
> cpu0 (BSP): APIC ID: 0
> cpu1 (AP): APIC ID: 2
> cpu2 (AP): APIC ID: 4
> cpu3 (AP): APIC ID: 6
> WARNING: VIMAGE (virtualized network stack) is a highly experimental feature.
> ACPI Warning: FADT (revision 5) is longer than ACPI 5.0 version,
> truncating length 268 to 256 (20111123/tbfadt-325)
> ioapic0 <Version 2.0> irqs 0-23 on motherboard
> kbd1 at kbdmux0
> cryptosoft0: <software crypto> on motherboard
> aesni0: <AES-CBC,AES-XTS> on motherboard
> padlock0: No ACE support.
> acpi0: <LENOVO TC-FB> on motherboard
> acpi0: Power Button (fixed)
> acpi0: reservation of 67, 1 (4) failed
> cpu0: <ACPI CPU> on acpi0
> cpu1: <ACPI CPU> on acpi0
> cpu2: <ACPI CPU> on acpi0
> cpu3: <ACPI CPU> on acpi0
> hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on
> acpi0 Timecounter "HPET" frequency 14318180 Hz quality 950 Event timer
> "HPET" frequency 14318180 Hz quality 550 Event timer "HPET1" frequency
> 14318180 Hz quality 440 Event timer "HPET2" frequency 14318180 Hz
> quality 440 Event timer "HPET3" frequency 14318180 Hz quality 440
> Event timer "HPET4" frequency 14318180 Hz quality 440
> atrtc0: <AT realtime clock> port 0x70-0x77 irq 8 on acpi0
> atrtc0: Warning: Couldn't map I/O.
> Event timer "RTC" frequency 32768 Hz quality 0
> attimer0: <AT timer> port 0x40-0x43,0x50-0x53 irq 0 on acpi0
> Timecounter "i8254" frequency 1193182 Hz quality 0 Event timer "i8254"
> frequency 1193182 Hz quality 100 Timecounter "ACPI-fast" frequency
> 3579545 Hz quality 900
> acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1808-0x180b on acpi0
> pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
> pci0: <ACPI PCI bus> on pcib0
> vgapci0: <VGA-compatible display> port 0xf000-0xf03f mem
> 0xf7800000-0xf7bfffff,0xe0000000-0xefffffff irq 16 at device 2.0 on
> pci0
> vgapci0: Boot video device
> pci0: <serial bus, USB> at device 20.0 (no driver attached)
> pci0: <simple comms> at device 22.0 (no driver attached)
> uart2: <Intel Lynx Point KT Controller> port 0xf0e0-0xf0e7 mem
> 0xf7c36000-0xf7c36fff irq 19 at device 22.3 on pci0
> em0: <Intel(R) PRO/1000 Network Connection 7.6.1-k> port 0xf080-0xf09f
> mem 0xf7c00000-0xf7c1ffff,0xf7c35000-0xf7c35fff irq 20 at device 25.0
> on pci0
> em0: Using an MSI interrupt
> em0: Ethernet address: 6c:0b:84:3d:c8:a7
> ehci0: <EHCI (generic) USB 2.0 controller> mem 0xf7c34000-0xf7c343ff
> irq 17 at device 26.0 on pci0
> usbus0: EHCI version 1.0
> usbus0 on ehci0
> pcib1: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0
> pci1: <ACPI PCI bus> on pcib1
> pcib2: <ACPI PCI-PCI bridge> irq 19 at device 28.3 on pci0
> pci2: <ACPI PCI bus> on pcib2
> pcib3: <ACPI PCI-PCI bridge> irq 19 at device 0.0 on pci2
> pci3: <ACPI PCI bus> on pcib3
> ehci1: <EHCI (generic) USB 2.0 controller> mem 0xf7c33000-0xf7c333ff
> irq 23 at device 29.0 on pci0
> usbus1: EHCI version 1.0
> usbus1 on ehci1
> isab0: <PCI-ISA bridge> at device 31.0 on pci0
> isa0: <ISA bus> on isab0
> ahci0: <Intel Lynx Point AHCI SATA controller> port
> 0xf0d0-0xf0d7,0xf0c0-0xf0c3,0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf060-0xf07f
> mem 0xf7c32000-0xf7c327ff irq 19 at device 31.2 on pci0
> ahci0: AHCI v1.30 with 5 6Gbps ports, Port Multiplier not supported
> ahcich0: <AHCI channel> at channel 0 on ahci0
> ahcich1: <AHCI channel> at channel 1 on ahci0
> ahcich2: <AHCI channel> at channel 2 on ahci0
> ahcich3: <AHCI channel> at channel 3 on ahci0
> ahcich4: <AHCI channel> at channel 4 on ahci0
> acpi_button0: <Power Button> on acpi0
> acpi_tz0: <Thermal Zone> on acpi0
> acpi_tz1: <Thermal Zone> on acpi0
> uart0: <16550 or compatible> port 0x3f8-0x3ff irq 4 flags 0x10 on
> acpi0
> orm0: <ISA Option ROM> at iomem 0xd0000-0xd0fff on isa0
> sc0: <System console> at flags 0x100 on isa0
> sc0: VGA <16 virtual consoles, flags=0x300>
> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
> isa0
> atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
> atkbd0: <AT Keyboard> irq 1 on atkbdc0
> kbd0 at atkbd0
> atkbd0: [GIANT-LOCKED]
> wbwd0: HEFRAS and EFER do not align: EFER 0x2e DevID 0xff DevRev 0xff
> CR26 0xff
> coretemp0: <CPU On-Die Thermal Sensors> on cpu0
> est0: <Enhanced SpeedStep Frequency Control> on cpu0
> coretemp1: <CPU On-Die Thermal Sensors> on cpu1
> est1: <Enhanced SpeedStep Frequency Control> on cpu1
> coretemp2: <CPU On-Die Thermal Sensors> on cpu2
> est2: <Enhanced SpeedStep Frequency Control> on cpu2
> coretemp3: <CPU On-Die Thermal Sensors> on cpu3
> est3: <Enhanced SpeedStep Frequency Control> on cpu3 ZFS filesystem
> version: 5 ZFS storage pool version: features support (5000)
> Timecounters tick every 1.000 msec
> ipfw2 (+ipv6) initialized, divert enabled, nat enabled, default to
> accept, logging disabled
> usbus0: 480Mbps High Speed USB v2.0
> usbus1: 480Mbps High Speed USB v2.0
> ugen0.1: <Intel> at usbus0
> uhub0: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on
> usbus0
> ugen1.1: <Intel> at usbus1
> uhub1: <Intel EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on
> usbus1
> (aprobe2:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00
> 00 00 00
> (aprobe2:ahcich2:0:0:0): CAM status: ATA Status Error
> (aprobe2:ahcich2:0:0:0): ATA status: 71 (DRDY DF SERV ERR), error: 04
> (ABRT )
> (aprobe2:ahcich2:0:0:0): RES: 71 04 00 00 00 40 00 00 00 00 00
> (aprobe2:ahcich2:0:0:0): Retrying command
> (aprobe2:ahcich2:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00
> 00 00 00
> (aprobe2:ahcich2:0:0:0): CAM status: ATA Status Error
> (aprobe2:ahcich2:0:0:0): ATA status: 71 (DRDY DF SERV ERR), error: 04
> (ABRT )
> (aprobe2:ahcich2:0:0:0): RES: 71 04 00 00 00 40 00 00 00 00 00
> (aprobe2:ahcich2:0:0:0): Error 5, Retries exhausted
> uhub0: 3 ports with 3 removable, self powered
> uhub1: 3 ports with 3 removable, self powered
> ugen0.2: <vendor 0x8087> at usbus0
> uhub2: <vendor 0x8087 product 0x8008, class 9/0, rev 2.00/0.04, addr
> 2> on usbus0
> ugen1.2: <vendor 0x8087> at usbus1
> uhub3: <vendor 0x8087 product 0x8000, class 9/0, rev 2.00/0.04, addr
> 2> on usbus1
> uhub2: 6 ports with 6 removable, self powered
> uhub3: 8 ports with 8 removable, self powered
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <ST4000VN000-1H4168 SC46> ATA-9 SATA 3.x device
> ada0: Serial Number Z304QQ4Q
> ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
> ada0: Previously was known as ad4
> ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
> ada1: <ST4000VN000-1H4168 SC46> ATA-9 SATA 3.x device
> ada1: Serial Number Z304QP9V
> ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
> ada1: Previously was known as ad6
> ada2 at ahcich3 bus 0 scbus3 target 0 lun 0
> ada2: <HGST HDN724040ALE640 MJAOA5E0> ATA-8 SATA 3.x device
> ada2: Serial Number PK2338P4H5UEEC
> ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
> ada2: Command Queueing enabled
> ada2: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
> ada2: Previously was known as ad10
> ada3 at ahcich4 bus 0 scbus4 target 0 lun 0
> ada3: <INTEL SSDSA2M080G2GC 2CV102M3> ATA-7 SATA 2.x device
> ada3: Serial Number CVPO0100014Y080JGN
> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada3: Command Queueing enabled
> ada3: 76319MB (156301488 512 byte sectors: 16H 63S/T 16383C)
> ada3: quirks=0x1<4K>
> ada3: Previously was known as ad12
> SMP: AP CPU #1 Launched!
> SMP: AP CPU #3 Launched!
> SMP: AP CPU #2 Launched!
> Timecounter "TSC-low" frequency 1596337589 Hz quality 1000 Trying to
> mount root from zfs:freenas-boot/ROOT/FreeNAS-9.3-STABLE-201605170422 []...
> GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)
> wbwd0: HEFRAS and EFER do not align: EFER 0x2e DevID 0xff DevRev 0xff
> CR26 0xff
> GEOM_ELI: Device ada0p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device ada1p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> GEOM_ELI: Device ada2p1.eli created.
> GEOM_ELI: Encryption: AES-XTS 256
> GEOM_ELI: Crypto: hardware
> vboxdrv: fAsync=0 offMin=0x2aa offMax=0xd13

-- End of security output --
 

Houe

Dabbler
Joined
Nov 20, 2015
Messages
16
Another email I received:

Checking status of zfs pools:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
Edmund 14.5T 3.39T 11.1T - 9% 23% 1.00x DEGRADED /mnt
freenas-boot 74.5G 1.55G 73.0G - - 2% 1.00x ONLINE -

pool: Edmund
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: scrub repaired 0 in 3h2m with 0 errors on Sun Dec 8 03:02:26 2019
config:

NAME STATE READ WRITE CKSUM
Edmund DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/834c8c1a-acc3-11e5-b60b-6c0b843dc8a7 ONLINE 0 0 0
gptid/8408e0ba-acc3-11e5-b60b-6c0b843dc8a7 ONLINE 0 0 0
9997707428433300241 UNAVAIL 0 0 0 was /dev/gptid/84c03ddc-acc3-11e5-b60b-6c0b843dc8a7
gptid/858346c2-acc3-11e5-b60b-6c0b843dc8a7 ONLINE 0 0 0

errors: No known data errors

-- End of daily output --
 

Alecmascot

Guru
Joined
Mar 18, 2014
Messages
1,177
Probably best to replace the defective drive with the spare you have been keeping for this contingency, then do a backup after the re-silver is finished.
THe backup my be slow if the pool is still degraded.
 
Top