Pool unavailable, all disks online but cannot import. (a bit desperate here)

sgn.advertising · Mar 10, 2022

Hello! I'm fairly new to TrueNas. Built the "server" in april 2021.

SYSTEM:
Intel Core i3-8100 CPU
32GB DDR4 Memory
GIGABYTE z390 AORUS PRO
DeLock PCI-E 10-SATA ports ( ASMedia / JMicron chipset)
6x 8TB Seagate IronWolf HDDs
2x 128GB Kingston SSDs (one for boot, one for an fast SSD pool used to run VMs)
1x 480GB ADATA SSDs (just a temporary ssd share, didn't have anything better to do with the SSD)
450W 80+ Gold PSU

Pool DataVault1 - 3 mirrored vdevs made from the 6x 8TB Seagate SSDs

PROBLEM:
Somewhere in January I started experiencing some errors and notifications from the system. I don't have screenshots with those errors but I will try to explain them to my best capabilities.
The errors were at first regarding disks...like: Device ada2 not capable of SMART self-test
At the same time my pool would become unhealthy with some checksum errors on the drive(s) that reported that error (i'm talking 5-15 chk errors, once it was one drive with 200 and something)

I started to read what would cause this and due to a full schedule I didn't get to be consistent in my troubleshooting and it took me a few weeks. I did short and long smart tests on all 6 drives. All clean.
In the meantime I cleared the checksum errors on the pool and everything was running smoothly.
I read here on the forum to check the CRC_Error_Count and indeed I found RAW_Values aroud 1-5. So to the recommendation of multiple users I changed 2 days ago all the cables leading to the 6 Seagate HDDs.

At this moment all hell let loose. While booting the machine I started seeing these messages in the console on a loop:

Code:

Root mount waiting for: CAM usbus0
(noperiph:ahcich0:0:-1:ffffffff): rescan already queued
uhub0: 26 ports with 26 removable, self powered
ses0 at ahciem0 bus 0 scbus16 target 0 lun 0
ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device
ses0: SEMB SES Device
ada0 at ahcich1 bus 0 scbus1 target 0 lun 0
ada0: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device
ada0: Serial Number WSD07MD2
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 7630885MB (15628053168 512 byte sectors)
ses0: (none) in 'Slot 00', SATA Slot: scbus10 target 0
ses0: (none) in 'Slot 01', SATA Slot: scbus11 target 0
ada1 at ahcich10 bus 0 scbus8 target 0 lun 0
ada1: <ADATA SU630 XD0R00V0> ACS-3 ATA SATA 3.x device
ada1: Serial Number 2K5229QBDNLW
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 457862MB (937703088 512 byte sectors)
ses0: (none) in 'Slot 02', SATA Slot: scbus12 target 0
ada2 at ahcich11 bus 0 scbus9 target 0 lun 0
ada2: <KINGSTON SA400S37120G SBFKB1D1> ACS-4 ATA SATA 3.x device
ada2: Serial Number 50026B7682F830F9
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 114473MB (234441648 512 byte sectors)
ses0: (none) in 'Slot 03', SATA Slot: scbus13 target 0
ada3 at ahcich12 bus 0 scbus10 target 0 lun 0
ada3: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device
ada3: Serial Number WSD07PYV
ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 7630885MB (15628053168 512 byte sectors)
ses0: (none) in 'Slot 04', SATA Slot: scbus14 target 0
ada4 at ahcich13 bus 0 scbus11 target 0 lun 0
ada4: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device
ada4: Serial Number WSD0QZAT
ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 7630885MB (15628053168 512 byte sectors)
ses0: (none) in 'Slot 05', SATA Slot: scbus15 target 0
ada5 at ahcich14 bus 0 scbus12 target 0 lun 0
ada5: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device
ada5: Serial Number WSD0QYYQ
ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada5: Command Queueing enabled
ada5: 7630885MB (15628053168 512 byte sectors)
ada6 at ahcich15 bus 0 scbus13 target 0 lun 0

When the server was up and running (after more that 20 minutes) my main pool was in a faulted state and already in a resilvering process. The progress percentage was going up reaaaaly slow and estimate was for 5-6 days (fluctuating).

After more than 24h no progress on the resilvering, and from the 6 drives 2 were online, 2 removed, and I think 1 faulted or some other status, cannot remember.
"zpool status" no longer shows the DataVault1 pool and today I stopped the server and decided to take another possible source of problems out of the equation, the 10-sata pci-e extender and connected all 6 hdd-s to the motherboard and the boot ssd I connected to the sata extender.

At first, by running zpool import I was seeing the DataVault1 pool with only 3 drives online and from the other 3 one faulted, two unavail. While reading lots of forum posts some 3-4 hours gone by and now when I run "zpool import" the pool looks like this

Code:

root@ODIN[~]# zpool import
   pool: DataVault1
     id: 151585561031890332
  state: ONLINE
status: One or more devices were being resilvered.
 action: The pool can be imported using its name or numeric identifier.
 config:

        DataVault1                                      ONLINE
          mirror-0                                      ONLINE
            gptid/8e3aabdb-8899-11eb-b080-b42e9949088d  ONLINE
            gptid/8ea1f970-8899-11eb-b080-b42e9949088d  ONLINE
          mirror-1                                      ONLINE
            gptid/8ed779a6-8899-11eb-b080-b42e9949088d  ONLINE
            gptid/8f0162db-8899-11eb-b080-b42e9949088d  ONLINE
          mirror-2                                      ONLINE
            gptid/8c1f3017-8899-11eb-b080-b42e9949088d  ONLINE
            gptid/8d8996d3-8899-11eb-b080-b42e9949088d  ONLINE
root@ODIN[~]# zpool import -f 151585561031890332
cannot import 'DataVault1': one or more devices is currently unavailable

AS you can see all drives are now ONLINE.....I'm very veery confused.
But running the import command on that pool says that devices are unavailable.

I'm a bit desperate and cannot comprehend what is happening here so I would really much appreciate all the help.

PS: I know the saying about making multiple backups. I have in mind a replica of this server for redundancy but right now I cannot afford to build another 48TB NAS.

PS 2: Ask me anything that I need to provide more, I know my way pretty good around computers but I am really new at this TrueNAS thing :D

Patrick M. Hausen · Mar 10, 2022

What does a zpool status -v show now? And please don't do anything else to the pool, just report the status.

sgn.advertising · Mar 10, 2022

Nothing regarding the big data pool :(

Code:

root@ODIN[~]# zpool status -v
  pool: VM-Services
 state: ONLINE
  scan: scrub repaired 0B in 00:10:24 with 0 errors on Sun Feb 27 00:10:24 2022
config:

        NAME                                          STATE     READ WRITE CKSUM
        VM-Services                                   ONLINE       0     0     0
          gptid/2681bc21-4089-11ec-8755-3c7c3f4b6b3a  ONLINE       0     0     0
          gptid/71f9abcb-4176-11ec-9d22-3c7c3f4b6b3a  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:35 with 0 errors on Tue Mar  8 03:45:35 2022
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada8p2    ONLINE       0     0     0

errors: No known data errors
root@ODIN[~]#

Etorix · Mar 10, 2022

sgn.advertising said:
SYSTEM:
Intel Core i3-8100 CPU
32GB DDR4 Memory
GIGABYTE z390 AORUS PRO
DeLock PCI-E 10-SATA ports ( ASMedia / JMicron chipset) <===

I fear your problem is this inappropriate card. What is attached to it?

Multiply your problems with SATA Port Multipliers and cheap SATA controllers

jgreco submitted a new resource: Multiply your problems with SATA Port Multipliers and cheap SATA controllers - Bad technology, multiplied by more evil In the last year or two, we've had a resurgence of users asking about SATA Port Multipliers and cheap SATA controllers. Please, do NOT use...

www.truenas.com

First, let's diagnose, then you may need to think about getting a proper HBA or attaching all six drives directly to the motherboard.

Patrick M. Hausen · Mar 10, 2022

Etorix said:
I fear your problem is this inappropriate card.

The OP already wrote that all disks are attached to the mainboard now instead.

sgn.advertising · Mar 10, 2022

Etorix said:
I fear your problem is this inappropriate card. What is attached to it?

The 6 HDDs were attached to it before, in the times i had sporadic checksum errors thrown out.

Etorix said:
First, let's diagnose, then you may need to think about getting a proper HBA or attaching all six drives directly to the motherboard.

Right now the 6 HDDs are connected to the mobo and I am willing to buy a proper HBA card since the mobo only has 6 ports. But I can't seem to find one on any stores in my country (Romania).
I only find this low-end sata "extenders" o every IT store. If you have any recommendation on HBAs I am here to listen and I will try and find them on some international stores.

sgn.advertising · Mar 10, 2022

I'm coming up with an update on the SATA expansion card.
This is the product from DeLock

Jailer · Mar 10, 2022

sgn.advertising said:
If you have any recommendation on HBAs I am here to listen and I will try and find them on some international stores.

Is ebay an option for you? There's plenty of used HBA's for sale there that many members use.

sgn.advertising · Mar 10, 2022

It's kind of an option, yes. Due to customs in our country it would take around 2 months for an order to come but yes, I could order.
Do you have a suggestion of a specific HBA card? I am not that knowledgeable in this types of components. What should I look for?

Etorix · Mar 10, 2022

Basically anything based on a LSI 2008, 2308 or 3008… It shouldn't take two months to arrive and clear if you can find a seller in the EU.
For instance, a seller on ServeTheHome.com offers IT-flashed Dell PERC H200 for 35E.

What does zpool status -v report?

sgn.advertising · Mar 11, 2022

Etorix said:
Basically anything based on a LSI 2008, 2308 or 3008… It shouldn't take two months to arrive and clear if you can find a seller in the EU.
For instance, a seller on ServeTheHome.com offers IT-flashed Dell PERC H200 for 35E.

What does zpool status -v report?

Okay. Thank you for the suggestion.

By the way, the zpool status -v report is already postes a few comments ahead.

sgn.advertising · Mar 11, 2022

So, next steps:
I found and secured an order for an LSI H310 card. It will arive in a few days. What should I do next?

Turn off the server completly and wait for the card? And as soon as it arrives connect all 6HDDs to it and.boot the server?

Is there anything else I should do?

Thank you to everyone offering their help and I hope, with your help I will be able to fix these issues.

Patrick M. Hausen · Mar 11, 2022

If the pool cannot be imported with all drives connected to the mainboard, connecting the same drives to a new HBA won't change that. You need the HBA for future stable and reliable operation while providing room for expansion. The current problem with the pool - sorry, I have no idea. If the six connected and visible ("ONLINE") drives are indeed the complete pool, you should be able to import. If there are drives missing because you don't have enough SATA ports on the mainboard, well, then it's obvious. But I did not read that from your messages.

Someone with deeper ZFS knowledge needs to step in.

sgn.advertising · Mar 11, 2022

So coming with an update. Right now I'm at the office and have access to the server.
I ran another zpool status -v and the result is the same

Code:

root@ODIN[~]# zpool status -v
  pool: VM-Services
 state: ONLINE
  scan: resilvered 0B in 00:00:00 with 0 errors on Fri Mar 11 13:39:02 2022
config:

        NAME                                          STATE     READ WRITE CKSUM
        VM-Services                                   ONLINE       0     0     0
          gptid/2681bc21-4089-11ec-8755-3c7c3f4b6b3a  ONLINE       0     0     0
          gptid/71f9abcb-4176-11ec-9d22-3c7c3f4b6b3a  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:35 with 0 errors on Tue Mar  8 03:45:35 2022
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          ada8p2    ONLINE       0     0     0

errors: No known data errors

BUT...some weird things are happening

I ran again zpool import and things are no longer okay

Code:

root@ODIN[~]# zpool import
   pool: DataVault1
     id: 151585561031890332
  state: FAULTED
status: One or more devices were being resilvered.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
 config:

        DataVault1                                      FAULTED  corrupted data
          mirror-0                                      DEGRADED
            gptid/8e3aabdb-8899-11eb-b080-b42e9949088d  FAULTED  corrupted data
            gptid/8ea1f970-8899-11eb-b080-b42e9949088d  ONLINE
          mirror-1                                      DEGRADED
            gptid/8ed779a6-8899-11eb-b080-b42e9949088d  UNAVAIL  cannot open
            gptid/8f0162db-8899-11eb-b080-b42e9949088d  UNAVAIL  corrupted data
          mirror-2                                      ONLINE
            gptid/8c1f3017-8899-11eb-b080-b42e9949088d  ONLINE
            gptid/8d8996d3-8899-11eb-b080-b42e9949088d  ONLINE

Second zpool import 10 minutes later

Code:

root@ODIN[~]# zpool import
   pool: DataVault1
     id: 151585561031890332
  state: DEGRADED
status: One or more devices were being resilvered.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
 config:

        DataVault1                                      DEGRADED
          mirror-0                                      ONLINE
            gptid/8e3aabdb-8899-11eb-b080-b42e9949088d  ONLINE
            gptid/8ea1f970-8899-11eb-b080-b42e9949088d  ONLINE
          mirror-1                                      DEGRADED
            gptid/8ed779a6-8899-11eb-b080-b42e9949088d  ONLINE
            ada1p2                                      UNAVAIL  cannot open
          mirror-2                                      ONLINE
            gptid/8c1f3017-8899-11eb-b080-b42e9949088d  ONLINE
            gptid/8d8996d3-8899-11eb-b080-b42e9949088d  ONLINE

Something really shady is going on there

mistermanko · Mar 11, 2022

Seeing your "Gaming branded" mainboard, I suspect that. I would recheck cabeling on every sata connection. You could also try to install TrueNAS SCALE on another boot environment and try to import the pool there. SCALE has sometimes better driver support for "non-server" hardware.

Important Announcement for the TrueNAS Community.

Pool unavailable, all disks online but cannot import. (a bit desperate here)

sgn.advertising

Cadet

Patrick M. Hausen

Hall of Famer

sgn.advertising

Cadet

Etorix

Wizard

Multiply your problems with SATA Port Multipliers and cheap SATA controllers

Patrick M. Hausen

Hall of Famer

sgn.advertising

Cadet

sgn.advertising

Cadet

Jailer

Not strong, but bad

sgn.advertising

Cadet

Etorix

Wizard

sgn.advertising

Cadet

sgn.advertising

Cadet

Patrick M. Hausen

Hall of Famer

sgn.advertising

Cadet

mistermanko

Guru

Similar threads

Important Announcement for the TrueNAS Community.

Pool unavailable, all disks online but cannot import. (a bit desperate here)

Cadet

Hall of Famer

Cadet

Wizard

Hall of Famer

Cadet

Cadet

Not strong, but bad

Cadet

Wizard

Cadet

Cadet

Hall of Famer

Cadet

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Pool unavailable, all disks online but cannot import. (a bit desperate here)"

Similar threads