sgn.advertising
Cadet
- Joined
- Jul 2, 2021
- Messages
- 9
Hello! I'm fairly new to TrueNas. Built the "server" in april 2021.
SYSTEM:
Intel Core i3-8100 CPU
32GB DDR4 Memory
GIGABYTE z390 AORUS PRO
DeLock PCI-E 10-SATA ports ( ASMedia / JMicron chipset)
6x 8TB Seagate IronWolf HDDs
2x 128GB Kingston SSDs (one for boot, one for an fast SSD pool used to run VMs)
1x 480GB ADATA SSDs (just a temporary ssd share, didn't have anything better to do with the SSD)
450W 80+ Gold PSU
Pool DataVault1 - 3 mirrored vdevs made from the 6x 8TB Seagate SSDs
PROBLEM:
Somewhere in January I started experiencing some errors and notifications from the system. I don't have screenshots with those errors but I will try to explain them to my best capabilities.
The errors were at first regarding disks...like: Device ada2 not capable of SMART self-test
At the same time my pool would become unhealthy with some checksum errors on the drive(s) that reported that error (i'm talking 5-15 chk errors, once it was one drive with 200 and something)
I started to read what would cause this and due to a full schedule I didn't get to be consistent in my troubleshooting and it took me a few weeks. I did short and long smart tests on all 6 drives. All clean.
In the meantime I cleared the checksum errors on the pool and everything was running smoothly.
I read here on the forum to check the CRC_Error_Count and indeed I found RAW_Values aroud 1-5. So to the recommendation of multiple users I changed 2 days ago all the cables leading to the 6 Seagate HDDs.
At this moment all hell let loose. While booting the machine I started seeing these messages in the console on a loop:
When the server was up and running (after more that 20 minutes) my main pool was in a faulted state and already in a resilvering process. The progress percentage was going up reaaaaly slow and estimate was for 5-6 days (fluctuating).
After more than 24h no progress on the resilvering, and from the 6 drives 2 were online, 2 removed, and I think 1 faulted or some other status, cannot remember.
"zpool status" no longer shows the DataVault1 pool and today I stopped the server and decided to take another possible source of problems out of the equation, the 10-sata pci-e extender and connected all 6 hdd-s to the motherboard and the boot ssd I connected to the sata extender.
At first, by running zpool import I was seeing the DataVault1 pool with only 3 drives online and from the other 3 one faulted, two unavail. While reading lots of forum posts some 3-4 hours gone by and now when I run "zpool import" the pool looks like this
AS you can see all drives are now ONLINE.....I'm very veery confused.
But running the import command on that pool says that devices are unavailable.
I'm a bit desperate and cannot comprehend what is happening here so I would really much appreciate all the help.
PS: I know the saying about making multiple backups. I have in mind a replica of this server for redundancy but right now I cannot afford to build another 48TB NAS.
PS 2: Ask me anything that I need to provide more, I know my way pretty good around computers but I am really new at this TrueNAS thing :D
SYSTEM:
Intel Core i3-8100 CPU
32GB DDR4 Memory
GIGABYTE z390 AORUS PRO
DeLock PCI-E 10-SATA ports ( ASMedia / JMicron chipset)
6x 8TB Seagate IronWolf HDDs
2x 128GB Kingston SSDs (one for boot, one for an fast SSD pool used to run VMs)
1x 480GB ADATA SSDs (just a temporary ssd share, didn't have anything better to do with the SSD)
450W 80+ Gold PSU
Pool DataVault1 - 3 mirrored vdevs made from the 6x 8TB Seagate SSDs
PROBLEM:
Somewhere in January I started experiencing some errors and notifications from the system. I don't have screenshots with those errors but I will try to explain them to my best capabilities.
The errors were at first regarding disks...like: Device ada2 not capable of SMART self-test
At the same time my pool would become unhealthy with some checksum errors on the drive(s) that reported that error (i'm talking 5-15 chk errors, once it was one drive with 200 and something)
I started to read what would cause this and due to a full schedule I didn't get to be consistent in my troubleshooting and it took me a few weeks. I did short and long smart tests on all 6 drives. All clean.
In the meantime I cleared the checksum errors on the pool and everything was running smoothly.
I read here on the forum to check the CRC_Error_Count and indeed I found RAW_Values aroud 1-5. So to the recommendation of multiple users I changed 2 days ago all the cables leading to the 6 Seagate HDDs.
At this moment all hell let loose. While booting the machine I started seeing these messages in the console on a loop:
Code:
Root mount waiting for: CAM usbus0 (noperiph:ahcich0:0:-1:ffffffff): rescan already queued uhub0: 26 ports with 26 removable, self powered ses0 at ahciem0 bus 0 scbus16 target 0 lun 0 ses0: <AHCI SGPIO Enclosure 2.00 0001> SEMB S-E-S 2.00 device ses0: SEMB SES Device ada0 at ahcich1 bus 0 scbus1 target 0 lun 0 ada0: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device ada0: Serial Number WSD07MD2 ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada0: Command Queueing enabled ada0: 7630885MB (15628053168 512 byte sectors) ses0: (none) in 'Slot 00', SATA Slot: scbus10 target 0 ses0: (none) in 'Slot 01', SATA Slot: scbus11 target 0 ada1 at ahcich10 bus 0 scbus8 target 0 lun 0 ada1: <ADATA SU630 XD0R00V0> ACS-3 ATA SATA 3.x device ada1: Serial Number 2K5229QBDNLW ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada1: Command Queueing enabled ada1: 457862MB (937703088 512 byte sectors) ses0: (none) in 'Slot 02', SATA Slot: scbus12 target 0 ada2 at ahcich11 bus 0 scbus9 target 0 lun 0 ada2: <KINGSTON SA400S37120G SBFKB1D1> ACS-4 ATA SATA 3.x device ada2: Serial Number 50026B7682F830F9 ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada2: Command Queueing enabled ada2: 114473MB (234441648 512 byte sectors) ses0: (none) in 'Slot 03', SATA Slot: scbus13 target 0 ada3 at ahcich12 bus 0 scbus10 target 0 lun 0 ada3: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device ada3: Serial Number WSD07PYV ada3: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada3: Command Queueing enabled ada3: 7630885MB (15628053168 512 byte sectors) ses0: (none) in 'Slot 04', SATA Slot: scbus14 target 0 ada4 at ahcich13 bus 0 scbus11 target 0 lun 0 ada4: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device ada4: Serial Number WSD0QZAT ada4: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada4: Command Queueing enabled ada4: 7630885MB (15628053168 512 byte sectors) ses0: (none) in 'Slot 05', SATA Slot: scbus15 target 0 ada5 at ahcich14 bus 0 scbus12 target 0 lun 0 ada5: <ST8000VN004-2M2101 SC60> ACS-4 ATA SATA 3.x device ada5: Serial Number WSD0QYYQ ada5: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes) ada5: Command Queueing enabled ada5: 7630885MB (15628053168 512 byte sectors) ada6 at ahcich15 bus 0 scbus13 target 0 lun 0
When the server was up and running (after more that 20 minutes) my main pool was in a faulted state and already in a resilvering process. The progress percentage was going up reaaaaly slow and estimate was for 5-6 days (fluctuating).
After more than 24h no progress on the resilvering, and from the 6 drives 2 were online, 2 removed, and I think 1 faulted or some other status, cannot remember.
"zpool status" no longer shows the DataVault1 pool and today I stopped the server and decided to take another possible source of problems out of the equation, the 10-sata pci-e extender and connected all 6 hdd-s to the motherboard and the boot ssd I connected to the sata extender.
At first, by running zpool import I was seeing the DataVault1 pool with only 3 drives online and from the other 3 one faulted, two unavail. While reading lots of forum posts some 3-4 hours gone by and now when I run "zpool import" the pool looks like this
Code:
root@ODIN[~]# zpool import pool: DataVault1 id: 151585561031890332 state: ONLINE status: One or more devices were being resilvered. action: The pool can be imported using its name or numeric identifier. config: DataVault1 ONLINE mirror-0 ONLINE gptid/8e3aabdb-8899-11eb-b080-b42e9949088d ONLINE gptid/8ea1f970-8899-11eb-b080-b42e9949088d ONLINE mirror-1 ONLINE gptid/8ed779a6-8899-11eb-b080-b42e9949088d ONLINE gptid/8f0162db-8899-11eb-b080-b42e9949088d ONLINE mirror-2 ONLINE gptid/8c1f3017-8899-11eb-b080-b42e9949088d ONLINE gptid/8d8996d3-8899-11eb-b080-b42e9949088d ONLINE root@ODIN[~]# zpool import -f 151585561031890332 cannot import 'DataVault1': one or more devices is currently unavailable
AS you can see all drives are now ONLINE.....I'm very veery confused.
But running the import command on that pool says that devices are unavailable.
I'm a bit desperate and cannot comprehend what is happening here so I would really much appreciate all the help.
PS: I know the saying about making multiple backups. I have in mind a replica of this server for redundancy but right now I cannot afford to build another 48TB NAS.
PS 2: Ask me anything that I need to provide more, I know my way pretty good around computers but I am really new at this TrueNAS thing :D