Resilvering drive and faulted and removed drives Truenas TrueNAS-13.0-U3.1

Pctravel

Dabbler
Joined
Aug 11, 2018
Messages
26
I'm running Truenas 13 as described in the subject on an asus p9c-d server board with a PIKE card with RaidZ2 on 6 Ultrastar HE8 HUH728080ALE604 drives. I can still access as of now the data but obviously this is an issue.

I had one of my drives error out on SMART as low helium level so I added my extra drive, selected replace, selected the new disk and resilvering started as normal. It said all finished but during the night I had a critical alert and it faulted out one drive and removed the other as illustrated below. The other odd thing is under Disks I'm still seeing the drive that was supposed to be replaced with the resilver as well as the replacement disk and under pool status I'm seeing two da1 as shown below. Additionally I have currently 7 disks running but under Disk it only shows 5 disks? I have another replacement drive but no idea how to best proceed here. I did reboot once but didn't want to do anything else without some advice.

This system started with Freenas and migrated fine to Truenas. It has been running fine for years now and have replaced other drives with no issues before and swapped all 4tb drives with these 8TB drives fine. The two drives that were removed and faulted by process of elimination of the serials on the disk menu are my top two drives and I suspect they could have overheated during the resilver?? They are at the top of my server and no fan directly blowing on them (plan to change that), also maybe my pike card, I'm also going to setup a fan to cool that but can someone suggest the best course here?

The alert:

The following devices are not healthy:
  • Disk ATA HGST HUH728080AL VKG7PK5X is FAULTED
  • Disk 13548086041729960873 is REMOVED

1684281697074.png
1684281738467.png
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
you can probably see why da1 was removed with dmesg | grep da1

Likely that it was failing to read or write in time at some point... maybe due to overheating or whatever, so certainly look into that.

A reboot is likely to see that drive come back online and if the damage wasn't permanent/fatal, maybe will be fine.

Did you burn in the new drive first?

I would suggest that you ensure you have any important data from that pool saved elsewhere (backed up) and then reboot.

You may find the disks situation resolves itself in terms of not showing all, but it's highly concerning that it's showing da1 in the pool twice.

Is the pike card flashed in IT mode?
 

Pctravel

Dabbler
Joined
Aug 11, 2018
Messages
26
you can probably see why da1 was removed with dmesg | grep da1

Likely that it was failing to read or write in time at some point... maybe due to overheating or whatever, so certainly look into that.

A reboot is likely to see that drive come back online and if the damage wasn't permanent/fatal, maybe will be fine.

Did you burn in the new drive first?

I would suggest that you ensure you have any important data from that pool saved elsewhere (backed up) and then reboot.

You may find the disks situation resolves itself in terms of not showing all, but it's highly concerning that it's showing da1 in the pool twice.

Is the pike card flashed in IT mode?
I did a full scan of the drive before installing it and it checked out. Overnight it dropped all the drives. I do have the PIKE in IT mode but I'm not sure I saw the bios on the last boot but i can see in the scrolling boot it's "syncing" all drives but they don't show at all any more when it comes up. Also got a new alert stating syslog_ng is not running. I was going to pull everthing and reseat everything. Also ordered a new PSU...argh. I did already copy what I needed but hope I can get it running again.
 

Pctravel

Dabbler
Joined
Aug 11, 2018
Messages
26
If my pike card died is there an issue with using a pcie sata card with the onboard 6gbs ports native to the board?? Needs to be LSI chip set. I'm assuming my PIKE 2008 card is dead. Truenas is booting up but zero disks are showing now.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If my pike card died is there an issue with using a pcie sata card with the onboard 6gbs ports native to the board??
ZFS doesn't care which kind of (good) SATA controller you use. Onboard is usually fine.
 

Pctravel

Dabbler
Joined
Aug 11, 2018
Messages
26
So looks like the issue was the PIKE2008 controller. Pulled it, attached 5 drives and my boot drive to the native 3gbs and 6gbs ports on the motherboard, plugged two in an asm1064+ JMB575 chipset PCI x1 board. Board has 8 ports available but only using two. Figured native ports were much better. Truenas recognized all drives at power on. Pool came back up, all good. Two drives showed 2 checksum errors but I think that's probably linked to the dying PIKE. Running long SMART test and will clear them if that comes back successful. Not adding any more errors other than those two. Seems weird the controller died but glad it's back up.
 
Top