Disk confusion

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Thanks for posting your system specs. Listing the hard drive make/models would be a good touch, if not in your system spec area, at least in this thread since we are dealing with a drive issue.

@joeschmuck , except for the USB, I use a controller in IT passthrough mode, so it should be able to grab SMART data, isn't it?
I would think so. What I do not know is if the firmware you have on the Dell board is the correct firmware. This is due to my ignorance, the firmware might be perfectly fine. I'd think that it's a case of either it works or it doesn't work at all. Here is a link to read that might help you figure out if the correct firmware is installed.

But I see you have an Enclosure and this might be causing you issues with not being able to run the smartctl command. What I'm perplexed about is that the onboard SATA connected drives, you can't get SMART data form them either. Maybe knowing the drive make/models will help? I wish I could pull a rabbit out of my hat and tell you exactly what is going on, but I'm just not that smart.

Let's get back to your original problem... This isn't a thread about my script not working.
1. If you haven't already done so, run a scrub on the pool 'Main' as follows zpool scrub Main and that will take less than 4 hours to complete.
2. Next check the scrub status zpool status Main and look for specifically this line:
" scan: scrub repaired 0B in 03:36:33 with 0 errors on Sun Mar 12 04:36:34 2023" and of course the date/time will be different. We are focused on "0B" and "0 errors".
3. If you have any error listed, I suspect you will have a CKSUM error, then run this command to clear it zpool clear Main then repeat step 2 and verify the errors are gone.
4. Drive da18 and da20 could be failing. Setup in the GUI a SMART Long/Extended Test to run every evening to test the drives. It may sound excessive but the drives should be able to handle the testing or fail. Do this for about a week. See if the reports happen again. If they do (I suspect they will) then replace these drives. If the error messages go away, good news. OR you could just replace those two drives now.

Some advice: If you have the opportunity to rebuild your vdevs/pools in a RAIDZ2 format, this is a better use of a Hot Spare. It also allows you a two disk failure at the same time.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
@Davvo , I updated my profile with the complete TrueNAS system specs. Also, I did the clear and scrub. We should see it the errors come back. If so, I will update this post.
Ok, let us know. Also, be advised that realtek NICs are problematic.
 
Top