Uncorrectable I/O failure and disaster recovery

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Try not doing more than one thing at a time with the drives. What I mean is if you are doing a scrub, don't do a SMART test or access your data. Let the scrub finish. The same goes for any other operation, let it finish. The SMR drives you have installed while reading is finem however writing is a problem once the drive has been completely written to at least one time thus requiring a lot of reading an rewriting of data.

I just pulled cupcakes out of the oven, damn they smell good. Yes, a man that bakes, once in a blue moon.
 

ghost_za

Dabbler
Joined
Oct 13, 2021
Messages
42
Try not doing more than one thing at a time with the drives. What I mean is if you are doing a scrub, don't do a SMART test or access your data. Let the scrub finish. The same goes for any other operation, let it finish. The SMR drives you have installed while reading is finem however writing is a problem once the drive has been completely written to at least one time thus requiring a lot of reading an rewriting of data.

You're misunderstanding.
The 2TiB that was scrubbing is in the freenas. The 3TB that I was running a smart on was in a seperate pc running a smart on the drive and once that is done I will attempt a data recovery on the data of the 3TB.

But it's fun here in the land of poverty so I have to get the failing 2TB back in a windows PC so I can backup the 3TB's data to that one...

Fun times!! :wink:

*** EDIT: *** I made cupcakes/muffins yesterday. those were also delightful lol

*** EDIT: *** I tested both drives on 2 sata cables just in case but I used a third one just now to just rule out the absolute obvious.... It seems like it's running beter now - I will just legit straight up kick my self it this was all caused by faulty sata cables. Oh and to state the obvious - I rebooted the truenas due to the drive detaching ( logs were posted )
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
Code:
Oct  9 17:06:48 truenas ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
Oct  9 17:06:48 truenas ada3: <WDC WD20EARX-00PASB0 51.0AB51> s/n WD-WCAZAJ631126 detached
Oct  9 17:06:48 truenas GEOM_MIRROR: Device swap2: provider ada3p1 disconnected.
Oct  9 17:37:03 truenas 1 2023-10-09T15:37:03.039502+00:00 truenas.vs.lan devd 369 - - notify_clients: send() failed; dropping unresponsive client


This is causing high cpu load. it's this service that is causing the high CPU usage.

What is this?

PS: there are lots of logs like that send() failed
You may want to see if you can remove the swap partition on any SMR drive. Even if you are not swapping, (aka using pagefile in MS-Windows speak for others reading this), that is just one more thing that could occasionally be causing trouble. TrueNAS Core probably does a sanity check of swap at some point, (in the GEOM mirroring software). And if the drive is busy doing it's lame SMR thing, then that has absolutely impacted your system, as the error message shows.

As for how to remove / prevent swap from being on your SMR drives, sorry I don't have the answer. But, it is WORTH investigating.
 
Top