One or more devices has experienced an unrecoverable error

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
Hello!

I'm receiving this message everyday in my inbox and wondered if there's anything to do or my SSD is just giving up on life.

Those were 250GB and they're already full anyway so I already needed to change them anytime soon.

ZFS has finished a scrub:

eid: 4211
class: scrub_finish
host: truenas
time: 2023-06-21 00:07:25-0400
pool: Pollen
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 96K in 00:07:23 with 0 errors on Wed Jun 21 00:07:25 2023
config:

NAME STATE READ WRITE CKSUM
Pollen ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
20c3c53e-646f-4826-a3ec-21010927584d ONLINE 0 0 2
9bd6c8d1-f378-4598-a358-0e83c7b1226d ONLINE 0 0 0
 
Joined
Jan 7, 2015
Messages
1,155
Hard to say, but normally a checksum error could be cable related, but only generally if you have been in the case, new build, lots of power splitters etc.. If its showing up after many years after being rock solid id say its about time to replace or at very least have a couple new larger disks on standby. Resilver the failing disk to a larger ssd, and once its done replace the other one with a larger ssd to realize the new space.
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
I don't have any power splitters although I played in the server in the past month to troubleshoot my mid bay. Still, chances are that I bought it from eBay and they sold the drive close to the end of it's life.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The SMART data should answer your questions as to if you have EOL, examine the Wear Level (Back in April is was at 92% so I doubt that is it, also the power on hours are actually low at 6276). Also generally UDMA_CRC errors are generated for data cable communication errors. You do have a high power cycle count which looks at face value to be cycling power slightly more than half the power on hours. If you are powering this server off frequently, this could cause the issue. But I don't know the history of this SSD, it could have been in another system that had it's power cycled a lot, such as a laptop drive.

It is troubling that you are getting these every day. Does this mean that you have a Scrub set to run every day/night?

If you are not cycling power often then I would system the SSD as well, once you have ruled everything else out. But it's an 850 Pro, it should be pretty solid.

If the pool is full, I suspect that could lend to a problem, but you wouldn't think TrueNAS would allow any data corruption no matter how full the pool was, it would just warn the hell out of you.

Hope you solve the issue.
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
The SMART data should answer your questions as to if you have EOL, examine the Wear Level (Back in April is was at 92% so I doubt that is it, also the power on hours are actually low at 6276). Also generally UDMA_CRC errors are generated for data cable communication errors. You do have a high power cycle count which looks at face value to be cycling power slightly more than half the power on hours. If you are powering this server off frequently, this could cause the issue. But I don't know the history of this SSD, it could have been in another system that had it's power cycled a lot, such as a laptop drive.

It is troubling that you are getting these every day. Does this mean that you have a Scrub set to run every day/night?

If you are not cycling power often then I would system the SSD as well, once you have ruled everything else out. But it's an 850 Pro, it should be pretty solid.

If the pool is full, I suspect that could lend to a problem, but you wouldn't think TrueNAS would allow any data corruption no matter how full the pool was, it would just warn the hell out of you.

Hope you solve the issue.

Hey Joe, always happy to read your help!

The smart data seems effectively good regarding those drives although this data can be reset or spoofed. Since one id from eBay this could be the case.

I'm not exactly sure what amount of shutdowns in a month is considering powering off a server frequently represent. I had to the troubleshoot the mid bay in the past few weeks and I needed to reinstall and do multiple reboots. TechMikeNY sent me a new mid bay and cables for it since they think one or the other is faulty.

I do have scrub tasks set to run every night. It was recommended in a tutorial when I've first setup the server.

I'm also getting the TrueNAS messagess about the pool having reached 80% of it's capacity. At this point I'm only running 14 days of backups for my apps and I would like to keep much more than that, more like a month. Some users gave me the advice that 256gb for the apps pool would be enough well, not in my use case. I'm looking at 1TB SSDs now and will upgrade asap.

This will fix the issue from this topic but we'll never know about the 850 drives!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I hope the problem goes away when the drives are replaced, just so the problem is fixed quickly. If not, maybe that new mid bay or a cable is causing it.

Best of luck and please post when you find the failed component.
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
I hope the problem goes away when the drives are replaced, just so the problem is fixed quickly. If not, maybe that new mid bay or a cable is causing it.

Best of luck and please post when you find the failed component.

Yes I'll update once I know more. I haven't received the new mid bay and cables yet. They're stuck at customs!
 
Joined
Jan 7, 2015
Messages
1,155
I do have scrub tasks set to run every night. It was recommended in a tutorial when I've first setup the server.

I'm also getting the TrueNAS messagess about the pool having reached 80% of it's capacity. At this point I'm only running 14 days of backups for my apps and I would like to keep much more than that, more like a month. Some users gave me the advice that 256gb for the apps pool would be enough well, not in my use case. I'm looking at 1TB SSDs now and will upgrade asap.

I think a scrub on small SSD disks every night could be a bit much but is also totally ok. Its kind of what they do. I just roll monthly. Old unneeded snapshots with lots of changing data such is the case with an "app" pool, could use alot of space too. As cheap as disks are now put TBs in it, will last until they are about to fail and be full; then put 2TBs, etc.. Its the play, especially if you want to go out a month.

Also if you do suspect something environmental, you should try to do a zpool clear POOL as stated in the error message and see if it returns in subsequent scrubs. Because it certainly would appear things arent in that bad of shape other than space issues.
 
Last edited:

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
I've found these at a very good price. I think it'll be hard to beat and will be able to hold on a ton of backups for my app dataset. If anyone's interested they're $74 USD for 1.92tb which is pretty darn cheap.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I've found these at a very good price. I think it'll be hard to beat and will be able to hold on a ton of backups for my app dataset. If anyone's interested they're $74 USD for 1.92tb which is pretty darn cheap.
My only questions are, What is Seller Refurbished? What is the wear level of these drives? If there are just drives pulled from running systems, I can deal with that, but if the wear level is 10% (90% used) then I don't see this as a good deal. I know others have purchased from this retailer and I suspect they are reputable, but wear level is important here, not so much that they will last 90 days. Maybe they can specify the wear level if asked.
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
My only questions are, What is Seller Refurbished? What is the wear level of these drives? If there are just drives pulled from running systems, I can deal with that, but if the wear level is 10% (90% used) then I don't see this as a good deal. I know others have purchased from this retailer and I suspect they are reputable, but wear level is important here, not so much that they will last 90 days. Maybe they can specify the wear level if asked.
It's weird it's not written on their website but usually you can call them and get the answer. They have stellar phone support. As for this drive on the eBay description it's stating 95-97% of wear usage which is practically new. I usually don't trust eBay sellers claiming stuff, but I do trust them. I've bought 16x drives from them this year and they all survived the burnin script and they're still rocking.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I might look to them the next time I need to replace/upgrade my pool drives to SSD. I don't need super fast, just small and light would be a great benefit. And less power consumption is nice too.
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
I might look to them the next time I need to replace/upgrade my pool drives to SSD. I don't need super fast, just small and light would be a great benefit. And less power consumption is nice too.
Yeah they're very good at this price and it's great to have plenty of backups of your app and then replicate that pool to the main data pool.
 
Joined
Oct 22, 2019
Messages
3,641
What is Seller Refurbished? What is the wear level of these drives?
At $75 for a 2-TiB Enterprise SSD, my guess is "Seller Refurbished" means it's a dead or failing drive where they buffed out any scratches and scuffs on the plastic exterior. :tongue:
 

CheeryFlame

Contributor
Joined
Nov 21, 2022
Messages
184
At $75 for a 2-TiB SSD, my guess is "Seller Refurbished" means it's a dead or failing drive where they buffed out any scratches and scuffs on the plastic exterior. :tongue:
That's good, it means more deals for us! But for real, server part deals are legit.
 
Top