What does critical error "Currently unreadable (pending) sectors" mean?

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
I opened up a ticket for this and the devs said to add -C 0 to my SMART options. Did this a week ago and haven't gotten a notification about my MX500s pending sectors since.

You can see https://jira.ixsystems.com/browse/NAS-103155 for reference.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
any updates on this matter? did crucial ever release a firmware to rectify this matter? or did truenas update bypass this issue?
I ended up getting rid of the MX500s a long time ago. Probably in late 2019, so I never figured out a real fix for this. Replaced them with WD Blue SSDs, of which I have a bunch and not a single problem with them for at least a year now.
 
Joined
May 13, 2021
Messages
22
I ended up getting rid of the MX500s a long time ago. Probably in late 2019, so I never figured out a real fix for this. Replaced them with WD Blue SSDs, of which I have a bunch and not a single problem with them for at least a year now.

well there not many good sata ssds at a good budget. the choices are crucial mx500, samsung 860 evo/870 evo, adata su800.

i was wondering if i should go with mx500 and add that fix you mentioned. or just go with the 870 evo.
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
I think the "fix" that I mentioned did work. But, I had like 4 of the MX500's at the time, and was wanting to buy a lot more SSDs, and the idea of buying more with a firmware bug pissed me off too much. So I switched to the WDs like a mentioned. Been getting the 2TB models from between $150-$170 for a while now if I'm patient, mostly on Amazon and Ebay. I personally wouldn't touch the crucial unless I read some release notes with a firmware update that specifically fixed the issue.

From what I recall the Samsungs were also a fine choice, they just cost more than the WDs when I started buying them. Not sure about the adata's.
 
Joined
May 13, 2021
Messages
22
I think the "fix" that I mentioned did work. But, I had like 4 of the MX500's at the time, and was wanting to buy a lot more SSDs, and the idea of buying more with a firmware bug pissed me off too much. So I switched to the WDs like a mentioned. Been getting the 2TB models from between $150-$170 for a while now if I'm patient, mostly on Amazon and Ebay. I personally wouldn't touch the crucial unless I read some release notes with a firmware update that specifically fixed the issue.

From what I recall the Samsungs were also a fine choice, they just cost more than the WDs when I started buying them. Not sure about the adata's.

well this is the current status for sata ssds. none of the legit reviewers i've seen are enthusiastic about sata ssds when for the same price you can get a m.2 nvme with better performance.






but i'm installing truenas on a qnap which only support regular sata ssd using one of the bays.



the reason i was looking at the mx500 is because it had some limited power loss protection for a budget. with decent performance.

WaltR said:
Crucial's MX500 series SATA has Integrated Power Loss Immunity, which is amazing for a $42 250 GB Consumer SSD
honeybadger

Hi WaltR. Unfortunately Micron/Crucial's "power loss immunity" is not the same as protection for data in-flight - the implementation on the Crucial MX consumer drives is only guaranteed to protect data at rest from being corrupted by a new write in progress when power is lost. This is not the same as the end-to-end power-loss-protection offered on enterprise drives, which will protect "pending writes" that are in the drive's DRAM cache.

Thankfully the drive does obey the "flush cache" command properly, and will in fact push its DRAM to NAND when asked; but this is limited by the write speed of the NAND and whether or not there are pages free to be "fast programmed" as SLC, so it will likely be slow.

If you do have one, please feel free to benchmark it using the diskinfo command found in the thread in my signature and provide the results. I imagine the SLC write cache could let it deliver some fairly good numbers for bursty workloads, but sustained writes could cause it to choke.

And with all that said - welcome to the forums.
 
Last edited:

Whiskey

Dabbler
Joined
Jul 10, 2021
Messages
29
well this is the current status for sata ssds. none of the legit reviewers i've seen are enthusiastic about sata ssds when for the same price you can get a m.2 nvme with better performance.






but i'm installing truenas on a qnap which only support regular sata ssd using one of the bays.



the reason i was looking at the mx500 is because it had some limited power loss protection for a budget. with decent performance.
Resurrecting this thread once more :smile:
@Moogle Stiltzkin did you go ahead with the mx500 and if so, can you share your experiences? I'm considering the 4tb model. Looked the the 870 Evo too, but those are €77 more expensive, which is a bit much (especially because I need two at least).
 

MikeyG

Patron
Joined
Dec 8, 2017
Messages
442
Resurrecting this thread once more :smile:
@Moogle Stiltzkin did you go ahead with the mx500 and if so, can you share your experiences? I'm considering the 4tb model. Looked the the 870 Evo too, but those are €77 more expensive, which is a bit much (especially because I need two at least).
You didn't ask me directly, but just in case you want some info, like I mentioned above the MX500s I had were problematic. Replaced with a whole bunch of Western Digital Blue model WDS200T2BOA drives that I've had for 3 years now. Still not a single problem with them.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Definitely avoid the MX500s. I've seen no indication that they ever fixed the TRIM bug that causes crazy write amplification and (on the cosmetic side, which has been worked around by smartmontools) errors to be reported.
 
Joined
May 13, 2021
Messages
22
Resurrecting this thread once more :smile:
@Moogle Stiltzkin did you go ahead with the mx500 and if so, can you share your experiences? I'm considering the 4tb model. Looked the the 870 Evo too, but those are €77 more expensive, which is a bit much (especially because I need two at least).
i use them on my nas(qnap tbs-453dx) which i rarely use. so i don't have extended period of usage to give u an idea how stable these m.2 sata ssds are (crucial mx 500)

but based on other sources online and including here, seems it's a firmware issue.... which is concerning when the issue is ignored and remains unfixed.... so that can't be good , and based on such, you may want to look at a more problem free tlc SSD.

4tb? then u should be extra careful with your choice since those are considered premium for those kinds of capacities. m.2 satas are being phased out these days for m.2 nvmes due to the bottleneck of the former. so unless u are forced to (due to hardware support limitations), best stick to m.2 nvmes (check first whatever ur using supports that format)

my more recent purchase was a 2tb Crucial P5 Plus which i don't have issues with. i use for my desktop pc :}

but there doesn't seem to be a 4tb variant for this? there are other 4tb options u can check on google, just read the reviews before buying ^^
 
Last edited:

Whiskey

Dabbler
Joined
Jul 10, 2021
Messages
29
Thanks all, as always this forum is an absolute treasure. I'll do some further research!
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
For 4TB 2.5" SATA, I feel comfortable recommending the Samsung 870 Evos. I've run a bunch of 850s, 860s and 870s in 250 GB, 500 GB and 4 TB capacities and zero issues so far.

EDIT: See three posts down.
 
Last edited:

homer27081990

Patron
Joined
Aug 9, 2022
Messages
321
Katy P (6/6/2019, 4:44:59 PM): Okay so at the base the error looks to be indicating bad sectors on the drive. It is not uncommon for SSDs to show a number of bad sectors but that doesn't necessarily point to a fault on the drive. You can find our details on why this can happen on our Knowledge Base page here:
Hahaha, yeah, all SSDs do this. Surely, Intel PCIe and KIOXIA enterprise models started it. It's a trend, hahaha
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Well, that was short-lived. Two failing Samsung 870 EVOs (4 TB), on the same machine, on the same pool, with like 25 TB of writes. Not looking good for these.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Well, that was short-lived. Two failing Samsung 870 EVOs (4 TB), on the same machine, on the same pool, with like 25 TB of writes. Not looking good for these.
What is failing on them? Also, did you investigate if this is a known problem with the drive, like maybe a chipset or firmware issue? I'm curious because I tend to buy Samsung SSD's. Also, was this an M.2 or 2.5" form factor? If M.2 did it have a heatsink? Yea, I like the details.

And I agree, it does suck, oly 25TB written, not good at all. Which leads me to the last question, how many hours did it take to fail?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What is failing on them?
Lots of bad data. Some detected by the drive and reported by SMART, tons more silent corruption.
Also, did you investigate if this is a known problem with the drive, like maybe a chipset or firmware issue? I'm curious because I tend to buy Samsung SSD's
Apparently it is! I was just reading up on this, word on the street is that early batches were defective, in an unspecified way.

For reference, SMART data for one of these disks:
Code:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.13.0-40-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 870 EVO 4TB
Serial Number:    S6BCNG0R206683E
LU WWN Device Id: 5 002538 f712073d3
Firmware Version: SVT01B6Q
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Aug 18 16:14:07 2022 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM feature is:   Unavailable
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x53) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 320) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  5 Reallocated_Sector_Ct   PO--CK   098   098   010    -    66
  9 Power_On_Hours          -O--CK   098   098   000    -    8544
 12 Power_Cycle_Count       -O--CK   099   099   000    -    8
177 Wear_Leveling_Count     PO--C-   099   099   000    -    7
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   098   098   010    -    66
181 Program_Fail_Cnt_Total  -O--CK   100   100   010    -    0
182 Erase_Fail_Count_Total  -O--CK   100   100   010    -    0
183 Runtime_Bad_Block       PO--C-   098   098   010    -    66
187 Uncorrectable_Error_Cnt -O--CK   099   099   000    -    6
190 Airflow_Temperature_Cel -O--CK   067   062   000    -    33
195 ECC_Error_Rate          -O-RC-   199   199   000    -    6
199 CRC_Error_Count         -OSRCK   100   100   000    -    0
235 POR_Recovery_Count      -O--C-   099   099   000    -    5
241 Total_LBAs_Written      -O--CK   099   099   000    -    54314432143
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning
                            

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      1  Comprehensive SMART error log
0x03       GPL     R/O      1  Ext. Comprehensive SMART error log
0x04       GPL,SL  R/O      8  Device Statistics log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x13       GPL     R/O      1  SATA NCQ Send and Receive log
0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1           SL  VS      16  Device vendor specific log
0xa5           SL  VS      16  Device vendor specific log
0xce           SL  VS      16  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (1 sectors)
Device Error Count: 6 (device log contains only the most recent 4 errors)
        CR     = Command Register
        FEATR  = Features Register
        COUNT  = Count (was: Sector Count) Register
        LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
        LH     = LBA High (was: Cylinder High) Register    ]   LBA
        LM     = LBA Mid (was: Cylinder Low) Register      ] Register
        LL     = LBA Low (was: Sector Number) Register     ]
        DV     = Device (was: Device/Head) Register
        DC     = Device Control Register
        ER     = Error register
        ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 6 [1] occurred at disk power-on lifetime: 8514 hours (354 days + 18 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 51 00 01 00 00 00 00 00 10 00 00  Error:  at LBA = 0x00000010 = 16

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  60 01 00 00 10 00 00 50 d6 b1 e0 00 02  2d+19:15:28.103  READ FPDMA QUEUED
  60 01 00 00 08 00 00 50 d6 b0 e0 00 01  2d+19:15:28.103  READ FPDMA QUEUED
  60 01 00 00 00 00 00 50 d6 af e0 00 00  2d+19:15:28.103  READ FPDMA QUEUED
  60 01 00 00 08 00 00 50 d6 ae e0 00 01  2d+19:15:28.103  READ FPDMA QUEUED
  60 01 00 00 00 00 00 50 d6 ad e0 00 00  2d+19:15:28.103  READ FPDMA QUEUED

Error 5 [0] occurred at disk power-on lifetime: 8432 hours (351 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 51 00 01 00 00 00 00 00 10 00 00  Error:  at LBA = 0x00000010 = 16

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 10 00 18 00 00 2d 55 2d 30 00 03 49d+02:22:41.792  WRITE FPDMA QUEUED
  60 01 00 00 10 00 00 50 d6 be 60 00 02 49d+02:22:41.792  READ FPDMA QUEUED
  60 01 00 00 08 00 00 50 d6 bd 60 00 01 49d+02:22:41.792  READ FPDMA QUEUED
  60 01 00 00 00 00 00 50 d6 bc 60 00 00 49d+02:22:41.792  READ FPDMA QUEUED
  60 01 00 00 10 00 00 50 d6 bb 60 00 02 49d+02:22:41.792  READ FPDMA QUEUED

Error 4 [3] occurred at disk power-on lifetime: 8432 hours (351 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  00 -- 51 00 01 00 00 00 00 00 10 00 00  Error:  at LBA = 0x00000010 = 16

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 40 00 08 00 00 2d 55 2c 30 00 01 49d+02:22:41.630  WRITE FPDMA QUEUED
  60 00 80 00 00 00 00 50 d6 b1 60 00 00 49d+02:22:41.630  READ FPDMA QUEUED
  60 00 80 00 00 00 00 50 d6 a6 e0 00 00 49d+02:22:41.630  READ FPDMA QUEUED
  60 00 80 00 00 00 00 50 d6 a6 60 00 00 49d+02:22:41.630  READ FPDMA QUEUED
  60 00 80 00 00 00 00 50 d6 a5 e0 00 00 49d+02:22:41.630  READ FPDMA QUEUED

Error 3 [2] occurred at disk power-on lifetime: 8432 hours (351 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 01 00 00 00 00 00 10 00 00  Error: WP at LBA = 0x00000010 = 16

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  61 00 10 00 08 00 00 d1 c0 bc 10 00 01 49d+02:22:41.471  WRITE FPDMA QUEUED
  61 00 10 00 08 00 00 d1 c0 ba 10 00 01 49d+02:22:41.471  WRITE FPDMA QUEUED
  61 00 10 00 08 00 00 00 bf aa 10 00 01 49d+02:22:41.471  WRITE FPDMA QUEUED
  60 00 80 00 00 00 00 50 d6 a4 60 00 00 49d+02:22:41.471  READ FPDMA QUEUED
  60 00 10 00 18 00 00 d1 c0 bc 10 00 03 49d+02:22:41.471  READ FPDMA QUEUED

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%      8543         750470952

----- snip -----

Temperatures are between 29 and 36 degrees Celsius.

I'll need to RMA these soon. Meanwhile, there's a WD Blue 4 TB mirroring the two offenders. Which is a good thing, because:
Code:
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: scrub repaired 304K in 0 days 00:29:27 with 0 errors on Wed Aug 17 10:59:16 2022
config:

        NAME                                              STATE     READ WRITE CKSUM
        rpool                                             ONLINE       0     0     0
          mirror-0                                        ONLINE       0     0     0
            sdb4                                          ONLINE       2     0     4
            sda4                                          ONLINE       9     0    16
            scsi-SATA_WDC_WDS400T2B0A_21011S802094-part4  ONLINE       0     0     0

errors: No known data errors

And it hasn't even been 48 hours since the last zpool clear!
Also, was this an M.2 or 2.5" form factor?
2.5", Samsung seems to not have bothered with an 870 M.2 line.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
And yes, you did just read "scrub repaired 304k"! Phew, they must be full, you say? Nope:
Code:
NAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool  3.62T   668G  2.97T        -         -     6%    17%  1.00x    ONLINE  -

Not even 700 GB stored on these things!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
That is very bad. Think about folks who do not have redundancy or a simple backup, they just loose the data. It's good that you can at least RMA these two. Thanks for posting all that data, but I'm sure it frustrates you having this problem in the first place.

So what will you be replacing these with? The RMA'd replacements?

Yes, I know we have fallen off the original topic.
 
Top