Hello,
I've read all the posts in this forum (they weren't a few), as well as quite a few posts on this topic in other forums, and I still can't figure out what I should be looking at.
I get warnings on 2 of my SSDs:
Device: /dev/sdi [SAT], 2 Currently unreadable (pending) sectors.
Device: /dev/sde [SAT], 1 Currently unreadable (pending) sectors.
In the web GUI in disks S.M.A.R.T Test Results I see Remaining: N/A; Lifetime: 894; Error: N/A (Lifetime is different for each disk).
Since I'm a simple home user, I do what the experts say:
1 - I run the SMART test manually:
2 - look for errors in the output:
And here I was already very confused. Everywhere it says No Errors Logged and Completed without error.
I made the server on Christmas:
ProLiant ML350e Gen8 v2
HP H220 (SAS2308_1(D1) 20.00.07.00 14.01.30.16 07.39.02.00)
2*Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
72GB RAM
8 brand new SSDs for date and 1 for boot
TrueNAS-SCALE-22.12.0
And let's move on to the questions asking for help:
1-What should I do about the Currently unreadable (pending) sectors problem?
2-Which of all the information from the SMART test should I watch and follow over time?
3-This is a suggestion to the developers. Is it possible for the OS to read all the information after it is run as a task in the S.M.A.R.T Test Results in the web GUI so that we simple users can see a simplified result but with everything important? And to add a "button" to "fix", if possible, the problem.
Many users know how to use shell, but also many do not. I'm throwing in some codes I found on the forums and hope they work for me too. It took me a while to figure out that some of them I can't run as admin and have to log in as root.
Sorry for my english and thanks for your time!
I've read all the posts in this forum (they weren't a few), as well as quite a few posts on this topic in other forums, and I still can't figure out what I should be looking at.
I get warnings on 2 of my SSDs:
Device: /dev/sdi [SAT], 2 Currently unreadable (pending) sectors.
Device: /dev/sde [SAT], 1 Currently unreadable (pending) sectors.
In the web GUI in disks S.M.A.R.T Test Results I see Remaining: N/A; Lifetime: 894; Error: N/A (Lifetime is different for each disk).
Since I'm a simple home user, I do what the experts say:
1 - I run the SMART test manually:
smartctl -t long /dev/sdi
and smartctl -t long /dev/sde
2 - look for errors in the output:
smartctl -a /dev/sdi
and smartctl -a /dev/sde
Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: SPCC Solid State Disk Serial Number: 30083939530 LU WWN Device Id: 5 000000 000003061 Firmware Version: 030fAA20 User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic, zeroed Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4 (minor revision not indicated) SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Feb 15 16:51:37 2023 EET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 33) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 2) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x0031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 20 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 1405 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 92 167 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 169 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 0 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 114 171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 174 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 0 177 Wear_Leveling_Count 0x0012 100 100 000 Old_age Always - 0 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 000 Pre-fail Always - 114 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 000 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 92 194 Temperature_Celsius 0x0022 040 040 000 Old_age Always - 40 (Min/Max 40/40) 196 Reallocated_Event_Count 0x0012 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 206 Unknown_SSD_Attribute 0x0032 200 200 000 Old_age Always - 7 207 Unknown_SSD_Attribute 0x0032 200 200 000 Old_age Always - 72 208 Unknown_SSD_Attribute 0x0032 200 200 000 Old_age Always - 27 231 Unknown_SSD_Attribute 0x0023 098 098 005 Pre-fail Always - 2 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 11327 234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 17780 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 1857 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 8147 243 Unknown_Attribute 0x0032 050 050 000 Old_age Always - 38 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1402 - # 2 Short offline Completed without error 00% 1343 - # 3 Extended offline Completed without error 00% 1295 - # 4 Extended offline Completed without error 00% 1228 - # 5 Short offline Completed without error 00% 1215 - # 6 Extended offline Completed without error 00% 1108 - # 7 Short offline Completed without error 00% 1060 - # 8 Short offline Completed without error 00% 894 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 128 0 65535 Read_scanning was completed without error Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
Code:
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Wed Feb 15 16:57:47 2023 EET Use smartctl -X to abort test. root@truenas[~]# smartctl -a /dev/sde smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build) Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: SPCC Solid State Disk Serial Number: 30083933959 LU WWN Device Id: 5 000000 000001202 Firmware Version: 030fAA20 User Capacity: 512,110,190,592 bytes [512 GB] Sector Size: 512 bytes logical/physical Rotation Rate: Solid State Device Form Factor: 2.5 inches TRIM Command: Available, deterministic, zeroed Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-4 (minor revision not indicated) SATA Version is: SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Feb 15 16:58:10 2023 EET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x02) Offline data collection activity was completed without error. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 33) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 2) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x0031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 20 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0032 100 100 050 Old_age Always - 0 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 1393 12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 90 167 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0 169 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 0 170 Unknown_Attribute 0x0033 100 100 010 Pre-fail Always - 114 171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 174 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 175 Program_Fail_Count_Chip 0x0033 100 100 010 Pre-fail Always - 0 177 Wear_Leveling_Count 0x0012 100 100 000 Old_age Always - 0 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 100 100 000 Pre-fail Always - 114 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0033 100 100 090 Pre-fail Always - 0 187 Reported_Uncorrect 0x0032 100 000 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 90 194 Temperature_Celsius 0x0022 040 040 000 Old_age Always - 40 (Min/Max 40/40) 196 Reallocated_Event_Count 0x0012 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 199 UDMA_CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0 206 Unknown_SSD_Attribute 0x0032 200 200 000 Old_age Always - 7 207 Unknown_SSD_Attribute 0x0032 200 200 000 Old_age Always - 63 208 Unknown_SSD_Attribute 0x0032 200 200 000 Old_age Always - 25 231 Unknown_SSD_Attribute 0x0023 098 098 005 Pre-fail Always - 2 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age Always - 8492 234 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 10539 241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 1899 242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 7407 243 Unknown_Attribute 0x0032 050 050 000 Old_age Always - 37 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1393 - # 2 Short offline Completed without error 00% 1331 - # 3 Extended offline Completed without error 00% 1283 - # 4 Extended offline Completed without error 00% 1276 - # 5 Short offline Completed without error 00% 1203 - # 6 Extended offline Completed without error 00% 1096 - # 7 Short offline Completed without error 00% 1048 - # 8 Extended offline Completed without error 00% 1014 - # 9 Extended offline Completed without error 00% 1014 - #10 Short offline Completed without error 00% 882 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing 128 0 65535 Read_scanning was completed without error Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
And here I was already very confused. Everywhere it says No Errors Logged and Completed without error.
I made the server on Christmas:
ProLiant ML350e Gen8 v2
HP H220 (SAS2308_1(D1) 20.00.07.00 14.01.30.16 07.39.02.00)
2*Intel(R) Xeon(R) CPU E5-2450 v2 @ 2.50GHz
72GB RAM
8 brand new SSDs for date and 1 for boot
TrueNAS-SCALE-22.12.0
And let's move on to the questions asking for help:
1-What should I do about the Currently unreadable (pending) sectors problem?
2-Which of all the information from the SMART test should I watch and follow over time?
3-This is a suggestion to the developers. Is it possible for the OS to read all the information after it is run as a task in the S.M.A.R.T Test Results in the web GUI so that we simple users can see a simplified result but with everything important? And to add a "button" to "fix", if possible, the problem.
Many users know how to use shell, but also many do not. I'm throwing in some codes I found on the forums and hope they work for me too. It took me a while to figure out that some of them I can't run as admin and have to log in as root.
Sorry for my english and thanks for your time!