The short version:
I did a long test on these drives, but got no obvious (to novice me...) errors. Perhaps I'm missing something and/or don't know how to read the results.
The longer version:
I recently built two identical FreeNAS servers.
-HP DL380e G8, LFF x12 on SAS2 backplane w/ dual connectors
-96GB RAM
-Currently six 4Tb HGST SATA drives (in the server itself). Soon to be replaced by all Seagate Exos 10Tb SATA drives.
-Currently a pair of HP/LSI9217-4i4e HBA's
- - Have tried previous cards, like the nearly identical 9207-4i4e, along with others including HP P822 in HBA mode.
- - Current setup on 9217 HBA's is that the internal port of both cards is connected to the server's SAS2 backplane. The external port of both cards is plugged into a HP D2700 SFF x25 SAS enclosure.
- - - This enclosure currently has 6 Crucial MX500 1000Gb SSD's, two of which have the "1 Currently unreadable (pending) sectors" message.
- - - When I tried using a single card with two internal ports, both connected to the backplane, and a single card with two external ports, both connected to a different controller on same the enclosure, FreeNAS (and Windows when I had it installed for testing, and even HP's RAID/HBA bootable utility) didn't seem to notice or care (let alone use for extra bandwidth) about the second port connections unless the primary one failed. Since I don't seem to be getting extra bandwidth/speed, I figure I'll just use dual HBA's over the same number of internal & external ports, and at least gain controller redundancy (though not full multipathing due to SATA drives instead of SAS drives).
Some caveats:
-The HGST drives are mostly new or low usage drives.
-The Crucial SSD's are brand new, which is why I find so many questionable drives errors to be suspicious.
-Neither of these servers are in production or have data on them, yet. I can wipe and/or rebuild the pools or even systems as needed, though time is becoming 'of the essence'.
-When I was trying the previous 9207 HBA's, I got many "Currently unreadable (pending) sectors" errors across random drives. When I changed the cards (to 9217) on both servers and wiped the drives, most of these went away and have yet to return. I suspected something funny about those specific cards.
-The 9207's & 9217's are using the 19 (as opposed to 20) IT-mode firmware. FreeNAS doesn't seem to mind. The reason I'm not using the latest version 20 (from Broadcom's website) is that during the POST sequence of both servers, the HBA would say something to the effect of 'Press Ctrl C to enter the SAS BIOS config'. Doing so with v20 would crash the server with an MNI fault. V19 does not have this issue.
-My experience with hardware and FreeNAS in general (largely limited to the web GUI) is decent, but my understanding of of SSH, and the underlying OS is quite novice. That said, I figured out how to get these outputs, and I've seen other posts about this error, but none mention on one error per drive. I'm also not sure what to make of the lack of obvious error counts.
-I read one thread (can't find it now) that suggested this could be a false positive caused by having SMART tests and Pool Scrubs scheduled to closely together. Previously, I had a short test scheduled around midnight for ALL drives (Mechanical, SSD, and all SSD SLOG & cache drives), plus a once-a-week pool scrubs (one pool for mechanical data dvDevs, the other pool for SSD data vDevs) starting around 1am. Perhaps these need to be seperated much more. I saw cyberjock's "Scrub and SMART testing schedules" article after this point.
Long test output of da0:
Long test output of da2:
Thanks in advance.
-Sam
I did a long test on these drives, but got no obvious (to novice me...) errors. Perhaps I'm missing something and/or don't know how to read the results.
The longer version:
I recently built two identical FreeNAS servers.
-HP DL380e G8, LFF x12 on SAS2 backplane w/ dual connectors
-96GB RAM
-Currently six 4Tb HGST SATA drives (in the server itself). Soon to be replaced by all Seagate Exos 10Tb SATA drives.
-Currently a pair of HP/LSI9217-4i4e HBA's
- - Have tried previous cards, like the nearly identical 9207-4i4e, along with others including HP P822 in HBA mode.
- - Current setup on 9217 HBA's is that the internal port of both cards is connected to the server's SAS2 backplane. The external port of both cards is plugged into a HP D2700 SFF x25 SAS enclosure.
- - - This enclosure currently has 6 Crucial MX500 1000Gb SSD's, two of which have the "1 Currently unreadable (pending) sectors" message.
- - - When I tried using a single card with two internal ports, both connected to the backplane, and a single card with two external ports, both connected to a different controller on same the enclosure, FreeNAS (and Windows when I had it installed for testing, and even HP's RAID/HBA bootable utility) didn't seem to notice or care (let alone use for extra bandwidth) about the second port connections unless the primary one failed. Since I don't seem to be getting extra bandwidth/speed, I figure I'll just use dual HBA's over the same number of internal & external ports, and at least gain controller redundancy (though not full multipathing due to SATA drives instead of SAS drives).
Some caveats:
-The HGST drives are mostly new or low usage drives.
-The Crucial SSD's are brand new, which is why I find so many questionable drives errors to be suspicious.
-Neither of these servers are in production or have data on them, yet. I can wipe and/or rebuild the pools or even systems as needed, though time is becoming 'of the essence'.
-When I was trying the previous 9207 HBA's, I got many "Currently unreadable (pending) sectors" errors across random drives. When I changed the cards (to 9217) on both servers and wiped the drives, most of these went away and have yet to return. I suspected something funny about those specific cards.
-The 9207's & 9217's are using the 19 (as opposed to 20) IT-mode firmware. FreeNAS doesn't seem to mind. The reason I'm not using the latest version 20 (from Broadcom's website) is that during the POST sequence of both servers, the HBA would say something to the effect of 'Press Ctrl C to enter the SAS BIOS config'. Doing so with v20 would crash the server with an MNI fault. V19 does not have this issue.
-My experience with hardware and FreeNAS in general (largely limited to the web GUI) is decent, but my understanding of of SSH, and the underlying OS is quite novice. That said, I figured out how to get these outputs, and I've seen other posts about this error, but none mention on one error per drive. I'm also not sure what to make of the lack of obvious error counts.
-I read one thread (can't find it now) that suggested this could be a false positive caused by having SMART tests and Pool Scrubs scheduled to closely together. Previously, I had a short test scheduled around midnight for ALL drives (Mechanical, SSD, and all SSD SLOG & cache drives), plus a once-a-week pool scrubs (one pool for mechanical data dvDevs, the other pool for SSD data vDevs) starting around 1am. Perhaps these need to be seperated much more. I saw cyberjock's "Scrub and SMART testing schedules" article after this point.
Long test output of da0:
Code:
root@FreeNAS02[~]# smartctl -a /dev/da0 smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: CT1000MX500SSD1 Serial Number: 1852E1E0B169 LU WWN Device Id: 5 00a075 1e1e0b169 Firmware Version: M3CR023 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Tue Jun 4 15:43:33 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x0031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0032 100 100 010 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 820 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 53 171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 173 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 2 174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 17 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 45 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 071 055 000 Old_age Always - 29 (Min/Max 0/45) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 202 Unknown_SSD_Attribute 0x0030 100 100 001 Old_age Offline - 0 206 Unknown_SSD_Attribute 0x000e 100 100 000 Old_age Always - 0 210 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 246 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4312042892 247 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 68557416 248 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 43355933 SMART Error Log Version: 1 Invalid Error Log index = 0x0d (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5) SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 819 - # 2 Short offline Completed without error 00% 780 - # 3 Short offline Completed without error 00% 756 - # 4 Short offline Completed without error 00% 732 - # 5 Short offline Completed without error 00% 708 - # 6 Short offline Completed without error 00% 683 - # 7 Short offline Completed without error 00% 658 - # 8 Short offline Completed without error 00% 633 - # 9 Short offline Completed without error 00% 607 - #10 Short offline Completed without error 00% 582 - #11 Short offline Completed without error 00% 557 - #12 Extended offline Completed without error 00% 538 - #13 Extended offline Completed without error 00% 534 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@FreeNAS02[~]# >.... # 5 Short offline Completed without error 00% 708 - # 6 Short offline Completed without error 00% 683 - # 7 Short offline Completed without error 00% 658 - # 8 Short offline Completed without error 00% 633 - # 9 Short offline Completed without error 00% 607 - #10 Short offline Completed without error 00% 582 - #11 Short offline Completed without error 00% 557 - #12 Extended offline Completed without error 00% 538 - #13 Extended offline Completed without error 00% 534 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@FreeNAS02[~]#
Long test output of da2:
Code:
root@FreeNAS02[~]# smartctl -a /dev/da2 smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Device Model: CT1000MX500SSD1 Serial Number: 1852E1E040A1 LU WWN Device Id: 5 00a075 1e1e040a1 Firmware Version: M3CR023 User Capacity: 1,000,204,886,016 bytes [1.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: Solid State Device Form Factor: 2.5 inches Device is: Not in smartctl database [for details use: -P showall] ATA Version is: ACS-3 T13/2161-D revision 5 SATA Version is: SATA 3.3, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Tue Jun 4 15:46:04 2019 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 0) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. Conveyance self-test routine recommended polling time: ( 2) minutes. SCT capabilities: (0x0031) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 0 5 Reallocated_Sector_Ct 0x0032 100 100 010 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 836 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 46 171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 173 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 2 174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 17 180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 42 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 065 056 000 Old_age Always - 35 (Min/Max 0/44) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0 202 Unknown_SSD_Attribute 0x0030 100 100 001 Old_age Offline - 0 206 Unknown_SSD_Attribute 0x000e 100 100 000 Old_age Always - 0 210 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0 246 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4312202524 247 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 68559898 248 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 33340769 SMART Error Log Version: 1 Invalid Error Log index = 0x0d (T13/1321D rev 1c Section 8.41.6.8.2.2 gives valid range from 1 to 5) SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 835 - # 2 Short offline Completed without error 00% 796 - # 3 Short offline Completed without error 00% 772 - # 4 Short offline Completed without error 00% 748 - # 5 Short offline Completed without error 00% 723 - # 6 Short offline Completed without error 00% 698 - # 7 Short offline Completed without error 00% 672 - # 8 Short offline Completed without error 00% 646 - # 9 Short offline Completed without error 00% 620 - #10 Short offline Completed without error 00% 594 - #11 Short offline Completed without error 00% 568 - #12 Extended offline Completed without error 00% 549 - #13 Extended offline Completed without error 00% 544 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. root@FreeNAS02[~]#
Thanks in advance.
-Sam