Sudden Checksum Errors on Multiple Drives

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Hello everyone,

I am having an issue where multiple drives have started showing checksum errors within the last 48 hours.

Setup:
  • 12 x 20 TB Seagate Exos X20 SAS drives in a 6 x 2-drive mirror vdev configuration.
  • 2 x 1.6 TB Intel P5800X
  • LSI 9300-8i with SuperMicro SAS backplane
  • Dual Intel Xeon E5-2620 v3 @ 2.40 GHz
  • 24 x 16 GB DDR4 ECC RAM
ZFS Configuration:
  • 6 vdevs in a 2-drive mirror (Seagate 20 TB HDDs)
  • 2 caches (Intel P5800X)
  • Current usage at 75%.
As mention, about 48 hours ago a device showed some checksum errors. I promptly replaced it with a cold spare and started the resilvering process which is currently on track to complete after 4.5 days (!!!!!). While resilvering three additional drives are reporting checksum errors, including one that was replaced for the same reason just a few weeks ago. The core server has been running well for 5+ years, but I did just upgrade the drives from 12 x 16 TB in a 2 vdevs / 6-drive RAID-Z2 to the current 12 x 20 TB in a 6 devs / 2-drive mirror configuration about two months ago.

SMART has not reported any issues.

zpool status:
Code:
# zpool status SG2
  pool: SG2
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
    continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Nov  4 12:43:56 2023
    38.5T scanned at 230M/s, 37.3T issued at 223M/s, 80.8T total
    5.60T resilvered, 46.23% done, 2 days 08:45:00 to go
config:

    NAME                                            STATE     READ WRITE CKSUM
    SG2                                             DEGRADED     0     0     0
      mirror-0                                      DEGRADED     0     0     0
        gptid/334033e5-6138-11ee-930b-000743123c30  DEGRADED     0     0   297  too many errors
        gptid/a90c314e-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
      mirror-1                                      ONLINE       0     0     0
        gptid/a832a9f6-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
        gptid/a828e09d-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
      mirror-2                                      ONLINE       0     0     0
        gptid/a8190bc6-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
        gptid/1713afdb-651b-11ee-930b-000743123c30  ONLINE       0     0     9
      mirror-3                                      DEGRADED     0     0     0
        gptid/a8a0c4ce-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
        gptid/a8c27bbf-442c-11ee-a112-3cecef580736  DEGRADED     0     0   672  too many errors
      mirror-4                                      ONLINE       0     0     0
        gptid/a82139f3-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
        gptid/a7ec34ad-442c-11ee-a112-3cecef580736  ONLINE       0     0     0
      mirror-5                                      DEGRADED     0     0     0
        gptid/a7f4a3d3-442c-11ee-a112-3cecef580736  DEGRADED     0     0   244  too many errors
        gptid/5688f682-7b31-11ee-930b-000743123c30  ONLINE       0     0     0  (resilvering)
    cache
      gptid/bd027deb-4b92-11ee-930b-000743123c30    ONLINE       0     0     0
      gptid/bd5e92c6-4b92-11ee-930b-000743123c30    ONLINE       0     0     0

errors: No known data errors


smartctl -a:
Code:
root@myserver:~ # smartctl -a /dev/da12
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf77fd3
Serial number:        ZVTAYHB80000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:13 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     38 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:29
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  149
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 2910781000
  Blocks received from initiator = 1322596240
  Blocks read from cache and sent to initiator = 2694736860
  Number of read and write commands whose size <= segment size = 210477695
  Number of read and write commands whose size > segment size = 2626177

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.48
  number of minutes until next internal SMART test = 23

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      96048.320           0
write:         0        0         0         0          0      71049.596           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da13
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf78bd3
Serial number:        ZVTAYGHG0000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:17 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     38 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 842:24
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  6
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  38
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 4251837784
  Blocks received from initiator = 1819054296
  Blocks read from cache and sent to initiator = 1607436120
  Number of read and write commands whose size <= segment size = 128029947
  Number of read and write commands whose size > segment size = 1240881

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 842.40
  number of minutes until next internal SMART test = 41

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      30764.244           0
write:         0        0         0         0          0      53709.374           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -     805                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da14
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf911ab
Serial number:        ZVTAYVT70000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:21 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     38 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:26
Manufactured in week 16 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  147
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 1602118672
  Blocks received from initiator = 3962131928
  Blocks read from cache and sent to initiator = 825043650
  Number of read and write commands whose size <= segment size = 200970048
  Number of read and write commands whose size > segment size = 2607053

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.43
  number of minutes until next internal SMART test = 18

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      73388.052           0
write:         0        0         0         0          0      72400.889           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da15
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf87b97
Serial number:        ZVT8F5AX0000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:24 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     36 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:24
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  148
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 1936199296
  Blocks received from initiator = 2216669176
  Blocks read from cache and sent to initiator = 1129056924
  Number of read and write commands whose size <= segment size = 202213693
  Number of read and write commands whose size > segment size = 2601382

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.40
  number of minutes until next internal SMART test = 16

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      73559.103           0
write:         0        0         0         0          0      71507.190           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da16
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf78ea3
Serial number:        ZVTAYGDX0000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:26 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     33 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 723:40
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  34
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 201456488
  Blocks received from initiator = 255642480
  Blocks read from cache and sent to initiator = 1267786971
  Number of read and write commands whose size <= segment size = 126315442
  Number of read and write commands whose size > segment size = 1156435

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 723.67
  number of minutes until next internal SMART test = 12

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      26491.425           0
write:         0        0         0         0          0      59506.068           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -     686                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da17
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf745eb
Serial number:        ZVTAY4450000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:30 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     29 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:29
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  148
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 2111045336
  Blocks received from initiator = 2949025560
  Blocks read from cache and sent to initiator = 2376556045
  Number of read and write commands whose size <= segment size = 207699753
  Number of read and write commands whose size > segment size = 2621357

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.48
  number of minutes until next internal SMART test = 29

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      97837.878           0
write:         0        0         0         0          0      71882.400           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da18
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf754a7
Serial number:        ZVTAY3X20000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:32 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     37 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:28
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  6
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  146
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 2043826408
  Blocks received from initiator = 3994883376
  Blocks read from cache and sent to initiator = 1006833027
  Number of read and write commands whose size <= segment size = 193128078
  Number of read and write commands whose size > segment size = 2552151

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.47
  number of minutes until next internal SMART test = 16

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      69216.160           0
write:         0        0         0         0          0      70218.339           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da19
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf4186f
Serial number:        ZVTAXDN10000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:34 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     38 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:27
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  148
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 4117791248
  Blocks received from initiator = 992147008
  Blocks read from cache and sent to initiator = 923461280
  Number of read and write commands whose size <= segment size = 196632654
  Number of read and write commands whose size > segment size = 2571121

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.45
  number of minutes until next internal SMART test = 17

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      70278.030           0
write:         0        0         0         0          0      70880.192           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da20
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf749df
Serial number:        ZVTAY4070000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:38 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     38 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:28
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  148
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 2718180616
  Blocks received from initiator = 974249776
  Blocks read from cache and sent to initiator = 1088085026
  Number of read and write commands whose size <= segment size = 197617999
  Number of read and write commands whose size > segment size = 2568509

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.47
  number of minutes until next internal SMART test = 22

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      69561.430           0
write:         0        0         0         0          0      70870.866           0

Non-medium error count:        1


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da21
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500daf784ef
Serial number:        ZVTAYGY10000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:41 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     36 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:27
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  5
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  147
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 2861377536
  Blocks received from initiator = 1476096152
  Blocks read from cache and sent to initiator = 937816978
  Number of read and write commands whose size <= segment size = 197297353
  Number of read and write commands whose size > segment size = 2571602

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.45
  number of minutes until next internal SMART test = 21

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      69634.746           0
write:         0        0         0         0          0      71127.864           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1692                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da22
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500da81a09f
Serial number:        ZVT6VLW50000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:44 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     35 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1980:16
Manufactured in week 02 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  7
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  5348
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 1079430840
  Blocks received from initiator = 1356389904
  Blocks read from cache and sent to initiator = 2387780025
  Number of read and write commands whose size <= segment size = 106097233
  Number of read and write commands whose size > segment size = 1442569

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1980.27
  number of minutes until next internal SMART test = 51

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      62125.320           0
write:         0        0         0         0          0      38080.250           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1943                 - [-   -    -]

Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]

root@myserver:~ # smartctl -a /dev/da23
smartctl 7.2 2020-12-30 r5155 [FreeBSD 12.2-RELEASE-p12 amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST20000NM002D
Revision:             E003
Compliance:           SPC-5
User Capacity:        20,000,588,955,648 bytes [20.0 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
LU is fully provisioned
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c500dadc7b2f
Serial number:        ZVTAJPMZ0000xxxxxxxx
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Mon Nov  6 12:32:48 2023 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Grown defects during certification <not available>
Total blocks reassigned during format <not available>
Total new blocks reassigned <not available>
Power on minutes since format <not available>
Current Drive Temperature:     30 C
Drive Trip Temperature:        60 C

Accumulated power on time, hours:minutes 1729:27
Manufactured in week 17 of year 2023
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  7
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  148
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 1749031712
  Blocks received from initiator = 2526037000
  Blocks read from cache and sent to initiator = 1299015920
  Number of read and write commands whose size <= segment size = 178983741
  Number of read and write commands whose size > segment size = 2288797

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 1729.45
  number of minutes until next internal SMART test = 15

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0        0         0         0          0      71264.249           0
write:         0        0         0         0          0      60670.116           0

Non-medium error count:        0


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -    1699                 - [-   -    -]


Long (extended) Self-test duration: 65535 seconds [1092.2 minutes]


How concerned should one be with these errors?
Do I let the resilver complete and do a zpool clear?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Check airflow over the HBA and SAS expander. This is one of the most common failures causing checksum errors, because an HBA with restricted airflow can bake its little brains out.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Not for the 2008, but for the 3008 you should be able to do "mprutil show adapter" or some variation on that. Be aware that these naturally run really warm and that this makes them particularly sensitive to poor airflow conditions, so you really need to pull the server apart and inspect and/or test for airflow. Be aware that with your drive changeout, you may also have increased the temperatures. I believe there was a thread around here discussing the TrueNAS Mini not having certified the 20TB drives due to the increased temperatures they seem to come with.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
The load is quite high at the moment and it's resilvering, and currently 66ºC which seems reasonable to me.
Code:
# mprutil show adapter
mpr0 Adapter:
       Board Name: SAS9300-8i
   Board Assembly: H3-25573-00H
        Chip Name: LSISAS3008
    Chip Revision: ALL
    BIOS Revision: 8.21.00.00
Firmware Revision: 9.00.00.00
  Integrated RAID: no
         SATA NCQ: ENABLED
 PCIe Width/Speed: x8 (8.0 GB/sec)
        IOC Speed: Full
      Temperature: 66 C

PhyNum  CtlrHandle  DevHandle  Disabled  Speed   Min    Max    Device
0       0001        0009       N         12      3.0    12     SAS Initiator
1       0001        0009       N         12      3.0    12     SAS Initiator
2       0001        0009       N         12      3.0    12     SAS Initiator
3       0001        0009       N         12      3.0    12     SAS Initiator
4       0001        0009       N         12      3.0    12     SAS Initiator
5       0001        0009       N         12      3.0    12     SAS Initiator
6       0001        0009       N         12      3.0    12     SAS Initiator
7       0001        0009       N         12      3.0    12     SAS Initiator 
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
I setup a script to print the date and temp every second and it seems to be hovering around 65-67ºC over the past 20 hours.
 
Top