smartctl attributes not displayed for SAS drives?

SMnasMAN · Apr 14, 2019

I too dont like the limited data that SAS drives give you vs SATA drives... Where is the "Enterprise" of SAS drives if i cant get (any) useful data out of them to tell if they are pre-fail?!

that said, i did find this great command, that on my HGST SAS drives gave me a tad more info (addtl, kind of disappointed that smarctl -a , ie --all / is NOT really showing ALL data possible, as its missing this data below, on sas drives that is):

smartctl -l background /dev/da30

(lowercase "L") -- (actually command above is the only way i have found to get my HGST SAS drives' Power on hours (POH) via smartctl, so that is helpful)

if you want to see how useless SAS "smart" data is, this below is the output of a SAS drive that is 100% failing....nice....

Code:

root@freenas:/mnt/new3uRDz5x3disks # smartctl --all /dev/da30
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               HITACHI
Product:              HUS723030ALS640
Revision:             A222
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000cca03e4afb04
Serial number:        YVHA7AED
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Sun Apr 14 20:45:59 2019 CDT
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     43 C
Drive Trip Temperature:        85 C

Manufactured in week 48 of year 2012
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  68
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  2883
Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 20736093531930624

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:          0   269671         0    269671   29810891     560605.872           0
write:         0  1898285         0   1898285     136999     134366.471           0
verify:        0     1879         0      1879      49360       2796.920           0

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   44172                 - [-   -    -]
# 2  Background long   Completed                   -   44099                 - [-   -    -]
# 3  Background short  Completed                   -       1                 - [-   -    -]

Long (extended) Self Test duration: 27182 seconds [453.0 minutes]

root@freenas:/mnt/new3uRDz5x3disks #
root@freenas:/mnt/new3uRDz5x3disks # smartctl -l background /dev/da30
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 45824:26 [2749466 minutes]
    Number of background scans performed: 285,  scan progress: 76.48%
    Number of background medium scans performed: 285

Chris Moore · Apr 14, 2019

I agree that the data a SAS drive gives is pretty useless. Did you try using the -x (extended) option?

SMnasMAN said:

This indicates that a scan is still in progress, and the current number of power on hours is 45824.

SMnasMAN said:

This indicates the last test to complete was at 44172 hours and that means there has not been a test run in about 1652 hours. So, it has been a while since a test was run. If that drive was running 24/7, it would be a little more than two months. Maybe test more often?

Chris Moore · May 22, 2019

Black Ninja said:
Perhaps too long, that you didn't even read what my suggestions was:

Sorry I missed this response. I have compared apples to apples many times. Take a look at this spec sheet:
https://www.seagate.com/www-content...-5-hdd-10tb-channelDS1863-5C-1608US-en_US.pdf
I know, this is Seagate instead of HGST, but the point I am making is better illustrated here. The same mechanical drive is simply fitted with a different interface card to switch from SATA to SAS. There really is no difference in most of the drives when you switch from SATA to SAS. In some very limited cases there is actually a difference in the mechanical components of the drive but those drives are significantly more expensive. So it depends on your usage as to whether those drives are worth investing in. If the price is close to the same, regardless of the interface, then you are not actually getting a better drive just because it is SAS instead of SATA. The only way you MIGHT be getting a better drive is if you do have a significantly more expensive drive. Not all vendors carry all drives so it is often difficult to see the difference in price from the same vendor.
I do a lot of these comparisons for work. We ordered 86 bare drives to upgrade existing servers and a whole new server that will include another 65 drives this year. They should be delivered in the next few months.

jgreco · May 22, 2019

SMnasMAN said:
I too don't like the limited data that SAS drives give you vs SATA drives... Where is the "Enterprise" of SAS drives if i can't get (any) useful data out of them to tell if they are pre-fail?!

SCSI drives evolved separately from IDE/PATA/SATA, and most of their statistics reporting happen through the mode pages. See /usr/share/misc/scsi_modes and the related camcontrol stuff. Seriously I'm surprised this conversation has gone on so long without someone stating the obvious...

Chris Moore · May 22, 2019

jgreco said:
I'm surprised this conversation has gone on so long without someone stating the obvious... :)

How does that work with smartctl?

jgreco · May 23, 2019

Chris Moore said:
How does that work with smartctl?

It doesn't, obviously. It's like asking where you plug in the EV charging cable for your conventional internal combustion engine car, or something like that.

SCSI doesn't have SMART attributes.

Back in the old days, back when ESDI still walked the earth and PC drives were dumb-ish, SCSI had a problem in that the drive controller for a HDD was not directly attached to the host, as most other HDD controllers were. It was at the far end of a general purpose bus, one that was designed to attach not only HDD's, but also tape drives, CDROM's, printers, scanners, jukebox devices, etc. This meant that you couldn't just directly twiddle bits on a directly-attached controller in the device driver, which is how things like MFM/RLL worked. The electronics that drove a SCSI device's physical bits needed to be behind a controller that could speak SCSI, and you needed to communicate with the controller which then talked to the drive electronics on your behalf.

The SCSI folks solved this by creating a way to interact with SCSI devices to allow "meta" level configuration and monitoring. These were mode pages. For hard drives, for example, you could explain to the drive what sort of policy you wanted when an error was encountered. You would go into the Read-Write Error Recovery mode page, and set or clear the ARRE and AWRE bits (Automatic Read Reallocation Enable, Automatic Write Reallocation Enable). If you set AWRE=1, and tried to write a block to a bad sector, it would, without further ado, catch the error, remap the bad block to a spare, write the data to the spare, and just report success back to the initiator. But you could also set AWRE=0 and let the error propagate back to the initiator to allow the initiator device to handle the error.

Now the problem with this mechanism is that SCSI's a bit of a big ugly mess, because it was designed not only to work with HDD's, but also tape drives, CDROM's, printers, bla bla bla. Also, it was designed very early on, in the mid '80's, when embedded processors were not really a huge thing, so the design was hyper-focused on efficiency and the ability to create device firmware that could fit into the limited embedded processors of the era. As a result, SCSI tends to be rather arcane and tedious to work within. Further, drive manufacturers started to chafe at some of the limitations of standards-based SCSI early on, and would often create customized functionality, sometimes for RAID array manufacturers, etc. Some of these things became standardized later, others remain miniature landmines for implementors. We call some of these "vendor-specific extensions" and we call others of these "quirks" and you can Google for "scsi quirks" to get a feel for that.

But the whole thing is, as the embedded controllers powering HDD's became more powerful, with more memory and larger firmware regions becoming available, the number of things a drive could potentially keep track of grew, and the HDD-related portions of the standard didn't seem to grow to evolve to match. Manufacturers often provide access to this information in a vendor-specific manner, but it isn't super-easy.

So the other thing is that during the '80's and '90's, consumers who had MFM/RLL/IDE drives enjoyed the experience of random and sometimes quite sudden HDD failure, which was a significant problem for people who often didn't have a backup. So, with the advent of the new ATA standard to replace IDE, manufacturers started working towards what we've come to know as S.M.A.R.T., which became common in the mid-2000's, with many mainboard manufacturers introducing HDD health reporting at boot time, because SMART made this easy to do. Monitoring HDD performance statistics was easier to implement on a 2000's-era HDD than it was on a 1980's-era HDD because the controllers and interface silicon were so much better.

But SMART is an ATA thing. It isn't a SCSI thing. And "SAS" is simply Serial Attached SCSI.

My recollection is that smartmontools still tries to make sense out of the data available on SAS drives, but expect that to be somewhat more limited due to the history of all this, and because drive manufacturers often don't share their vendor-specific extension information with the public.

Also I'm sure someone can find some minor bits or nits to quibble over here. I'm trying to condense ~20-30 years of HDD and connectivity tech evolution into a not-ungodly-long post, and mostly doing it from memory. I don't care to argue trite corrections but feel free to call me out on any large scale screwups.

Chris Moore · May 23, 2019

jgreco said:
Also I'm sure someone can find some minor bits or nits to quibble over here. I'm trying to condense ~20-30 years of HDD and connectivity tech evolution into a not-ungodly-long post, and mostly doing it from memory.

Thanks for sharing. I am doing some research into this and Google has provided me with some assistance but I wonder if you know of a good resource that I could read to help me with the how-to part of getting more info out of a SAS drive. I have a batch of SAS drives in one of my systems at work and all the FreeNAS GUI will do with them is give me a mess of errors about S.M.A.R.T. not being supported.

jgreco · May 23, 2019

Chris Moore said:
Thanks for sharing. I am doing some research into this and Google has provided me with some assistance but I wonder if you know of a good resource that I could read to help me with the how-to part of getting more info out of a SAS drive. I have a batch of SAS drives in one of my systems at work and all the FreeNAS GUI will do with them is give me a mess of errors about S.M.A.R.T. not being supported.

No, not really, sorry. If you had asked me 20 years ago, I'd probably have had a bunch of additional information. Unfortunately, this is one of those niche areas where vendor specific info was esoteric information even back then. My interest in the topic was generally along the lines of needing to adjust ARRE/AWRE. That fizzled out around slowly between 2005-2010 as most of our systems here transitioned from mostly-SCSI to mostly-SATA, with only a minimal toying with SAS. Just because server manufacturers make "SAS" drive bays doesn't mean I'm gonna spend that kind of money. ;-)

Manufacturer-provided drive analysis tools of the time definitely tracked statistics on remapped sectors and often a variety of other statistics about the drive. Stuff like remapped sectors was handled with a special command (READ-DEFECT maybe?) and as far as I recall most of the rest was driven through the mode page mechanism, though if someone accused a manufacturer of creating special command codes, I wouldn't be at all shocked by that.

The best current practical documentation is probably whatever smartmontools and other "drive repair" tools of that era have gathered. I suspect much specialist knowledge has been lost over the last 20 years as SAS HDD's have become a "perfected commodity" and they're just not of interest to most "hardware hacker" types. Everyone's rushing off to SATA HDD for capacity and SSD for performance.

Expect working with this stuff to be frustrating and the available information to be sparse, tedious, inscrutable, etc. I'm pretty sure that basic stats support was *NOT* part of the standard SCSI mode pages, alas.

Chris Moore · Jan 13, 2021

jgreco said:
Expect working with this stuff to be frustrating and the available information to be sparse, tedious, inscrutable, etc. I'm pretty sure that basic stats support was *NOT* part of the standard SCSI mode pages, alas.

I wanted to thank you again for sharing your wealth of knowledge. Thanks, very much.

jgreco · Jan 13, 2021

I don't know what brought that on, but as always, you're welcome.

Chris Moore · Jan 13, 2021

jgreco said:
I don't know what brought that on, but as always, you're welcome.

Discussion here:

Not getting SMART data from new drives

I'm building a new server on a Dell R720 and have the following SAS drives in my system: Seagate DL2400MM0159 and Toshiba PX04SVB040. I'm only getting limited SMART data from these SAS drives. On both types of drives, I don't get the "SMART Attributes Data Structure" table with all the SMART...

www.truenas.com

Important Announcement for The TrueNAS Community.

smartctl attributes not displayed for SAS drives?

SMnasMAN

Contributor

Chris Moore

Hall of Famer

Chris Moore

Hall of Famer

jgreco

Resident Grinch

Chris Moore

Hall of Famer

jgreco

Resident Grinch

Chris Moore

Hall of Famer

jgreco

Resident Grinch

Chris Moore

Hall of Famer

jgreco

Resident Grinch

Chris Moore

Hall of Famer

Not getting SMART data from new drives

Similar threads