Strange SMART behavior:

Status
Not open for further replies.

pombaer

Cadet
Joined
Jan 11, 2017
Messages
2
Hallo community!
I want to use a Dell R530 with PERC H730 mini Raid (LSI Logic / Symbios Logic MegaRAID SAS-3 3108) and 3 Attached Disks (1 SSD 112 GB and 3 Seagate Constellation ES.3 SATA with 4TB) as Storage Device. The Controller is in HBA mode and i am using the "mrsas" driver in freenas with the RAID Controller. All SATA Disks are configured as "Non Raid Disks" and are shown under "/dev/da[1-3]".
When i enable SMART in the FREENAS Gui it will imediately turn to OFF.

After that i tried to retrieve smart information with the following command:

smartctl -a -d sat /dev/da1

Running this command gives me some errors (i will attach the errors at the end of the post), the strange behavior is that after running the command the block device from "/dev" will be deleted/removed.

I also tried to start the smartd daemon on the commandline in debug mode, it gives me the same results (i will also post the output at the end of my post).

smartd -d -c /usr/local/etc/smartd.conf


I need a solution to watch the disk state on my Server, i want to create a ZFS zpool with the disks, for me also a Hardware RAID is a solution, but the need of Disk monitoring must be satisfied, how do you monitor your disks? Has someone experience with the DELL PERC Controller?
On Linux i can use the OpenManage Tools from Dell, but what are the alternatives in FreeBSD or FreeNAS?


The Output of smartctl:

Code:
=== START OF INFORMATION SECTION ===
Model Family:  Seagate Constellation ES.3
Device Model:  ST4000NM0033-9ZM170
Serial Number:  Z1Z90YN1
LU WWN Device Id: 5 000c50 07bb69f50
Add. Product Id:  DELL(tm)
Firmware Version: GA6A
User Capacity:  4,000,787,030,016 bytes [4.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Wed Jan 11 15:16:27 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART Status command failed: Input/output error
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  90) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 494) minutes.
Conveyance self-test routine
recommended polling time:  (  3) minutes.
SCT capabilities:  (0x50bd) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x010f  081  063  ---  Pre-fail  Always  -  142900498
  3 Spin_Up_Time  0x0103  092  092  ---  Pre-fail  Always  -  0
  4 Start_Stop_Count  0x0032  100  100  ---  Old_age  Always  -  183
  5 Reallocated_Sector_Ct  0x0133  100  100  ---  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x000f  086  060  ---  Pre-fail  Always  -  417552077
  9 Power_On_Hours  0x0032  088  088  ---  Old_age  Always  -  11205
 10 Spin_Retry_Count  0x0013  100  100  ---  Pre-fail  Always  -  0
 12 Power_Cycle_Count  0x0032  100  100  ---  Old_age  Always  -  182
184 End-to-End_Error  0x0032  100  100  ---  Old_age  Always  -  0
187 Reported_Uncorrect  0x0032  100  100  ---  Old_age  Always  -  0
188 Command_Timeout  0x0032  100  100  ---  Old_age  Always  -  0
189 High_Fly_Writes  0x003a  100  100  ---  Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x0022  057  048  ---  Old_age  Always  -  43 (Min/Max 40/43)
191 G-Sense_Error_Rate  0x0032  100  100  ---  Old_age  Always  -  0
192 Power-Off_Retract_Count 0x0032  100  100  ---  Old_age  Always  -  181
193 Load_Cycle_Count  0x0032  100  100  ---  Old_age  Always  -  598
194 Temperature_Celsius  0x0022  043  052  ---  Old_age  Always  -  43 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a  021  003  ---  Old_age  Always  -  142900498
196 Reallocated_Event_Count 0x0032  000  000  ---  Old_age  Always  -  65535
197 Current_Pending_Sector  0x0012  100  100  ---  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0010  100  100  ---  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x003e  200  200  ---  Old_age  Always  -  0
240 Head_Flying_Hours  0x0000  100  253  ---  Old_age  Offline  -  10390 (75 118 0)
241 Total_LBAs_Written  0x0000  100  253  ---  Old_age  Offline  -  13765656312
242 Total_LBAs_Read  0x0000  100  253  ---  Old_age  Offline  -  548188675824

Read SMART Error Log failed: Input/output error

Read SMART Self-test Log failed: Input/output error

Read SMART Selective Self-test Log failed: Input/output error


The Output of smartd daemon:

Code:
Configuration file smartd.conf parsed.
Device: /dev/da2, type changed from 'scsi' to 'sat'
Device: /dev/da2 [SAT], opened
Device: /dev/da2 [SAT], ST4000NM0033-9ZM170, S/N:Z1Z90YN1, WWN:5-000c50-07bb69f50, FW:GA6A, 4.00 TB
Device: /dev/da2 [SAT], found in smartd database: Seagate Constellation ES.3
Device: /dev/da2 [SAT], not capable of SMART Health Status check
Device: /dev/da2 [SAT], Read SMART Thresholds failed, ignoring -f Directive
Device: /dev/da2 [SAT], Read SMART Self Test Log Failed
Device: /dev/da2 [SAT], no SMART Self-test Log, ignoring -l selftest
Device: /dev/da2 [SAT], Read Summary SMART Error Log failed
Device: /dev/da2 [SAT], no SMART Error Log, ignoring -l error
Device: /dev/da2 [SAT], is SMART capable. Adding to "monitor" list.
Device: /dev/da1, type changed from 'scsi' to 'sat'
Device: /dev/da1 [SAT], opened
Device: /dev/da1 [SAT], ST4000NM0033-9ZM170, S/N:Z1Z90RLG, WWN:5-000c50-07bb6db8b, FW:GA6A, 4.00 TB
Device: /dev/da1 [SAT], found in smartd database: Seagate Constellation ES.3
Device: /dev/da1 [SAT], not capable of SMART Health Status check
Device: /dev/da1 [SAT], Read SMART Thresholds failed, ignoring -f Directive
Device: /dev/da1 [SAT], Read SMART Self Test Log Failed
Device: /dev/da1 [SAT], no SMART Self-test Log, ignoring -l selftest
Device: /dev/da1 [SAT], Read Summary SMART Error Log failed
Device: /dev/da1 [SAT], no SMART Error Log, ignoring -l error
Device: /dev/da1 [SAT], is SMART capable. Adding to "monitor" list.
Device: /dev/da3, type changed from 'scsi' to 'sat'
Device: /dev/da3 [SAT], opened
Device: /dev/da3 [SAT], ST4000NM0033-9ZM170, S/N:Z1Z90ZC2, WWN:5-000c50-07bb686d5, FW:GA6A, 4.00 TB
Device: /dev/da3 [SAT], found in smartd database: Seagate Constellation ES.3
Device: /dev/da3 [SAT], not capable of SMART Health Status check
Device: /dev/da3 [SAT], Read SMART Thresholds failed, ignoring -f Directive
Device: /dev/da3 [SAT], Read SMART Self Test Log Failed
Device: /dev/da3 [SAT], no SMART Self-test Log, ignoring -l selftest
Device: /dev/da3 [SAT], Read Summary SMART Error Log failed
Device: /dev/da3 [SAT], no SMART Error Log, ignoring -l error
Device: /dev/da3 [SAT], is SMART capable. Adding to "monitor" list.
Monitoring 3 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Device: /dev/da2 [SAT], open() failed: No such file or directory
Sending warning via /usr/local/www/freenasUI/tools/smart_alert.py to root ...
Warning via /usr/local/www/freenasUI/tools/smart_alert.py to root: successful
Device: /dev/da1 [SAT], open() failed: No such file or directory
Sending warning via /usr/local/www/freenasUI/tools/smart_alert.py to root ...
Warning via /usr/local/www/freenasUI/tools/smart_alert.py to root: successful
Device: /dev/da3 [SAT], open() failed: No such file or directory
Sending warning via /usr/local/www/freenasUI/tools/smart_alert.py to root ...
Warning via /usr/local/www/freenasUI/tools/smart_alert.py to root: successful

 

pombaer

Cadet
Joined
Jan 11, 2017
Messages
2
Ok, thank you for your reply, but do you have an idea if there is an alternative possibility to watch the disk health? May MegaCli offers a possibility, do you know? I still have less experience with FreeNAS, Raid/Storage Controllers in depth.

Do you have an idea why the devices are removed from "/dev/" when starting SMART, this behavior is very strange, since it also deletes the block device entry and makes no difference if it is used from a zpool store or not, so it can damage my storagesystem.
 
Last edited:

m0nkey_

MVP
Joined
Oct 27, 2015
Messages
2,739
Ok, thank you for your reply, but do you have an idea if there is an alternative possibility to watch the disk health? May MegaCli offers a possibility, do you know? I still have less experience with FreeNAS, Raid/Storage Controllers in depth.
FreeNAS only offers SMART to monitor the health of the disks.
Do you have an idea why the devices are removed from "/dev/" when starting SMART, this behavior is very strange, since it also deletes the block device entry and makes no difference if it is used from a zpool store or not, so it can damage my storagesystem.
Likely because it's generating IO errors. Syslog or dmesg should give you an idea why it was removed.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This card is not suitable for FreeNAS as it (afaik) cannot be flashed to IT mode. Although it has a 'HBA' mode, smartd will never be able to monitor the drives.
Actually, it should work with the mrsas stack. It allows for true direct-attach even without IT mode.

Of course, it's less than ideal since it hasn't been widely tested, but it's an option.

Hallo community!
I want to use a Dell R530 with PERC H730 mini Raid (LSI Logic / Symbios Logic MegaRAID SAS-3 3108) and 3 Attached Disks (1 SSD 112 GB and 3 Seagate Constellation ES.3 SATA with 4TB) as Storage Device. The Controller is in HBA mode and i am using the "mrsas" driver in freenas with the RAID Controller. All SATA Disks are configured as "Non Raid Disks" and are shown under "/dev/da[1-3]".
When i enable SMART in the FREENAS Gui it will imediately turn to OFF.

After that i tried to retrieve smart information with the following command:

smartctl -a -d sat /dev/da1

Running this command gives me some errors (i will attach the errors at the end of the post), the strange behavior is that after running the command the block device from "/dev" will be deleted/removed.

I also tried to start the smartd daemon on the commandline in debug mode, it gives me the same results (i will also post the output at the end of my post).

smartd -d -c /usr/local/etc/smartd.conf


I need a solution to watch the disk state on my Server, i want to create a ZFS zpool with the disks, for me also a Hardware RAID is a solution, but the need of Disk monitoring must be satisfied, how do you monitor your disks? Has someone experience with the DELL PERC Controller?
On Linux i can use the OpenManage Tools from Dell, but what are the alternatives in FreeBSD or FreeNAS?


The Output of smartctl:

Code:
=== START OF INFORMATION SECTION ===
Model Family:  Seagate Constellation ES.3
Device Model:  ST4000NM0033-9ZM170
Serial Number:  Z1Z90YN1
LU WWN Device Id: 5 000c50 07bb69f50
Add. Product Id:  DELL(tm)
Firmware Version: GA6A
User Capacity:  4,000,787,030,016 bytes [4.00 TB]
Sector Size:  512 bytes logical/physical
Rotation Rate:  7200 rpm
Form Factor:  3.5 inches
Device is:  In smartctl database [for details use: -P show]
ATA Version is:  ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:  Wed Jan 11 15:16:27 2017 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART Status command failed: Input/output error
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
 
General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
  was completed without error.
  Auto Offline Data Collection: Enabled.
Self-test execution status:  (  0) The previous self-test routine completed
  without error or no self-test has ever
  been run.
Total time to complete Offline
data collection:  (  90) seconds.
Offline data collection
capabilities:  (0x7b) SMART execute Offline immediate.
  Auto Offline data collection on/off support.
  Suspend Offline collection upon new
  command.
  Offline surface scan supported.
  Self-test supported.
  Conveyance Self-test supported.
  Selective Self-test supported.
SMART capabilities:  (0x0003) Saves SMART data before entering
  power-saving mode.
  Supports SMART auto save timer.
Error logging capability:  (0x01) Error logging supported.
  General Purpose Logging supported.
Short self-test routine
recommended polling time:  (  2) minutes.
Extended self-test routine
recommended polling time:  ( 494) minutes.
Conveyance self-test routine
recommended polling time:  (  3) minutes.
SCT capabilities:  (0x50bd) SCT Status supported.
  SCT Error Recovery Control supported.
  SCT Feature Control supported.
  SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG  VALUE WORST THRESH TYPE  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate  0x010f  081  063  ---  Pre-fail  Always  -  142900498
  3 Spin_Up_Time  0x0103  092  092  ---  Pre-fail  Always  -  0
  4 Start_Stop_Count  0x0032  100  100  ---  Old_age  Always  -  183
  5 Reallocated_Sector_Ct  0x0133  100  100  ---  Pre-fail  Always  -  0
  7 Seek_Error_Rate  0x000f  086  060  ---  Pre-fail  Always  -  417552077
  9 Power_On_Hours  0x0032  088  088  ---  Old_age  Always  -  11205
10 Spin_Retry_Count  0x0013  100  100  ---  Pre-fail  Always  -  0
12 Power_Cycle_Count  0x0032  100  100  ---  Old_age  Always  -  182
184 End-to-End_Error  0x0032  100  100  ---  Old_age  Always  -  0
187 Reported_Uncorrect  0x0032  100  100  ---  Old_age  Always  -  0
188 Command_Timeout  0x0032  100  100  ---  Old_age  Always  -  0
189 High_Fly_Writes  0x003a  100  100  ---  Old_age  Always  -  0
190 Airflow_Temperature_Cel 0x0022  057  048  ---  Old_age  Always  -  43 (Min/Max 40/43)
191 G-Sense_Error_Rate  0x0032  100  100  ---  Old_age  Always  -  0
192 Power-Off_Retract_Count 0x0032  100  100  ---  Old_age  Always  -  181
193 Load_Cycle_Count  0x0032  100  100  ---  Old_age  Always  -  598
194 Temperature_Celsius  0x0022  043  052  ---  Old_age  Always  -  43 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a  021  003  ---  Old_age  Always  -  142900498
196 Reallocated_Event_Count 0x0032  000  000  ---  Old_age  Always  -  65535
197 Current_Pending_Sector  0x0012  100  100  ---  Old_age  Always  -  0
198 Offline_Uncorrectable  0x0010  100  100  ---  Old_age  Offline  -  0
199 UDMA_CRC_Error_Count  0x003e  200  200  ---  Old_age  Always  -  0
240 Head_Flying_Hours  0x0000  100  253  ---  Old_age  Offline  -  10390 (75 118 0)
241 Total_LBAs_Written  0x0000  100  253  ---  Old_age  Offline  -  13765656312
242 Total_LBAs_Read  0x0000  100  253  ---  Old_age  Offline  -  548188675824
 
Read SMART Error Log failed: Input/output error
 
Read SMART Self-test Log failed: Input/output error
 
Read SMART Selective Self-test Log failed: Input/output error


The Output of smartd daemon:

Code:
Configuration file smartd.conf parsed.
Device: /dev/da2, type changed from 'scsi' to 'sat'
Device: /dev/da2 [SAT], opened
Device: /dev/da2 [SAT], ST4000NM0033-9ZM170, S/N:Z1Z90YN1, WWN:5-000c50-07bb69f50, FW:GA6A, 4.00 TB
Device: /dev/da2 [SAT], found in smartd database: Seagate Constellation ES.3
Device: /dev/da2 [SAT], not capable of SMART Health Status check
Device: /dev/da2 [SAT], Read SMART Thresholds failed, ignoring -f Directive
Device: /dev/da2 [SAT], Read SMART Self Test Log Failed
Device: /dev/da2 [SAT], no SMART Self-test Log, ignoring -l selftest
Device: /dev/da2 [SAT], Read Summary SMART Error Log failed
Device: /dev/da2 [SAT], no SMART Error Log, ignoring -l error
Device: /dev/da2 [SAT], is SMART capable. Adding to "monitor" list.
Device: /dev/da1, type changed from 'scsi' to 'sat'
Device: /dev/da1 [SAT], opened
Device: /dev/da1 [SAT], ST4000NM0033-9ZM170, S/N:Z1Z90RLG, WWN:5-000c50-07bb6db8b, FW:GA6A, 4.00 TB
Device: /dev/da1 [SAT], found in smartd database: Seagate Constellation ES.3
Device: /dev/da1 [SAT], not capable of SMART Health Status check
Device: /dev/da1 [SAT], Read SMART Thresholds failed, ignoring -f Directive
Device: /dev/da1 [SAT], Read SMART Self Test Log Failed
Device: /dev/da1 [SAT], no SMART Self-test Log, ignoring -l selftest
Device: /dev/da1 [SAT], Read Summary SMART Error Log failed
Device: /dev/da1 [SAT], no SMART Error Log, ignoring -l error
Device: /dev/da1 [SAT], is SMART capable. Adding to "monitor" list.
Device: /dev/da3, type changed from 'scsi' to 'sat'
Device: /dev/da3 [SAT], opened
Device: /dev/da3 [SAT], ST4000NM0033-9ZM170, S/N:Z1Z90ZC2, WWN:5-000c50-07bb686d5, FW:GA6A, 4.00 TB
Device: /dev/da3 [SAT], found in smartd database: Seagate Constellation ES.3
Device: /dev/da3 [SAT], not capable of SMART Health Status check
Device: /dev/da3 [SAT], Read SMART Thresholds failed, ignoring -f Directive
Device: /dev/da3 [SAT], Read SMART Self Test Log Failed
Device: /dev/da3 [SAT], no SMART Self-test Log, ignoring -l selftest
Device: /dev/da3 [SAT], Read Summary SMART Error Log failed
Device: /dev/da3 [SAT], no SMART Error Log, ignoring -l error
Device: /dev/da3 [SAT], is SMART capable. Adding to "monitor" list.
Monitoring 3 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Device: /dev/da2 [SAT], open() failed: No such file or directory
Sending warning via /usr/local/www/freenasUI/tools/smart_alert.py to root ...
Warning via /usr/local/www/freenasUI/tools/smart_alert.py to root: successful
Device: /dev/da1 [SAT], open() failed: No such file or directory
Sending warning via /usr/local/www/freenasUI/tools/smart_alert.py to root ...
Warning via /usr/local/www/freenasUI/tools/smart_alert.py to root: successful
Device: /dev/da3 [SAT], open() failed: No such file or directory
Sending warning via /usr/local/www/freenasUI/tools/smart_alert.py to root ...
Warning via /usr/local/www/freenasUI/tools/smart_alert.py to root: successful
 
You'll probably want to file a bug report. Please post the issue number here, for future reference. I'd like to see mrsas working properly, to try and get more usable SAS controllers working.
 

Magius

Explorer
Joined
Sep 29, 2016
Messages
70
Running this command gives me some errors (i will attach the errors at the end of the post), the strange behavior is that after running the command the block device from "/dev" will be deleted/removed.
I've actually seen this exact kind of behavior before, in Linux and Windows, with motherboard SATA ports and LSI-driven SAS ports, and in my case I was able to trace it back to the drives. The only reason I bother mentioning it is to highlight the possibility that it's absolutely possible for an HBA port to completely shut down after processing SMART data from a drive.

When I used to see this regularly I was issuing a 'smartctl -x' command instead of 'smartctl -a'. After years of doing this with no problems, I came across some drives where this would cause the drive to start returning SMART data and then either partway through or just after the data came back the SAS/SATA port it was attached to would go completely dead. The device /dev/sdx would be non-existant in the OS anymore, and trying to hot-plug it or use another drive would do nothing, the whole port was dead. I had several adapters I could repeat this reliably every time I issued the '-x' option to that model hard drive. Rebooting the machine would fix everything until you ran the command again.

This was actually the reason I changed all of my test scripts to use 'smartctl -a' instead of '-x' many years back. The '-a' didn't seem to trigger the issue, at least not nearly as often, and since that time I've only very rarely seen the problem again. As in single digit occurrences out of thousands of tries. I don't have any experience with your Seagate ES.3's, and my guess is they're far too common to have an unknown issue with crashing on SMART commands, so I'm not suggesting that this is your answer. I'm just pointing out that "weirder things can happen", and each combination of HBA, firmware, driver and hard drive can have its own unique issues :)
 
Status
Not open for further replies.
Top