SMART on 3ware controller.

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What you COULD do.. a camcontrol devlist which lists how many 3ware drives there are and the daXX values and then do a tw_cli /c0 show and parse out the drives and then match each uXX to each daXX in order.

camcontrol devlist output:

Code:
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 0 lun 0 (pass0,da0)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 1 lun 0 (pass1,da1)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 2 lun 0 (pass2,da2)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 3 lun 0 (pass3,da3)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 4 lun 0 (pass4,da4)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 5 lun 0 (pass5,da5)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 6 lun 0 (pass6,da6)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 7 lun 0 (pass7,da7)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 8 lun 0 (pass8,da8)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 9 lun 0 (pass9,da9)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 10 lun 0 (pass10,da10)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 11 lun 0 (pass11,da11)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 12 lun 0 (pass12,da12)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 13 lun 0 (pass13,da13)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 14 lun 0 (pass14,da14)
<AMCC 9650SE-24M DISK 4.10>        at scbus3 target 15 lun 0 (pass15,da15)
<Corsair Voyager 1100>             at scbus11 target 0 lun 0 (pass16,da16)


Then run tw_cli /c0 show
tw_cli /c0 show output:
Code:
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   1.36 TB   SATA  0   -            WDC WD15EADS-00S2B0 
p1    OK             u1   931.51 GB SATA  1   -            ST31000340AS        
p2    OK             u2   931.51 GB SATA  2   -            ST31000528AS        
p3    OK             u3   1.36 TB   SATA  3   -            ST31500341AS        
p4    OK             u4   931.51 GB SATA  4   -            ST31000340AS        
p5    OK             u5   1.36 TB   SATA  5   -            ST31500341AS        
p6    OK             u6   1.36 TB   SATA  6   -            ST31500341AS        
p7    OK             u7   931.51 GB SATA  7   -            ST31000340AS        
p8    OK             u8   931.51 GB SATA  8   -            ST31000340AS        
p9    OK             u9   1.36 TB   SATA  9   -            ST31500341AS        
p10   OK             u10  931.51 GB SATA  10  -            ST31000528AS        
p11   OK             u11  1.36 TB   SATA  11  -            ST31500341AS        
p12   OK             u12  931.51 GB SATA  12  -            ST31000340AS        
p13   OK             u13  1.36 TB   SATA  13  -            ST31500341AS        
p19   OK             u14  931.51 GB SATA  19  -            ST31000340AS        
p23   OK             u15  1.36 TB   SATA  23  -            ST31500341AS


Now we know that da0 matches u0, da1 matches u1, etc...

Do tw_cli /c0/u0 show serial for output of:

Code:
/c0/u0 serial number = ******************


Then pull out the serial and you have a /dev/daXX matching a given serial number! Dump that into the FreeNAS GUI and all is working :D
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Dammit.. now I wish I knew more about scripting in BSD :( I think it's time I see if I can figure out this mess on my own ;)
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I think I have this working, but you will have to test it. First save ix-smartd rev-12158 from trunk to somewhere writable. Then patch it with the attached file, mount / uw and copy ix-smartd to /conf/base/etc/rc.d/ix-smartd. I would double check owner & permissions on it.
 

Attachments

  • patch.ix-smartd.rev-12158.txt
    630 bytes · Views: 348

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
Support for 3ware (twa) has been added in r12171, based on this patch.

Please give it a try in tomorrows nightly release.

Thanks
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Still having problems with SMART on 3ware controller. Looking for some clarification. I've upgraded to 8.3 and using a 3ware controller (9650SE-24M8). Support was added by giving William SSH access to the server so he could get the back-asswards stuff in 3ware to work. It was so promising, until tonight.

I decided that I wanted to verify each of the 3 type of SMART tests work with my 3ware controller now that 8.3-RELEASEis out and I had upgraded this server. So I setup /dev/da0 and /dev/da1 to run a long test tonight(Nov 11) at 1700 hours. I setup the test about 1650. Ports 4 and 5 on the controller are /dev/da0 and /dev/da1. Per previous manual initiation of the long test via 'smartctl -t long -d 3ware,4 /dev/twa0' I knew that this test takes 296 minutes to complete. The /etc/local/smartd.conf file is as follows below with only long tests setup for da0 and da1:

Code:
/dev/twa0 -d 3ware,11 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,22 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,16 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,4 -n never -W 0,0,0 -m myemail@hotmail.com -s L/(11)/(11)/(7)/(17) 
/dev/twa0 -d 3ware,17 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,5 -n never -W 0,0,0 -m myemail@hotmail.com -s L/(11)/(11)/(7)/(17) 
/dev/twa0 -d 3ware,6 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,18 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,19 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,7 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,20 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,8 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,21 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,9 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,10 -n never -W 0,0,0 -m myemail@hotmail.com 
/dev/twa0 -d 3ware,23 -n never -W 0,0,0 -m myemail@hotmail.com 


Everything looks okay, so far so good.

So I decided I'd check the status of the long test. I ran 'smartctl -a -d 3ware,4 /dev/twa0' and found that the test is not running. Output is:

Code:
# smartctl -a -d 3ware,4 /dev/twa0
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST31500341AS
Serial Number:    *********
LU WWN Device Id: 5 000c50 010a1145e
Firmware Version: CC1H
User Capacity:    1,500,301,910,016 bytes [1.50 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sun Nov 11 17:11:17 2012 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  617) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 296) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       197456198
  3 Spin_Up_Time            0x0003   100   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       781
  5 Reallocated_Sector_Ct   0x0033   099   099   036    Pre-fail  Always       -       73
  7 Seek_Error_Rate         0x000f   083   060   030    Pre-fail  Always       -       213625880
  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       19189
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       3
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       545
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   097   000    Old_age   Always       -       579831333030
189 High_Fly_Writes         0x003a   001   001   000    Old_age   Always       -       178
190 Airflow_Temperature_Cel 0x0022   059   048   045    Old_age   Always       -       41 (Min/Max 40/41)
194 Temperature_Celsius     0x0022   041   052   000    Old_age   Always       -       41 (0 16 0 0 0)
195 Hardware_ECC_Recovered  0x001a   030   019   000    Old_age   Always       -       197456198
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       4449
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       164565966932149
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       4042743741
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       59243558

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


So, I know the test did not initiate. I can manually initiate a test at any time with 'smartctl -t long -d 3ware,4 /dev/twa0'. Then you see an output with:

Code:
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Self-test routine in progress 90%     19190         -


So I changed the Long Test to a Short Test in the GUI. The following 2 lines are now in the smartd.conf:

Code:
/dev/twa0 -d 3ware,4 -n never -W 0,0,0 -m myemail@hotmail.com -s S/(11)/(11)/(7)/(17)
/dev/twa0 -d 3ware,5 -n never -W 0,0,0 -m myemail@hotmail.com -s S/(11)/(11)/(7)/(17)


Bold is what changed. So I changed from a short test to a Conveyance Test. smartd.conf now has these 2 lines:

Code:
/dev/twa0 -d 3ware,4 -n never -W 0,0,0 -m myemail@hotmail.com -s C/(11)/(11)/(7)/(17)
/dev/twa0 -d 3ware,5 -n never -W 0,0,0 -m myemail@hotmail.com -s C/(11)/(11)/(7)/(17)


Bold is "L" for long test, "S" for short test, and "C" for conveyance test. Seems to make logical sense, except I don't seem to have any tests that actually run. I've verified that SMART service is ON.
I can manually initiate a long test at any time with 'smartctl -t long -d 3ware,XX /dev/twa0'.

So does anyone have any ideas what is wrong? Should I be receiving an email after the long test completes if I manually initiate a long test(or any test for that matter)? I'm trying to rule out whether my attempt to determine if the SMART test actually is valid or if it wasn't suppose to run for some reason. I've never received a SMART email from this system because SMART didn't work on 8.2. I'm only now finally getting around to setting up SMART as a normal service on my FreeNAS servers because I have the 3ware controllers.

Ideas or suggestions? As a stop-gap I was thinking of doing a cronjob to run the long and short tests via cron, but I'm not sure the conveyance testing is even running.
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
I am not understanding you. Why do you think smart test you run exactly once you setup a test?
They are scheduled. Looks to me you have just selected the wrong times, (11)/(11)/(7)/(17) does not look like to me at all, what did you choose day of the week, month, hours, etc?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I had chosen everything to force a test yesterday evening; Sunday, November 11th at 1700 hours(unless I checked a wrong box somewhere). This particular machine isn't always available to me, so I figured I'd run a test and see if the SMART tests actually run. I definitely haven't ruled out that my test was a complete fallacy due to my inexperience with SMART. I just thought it should be easy to prove they run correctly last night by scheduling a test at 1700.

I see nothing wrong with the smartd.conf settings. Even comparing my attempted test against http://smartmontools.sourceforge.net/man/smartd.conf.5.html I see that the test should have occurred November 11th, Sunday, at 1700 hours. This is the correct month, day, day of the week, and hour that my test should have run.

I'm just curious(and puzzled) why the test didn't start at 1700 like I should have expected. The smartd.conf info seems completely correct.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I'm just curious(and puzzled) why the test didn't start at 1700 like I should have expected. The smartd.conf info seems completely correct.
According to the man page the test should run between 17:00 & 17:59 inclusive if I'm not mistaken. While not your problem it would be better to only specify day of week & hour.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
According to the man page the test should run between 17:00 & 17:59 inclusive if I'm not mistaken. While not your problem it would be better to only specify day of week & hour.

I was driving home a few mins ago and I thought to myself "what if it runs anytime during that hour and not necessarily at that exact hour". Then I saw you posted and I thought "get read.. he's gonna throw a stone at your glass house". LOL. Sure enough, the test was basically invalidated because I didn't give it enough time. I'll have to redo my test and try it again to see. I was really baffled because William fixed it all up, and everything looked right, but it still didn't work.

I never questioned William's patch to make 3ware controllers start working correctly, I was more of questioning if there was a bug with smartctl or I just suck at trying to use SMART :P
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Finally. Got results from my SMART tests. All is good. Long Test started during the hour and not on the hour.

So.. Any recommendations on how often to run:

-Offline Immediate Test
-Long Test
-Short Test
-Conveyance Test

???

I was thinking:

-Offline Immediate Test - Once a day at 1am
-Short Self Test - Once a day at 2am
-Long Self Test - Twice a month at 3am
-Conveyance Test - Hourly

Scrub also will run at noon on the same day as Long Self Test. This will also prevent a Long Self Test and scrub from running at the same time.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
-Offline Immediate Test
-Long Test
-Short Test
-Conveyance Test
What do each of the above tests do? What benefit do you get in running them the frequency you propose? Have you considered the downsides of running the tests?

So.. Any recommendations on how often to run:
Personally, I would lose all of them except the long tests.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Personally, I would lose all of them except the long tests.

At what frequency? Nightly? Weekly?

And what are the downsides of the short and conveyance tests? Feel free to point me to a FAQ somewhere -- I've been looking and haven't come up with the right keywords to generate the answers.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I just did them from some website I saw. No explanation was given and no comments were made.. so that's why I asked here. I can find no forum threads with google discussing how often tests should be run. :(

Personally, if paleoN says to do 1 long a week, that's good enough for me. I'll delete them all except the long test. :)
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
I just did them from some website I saw. No explanation was given and no comments were made.. so that's why I asked here. I can find no forum threads with google discussing how often tests should be run. :(

Personally, if paleoN says to do 1 long a week, that's good enough for me. I'll delete them all except the long test. :)

I hear you. ;)

Maybe it's because I don't understand how smartd works, but but how do disk prefail errors get reported outside of running one of the tests?
 

paylesspizzaman

Explorer
Joined
Sep 1, 2015
Messages
92
I'll revive a very old thread here. I have a 3ware 9650se-24 controller (jbod mode) in a test setup and am trying to see if disk da0 is in standby or not. First I ran "camcontrol devlist" which returned this info for da0:

<AMCC 9659SE-24 DISK 4.10> at scbus0 target 0 lun 0 (pass0,da0)

I then ran the following commands:

camcontrol cmd -a da0 "E5 00 00 00 00 00 00 00 00 00 00 00" -r -

camcontrol cmd -a -d 3ware, 0 /dev/twa0 "E5 00 00 00 00 00 00 00 00 00 00 00" -r -

camcontrol cmd -a --device=3ware,0 /dev/twa0 "E5 00 00 00 00 00 00 00 00 00 00 00" -r -

All of which returned:
camcontrol: subcommand cmd requires a valid device identifier

Any idea what I'm doing wrong or maybe this can't be done with the 3ware controller?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
As a former (keyword: former) user of the 3ware 9650 controller in two different FreeNAS machines, I can tell you that you're better off replacing that thing *right now*. Consider yourself enlightened by reading this post and just replace it. It's a terrible match for FreeNAS and ZFS, and is going to create problems for you later. I lost over $500 because of the lessons of 3ware. There's a reason we don't recommend them.. they really shouldn't be used, even in their "jbod" mode with FreeNAS/ZFS.
 
Status
Not open for further replies.
Top