SOLVED New error - unrecoverable

Status
Not open for further replies.

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
UGH, the problems are now multiplying like rabbits.....On a different server than my previous post.

I just got this email error message;

Code:
  pool: TRINITY_RAID-01
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 7h38m with 0 errors on Sun Feb 15 09:38:36 2015
config:

        NAME                                            STATE     READ WRITE CKSUM
        TRINITY_RAID-01                                 ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/934f3544-e66c-11e2-aea7-002590ab7843  ONLINE       0     0     0
            gptid/93cb8c02-e66c-11e2-aea7-002590ab7843  ONLINE       0     0     0
            gptid/944e85b1-e66c-11e2-aea7-002590ab7843  ONLINE       0     0     0
            gptid/59c9cd9f-d827-11e3-b3b8-002590ab7843  ONLINE       0     0     0
            gptid/95474b4f-e66c-11e2-aea7-002590ab7843  ONLINE       0 21.7K     0
            gptid/6e0b6f9b-4a9a-11e4-88d2-002590ab7843  ONLINE       0     0     0
            gptid/9644ecaf-e66c-11e2-aea7-002590ab7843  ONLINE       0     0     0

errors: No known data errors


So I logged into my server to see what's going on, and right away, the yellow button is flashing and I checked it. It tells me the following;

Code:
WARNING: The volume TRINITY_RAID-01 (ZFS) status is UNKNOWN: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.


When I take a look at the pool, it seems that one of the drives is missing/not showing up. I found out that it's listed in the disks, but is definitely not listed in the volume. Also, when I view the volumes, it says that the pool is healthy. How can that be if the drive isn't even listed?

How do I determine if the drive needs to be replaced?
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Hmm,

I just checked the other email security report and it tells me the following;

Code:
trinity kernel log messages:
+++ /tmp/security.2MJX5zPd      2015-02-28 03:01:00.000000000 -0700
+(da3:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): SYNCHRONIZE CACHE(10). CDB: 35 0 0 0 0 0 0 0 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 1 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 1 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 3 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 1 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 3 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 1 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 3 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 1 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 9f a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 3 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 a1 a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 9f a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(10). CDB: 2a 0 0 40 3 a0 0 0 8 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 a1 a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 9f a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 a1 a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 9f a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 a1 a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 9f a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): WRITE(16). CDB: 8a 0 0 0 0 1 5d 50 a1 a0 0 0 0 8 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 9e 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 a0 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 9e 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 9e 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 a0 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 9e 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 a0 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 9e 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 a0 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
+(da3:mps0:0:4:0): READ(16). CDB: 88 0 0 0 0 1 5d 50 a0 90 0 0 0 10 0 0
+(da3:mps0:0:4:0): CAM status: SCSI Status Error
+(da3:mps0:0:4:0): SCSI status: Check Condition
+(da3:mps0:0:4:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)


What I'm reading from this is that the device is da3, but from checking the pool and drives, I thought for sure it was da7 (the one that isn't listed in the pool). Because da3 is set as a null drive when looking at the pool.

Does that help to diagnose what might be wrong? Should I be replacing the drive da3, or testing it. Or is my RAID card having some fits like my other server, and it might need the firmware changed?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Odds are exceptionally good that it's the RAID card.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
DrKK,

Is there a log file or something else I can do to test/check if it's the RAID card?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
DrKK,

Is there a log file or something else I can do to test/check if it's the RAID card?
I've never used a RAID card. So I am not expert. But I've been sitting around when Cyberjock has been talking about the various things.

Since you're throwing CAM errors, it's almost certainly a RAID card error if the disk's SMART output (do smartctl -a /dev/da3) (you can do it and post it here if you want in "code" tags, or better yet, pastebin) doesn't show anything scary.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Result of smartctl -a/dev/da3

Code:
[root@trinity] ~# smartctl -a /dev/da3
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p7 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda (SATA 3Gb/s, 4K Sectors)
Device Model:     ST3000DM001-1CH166
Serial Number:    Z1F1GQN7
LU WWN Device Id: 5 000c50 04ebcf754
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4
Local Time is:    Sat Feb 28 09:54:41 2015 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  584) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off supp                                                                   ort.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 326) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x3085) SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_                                                                   FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   119   099   006    Pre-fail  Always       -                                                                          229952616
  3 Spin_Up_Time            0x0003   093   093   000    Pre-fail  Always       -                                                                          0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -                                                                          119
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -                                                                          0
  7 Seek_Error_Rate         0x000f   065   059   030    Pre-fail  Always       -                                                                          94552680432
  9 Power_On_Hours          0x0032   087   087   000    Old_age   Always       -                                                                          11944
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -                                                                          0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -                                                                          119
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -                                                                          1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -                                                                          0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -                                                                          0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -                                                                          0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -                                                                          0
190 Airflow_Temperature_Cel 0x0022   069   049   045    Old_age   Always       -                                                                          31 (Min/Max 29/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -                                                                          0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -                                                                          118
193 Load_Cycle_Count        0x0032   016   016   000    Old_age   Always       -                                                                          168954
194 Temperature_Celsius     0x0022   031   051   000    Old_age   Always       -                                                                          31 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -                                                                          0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -                                                                          0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -                                                                          0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -                                                                          8602819499575
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -                                                                          44550147662
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -                                                                          239429708098

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA                                                                   _of_first_error
# 1  Short offline       Completed without error       00%     11826         -
# 2  Short offline       Completed without error       00%     11725         -
# 3  Short offline       Completed without error       00%     11605         -
# 4  Short offline       Completed without error       00%     11509         -
# 5  Extended offline    Completed without error       00%     11491         -
# 6  Short offline       Completed without error       00%     11389         -
# 7  Short offline       Completed without error       00%     11221         -
# 8  Short offline       Completed without error       00%     11110         -
# 9  Extended offline    Completed without error       00%     11092         -
#10  Short offline       Completed without error       00%     10990         -
#11  Short offline       Completed without error       00%     10870         -
#12  Short offline       Completed without error       00%     10775         -
#13  Extended offline    Interrupted (host reset)      20%     10756         -
#14  Short offline       Completed without error       00%     10656         -
#15  Extended offline    Interrupted (host reset)      00%     10536         -
#16  Short offline       Completed without error       00%     10439         -
#17  Short offline       Completed without error       00%     10330         -
#18  Short offline       Completed without error       00%     10234         -
#19  Extended offline    Completed without error       00%     10216         -
#20  Short offline       Completed without error       00%     10157         -
#21  Short offline       Completed without error       00%     10037         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
OK:

1) That looks good. I don't suspect the drive is dying.
2) I suspect your RAID card has a problem
3) Time to update your FreeNAS sir. FreeNAS 8.3 sir?
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Yes, 8.3. If it ain't broke, don't fix it.

Plus, I find 8.3 is way easier to navigate in the GUI and find things than 9.x versions.

I'll swap out the RAID card sometime this weekend.....if time permits. My other server also needs to have the firmware changed on all of it's RAID cards, so it will be a worthwhile task. Thanks for the help!
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Ok, well I thought this problem was resolved. Now I'm not so sure. After reading this thread, I'm not 100% sure that the error lies with my RAID card. So I booted the server back up, and there are no errors now (because it resets everything on a hard boot). So as per the advice of the above thread, I'm starting to run LONG SMART tests on all HDD's in the pool. The problem is that it's going to take 300+ minutes to run on each drive.

So my question is, can I run concurrent LONG SMART tests on each drive in the pool, or do I have to wait until one completes before I start another one?

BTW, I'll also check my zpool status once this first LONG SMART test is completed running on my first HDD in the pool (whould have done this first had I known how long it would take), to see if there is still an error overall. But I'm not going to run it while the first HDD is has the test running in off-line mode.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Each drive's SMART tests are totally autonomous. Run all the LONG tests all at the same time. They are drive level, not filesystem level.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Ok, well now I don't know what's going on. I'm getting absolutely no errors now. Have run the long and short smart tests. Checked the zpool status' and still nothing.

Could that error have been a one-off kinda thing?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Three things..

1. If you call that card your "RAID" card again, you will probably be disemboweled. ;) You probably freaked out every person of any kind of experience the second they saw the word "RAID" in your post and ignored you after that. Hardware RAID is basically suicide for ZFS. If you truly used hardware RAID, the community takes the stance that "you deserve what's coming for ignoring those recommendations".
2. Post a debug if your system after it's been running for a day or two. That will give any kind of hardware problem a chance to have problems and result in error messages.
3. The SMART from a disk tells you what the disk thinks of itself. The SMART info shows that the disk has zero indicators of a problem. So unless you happen to have a disk with a defect that happens to not affect any of the indicators it monitors (exceedingly unlikely, but not theoretically impossible) then the problem lies elsewhere.

What's the output of "sas2flash -listall"
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Sorry this reply has taken so long.... Life happened and other things became more important.

1. Ok, I'll try to refer to the device as a SATA Controller (?) instead of a RAID card.....didn't know it would offend so much. :(

2. Which portion of the Debug should I post? There are 11. I'm guessing the Hardware Configuration, and/or the Loader Configuration. I know you don't want to see everything....

3. Ok.

Would I be able to load sas2flash under FreeNAS and run it or would I have to run it via a separate boot device (USB)? I just SSH'd in to my server and the command isn't known.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
Last edited:

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
From the shell I'm getting the same output;

[root@trinity ~]# sas2flash -listall
bash: sas2flash: command not found

as I get from the Secure Shell (SSH);

[root@trinity] ~# sas2flash -listall
sas2flash: Command not found.

I am using version 8.3; could that be the reason it's not an available command?
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
I am using version 8.3; could that be the reason it's not an available command?
My guess is YES, that's the reason.
Assuming cyberjocks intention was to check you driver/firmware versions, try this command and post output please.
dmesg | grep mps
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Here it is;

Code:
[root@trinity] ~# dmesg | grep mps
mps0: <LSI SAS2008> port 0xe000-0xe0ff mem 0xdfa00000-0xdfa03fff,0xdf980000-0xdf9bffff irq 16 at device 0.0 on pci1
mps0: Firmware: 15.00.00.00, Driver: 13.00.00.00-fbsd
mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps0: [ITHREAD]
mps1: <LSI SAS2008> port 0xd000-0xd0ff mem 0xdf400000-0xdf403fff,0xdf380000-0xdf3bffff irq 17 at device 0.0 on pci2
mps1: Firmware: 15.00.00.00, Driver: 13.00.00.00-fbsd
mps1: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps1: [ITHREAD]
mps2: <LSI SAS2008> port 0xc000-0xc0ff mem 0xdee00000-0xdee03fff,0xded80000-0xdedbffff irq 16 at device 0.0 on pci3
mps2: Firmware: 15.00.00.00, Driver: 13.00.00.00-fbsd
mps2: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
mps2: [ITHREAD]
[root@trinity] ~#


Ok, I'm guessing that the driver should be updated to match the firmware.
 

BigDave

FreeNAS Enthusiast
Joined
Oct 6, 2013
Messages
2,479
The driver is within the Freenas version you are using (you don't want to change this),
just flash the card with matching firmware version (IT mode) P13 and then you can test things out.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd say its unlikely that the firmware version mismatch would cause the problems with just a single disk. But, since 8.3 is circa 12th century, BC, support is gonna be non-existent. You're best bet is to get on something MUCH more recent, make the firmware match the driver you are using, *then* figure out *if* there is still a problem.
 
Status
Not open for further replies.
Top