SOLVED Security Run Output Error

Status
Not open for further replies.

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
A little help please for a NOOB;

I have recently got my NAS up and running. Filled one of my ZFS pools, and am starting to load up another one.

With dumping information onto the next Pool, I've been getting errors from the security run output. This is what it's telling me;

Code:
+++ /tmp/security.GlB5bLOs      2013-08-01 03:01:00.000000000 -0600
+(da12:mps1:0:4:0): READ(10). CDB: 28 0 c9 29 ed c8 0 0 8 0
+(da12:mps1:0:4:0): CAM status: SCSI Status Error
+(da12:mps1:0:4:0): SCSI status: Check Condition
+(da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
+(da12:mps1:0:4:0): READ(10). CDB: 28 0 2 7a fb 58 0 0 8 0
+(da12:mps1:0:4:0): CAM status: SCSI Status Error
+(da12:mps1:0:4:0): SCSI status: Check Condition
+(da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)


What I can tell from this is that drive da12 is having some kind of error, but what kind, I have no idea. Can someone else help me with identifying this error and what it might mean? I know that it exists on the second Pool, which I have recently started to write/dump files to.

I have searched the forums, and I did find this post, and it seems to be very relevant. I have 24, 3.0TB Seagate drives.....but only drive da12 seems to be giving me any grief. All the other 23 seem fine so far (I haven't touched the 3rd ZFS Pool yet, so am not totally sure that statement is true, but I did test the drives for 2 weeks each before inserting into the server, and they all passed all tests). And in my pools, I have hot swap drives, so should anything happen to the da12 drive, I should be OK.....

But the above post also mentions a faulty backplane, which I have had before on another Norco RPC-4220 case (this case is a Norco RPC-4224).....but I'm not going to order a new one until I now if my 3rd pool is fully functioning in case I need more than one backplane. Tonight I'll start transfering files to the 3rd pool to see if I get any similar errors on it.

Is there a way to spin up the da12 drive so that it shows it's activity light so that I can identify it in the case? I do have spares in case I need to swap it out. That would also help me identify if its actually the drive that is faulty or if it's the backplane, swapping it out.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It looks like its having read errors. I'd find that disk and replace it at your next possible opportunity. Unless the hardware has an activity light that can be controlled from the attached hardware, and that attached hardware has software that is part of FreeNAS and can affect the LED, no.

What I do is I do smartctl -a /dev/da12, write down the serial number, then shutdown the server and hunt down that disk. The serial number is also available from the GUI if you look at the disks.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
It looks like its having read errors. I'd find that disk and replace it at your next possible opportunity.
What I do is I do smartctl -a /dev/da12, write down the serial number, then shutdown the server and hunt down that disk. The serial number is also available from the GUI if you look at the disks.

Ugh, I thought all the testing I did would have helped with reducing the possibility of these drives having errors!!! Well, they are still under warranty.

So, are there any special steps that I should follow for changing out that faulty drive? Find the serial number, shut down the server, identify the faulty disk, swap it out........and will FreeNAS take care of the rebuilding of the drive, or do I have to initiate a rebuild somehow?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Consult the manual. There's a section for replacing disks.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Consult the manual. There's a section for replacing disks.

Yup, I found it by doing a Google search. Thanks for the help. I'll see what happens when I replace the drive. Hopefully it's just the drive, and not the backplane. I don't want to have to tear apart the whole case to get in and replace one.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Ok, I've followed the manual about "Replacing a Failed Drive or SSD", and am currently resilvering the replaced drive.

Quick question, how do I monitor the resilvering using zpool status? I've tried
Code:
zpool status {RAID POOL NAME}

but it doesn't really tell me a status of the progress of the resilvering.

I've done a Google search for "zpool status options" and it came up with options for Oracle and Solaris......and with many, many pages of material. I found out that zpool status is a pretty powerful tool, but didn't come up with options to view the progress of the resilvering. And, I don't want to mess anything else up......so if someone knows what options to use, please let me know.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It is zpool status... look at the lines that say state and scan... they'll show you resilvering complete and stuff.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Ahhh,

I had to go to the console. The GUI would only show me a little bit of what the output was (no way to resize the shell window, or page up). It has over an hour left.....but I am getting a lot of the same errors on the new drive. So I'm guessing that it's a problem with the backplane. Going to have to order a new one.

Thanks for the help!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You can see stuff that scrolls to fast by adding the | more command. For example.. zpool status | more.

Post the output of zpool status.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Output of zpool status as requested;
Code:
  pool: TRINITY_RAID-02                                                       
state: DEGRADED                                                               
status: One or more devices is currently being resilvered.  The pool will     
        continue to function, possibly in a degraded state.                   
action: Wait for the resilver to complete.                                     
  scan: resilver in progress since Fri Aug  2 17:25:31 2013                   
        2.34T scanned out of 3.34T at 356M/s, 0h49m to go                     
        327G resilvered, 69.99% done                                           
config:                                                                       
                                                                               
        NAME                                              STATE    READ WRITE C
KSUM                                                                           
        TRINITY_RAID-02                                  DEGRADED    0    0 
  0                                                                           
          raidz2-0                                        DEGRADED    0    0 
  0                                                                           
            gptid/78a36f95-e66d-11e2-aea7-002590ab7843    ONLINE      0    0 
  0                                                                           
            gptid/791e1c0e-e66d-11e2-aea7-002590ab7843    ONLINE      0    0 
  0                                                                           
            gptid/79a1317d-e66d-11e2-aea7-002590ab7843    ONLINE      0    0 
  0                                                                           
            gptid/7a25ab00-e66d-11e2-aea7-002590ab7843    ONLINE      0    0 
  0                                                                           
            replacing-4                                  OFFLINE      0    0 
  0                                                                           
              6728111850616894390                        OFFLINE      0    0 
  0  was /dev/gptid/7aafe464-e66d-11e2-aea7-002590ab7843                     
              gptid/d2369baa-fbca-11e2-9a7e-002590ab7843  ONLINE      0    0 
  0  (resilvering)                                                           
            gptid/7b33f2c1-e66d-11e2-aea7-002590ab7843    ONLINE      0    0 
  0                                                                           
            gptid/7c377b71-e66d-11e2-aea7-002590ab7843    ONLINE      0    0 
  0                                                                           
        spares                                                                 
          gptid/7cdcdaeb-e66d-11e2-aea7-002590ab7843      AVAIL               
                                                                               
errors: No known data errors                                                    
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I don't see any errors there.. what errors are you referring to?
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
They came up when I re-started the server, and started resilvering. Very similar to the first post;

Code:
    +++ /tmp/security.GlB5bLOs      2013-08-01 03:01:00.000000000 -0600
    +(da12:mps1:0:4:0): READ(10). CDB: 28 0 c9 29 ed c8 0 0 8 0
    +(da12:mps1:0:4:0): CAM status: SCSI Status Error
    +(da12:mps1:0:4:0): SCSI status: Check Condition
    +(da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)
    +(da12:mps1:0:4:0): READ(10). CDB: 28 0 2 7a fb 58 0 0 8 0
    +(da12:mps1:0:4:0): CAM status: SCSI Status Error
    +(da12:mps1:0:4:0): SCSI status: Check Condition
    +(da12:mps1:0:4:0): SCSI sense: ABORTED COMMAND asc:47,3 (Information unit iuCRC error detected)


it only shows up at the console, when I page up. Had to take a couple photos to get it all;

Code:
    +(da15:mps1:0:8:0): WRITE(10). CDB: 2a 0 8 2b aa b8 0 0 68 0 length 53248 SMID 717 terminated ioc 804b scsi 0 state c xfer 0
    +(da15:mps1:0:8:0): WRITE(10). CDB: 2a 0 8 2b a7 e8 0 0 68 0 length 53248 SMID 3001 terminated ioc 804b scsi 0 state c xfer 0
Several extra lines like the above one..........but I don't have the patience to type them all out........
    +(da15:mps1:0:8:0): CAM status: SCSI Status Error
    +(da15:mps1:0:8:0): SCSI status: Check Condition
    +(da15:mps1:0:8:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
    +(da15:mps1:0:8:0): WRITE(10). CDB: 2a 0 8 2c 89 30 0 0 d8 0
    +(da15:mps1:0:8:0): CAM status: SCSI Status Error
    +(da15:mps1:0:8:0): SCSI status: Check Condition
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah.. da12 and da15 are having issues... probably is a backplane.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Actually, I was just thinking.....it could be a cable too. I'll try changing that one out when I can shut the server down again (after the resilvering). That would be much easier to change out than the backplane. Otherwise, I'll get a couple backplanes on order. Good to have a spare handy in case it happens on another of the servers.

Thanks for the help.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Ok, I got my new backplane and have installed it. The pool is now showing as HEALTHY and ONLINE in the GUI......but.....

I still am getting errors on the nightly status email;
Code:
Checking status of zfs pools:
NAME              SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
TRINITY_RAID-01    19T  16.7T  2.27T    88%  1.00x  ONLINE  /mnt
TRINITY_RAID-02    19T  4.78T  14.2T    25%  1.00x  ONLINE  /mnt
TRINITY_RAID-03    19T  4.43T  14.6T    23%  1.00x  ONLINE  /mnt
 
  pool: TRINITY_RAID-02
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
  scan: scrub repaired 0 in 0h0m with 0 errors on Sun Aug 11 00:00:06 2013
config:
 
        NAME                                            STATE    READ WRITE CKSUM
        TRINITY_RAID-02                                ONLINE      0    0    0
          raidz2-0                                      ONLINE      0    0    0
            gptid/78a36f95-e66d-11e2-aea7-002590ab7843  ONLINE      0    0    0
            gptid/791e1c0e-e66d-11e2-aea7-002590ab7843  ONLINE      0    0    0
            gptid/79a1317d-e66d-11e2-aea7-002590ab7843  ONLINE      0    0    0
            gptid/7a25ab00-e66d-11e2-aea7-002590ab7843  ONLINE      0    0    0
            gptid/d2369baa-fbca-11e2-9a7e-002590ab7843  ONLINE      0    0    2
            gptid/7b33f2c1-e66d-11e2-aea7-002590ab7843  ONLINE      0    0    0
            gptid/7c377b71-e66d-11e2-aea7-002590ab7843  ONLINE      0    0    0
        spares
          gptid/7cdcdaeb-e66d-11e2-aea7-002590ab7843    AVAIL


The drive "gptid/d2369baa-fbca-11e2-9a7e-002590ab7843 ONLINE 0 0 2" is DA15, the drive that I was having problems with before changing out the backplane.

Are there some steps that I might have missed in getting this pool back up and running properly? I'm scrubbing the volume now, but am unsure as to how long it will take and when it will finish.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Sorry for wasting anyone's time, but I think I figured it out.

Ran zpool clear TRINITY_RAID-02, and it seems to be good now. I'll rescrub the pool and see if anything else turns up.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
zpool clear doesn't fix any errors. It just makes those numbers 0. If you had errors during a scrub and you rescrubbed and still have errors then you have a problem that needs to be addressed. In essence you are minus 1 disk of redundancy because of the errors.
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Cyberjock,

Wow, it's almost been a month already.....yikes!!!

After running the zpool clear I haven't have any more errors with the pool or drive. It's all good now.

Long story short, the backplane was faulty. Got a replacement from Norco pretty quickly, and got a spare for the future.....just in case. After changing it out, was still getting errors. Ran zpool clear and everyting was cleared from the pool. Now all reports are good.

Code:
Checking status of zfs pools:
NAME              SIZE  ALLOC  FREE    CAP  DEDUP  HEALTH  ALTROOT
TRINITY_RAID-01    19T  17.6T  1.41T    92%  1.00x  ONLINE  /mnt
TRINITY_RAID-02    19T  4.31T  14.7T    22%  1.00x  ONLINE  /mnt
TRINITY_RAID-03    19T  3.35T  15.6T    17%  1.00x  ONLINE  /mnt
 
all pools are healthy


Now I have to figure out how to get drive temperatures to be added to my daily report......but that's for another post.

Thanks for your help with diagnosing this problem. Will change post to [SOLVED].
 

tstorzuk

Explorer
Joined
Jun 13, 2011
Messages
92
Hmm, can't find a way to change the original header title.....can't change to [SOLVED].
 
Status
Not open for further replies.
Top