HELP!: "swap_pager: I/O error - pagein failed"

Status
Not open for further replies.
Joined
Jun 26, 2012
Messages
260
Error is starting to show up repeatedly and often:
swap_pager: I/O error - pagein failed; blkno 966,size 8192, error6
vm_fault: pager read error, pid 2361

If I leave the box alone for a while (day or two) it will have this error repeated several times.
If I click enter on the keyboard it goes back to the standard "ready" screen. For a while
after I noticed this issue, the GUI would still work, but now I get "The page you are looking for is temporarily unavailable..."

GUI Alert was "One or more devices has experienced an unrecoverable error...Determine if the device needs to be replaced"

Is this basically telling me one or both of my drives have bad sectors?
How can I figure out which one is bad?
Should I just replace? Or is there a way I can salvage the failing drive and still have reliability?


8GB RAM, FreeNAS 8.3.0-RELEASE, 2 3TB HDD RAID1
 
D

dlavigne

Guest
It doesn't sound very good for the drive. The 8.3.0 Guide should have instructions on how to view the failed drive and replace it. You could also paste the output of zpool status here, between code blocks.
 
Joined
Jun 26, 2012
Messages
260
I'm a complete noob. I know enough to be able to put this together, but actual troubleshooting
and commands, not so much. Not sure how to get a full error log...

Everything shows up as healthy...

Code:
[root@freenas ~]# zpool status -x                                             
all pools are healthy                                                         
[root@freenas ~]# zpool status                                                 
  pool: CVZData1                                                               
state: ONLINE                                                                 
  scan: none requested                                                         
config:                                                                       
                                                                               
        NAME                                            STATE    READ WRITE CKS
UM                                                                             
        CVZData1                                        ONLINE      0    0   
0                                                                             
          mirror-0                                      ONLINE      0    0   
0                                                                             
            gptid/25e5cb57-c195-11e1-b9d6-6c626d8c6b58  ONLINE      0    0   
0                                                                             
            gptid/268bcf06-c195-11e1-b9d6-6c626d8c6b58  ONLINE      0    0   
0                                                                             
                                                                               
errors: No known data errors                      
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Please run smart tests on both of your drives and post the output in code tags.

Syntax should be along the lines of: smartctl -a -q noserial /dev/adaX

Substitute your device names in place of adaX


Sent from my phone
 
Joined
Jun 26, 2012
Messages
260
Completely unsure how to retrieve the log/info from the smart tests. My command line knowledge is nil.
If someone could walk me through it, that would be great. Googling/searching the forums is not helpful as all posts I've
found assume a relatively well versed knowledge.

Anyway, I can see that ada1p2 is not functioning (GUI is functioning but the error is clearly back).
I have 2 SATA drives. Does this mean that the drive connected to STAT port 1 is not functioning and should
simply be replaced?




Data1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada1p2 null 0 0 0
Edit
Replace
ada0p2 ONLINE 0 0 0
Edit
Replace
Offline
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
No, the best way is to it is by serial number. Refer to the section titled "Viewing Disks" in the manual, for instructions on how to retrieve the serial number. Whenever you replace the drive - please follow the directions in the manual.

If you executed the command: smartctl -a /dev/ada0 (or /ada1) from the shell (like you did for zpool status), it would also return the serial number as well as detailed information about your drive and it's health. If all the information for one disk doesn't scroll off the screen, perhaps you can take a picture of the screen and post them here.

Getting the data via SSH/PuTTY is best, but I assume you don't know how to do it this way, yet.
 
Joined
Jun 26, 2012
Messages
260
This is from ada0:

Code:
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p5 amd64] (local build) 
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net   
                                                                             
=== START OF INFORMATION SECTION ===                                         
Model Family:    Seagate Barracuda (SATA 3Gb/s, 4K Sectors)                 
Device Model:    ST3000DM001-9YN166                                         
Serial Number:    W1F0NAN7                                                   
LU WWN Device Id: 5 000c50 0511e979c                                         
Firmware Version: CC4B                                                       
User Capacity:    3,000,592,982,016 bytes [3.00 TB]                         
Sector Sizes:    512 bytes logical, 4096 bytes physical                     
Device is:        In smartctl database [for details use: -P show]           
ATA Version is:  8                                                         
ATA Standard is:  ATA-8-ACS revision 4                                       
Local Time is:    Tue Mar 18 01:00:16 2014 EDT                               
SMART support is: Available - device has SMART capability.                   
SMART support is: Enabled                                                   
                                                                             
=== START OF READ SMART DATA SECTION ===                                     
SMART overall-health self-assessment test result: PASSED                     
                                                                             
General SMART Values:                                                       
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.         
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 121) The previous self-test completed having
                                        the read element of the test failed. 
Total time to complete Offline                                               
data collection:                (  575) seconds.                             
Offline data collection                                                     
capabilities:                    (0x7b) SMART execute Offline immediate.     
                                        Auto Offline data collection on/off supp
ort.                                                                         
                                        Suspend Offline collection upon new 
                                        command.                             
                                        Offline surface scan supported.     
                                        Self-test supported.                 
                                        Conveyance Self-test supported.     
                                        Selective Self-test supported.       
SMART capabilities:            (0x0003) Saves SMART data before entering     
                                        power-saving mode.                   
                                        Supports SMART auto save timer.     
Error logging capability:        (0x01) Error logging supported.             
                                        General Purpose Logging supported.   
Short self-test routine                                                     
recommended polling time:        (  1) minutes.
Extended self-test routine                                                   
recommended polling time:        ( 328) minutes.                             
Conveyance self-test routine                                                 
recommended polling time:        (  2) minutes.                             
SCT capabilities:              (0x3085) SCT Status supported.               
                                                                             
SMART Attributes Data Structure revision number: 10                         
Vendor Specific SMART Attributes with Thresholds:                           
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_
FAILED RAW_VALUE                                                             
  1 Raw_Read_Error_Rate    0x000f  111  095  006    Pre-fail  Always      -
      79608032                                                             
  3 Spin_Up_Time            0x0003  093  092  000    Pre-fail  Always      -
      0                                                                     
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -
      53                                                                   
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -
      144                                                                   
  7 Seek_Error_Rate        0x000f  072  060  030    Pre-fail  Always      -
      19186212                                                             
  9 Power_On_Hours          0x0032  084  084  000    Old_age  Always      -
      14217                                                                 
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -
 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 68 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 353 188 Command_Timeout 0x0032 100 098 000 Old_age Always - 17180131332 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 057 048 045 Old_age Always - 43 (Min/Max 23/47) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 65 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 131 194 Temperature_Celsius 0x0022 043 052 000 Old_age Always -
197 Current_Pending_Sector 0x0012 096 088 000 Old_age Always - 704 198 Offline_Uncorrectable 0x0010 096 088 000 Old_age Offline - 704 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 127706557593434 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 31361295986790 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 96105284060958 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Short offline Completed: read failure 90% 14164 149 884 SMART Selective self-test log data structure revision number 1 
 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA _of_first_error # 1 Short offline Completed: read failure 90% 14164 149 884 SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay. 
 
Joined
Jun 26, 2012
Messages
260
ada1 is not found.

Code:
[root@freenas ~]# smartctl -a /dev/ada1 |more                                 
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p5 amd64] (local build)   
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net     
                                                                               
/dev/ada1: No such file or directory                                           
Smartctl: please specify device type with the -d option.                       
                                                                               
Use smartctl -h to get a usage summary  
 
Joined
Jun 26, 2012
Messages
260
Latest zpool status:

Code:
[root@freenas ~]# zpool status -x                                             
  pool: CVZData1                                                               
state: DEGRADED                                                               
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.                 
action: Attach the missing device and online it using 'zpool online'.         
  see: http://www.sun.com/msg/ZFS-8000-2Q                                     
  scan: none requested                                                         
config:                                                                       
                                                                               
        NAME                                            STATE    READ WRITE CKS
UM                                                                             
        CVZData1                                        DEGRADED    0    0   
0                                                                             
          mirror-0                                      DEGRADED    0    0   
0                                                                             
            4209355097016643583                        UNAVAIL      0    0   
0  was /dev/gptid/25e5cb57-c195-11e1-b9d6-6c626d8c6b58                       
            gptid/268bcf06-c195-11e1-b9d6-6c626d8c6b58  ONLINE      0    0   
0                                                                             
                                                                               
errors: No known data errors                                                   
[root@freenas ~]#                
 
Joined
Jun 26, 2012
Messages
260
Heading out to buy a new 3TB drive. Annoyballs.
 
Joined
Jun 26, 2012
Messages
260
This, right:
Code:
6.3.11 Replacing a Failed Drive or ZIL Device
If you are using any form of redundant RAID, you should replace a failed drive as soon as possible to
repair the degraded state of the RAID. Depending upon the capability of your hardware, you may or
may not need to reboot in order to replace the disk. AHCI capable hardware does not require a reboot.
NOTE: a stripe (RAID0) does not provide redundancy. If you lose a disk in a stripe, the data on the
stripe is lost.
Before physically removing the failed drive or ZIL device, go to Storage → Volumes → View Volumes
→ Volume Status and locate the failed device. Once you have located the failed device in the GUI,
perform the following steps:
1. If the disk is formatted with ZFS, click the disk's Offline button in order to change its status to
FreeNAS® 8.3.0 Users Guide Page 115 of 242OFFLINE. This step is needed to properly remove the device from the ZFS pool and to prevent
swap issues. If your hardware supports hot-pluggable disks, click the disk's Offline button, pull
the disk, then skip to step 3.
NOTE: if the process of changing the disk's status to OFFLINE fails with a "disk offline failed - no
valid replicas" message, you will need to scrub the ZFS volume first using its Scrub Volume button in
Storage → Volumes → View Volumes. Once the scrub completes, try to Offline the disk again before
proceeding.
2. If the hardware is not AHCI capable, shutdown the system in order to physically replace the
disk. When finished, return to the GUI and locate the OFFLINE disk.
3. Once the disk is showing as OFFLINE, click the disk's Replace button. Select the replacement
disk from the drop-down menu and click the Replace Disk button. If the disk is being added to a
ZFS pool, it will start to resilver. You can use the zpool status command in Shell to monitor the
status of the resilvering.
4. If the replaced disk continues to be listed after resilvering is complete, use the Detach button to
remove the disk from the list.
In the example shown in Figure 6.3n, failed disk ada0 is being replaced by disk ada3.
 
Joined
Jun 26, 2012
Messages
260
Replaced the drive a few days ago.
Thanks for your help!
 
Status
Not open for further replies.
Top