swap_pager: I/O error - pagein failed

Status
Not open for further replies.

mdreed

Dabbler
Joined
Nov 18, 2018
Messages
13
Hey guys,

I've got a several year old FreeNAS server (an ASRock C2550D4I w/ 8 gigs ECC memory, 2x5 TB & 2x3 TB WD Red NAS hard drives) that has been causing me trouble. It'll be fine at boot, but if I leave it alone for a little while (a day, maybe a few hours even) it tends to freeze up and require reboot. When it does this, I can't access the UI or even ssh.

It hasn't been a priority until recently when I had some time (California fires closed my work for a week, so side projects finally moved to the top of the pile). When I connected a monitor to it when its frozen, I get this. From what I've gleaned online from others with similar errors, it seems like FreeNAS is trying to pagein from a drive that is throwing errors?

Here's zpool status (note I just upgraded from FreeNAS 9 to 11 and haven't upgraded my pools yet). Still running a scrub, but haven't seen anything amiss yet.

zpool status
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:10:43 with 0 errors on Fri Nov 16 03:55:43 2018
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/46a3905f-8166-11e4-b1b4-d050992dfacc ONLINE 0 0 0

errors: No known data errors

pool: plex
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub in progress since Fri Nov 16 14:49:42 2018
2.52T scanned at 309M/s, 2.18T issued at 79.1M/s, 2.91T total
0 repaired, 75.02% done, 0 days 02:40:35 to go
config:

NAME STATE READ WRITE CKSUM
plex ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/29ba2f2f-82f2-11e4-8cac-d050992dfacc ONLINE 0 0 0
gptid/2a3ba454-82f2-11e4-8cac-d050992dfacc ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/1492339f-ae78-11e4-984d-d050992dfacc ONLINE 0 0 0
gptid/154e3a7b-ae78-11e4-984d-d050992dfacc ONLINE 0 0 0

errors: No known data errors

The output of smartctl -x on the four drives is here. One thing that might indicate the problem is when I executed the command on /dev/ada2, it froze the machine for a while before eventually saying:
> Read SCT Status failed: Input/output error
> SCT (Get) Error Recovery Control command failed

During the time it was hanging, error messages saying "swap_pager: indefinite wait buffer: bufobj: 0, blink: 5021, size: 20480" came up on the screen. Once it got through the wait period, though, things seemed to recover fine.

So! What should I do? Is /dev/ada2 dead or dying? Is there a way I can further diagnose the problem? Is the 8 gigs of memory I have too little? Should I try using this script to periodically pagein?

The other possibility I was wondering about is whether the USB stick I have FreeNAS installed on could be dying. Is there any way to check that?

Thanks for any advice!
 

mdreed

Dabbler
Joined
Nov 18, 2018
Messages
13
Yikes. Now using smartctl on either /dev/ada2 or /dev/ada3 frequently causes the hanging problem. ada0 and ada1 never have. Could *two* disks be dying?

Also, zpool status has degraded dramatically:

% zpool status

pool: freenas-boot

state: ONLINE

scan: scrub repaired 0 in 0 days 00:10:43 with 0 errors on Fri Nov 16 03:55:43 2018

config:


NAME STATE READ WRITE CKSUM

freenas-boot ONLINE 0 0 0

gptid/46a3905f-8166-11e4-b1b4-d050992dfacc ONLINE 0 0 0


errors: No known data errors


pool: plex

state: UNAVAIL

status: One or more devices are faulted in response to IO failures.

action: Make sure the affected devices are connected, then run 'zpool clear'.

see: http://illumos.org/msg/ZFS-8000-JQ

scan: scrub in progress since Fri Nov 16 14:49:42 2018

2.82T scanned at 102M/s, 2.54T issued at 63.9M/s, 2.91T total

0 repaired, 87.15% done, 0 days 01:42:15 to go

config:


NAME STATE READ WRITE CKSUM

plex UNAVAIL 0 0 0

mirror-0 UNAVAIL 0 0 0

1574802938002244967 REMOVED 0 0 0 was /dev/gptid/29ba2f2f-82f2-11e4-8cac-d050992dfacc

13989635869729815871 REMOVED 0 0 0 was /dev/gptid/2a3ba454-82f2-11e4-8cac-d050992dfacc

mirror-1 DEGRADED 0 0 0

gptid/1492339f-ae78-11e4-984d-d050992dfacc ONLINE 0 0 0

14211114452282420954 REMOVED 0 0 0 was /dev/gptid/154e3a7b-ae78-11e4-984d-d050992dfacc


errors: 15 data errors, use '-v' for a list
 

mdreed

Dabbler
Joined
Nov 18, 2018
Messages
13
Now ad1, ada2, and ada3 are no longer visible to smartctl. Ada0 remains all-ok, as far as I can tell. Oof.

% sudo smartctl -x /dev/ada1

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org


/dev/ada1: Unable to detect device type

Please specify device type with the -d option.


Use smartctl -h to get a usage summary
 
Status
Not open for further replies.
Top