Freenas becomes unresponsive, console keeps printing "swap_pager_getswapspace"?

Status
Not open for further replies.

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
I have a FreeNAS system that seems to become unresponsive after some time.

The machine has 64 GB of RAM. I am running a single Bhyve VM with 8 GB of RAM, and running a Docker instance with qbittorrent inside).

The web interface becomes unresponsive - when I try to access it, I see the error message:

> Connecting to NAS... Make sure the NAS system is powered on and connected to the network.

On the console, I see the following error messages continually printed:

Code:
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(3): failed


Any idea on what's going on?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

toadman

Guru
Joined
Jun 4, 2013
Messages
619
I have a FreeNAS system that seems to become unresponsive after some time.

The machine has 64 GB of RAM. I am running a single Bhyve VM with 8 GB of RAM, and running a Docker instance with qbittorrent inside).

The web interface becomes unresponsive - when I try to access it, I see the error message:

> Connecting to NAS... Make sure the NAS system is powered on and connected to the network.

On the console, I see the following error messages continually printed:

Code:
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(3): failed


Any idea on what's going on?


My guess is you are running a version of freenas with a known memory leak from either SMB or SNMP.
 

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
Please share the hardware and software information about your system per the:

Updated Forum Rules 4/11/17
https://forums.freenas.org/index.php?threads/updated-forum-rules-4-11-17.45124/

Sorry, you're right, I should have included that.

Hardware is:
  • SuperMicro A2SDi-8C+-HLN4F
  • 64GB RAM
  • 6 x 8TB WD HDDs, currently in RAID-Z1
Software is FreeNAS-11.2-MASTER-201806210452.

I guess it could be a memory leak.

I was thinking maybe it was qBittorrent - however, this is running inside Docker, which is itself running inside a Bhyve VM instance, which is limited to 8GB of RAM.

Is there any other diagnostic information I can fetch from the machine itself?
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
I haven't checked the updates for the 11.2 branches. I know the SMB leak was fixed in 11.1-U5. But the SNMP leak has an 11.2 target. No idea if the version of 11.2 you have includes the fix.

But something is causing swap and you should investigate by instrumenting the system and looking at what process(es) is using memory.
 

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
I believe the issue is linked to FreeNAS + Bhyve VM + Docker.

That is - if I just boot up FreeNAS and then also boot up the Bhyve VM (running Ubuntu 18.04) it is fine.

However, if I then start the Docker container (https://github.com/wernight/docker-qbittorrent), some time afterwards, the machine becomes unresponsive, and the console output contains that "
swap_pager_getswapspace(2): failed" line printed over and over again.

I will verify again tonight - I'll have had the FreeNAS + Bhyve VM running for a day by then.

This doesn't make sense though.

My Bhyve instance is configure for 8GB of RAM - the machine has physical 64GB of RAM.

How can something running in Docker, somehow escape the Bhyve instance, and the configured RAM limit and take down the machine?

What sort of instrumentation were you thinking. Happy to share any diagnostics or metrics you think would help solve this mystery.

I can also try updating to the newest nightly if you think it's been fixed since the 201806210452 build (i.e. build from 4 days ago).
 

-fun-

Contributor
Joined
Oct 27, 2015
Messages
171
But something is causing swap and you should investigate by instrumenting the system and looking at what process(es) is using memory.

I had the very same problem this morning and had to reboot my system. The system was up with stable swap usage for a very long time before. Something must have caused the swap to increase "suddenly". I don't have a VM running but some iocage jails. It may have been one of them that caused the problem today but that is guesswork. I was running 11.1 RELEASE (update to 11.1 U5 is just now running).

While I know how to identify the culprit when I'm at the console I just cannot wait for weeks or even months for the problem to occur.

So how can I instrument the system so that I get an early warning (preferably via email) when swap memory is above a defined threshold?

Also I would love to know whether there is any way to safeguard the system so it does not become completely unresponsive if swap is full.

And is there any way to increase swap space?

-fun-
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Software is FreeNAS-11.2-MASTER-201806210452.
I had to get home where I could look that, "201806210452" number up. I thought you were talking about 11.1.U2, but I was wrong.
The 11.2 branch is a development branch, not a release branch.
If you are looking to have a stable system, you should be on the "FreeNAS-11-STABLE" train.
The current stable release is the 11.1.U5 and the last I read they are not quite ready to release the 11.2 as it is still in development, not ready for production.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
And is there any way to increase swap space?
Unless you want to destroy your pool and create it again, the only way to add more swap would be to add more drives. Each pair of drives gets a swap mirror created during system boot. If you have an odd number of drives, it only uses an even number of them. The default is to create a 2GB swap partition on each drive but those partitions are mirrored, so if you have two drives you only get 2GB of swap. If you have 4 drives, you get 4GB of swap and that continues but only up to a certain number of drives. I have 32 drives in my primary NAS and I only get 5GB of swap space. Under normal conditions, you shouldn't need any.
 
Last edited:

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
I can confirm that the issue does not occur with just FreeNAS + Bhyve VM (Ubuntu 18.04). I've had it running for around 24 hours to verify.

However, it is consistently reproducible with FreeNAS + Bhyve VM (Ubuntu 18.04) + Docker container (qBittorrent-nox).

This is very odd - as I would have thought the Bhyve VM would limit everything to the configure memory (8GB) - the machine has physical 64 GB of RAM.

Could it be possible there is some bug with Bhyve that somehow lets it overcome this memory limit? Or is this perhaps some expected behaviour I'm not aware of?

And yeah, I know, nightly =), but that's why I'm doing this testing now, and I'm hoping I can gather enough info to file a helpful bug report, and get it fixed.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I guess it could be a memory leak.
You should submit a ticket on this.
On the console, I see the following error messages continually printed:

Code:
swap_pager_getswapspace(2): failed
swap_pager_getswapspace(3): failed


Any idea on what's going on?
Something about the virtualization is causing the system to try and allocate swap space and it becomes unresponsive when it fails. Your system has 6 drives, so it should have between 5 and 6 GB of swap space available, but it should not need (normally) to allocate that space. When you create a ticket, it will upload a lot of log files and diagnostic data to the redmine system that will, hopefully, allow the development team to find the issue.
 

victorhooi

Contributor
Joined
Mar 16, 2012
Messages
184
Do I need to file this immediately after rebooting after I see the swap message?

(I can't do it during the issue as FreeNAS becomes unresponsive).
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Do I need to file this immediately after rebooting after I see the swap message?

(I can't do it during the issue as FreeNAS becomes unresponsive).
You could probably file the report now because some of the diagnostic information they will need should certainly still be available. Also be sure to give a good explanation of the symptoms and circumstances. There have been a lot of problems with FreeNAS crashing because of running out of swap space (and related issues) so it is something they want to find and fix. They may ask you for additional information after they review the post.
 

ctphillips

Dabbler
Joined
Apr 20, 2012
Messages
12
I am seeing something very similar in the latest 11.2 RC2 version. I attempt to copy a large amount of data via AFP (via rsync from an older FreeNAS 9.3 server) and the copy eventually hangs with similar messages appearing in the console. The 11.2 server just disappears off the desktop of the intermediary computer running the rsync. The 11.2 unit becomes unresponsive and requires a forced shutdown.

My server config. is as follows:

Dell R510
48 GB of ECC RAM
24 TB of storage
PERC H310 (flashed to IT mode)

Both the 9.3 server and 11.2 RC2 units are connected to a router along with a Mac laptop acting as an intermediary. The Mac laptop connects to each unit via AFP and runs a command like so:

rsync -a /Volumes/9-3Server/OldBigFolder/ /Volumes/11-2RC2Server/NewBigFolder

Again, the transfer eventually hangs, the 11.2 unit disappears from the Mac's desktop and the 11.2 unit becomes entirely unresponsive with just repeated messages appearing in the console.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I would say that you should build your new system with FreeNAS 11.1-U6 instead of using the RC (release candidate) which is still in development and could have some bugs. I use rsync daily on my 11.1-U6 systems and it works fine.
along with a Mac laptop acting as an intermediary.
Why? You can easily do an rsync directly from one FreeNAS to another with no intermediate step. It is even possible that the intermediate step is the cause of the problem.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
and the 11.2 unit becomes entirely unresponsive
PS. you might want to report this as a bug, so the developers can look into it. Also, remember what it says below, highlighted in yellow:

upload_2018-11-18_1-58-33.png


I have an 11.2 system running but it is not my primary system, not even a backup.
 

ctphillips

Dabbler
Joined
Apr 20, 2012
Messages
12
I would say that you should build your new system with FreeNAS 11.1-U6 instead of using the RC (release candidate) which is still in development and could have some bugs. I use rsync daily on my 11.1-U6 systems and it works fine.

Why? You can easily do an rsync directly from one FreeNAS to another with no intermediate step. It is even possible that the intermediate step is the cause of the problem.

Thanks for the reply. I have taken your recommendation and stepped back to 11.1-U6. As for *why* I was migrating the data that way, the answer is easy - I'm a relative n00b. I understand rsync in the Mac command line very easily. Despite this being my THIRD FreeNAS server I have never clearly understood snapshotting and ZFS replication. However...I'm learning. I have just set up a ZFS replication task from the old server to the new server using the online documentation as a guide and though it took a couple of tries - it appears to be working! (I had to "enable" the periodic snapshot task as the last step.) The new server is chattering away and I can see that the old server is transmitting data at a decent clip.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I understand rsync in the Mac command line very easily. Despite this being my THIRD FreeNAS server I have never clearly understood snapshotting and ZFS replication.
The command line use of rsync on FreeNAS is almost identical to the MAC although there are some small details that are different because they are two different flavors of Unix.
Mac OS is a certified version of Unix, by the way, where FreeNAS, based on FreeBSD is also a version of Unix, but without a certification as those things cost big money.

ZFS replication can be a challenge to get working, but once you have it working it can be really nice. Do let us know how it works out.
 
Status
Not open for further replies.
Top