SOLVED nfsd consumes too much CPU

Status
Not open for further replies.

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Something strange is going on on one of the FreeNAS (9.1.1) servers I have: after awhile nfsd starts consuming A LOT of CPU - up to 1200% (I have the "Number of servers" set to 12 on a 16-core server).

The two servers are virtually identical - one in NY and one in CA, serving the same content to the same number of identically configured web servers, running Ubuntu (all servers are virtual machines, and are clones made from the same original VM).

Both servers have 2x8core CPUs, 32G of RAM and zpool consists of 4 striped 2-disk mirrors + 2 striped log disks. All sysctls and tunables are exactly the same on both servers. The volume(s) in question is shared as read-only for all servers, plus read-write for one server in each location (this server is responsible for keeping content of the shared volume at both locations in sync).

The load on NY server is actually slightly higher, overall, as there are several other volumes shared, but the nfsd rarely - if ever - uses more than 200% of CPU. The server in CA after reboot shows nfsd using around 50-100% of CPU, but after a couple of hours that number grows to 500-800%, and might eventually reach 1200%.

My attempts to use "truss" or "ktrace" on nfsd process in CA resulted in sever crash and reboot :(

Not sure what's going on :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
How big are your log disks?
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
2x900G, but it rarely uses more than a few dozen megabytes.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah.. you need to change that. It's impossible for you to use that much space. I'd be shocked if it even used 1GB of disk space. The horror on my face that you bought 900GB SSDs for that :(
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Well, I initially bought them as a backup for the dualSD storage I use to boot the Freenas from, and these were the smallest ones available :)

The SD boot seems to work fine so far, so I thought I'd use them for logs. But I'm ordering SSDs for both logs and l2arc.

Still, it's the same setup on both Freenas servers - and yet on one if them CPU consumed by nfsd process grows to insane levels, while on the other it's working fine :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Wait a minute. Now you're really throwing me for a loop. Do you have both logs and an l2arc? Are your current logs(and/or l2arc) SSD right now? And how were 900GB SSDs the smallest you could find? 900GB SSD disks aren't exactly pocket change.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
No, right now both disks (HDDs) are used for logs, in striped mode (for performance). I have nothing else to do with them anyway - it's Dell R720xd server, with two 2.5" slots in the back - that's where these two are installed.

If I buy the SSDs - I'll use one for logs and one - for l2arc (thinking about 200G for logs and 400G for l2arc)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Ok, you need to stop and go do alot more reading. Reasons why I say this:

1. A 2ooGB slog is WAAAAAY too big. You should know that even a 5GB slog is freakin' ginormous for your situation. I'm betting you were disappointed when you say only a few dozen MB used. That's exactly what I'd expect.
2. slogs should be SSD or 15kRPM drives. Nothing else is likely to help, but they can definitely make the pool perform more slowly. I wouldn't be the least bit surprised if they are hurting more than they are helping.
3. 400GB L2ARC is impossible to be used on your system. In fact, i'd bet its far more likely that if you put a 400GB L2ARC on your system you'll find that performance gets worse.

So you need to stop and go do a lot more reading. You can't throw a bunch of hardware at it and be guaranteed to make it faster. It has to be right-sized for your server and what you use the server for or you'll be throwing more money at the server when performance gets worse.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Well, the SSDs - especially the SAS, SLC type - aren't available in many sizes. I'm not even sure I can get one smaller than 200GB.

I did read about separate disks for l2aec and don't recall any specific recommendations on the size.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, the SSDs - especially the SAS, SLC type - aren't available in many sizes. I'm not even sure I can get one smaller than 200GB.

I did read about separate disks for l2aec and don't recall any specific recommendations on the size.

That's because we can't just throw out some simple thumb rule for recommended size. Deciding what size to use is pretty complicated and depends on a whole bunch of factors. The expectation is that you'll do your homework and figure out what factors are important for your situation.

You don't need to use SLC(although those are definitely the best for slogs). There are plenty of SSDs out there that are MLC that provide high-endurance writes considering that SLC is practically non-existent. You'll pay more per GB for those than your standard consume drive, but again, you have to figure out what sizes to go with and such.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Well, for me there's very little difference bet win the cost if MLC and SLC drives, so...

The pool size I have is 10T, with a lot of small files (about 20 million files). Also, VMware images are stored on it.

My thoughts were that 400GB would accommodate all the content read throughout the day, with little need to read much from the actual HDDs. Especially for VMware - there are about 50 VM images on each server, up to 20G each. The default configuration makes VMware work extremely slow with Freenas, so I had to disable sync for its dataset. I'm hoping addition of SSDs will remove the need to have sync disabled.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
All I can say is you need to do more reading. Your misconceptions are getting worse the more you say(no offense). While you are welcome to buy a 400GB l2arc and install it, you'll never be able to use all 400GB no matter how hard you try. But anyway...

The fact that you even considered disabling sync just baffles me. If you read around here at all about sync, its borderline irresponsible to do it for any reason beyond testing. It's clearly documented in ZFS manuals everywhere that sync shouldn't be disabled for any reason except for testing. It's an excellent way to clobber your data up really fast and thoroughly.

I truly hope you keep religious backups.

The more you say the more you scare me.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
I know about sync. But with it on, Freenas absolutely can't be used to host VMware images - the performance is absolutely horrendous (about 3 megabytes per second). The server's got redundant, battery and generator backed power.

But all this us besides the point. I'm curious why nfsd starts using way too much CPU on one if the servers, when both servers are virtually identical.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I know about sync. But with it on, Freenas absolutely can't be used to host VMware images - the performance is absolutely horrendous (about 3 megabytes per second). The server's got redundant, battery and generator backed power.

Totally normal, totally expected, and totally able to be worked around.

As for your nfsd thing I don't have a solid answer for you(notice nobody else has your problem?), but I'd consider stopping and redoing the administration of your server after doing a lot of reading. My guess is you're making a whole laundry list of mistakes(common for those that don't want to read and/or do things with a limited budget but expect the world) and you'll find things improve when you reconfigure your server properly.

It's well documented that ZFS + ESXi VMs = suck for performance. But there are plenty of solutions:

1. More RAM(64GB+ if you have lots of VMs, which you do)
2. Add an L2ARC.
3. Add a ZIL.
4. Tweak ZFS.
5. Combination of the above.
6. Go to UFS

Notice I didn't include options like "disable sync" or "make it work with a limited budget". You either do the proper things or you are asking for trouble. Most of this technology won't forgive you if you mis-administer your server. The expectation that you are informed and know what you are doing is absolutely valid. Just ask the people before you that have lost their entire pools without warning after their server worked great for a while with no problems.

I'm not sure why you chose to go with ZFS and disable sync versus moving over using UFS. I'd have expected more happiness and reliability with UFS than doing ZFS with sync = disabled. But to each their own.

By the way... sync=disabled isn't safe just because you have a backup generator and a UPS. But I'll let you read up on why that's the case.

I'll just say what I said before. I truly hope you keep religious backups.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
I know exactly what sync=disabled does. And I've already said that adding SSDs for ZIL and l2arc is for this very purpose. Besides sync=disabled the only other option in zfs is to disable cache flush, which is a global setting. I don't want it to be global, I only want to have one dataset, and I accept the risk, until I get the SSDs installed.

The main reasons I use zfs is snapshots and replication.

And contrary to what you think, I have done a lot of reading, besides my own experience dealing with servers for over 20 years, which gives me the knowledge of what I might want to tweak, so I have a feeling that I know what I'm doing.

I would have used truss or ktrace to see what keeps nfsd so occupied, but they crash the server. So my only recourse is to ask others and see if anyone has any ideas on what might be causing this behavior. Someone who is familiar with FreeBSD innards and who might know what nfsd might be doing.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you've done a lot of reading you'd also know that with sync=disabled the ZIL is effectively short circuited. But we won't talk about that because you've told me that "you've done a lot of reading" and your 20 years in IT must be worth something, right?

And to be bluntly honest, I hear all the time that people have done a lot of reading in the forums. But if you don't do enough and of the right stuff I don't really care how much you did. Some people consider "a lot" to be an hour, others spend weeks on it. And if you don't read the right stuff you can read the wrong stuff for years and still never grasp the fundamentals. And 20 years of servers does NOT prepare you for ZFS. At all. Do you have any idea how many people have shown up here and been smacked around with ZFS despite being in the IT and server industry longer than I've been standing upright? ZFS is nothing like you are used to. So unless you're about to tell me you have years of experience in ZFS, you are absolutely a newbie to ZFS. But hey, go ahead and keep that "feeling" that you think you know what you are doing. You're the one with the broken server you can't explain and aren't willing to even entertain the idea that you need to do more reading. But I don't think I can point a finger at a single thing you've done right with your configuration aside from doing mirrored vdevs.

Good luck!

PS. Don't expect me to respond to you any further. You clearly aren't willing to consider any mistakes you've made that maybe are responsible for your problem. And not surprisingly nobody else has posted either...for obvious reasons.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
I ask a question and expect someone who knows an answer to chime in. I'm not asking anyone to judge my level if expertise in any topic. What you're doing is trying to start a flame. I'm merely asking for an answer to a specific question. What nfsd is doing and how it's doing it has nothing to do with zfs. The issue is clearly related to networking, regardless of what file system is being used. My guess is that probably for some reason a lot of files are being opened and not closed, leaving nfsd unable to handle the table of open filehandles. Just one possibility. Could be a bug - this had never happened on FreeNAS 8.3.

P.S. Curious that it's not the first time you start talking completely off-topic. Strange thing to do for an administrator.
 

DJ9

Contributor
Joined
Sep 20, 2013
Messages
183
Actually, CJ was just concerned about your setup going kaboom and pointing out some of these issues. I didn't take his response as "trying to start a flame".

Best of luck.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Like I said - these are acceptable risks. For me the performance is more important than reliability of a single server - I have plenty of redundancy and fallback mechanisms implemented for that. Freenas in my production network is merely an experiment - a test of sorts, whether I want to look further into TrueNAS or not (or, more specifically, into FreeBSD, zfs-based NAS in general).

So far it failed in terms of live replication - it's only good for backup. The replica is unusable in live production.

Now this odd nfsd problem - which may or may not be related to Freenas specifically.

While none of this directly applies to TrueNAS, it provides valuable information about what to look for, should I decide to give it a shot.

As for the equipment I use - I buy things I can reuse, if this experiment fails. Hence my choice of servers, CPU, HDDs and SSDs.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
So, the problem was resolved by re-creating the zpool. After that - nfsd is now using about 40-50% of CPU. Weird... Could be a result of an interrupted replication.
 
Status
Not open for further replies.
Top