Memory utilization and performance problem.

datnus · Jan 25, 2013

Hi guy,
I have a freenas 8.0.4 box with
- 6 HDD 7200rpm
- ZFS raidz2 (1 spare)
- Quad core Xeon CPU
- 8 GB RAM
- 2 Gigabit NICs.

The box serves iSCSI for Vmware ESXi 5.
After a while (2-3 months), the freenas seems to be very slow on IO and ESXi will hang.

Any help on what the bottlenecks are and what I should upgrade?

top shows:
[root@data1] ~# top
last pid: 5118; load averages: 0.11, 0.06, 0.04 up 0+08:31:06 18:41:51
58 processes: 1 running, 57 sleeping
CPU: 0.2% user, 0.0% nice, 8.1% system, 0.3% interrupt, 91.4% idle
Mem: 7896K Active, 295M Inact, 7337M Wired, 133M Buf, 259M Free
Swap: 12G Total, 12G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1951 root 15 44 0 192M 30052K ucond 0 5:32 0.00% istgt
1998 root 7 44 0 66324K 6228K ucond 0 0:09 0.00% collectd
1879 root 6 44 0 154M 36816K uwait 1 0:02 0.00% python
4871 root 1 44 0 9916K 1148K nanslp 3 0:01 0.00% vmstat
2395 www 1 44 0 19328K 2292K kqread 1 0:01 0.00% lighttpd
1632 root 1 44 0 11780K 1876K select 1 0:01 0.00% ntpd
2483 root 1 76 0 64100K 4732K ttyin 0 0:00 0.00% python
2128 root 1 47 0 7832K 1208K nanslp 0 0:00 0.00% cron
4850 root 1 44 0 33304K 4428K select 0 0:00 0.00% sshd
1294 root 4 76 0 5684K 992K rpcsvc 1 0:00 0.00% nfsd
1140 root 1 44 0 6904K 1156K select 0 0:00 0.00% syslogd
5031 root 1 44 0 8936K 1176K nanslp 1 0:00 0.00% iostat
1271 root 1 44 0 7836K 1108K select 1 0:00 0.00% rpcbind
1326 root 1 76 0 7956K 1048K rpcsvc 1 0:00 0.00% rpc.lockd
1293 root 1 76 0 5684K 968K select 1 0:00 0.00% nfsd
4852 root 1 44 0 10172K 1764K pause 2 0:00 0.00% csh
1310 root 1 44 0 263M 992K select 1 0:00 0.00% rpc.statd

=> Low CPU%

---------------------------------------
vmstat shows:
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr da0 ad0 in sy cs us sy id
0 0 0 832M 259M 0 0 0 0 198 0 0 13 777 7484 12770 0 1 99
0 0 0 834M 258M 617 0 0 0 640 0 0 7 667 2021 4698 0 0 99
0 0 0 838M 256M 1770 0 0 25 1517 0 0 8 1393 3495 8154 0 1 99
0 0 0 838M 255M 64 0 0 0 38 0 0 25 549 2543 5671 0 0 100
0 0 0 830M 434M 187 0 0 0 29653 0 0 274 1733 2974 20264 0 14 85
0 0 0 830M 442M 0 0 0 0 7515 0 0 396 1955 106 23480 0 4 96
0 0 0 830M 442M 43 0 0 0 5560 0 0 421 2057 106 24983 0 5 95
0 0 0 822M 452M 124 0 0 0 3257 0 0 176 1214 787 14059 0 2 98
0 0 0 822M 451M 563 0 0 0 602 0 0 17 783 2850 6560 0 1 99
0 0 0 822M 451M 0 0 0 0 0 0 0 6 99 351 1219 0 0 100
0 0 0 822M 451M 2 0 0 0 33 0 0 7 159 515 1701 0 0 100
0 0 0 822M 449M 2 0 0 0 35 0 0 41 1127 3772 9724 0 1 99
0 0 0 828M 443M 1594 0 0 0 987 0 0 16 860 2076 5399 0 0 99
=> Most of memory is cached and free memory is slow.

----------------------
iostat shows
tty da0 ada0 ada1 cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id
0 39 0.00 0 0.00 2.00 13 0.03 3.82 17 0.06 3 0 1 0 96
0 117 0.00 0 0.00 2.19 10 0.02 2.24 8 0.02 0 0 0 0 100
0 39 0.00 0 0.00 2.00 6 0.01 2.00 7 0.01 0 0 0 0 100
0 39 0.00 0 0.00 2.00 10 0.02 2.00 7 0.01 0 0 0 0 100
0 39 0.00 0 0.00 7.56 78 0.58 9.30 62 0.57 0 0 1 0 99
0 39 0.00 0 0.00 9.21 21 0.19 7.77 21 0.16 0 0 0 0 100
0 39 0.00 0 0.00 2.00 2 0.00 2.00 3 0.01 0 0 0 0 100
0 39 0.00 0 0.00 2.00 0 0.00 2.00 1 0.00 0 0 0 0 100
0 39 0.00 0 0.00 6.92 43 0.29 7.40 40 0.29 0 0 1 0 99
0 39 0.00 0 0.00 2.00 4 0.01 2.00 4 0.01 1 0 1 0 98
0 39 0.00 0 0.00 2.00 2 0.00 2.00 2 0.00 0 0 0 0 100

=> Not much IO
---------------------

datnus · Jan 25, 2013

And the free memory is decreasing...
Any help is appreciated. :D

sqwob · Jan 25, 2013

if what people generally say about required memory is accurate, you are low on ram in case you have 1tb disks by (8+6 = 6gb low). If you have 2TB disks you are 14GB low for the ideal configuration)

ofcourse load & concurrent users also matters...

Having the hardware you have (xeons etc), definatly upgrade the memory to get the most out of it.

datnus · Jan 25, 2013

sqwob said:
if what people generally say about required memory is accurate, you are low on ram in case you have 1tb disks by (8+6 = 6gb low). If you have 2TB disks you are 14GB low for the ideal configuration)

ofcourse load & concurrent users also matters...

Having the hardware you have (xeons etc), definatly upgrade the memory to get the most out of it.

I have arount 1.78T capacity usage (5 raid-z of 500 GB HDD + 1 spare).
So 16 BG will be sufficient?
What is the optimum amount of RAM in my case?

Thanks so much.

sqwob · Jan 25, 2013

datnus said:
I have arount 1.78T capacity usage (5 raid-z of 500 GB HDD + 1 spare).
So 16 BG will be sufficient?
What is the optimum amount of RAM in my case?

Thanks so much.

In that case memory shouldn't be the bottleneck, according to the magic formula 8gb + 2.5gb = 10.5gb RAM for your configuration.

What percentage of your raid array is full? i've read you get performance degradation at 80% full disks

datnus · Jan 25, 2013

Used capacity is around 1.2TB/1.7TB.
Currently the RAM is only 8GB.

datnus · Feb 5, 2013

This morning the freenas only deliver up to 16MB/s and the latency in ESXi go up to 1000 ms!!!
Please help :(:(:(

I config the MPIO (Multi path IO for ISCSI with 2 NICs on ESXi and 2 NICs on FreeNas. But it doesn't help much.
Where should I upgrade?
Thanks MILLIONS!! :)

top

last pid: 91262; load averages: 0.03, 0.09, 0.08 up 12+00:20:33 10:31:18
57 processes: 1 running, 56 sleeping
CPU: 0.4% user, 0.0% nice, 7.1% system, 1.3% interrupt, 91.2% idle
Mem: 41M Active, 286M Inact, 7095M Wired, 1536K Cache, 206M Buf, 475M Free
Swap: 12G Total, 7220K Used, 12G Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
1998 root 7 44 0 66836K 3596K ucond 1 4:48 0.00% collectd
86744 root 23 44 0 244M 196M ucond 1 4:29 0.00% istgt
1879 root 6 44 0 168M 47056K uwait 2 0:54 0.00% python
1632 root 1 44 0 11780K 916K select 0 0:21 0.00% ntpd
2395 www 1 44 0 19328K 1772K kqread 0 0:16 0.00% lighttpd
2128 root 1 47 0 7832K 644K nanslp 1 0:04 0.00% cron
1140 root 1 44 0 6904K 664K select 1 0:03 0.00% syslogd
1271 root 1 44 0 7836K 516K select 0 0:01 0.00% rpcbind
90411 root 1 44 0 33304K 3840K select 0 0:00 0.00% sshd
91087 root 1 44 0 9224K 1940K CPU2 3 0:00 0.00% top
2483 root 1 76 0 64100K 1892K ttyin 0 0:00 0.00% python
90413 root 1 44 0 10172K 2212K ttyin 0 0:00 0.00% csh
91073 root 1 44 0 10172K 1492K pause 0 0:00 0.00% csh
820 root 1 44 0 3204K 132K select 2 0:00 0.00% devd
2100 root 1 44 0 24976K 1480K select 0 0:00 0.00% sshd
2485 root 1 76 0 6772K 404K ttyin 3 0:00 0.00% getty
2484 root 1 76 0 6772K 404K ttyin 0 0:00 0.00% getty

iostat

[root@data1] ~# zpool iostat data1 2
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
data1 2.07T 186G 152 267 1.18M 1.87M
data1 2.07T 186G 1.55K 0 12.3M 0
data1 2.07T 186G 1.58K 0 12.6M 0
data1 2.07T 186G 1.74K 0 13.9M 0
data1 2.07T 186G 243 500 1.83M 2.19M
data1 2.07T 186G 274 2.29K 2.13M 17.5M
data1 2.07T 186G 1.32K 0 10.5M 0
data1 2.07T 186G 1.29K 0 10.2M 0
data1 2.07T 186G 1.95K 0 15.5M 0
data1 2.07T 186G 1.45K 0 11.5M 0
data1 2.07T 186G 1.85K 0 14.7M 0
data1 2.07T 186G 1.96K 0 15.6M 0
data1 2.07T 186G 2.24K 0 17.9M 0

cyberjock · Feb 5, 2013

ZFS and iSCSI don't necessarily go well together. The way in which iSCSI writes data isn't efficient with ZFS. If you search the forums you'll see that there is some optimizations that you can do that can help to some extent. But the best thing you can do is switch to UFS. The person that has written alot about it in the forums has given up trying to help people because if you do any kind of searching with 'iscsi' and 'performance' you'll find he's said it all 1000x and he's given up.

You could try upgrading RAM(that's the easiest vice converting to UFS) but there's no guarantee that will help. You are slightly thin on RAM. Personally, I always shoot for at least 12GB minimum and go up if the thumbrule calls for alot more. But if you really want good performance you'll need to read up on the stuff I mentioned above.

When you copy and paste if you put stuff inside of code tags it'll be nice and pretty.

datnus · Feb 5, 2013

1) Ok, i will try to put in more RAM, 16GB or 32 GB will be sufficient?
I need to move data some where before convert to UFS right?

2) Last time, the latency for NFS on Freenas is high, iSCSI on FreeNas has lower latency.

3) I changed the "quote" as request.

cyberjock · Feb 5, 2013

More RAM never hurts. For you 16GB is probably plenty. But if you switch to UFS its a waste of money.

Go read up on the tweaks discussed in the forum for iSCSI and ZFS. They may help enough to keep you on ZFS.

datnus · Feb 5, 2013

[root@data1] ~# zpool status
pool: data1
state: ONLINE
scrub: scrub in progress for 10h30m, 35.56% done, 19h1m to go

It seems that scrub has running for so long. scrub is to check disk / data integrity, right?
I have terminated it using zpool scrub -s data1.

The latency went back to 10ms.
Anything I should use to improve it / optimize?

cyberjock · Feb 5, 2013

Not really. Scrubs go faster with less load on the server, so running it at night definitely is a good plan. If you are using iSCSI on ZFS you can expect scrubs to perform just as poorly as the iscsi devices. Scrubs really need to be performed in accordance with the manual, which says weekly for consumer grade and datacenter-grade monthly. I'm sure you don't have datacenter-grade(they're like 2-3x the price of standard hard drives and if you could afford those you wouldn't have only 8GB of RAM) so you should be scrubbing weekly. Of course, looking at that post it looks like it would take your system about 30 hours just to complete a scrub. So yeah.. sucky :(

joeschmuck · Feb 6, 2013

Correct me if I'm wrong but this looks like a clear case of a near full pool. Capacity is 1.7TB, currently in use is 1.2TB. Spread across 5 drives of 500GB size, that would leave 100GB of space on each drive, not very much. It's not 10% but this is a server that was working well but now it's magically not so RAM shouldn't be the issue here. If nothing changed then it's likely the full drives.

I recommend you pull off some data, store it elsewhere, maybe on DVD's, and see if that increases your performance.

datnus · Feb 6, 2013

Oops, so ZFS performance goes down when % free capacity goes down.
Ok, I will move data somewhere. Thanks. :)
Now I know where is the bottleneck. :D

cyberjock · Feb 6, 2013

That pool is only 70% full. Not sure I'd blame insufficient free space rather than iscsi on ZFS = bad.

joeschmuck · Feb 6, 2013

cyberjock said:
That pool is only 70% full. Not sure I'd blame insufficient free space rather than iscsi on ZFS = bad.

Looking at the zpool iostat data above it appears there is about 8.3% available space.

I'm gonna be honest here, there is a lot of confusing data here, but after doing some research I understand it a bit better:
The first post states there is a RAIDZ2 of 5 drives.
There is a post that the drives are 5 drives of 500GB each + one spare drive.
There is a post stating the capacity is 1.7TB but there is 1.2TB used.
There is a post with iostat which states there is 2.07TB of space and 168GB available.

I'm going to use zpool iostat as my guide for this thread until someone changes my mind, and I'm easy to change when I know I'm wrong.

Let me know what you think.

paleoN · Feb 6, 2013

joeschmuck said:
Looking at the zpool iostat data above it appears there is about 8.3% available space.

Over 90% full means the pool switched to space based optimization which hurts.

joeschmuck said:
I'm going to use zpool iostat as my guide for this thread until someone changes my mind, and I'm easy to change when I know I'm wrong.

This is correct, inflated disk space. A simple zpool list would have been easier to read.

datnus · Feb 7, 2013

Hi,
6 disks = 1 spare + 5 data (raidz1). Each disk is 500GB.

[root@data1] ~# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
data1 2.25T 2.07T 185G 91% ONLINE /mnt

On top of zfs, I use vmfs (Vmware) on iscsi.
And the ESXi report 1.76 TB capacity and 600 GB free.

On FreeNAS GUI, it show data1 on Volume page is 1.76TB

I'm confused myself also. Which one is correct?

joeschmuck · Feb 7, 2013

Type in a shell, 'zpool list' (let us know what it says) and it will give you the Size, Allocated (Used), and Free. When you add Used and Free they should equal the Size.
As I now understand it, this is maybe the better way to go.

ZFS is confusing but I'm getting a better grasp on it as time passes. I would not use the ESXi report at all to determine capacity for a ZFS pool.

So I'm still convinced that you are running below 10% of free space which is why your system slowed down. The easiest fix I can see is to reduce the amount of data is on the drives or replace the 500GB drives with 2TB drives (there is a specific process for that). Also, guess I need to do some reading but you have a "spare" drive. I have no idea how that spare drive factors in, how you are notified there was a failure and the spare in now in use, etc... Would it be better to reallocate the spare drive to create a RAIDZ2 or is there a performance hit? I'm now going to read into that to see what I find.

datnus · Feb 7, 2013

If zpool list reports correct, then I need to upgrade HDD urgently.

I'm just curious. If I have VMFS on top of zfs iscsi, how could zfs know how much free hdd left as the file system is controlled by VMFS.

Important Announcement for the TrueNAS Community.

Memory utilization and performance problem.

Contributor

Contributor

Explorer

Contributor

Explorer

Contributor

Contributor

Inactive Account

Contributor

Inactive Account

Contributor

Inactive Account

Old Man

Contributor

Inactive Account

Old Man

Wizard

Contributor

Old Man

Contributor

Similar threads