zpool poor performance - recovery options

deasmi · Jun 18, 2015

I have a zpool that has become awfully performing.

It's got a number of iscsi zvols on it which are presented to vmware, sometimes initial accesses can take > 5s and there are latency spikes of > 7s.

After looking for complicated solutions I've noticed it's over 60% full and so that's almost certainly the cause.

If I get it back below 60% or 50% is it likely to recover or am I better off destroying it altogether and starting again ?

Thanks

nas1# zpool status vol1
pool: vol1
state: ONLINE
scan: scrub repaired 0 in 2h4m with 0 errors on Sun May 10 05:04:06 2015
config:

NAME STATE READ WRITE CKSUM
vol1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
gptid/b9457b59-4a28-11e3-b141-000c29ec0891 ONLINE 0 0 0
gptid/b9cc4265-4a28-11e3-b141-000c29ec0891 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
gptid/ba530c5d-4a28-11e3-b141-000c29ec0891 ONLINE 0 0 0
gptid/badc1cdf-4a28-11e3-b141-000c29ec0891 ONLINE 0 0 0

errors: No known data errors

nas1# zpool get all vol1
NAME PROPERTY VALUE SOURCE
vol1 size 920G -
vol1 capacity 43% -
vol1 altroot /mnt local
vol1 health ONLINE -
vol1 guid 14124041658513480580 default
vol1 version - default
vol1 bootfs - default
vol1 delegation on default
vol1 autoreplace off default
vol1 cachefile /data/zfs/zpool.cache local
vol1 failmode continue local
vol1 listsnapshots off default
vol1 autoexpand on local
vol1 dedupditto 0 default
vol1 dedupratio 1.00x -
vol1 free 517G -
vol1 allocated 403G -
vol1 readonly off -
vol1 comment - default
vol1 expandsize - -
vol1 freeing 0 default
vol1 fragmentation 17% -
vol1 leaked 0 default
vol1 feature@async_destroy enabled local
vol1 feature@empty_bpobj active local
vol1 feature@lz4_compress active local
vol1 feature@multi_vdev_crash_dump enabled local
vol1 feature@spacemap_histogram active local
vol1 feature@enabled_txg active local
vol1 feature@hole_birth active local
vol1 feature@extensible_dataset enabled local
vol1 feature@embedded_data active local
vol1 feature@bookmarks enabled local
vol1 feature@filesystem_limits enabled local
vol1 feature@large_blocks enabled local
nas1#

GeoffK · Jun 19, 2015

How many VM's have you got? What are they? What are you doing with them?
What CPU are you running?
How much Ram do you have?
What are your ARC stats like?
How much of your ARC are you using?
What type of Disks are they? SSD's?
Do you have an L2ARC? What size is it?
Do you have a SLOG? What size is it?
What is your Network Connectivity? Are you running Jumbo Frames?
Is VAAI enabled and working with VMWare?

Legitimately - there is only so much performance you can squeeze out of a 2 drive mirror (with 2 vdevs - thats all you really have - even though ZFS is smart and uses all both drives in the vdev to accelerate reads/writes). Chances are, you've hit that.

Unfortunately, BSD doesn't have disk latency stats inbuilt into zpool iostat <pool> 1 or zpool iostat -v 1 (something illumnos does).

deasmi · Jun 19, 2015

GeoffK said:
How many VM's have you got? < 10 on this zvol
What are they? Mainly centos (6|7) few windows
What are you doing with them? Essentially nothing, it's a lab.
What CPU are you running? VM All in One ( Yes I know ) Xeon 1230 v2, LSI VT-D
How much Ram do you have? 8GB ( Tested at 6/12/16 same issue )
What are your ARC stats like? HEALTHY
How much of your ARC are you using?
What type of Disks are they? SSD's? WD Raid Edition 2, 500GB
Do you have an L2ARC? What size is it? No
Do you have a SLOG? What size is it? No ( Added SSD backed, no change )
What is your Network Connectivity? As virtual iperf show > 5Gbps
Are you running Jumbo Frames? No
Is VAAI enabled and working with VMWare? Yes

Legitimately - there is only so much performance you can squeeze out of a 2 drive mirror (with 2 vdevs - thats all you really have - even though ZFS is smart and uses all both drives in the vdev to accelerate reads/writes). Chances are, you've hit that.

Unfortunately, BSD doesn't have disk latency stats inbuilt into zpool iostat <pool> 1 or zpool iostat -v 1 (something illumnos does).

I'm not sure a lot of those are relevant for my specific query, I've probably explained my question poorly, but I've answered them above anyway.

I am suffering from a sudden increase in high >5s latency spikes which I currently suspecting is due to being >60% full and >17% fragmented.
Other than a slow increase in zvol utilisation there has been no real change other than 9.3 updates.

average I/O load is zero within margin of error, I just get crazy latency when first accessing a virtual disk sequentially, or putting any kind of random read load on to the virtual disk.

For example with a single VM running on the zvol logging onto a Linux vm and running something like

dd if=/dev/sda2 | pv > /dev/zero ( pv - http://blog.johngoulah.com/tag/pipe-viewer/ )

We will see < 0-100K throughput for 5-15s then a jump to >70-100MB/s which is then sustained.
70-100MB is the performance I'm used to seeing and am more than happy with it's the first 5-15s on nothing that is very annoying.

All I really wanted to know here is if I drop the usage <50% say will the fragmentation improve over time, or should I just blow the zvol away and start again.
I may now know the answer as the zvol is now down to 43% and the fragmentation hasn't moved.

This may or may not be the solution, it was just that nixing the fragmentation was the next step in my troubleshooting, it's just a lab but when it takes 10s to start emacs it's annoying.

deasmi · Jun 19, 2015

Bit more digging with gstat and the drives are 0% busy when this is happening, so really not sure what is going on.

GeoffK · Jun 19, 2015

Fragmentation as i understand never improves. Cap your ZVOL's to 60% :)

deasmi · Jun 19, 2015

GeoffK said:
Fragmentation as i understand never improves. Cap your ZVOL's to 60% :)

Thanks, I think a rebuild then.

However further testing has shown this is actually nothing to do with the zvol.

The same lun presented directly to the Linux VM over iscsi does't show this behaviour. It's only with a VMFS/vmdk or raw device mapping.

Something between FreeNAS and ESXi... grr. And given it's fine from FreeNAS to Linux my money is currently on vmware.

Important Announcement for the TrueNAS Community.

zpool poor performance - recovery options

deasmi

Dabbler

GeoffK

Dabbler

deasmi

Dabbler

deasmi

Dabbler

GeoffK

Dabbler

deasmi

Dabbler

Similar threads