higher latency issues

TxAggieEngineer

Dabbler
Joined
Apr 25, 2023
Messages
16
TrueNAS configuration
Intel Core i3-3220 3.3GHz
32GB ECC RAM
Intel x520 NIC for 10G connectivity
2x Kingston 240GB A400 SSD's
6x HGST Ultrastar 2TB disks (3x mirrored vdev's)
Samsung 980 Pro 2TB NVMe SSD on PCIe expansion card (not currently used)
Silicon Power 128GB NVMe SSD on PCIe expansion card (as L2ARC)
iSCSI sync disabled on zvol
3x ESXi hosts with 2x dedicated 10G NIC's for iSCSI
iSCSI MPIO round-robin with IOPS limit set to 8
2x Cisco 4500X switches for host-to-array connections

I've been doing some testing and have two lightly-used VM's running on this TrueNAS device. According to the hosts (esxtop and the per-VM disk graphs), there are consistently higher read and (especially) write latencies when these VM's are running on TrueNAS vs. the Nimble CS300 the same hosts connect to. Where I'll see latencies of less than 2ms when the VM's are running on the Nimble, I see 10-30ms when they are running on the TrueNAS unit. Same VM's, same host, same network switches; I just Storage vMotion them between the two arrays.

I've been running "gstat -dp" and "esxtop" in different windows to see if there is a correlation between physical disk performance and the LAT/rd and LAT/wr values being seen by the host but there doesn't seem to be any. Gstat shows a "%busy" of up to about 10-15% with "ms/w" at <1 and "ms/r" in the 8-14 range. What's strange is most of the time when there's read activity, only an individual disk will show activity but all six disks show activity when there are writes. I've read various things about L2ARC hindering read performance with certain devices and/or RAM configurations but that wouldn't explain the higher write latencies.

Various outputs are below. Dashboard is currently showing 9.3GiB for "Services", 16.3 GiB for "ZFS Cache" and 6.2GiB "Free".

Any thoughts about why the latencies would be so much higher?

root@hq-nas-4[~]# zpool iostat -v
capacity operations bandwidth
pool alloc free read write read write
---------------------------------------------- ----- ----- ----- ----- ----- -----
boot-pool 1.30G 205G 0 0 501 412
mirror-0 1.30G 205G 0 0 501 412
ada0p2 - - 0 0 273 206
ada1p2 - - 0 0 228 206
---------------------------------------------- ----- ----- ----- ----- ----- -----
data 219G 5.22T 22 62 313K 3.23M
mirror-0 63.6G 1.75T 6 19 88.1K 988K
gptid/48e2fc97-e446-11ed-a878-90e2ba8416c8 - - 3 9 44.3K 494K
gptid/48db9f4f-e446-11ed-a878-90e2ba8416c8 - - 3 9 43.8K 494K
mirror-1 63.7G 1.75T 6 19 88.2K 1008K
gptid/48cd96dc-e446-11ed-a878-90e2ba8416c8 - - 3 9 44.5K 504K
gptid/48f34f63-e446-11ed-a878-90e2ba8416c8 - - 3 9 43.6K 504K
mirror-2 91.4G 1.72T 8 23 136K 1.28M
gptid/48ebf10a-e446-11ed-a878-90e2ba8416c8 - - 4 11 65.0K 657K
gptid/4905ab65-e446-11ed-a878-90e2ba8416c8 - - 4 11 71.3K 657K
cache - - - - - -
gptid/4194116b-f39f-11ed-a8d2-90e2ba8416c8 1.90G 117G 0 0 2.10K 16.5K
---------------------------------------------- ----- ----- ----- ----- ----- -----

root@hq-nas-4[~]# zpool get capacity,size,health,fragmentation
NAME PROPERTY VALUE SOURCE
boot-pool capacity 0% -
boot-pool size 206G -
boot-pool health ONLINE -
boot-pool fragmentation 0% -
data capacity 4% -
data size 5.44T -
data health ONLINE -
data fragmentation 0% -

------------------------------------------------------------------------
ZFS Subsystem Report Wed May 17 11:51:47 2023
FreeBSD 13.1-RELEASE-p7 zpl version 5
Machine: hq-nas-4.bell.local (amd64) spa version 5000

ARC status: HEALTHY
Memory throttle count: 0

ARC size (current): 53.0 % 16.3 GiB
Target size (adaptive): 53.2 % 16.4 GiB
Min size (hard limit): 3.2 % 1018.5 MiB
Max size (high water): 30:1 30.8 GiB
Most Frequently Used (MFU) cache size: 1.6 % 268.0 MiB
Most Recently Used (MRU) cache size: 98.4 % 15.7 GiB
Metadata cache size (hard limit): 75.0 % 23.1 GiB
Metadata cache size (current): 2.4 % 572.7 MiB
Dnode cache size (hard limit): 10.0 % 2.3 GiB
Dnode cache size (current): 0.2 % 5.4 MiB

ARC hash breakdown:
Elements max: 3.9M
Elements current: 42.5 % 1.7M
Collisions: 16.7M
Chain max: 9
Chains: 252.6k

ARC misc:
Deleted: 29.0M
Mutex misses: 1.8k
Eviction skips: 191.2k
Eviction skips due to L2 writes: 0
L2 cached evictions: 33.1 GiB
L2 eligible evictions: 71.8 GiB
L2 eligible MFU evictions: 11.7 % 8.4 GiB
L2 eligible MRU evictions: 88.3 % 63.4 GiB
L2 ineligible evictions: 347.7 GiB

ARC total accesses (hits + misses): 89.5M
Cache hit ratio: 92.5 % 82.8M
Cache miss ratio: 7.5 % 6.7M
Actual hit ratio (MFU + MRU hits): 92.4 % 82.7M
Data demand efficiency: 57.6 % 12.8M
Data prefetch efficiency: 9.9 % 1.3M

Cache hits by cache type:
Most frequently used (MFU): 90.2 % 74.7M
Most recently used (MRU): 9.6 % 8.0M
Most frequently used (MFU) ghost: 0.2 % 193.8k
Most recently used (MRU) ghost: 0.1 % 86.4k

Cache hits by data type:
Demand data: 8.9 % 7.4M
Prefetch data: 0.2 % 130.3k
Demand metadata: 90.9 % 75.2M
Prefetch metadata: 0.1 % 42.0k

Cache misses by data type:
Demand data: 81.0 % 5.4M
Prefetch data: 17.7 % 1.2M
Demand metadata: 1.0 % 65.8k
Prefetch metadata: 0.3 % 20.3k

DMU prefetch efficiency: 15.9M
Hit ratio: 6.7 % 1.1M
Miss ratio: 93.3 % 14.8M

L2ARC status: HEALTHY
Low memory aborts: 1
Free on write: 31.0k
R/W clashes: 0
Bad checksums: 0
I/O errors: 0

L2ARC size (adaptive): 3.7 GiB
Compressed: 51.9 % 1.9 GiB
Header size: 0.5 % 19.2 MiB
MFU allocated size: 68.3 % 1.3 GiB
MRU allocated size: 30.1 % 589.5 MiB
Prefetch allocated size: 1.7 % 32.4 MiB
Data (buffer content) allocated size: 92.1 % 1.8 GiB
Metadata (buffer content) allocated size: 7.9 % 154.2 MiB

L2ARC breakdown: 6.7M
Hit ratio: 3.1 % 206.0k
Miss ratio: 96.9 % 6.5M
Feeds: 236.3k

L2ARC writes:
Writes sent: 100 % 8.8k

L2ARC evicts:
Lock retries: 0
Upon reading: 0
 
Top