Performance of ZFS used with a File GeoDatabase

Status
Not open for further replies.

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I have not created many threads here and I don't know if this is the best spot for this one, but I figured I would give it a try. Here is the problem I have.

I have a server that houses several ArcGIS "File geodatabase" containers. This is a special ArcGIS thing that is not really a database in the sense of something like SQL where all the data is inside a single file. This is a folder (directory structure) that contains thousands of small files. When the application is accessing these files, it opens many of them and relates the data in the individual files to produce composite imagery in the form of 3D maps that have many layers of data. Some of the files are small as 1k.
Here is a link to ArcGIS info on it: http://desktop.arcgis.com/en/arcmap/10.3/manage-data/geodatabases/what-is-a-geodatabase.htm
When a user makes changes, it may involve updates to many of the files and this is apparently working a bit too slow on my server. According to users, "Access to the database on the server 'freezes' where the same database, when moved to a local drive, works fine." It is not convenient to move the database to a local drive because it can take quite a while to do that as there are so many small files to copy and it would need to be moved back to the server after work is complete.
I am wondering what I can do to ZFS to accelerate the updates. I am currently using a RAIDz2 pool with 4 vdevs and I am willing to make changes, I just am not sure what would give me the best results.

The server in question is running on a Supermicro X10 dual Xeon board with two 8 core processors at 2.6GHz and 256GB of memory. The network connection between the workstation and the server is 1gig, but it is not fully utilized during the transactions. The slowdown appears to be on the server side. Server processor utilization is usually less than 25% and memory utilization has stabilized around 75%. The system has been running for about a year and this is really the only complaint. It is a little slow and I would like to make it faster.

Any suggestions?
 
Last edited:

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
How are you access these files? SMB?

If so, and if you have an appropriate SSD on hand, I would recommend trying a SLOG. My guess is that with the many small file writes, your IOPS is limited you. A SLOG should help overcome some of that.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
How are you access these files? SMB?
Yes, SMB share to Windows 10 computers.
I was thinking that a SLOG might be called for but I had seen so many posts on the forum saying it is only for certain workloads, I was not sure if this was one of those situations. I will make arrangements to obtain a suitable SSD.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
I'm not to go so far out on a limb and say that it will help your performance, because I don't have direct experience with your use case, but I will say that it should help your performance.

Writing all those little files really push up the I/O requirements on your NAS: each file write requires a data transfer to the NAS, a write to the drive, and a confirmation that the write was completed successfully (sync write). With a SLOG, these writes from the client should be made much faster, because they will be sync written on the fast SLOG, and then from there they can be grouped into more intelligent transaction groups, which reduces I/O on the drives.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Attachments

  • smb4.conf.txt
    2.5 KB · Views: 402

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Yes, SMB share to Windows 10 computers.
I was thinking that a SLOG might be called for but I had seen so many posts on the forum saying it is only for certain workloads, I was not sure if this was one of those situations. I will make arrangements to obtain a suitable SSD.

I would look at telemetry from the disks in your vdev while doing whatever it is in ArcGIS that triggers the load. In particular, how busy are the individual disks and what does latency look like.

Based on what I know of ArcGIS, geodatabases are more of a metadata and IOPS strain than streaming reads and writes.
Probably anything that increases the IOPS capability of the pool would help. Can you buy 20TB of SSDs?? Might be able to do it all in 2 of those Samsung 30TB units they just announced.. :P

It would be also interesting to know if the smbd I/O is doing sync writes to the pool, I'm not a windows expert, so not sure how samba handles that I/O by default.

An NVMe (Intel 3700,etc) SLOG and L2ARC would be things I would think about for sure. But a Slog may only make sense if sync writes are involved. (I suppose you could force all writes to be sync as a test..) I'd also look to see if readahead/prefetch is helping or hurting you.. If the Geodb accesses are unpredictable, prefetch may be hurting you by making disks do more seeks and reads than necessary. If your hit rates are low, you might want to disable it and see how that affects performance.
 

Nick2253

Wizard
Joined
Apr 21, 2014
Messages
1,633
L2ARC won't really help this problem, because L2ARC will only speed up reads. It sounds like this problem is mostly write issues.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
L2ARC won't really help this problem, because L2ARC will only speed up reads. It sounds like this problem is mostly write issues.

It depends. In the past ArcGIS would read the old geodb file, make changes in memory, then write a new one..
That's why I said L2ARC is one of the things I would be thinking about.. It might help, it might, not, but it's worth investigating. But that needs a lot more details about how each part of the puzzle is performing before making any decisions.

Frankly, my suspicion is that the real culprit is metadata updates. But again, need more info to know.

BTW, What does "zfs-stats -a" produce?
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Frankly, my suspicion is that the real culprit is metadata updates. But again, need more info to know.
Tell me what I can tell you that would help.
Here is something I have done that may help. I did some tests from Windows using directories of files to try and eliminate any peculiarities of ArcGIS from the picture. When copying a directory of large files to the NAS I get consistent speeds of 112MB/s but when copying a similar quantity of data made up of small files, speed fluctuates wildly depending on the file size and sometimes the copy even appears to stop for a few seconds before resuming.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
Tell me what I can tell you that would help.
Here is something I have done that may help. I did some tests from Windows using directories of files to try and eliminate any peculiarities of ArcGIS from the picture. When copying a directory of large files to the NAS I get consistent speeds of 112MB/s but when copying a similar quantity of data made up of small files, speed fluctuates wildly depending on the file size and sometimes the copy even appears to stop for a few seconds before resuming.

What does zfs-stats -a produce ?

Also, watch gstat while triggering a slow copy and monitor the disk busy% for the disks in the pool.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
What does zfs-stats -a produce ?
Code:
:~ # zfs-stats -a
------------------------------------------------------------------------
sysctl: unknown oid 'kstat.zfs.misc.arcstats.l2_writes_hdr_miss'
sysctl: unknown oid 'kstat.zfs.misc.arcstats.recycle_miss'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.bogus_streams'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.colinear_hits'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.colinear_misses'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.reclaim_failures'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.reclaim_successes'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.streams_noresets'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.streams_resets'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.stride_hits'
sysctl: unknown oid 'kstat.zfs.misc.zfetchstats.stride_misses'
ZFS Subsystem Report							Thu Mar  1 23:28:14 2018
------------------------------------------------------------------------
System Information:

		Kernel Version:						 1101505 (osreldate)
		Hardware Platform:					  amd64
		Processor Architecture:				 amd64

FreeBSD 11.1-STABLE #0 r321665+de6be8c8d30(freenas/11.1-stable): Tue Feb 20 02:38:09 UTC 2018	 root
11:28PM  up 4 days,  4:43, 1 user, load averages: 0.33, 0.39, 0.40
------------------------------------------------------------------------
System Memory Statistics:
		Physical Memory:						32702.67M
		Kernel Memory:						  700.32M
		DATA:						   94.53%  662.06M
		TEXT:						   5.46%   38.25M
------------------------------------------------------------------------
ZFS pool information:
		Storage pool Version (spa):			 5000
		Filesystem Version (zpl):			   5
------------------------------------------------------------------------
ARC Misc:
		Deleted:								11909187
		Recycle Misses:						 0
		Mutex Misses:						   7268
		Evict Skips:							7268

ARC Size:
		Current Size (arcsize):		 83.07%  24400.78M
		Target Size (Adaptive, c):	  83.30%  24467.95M
		Min Size (Hard Limit, c_min):   13.12%  3855.56M
		Max Size (High Water, c_max):   ~7:1	29372.00M

ARC Size Breakdown:
		Recently Used Cache Size (p):   72.13%  17650.45M
		Freq. Used Cache Size (c-p):	27.86%  6817.50M

ARC Hash Breakdown:
		Elements Max:						   1658525
		Elements Current:			   97.77%  1621635
		Collisions:							 7129979
		Chain Max:							  0
		Chains:								 243658

ARC Eviction Statistics:
		Evicts Total:						   1140928179200
		Evicts Eligible for L2:		 99.26%  1132494328320
		Evicts Ineligible for L2:	   0.73%   8433850880
		Evicts Cached to L2:					491679621120

ARC Efficiency
		Cache Access Total:					 39060128
		Cache Hit Ratio:				68.35%  26699403
		Cache Miss Ratio:			   31.64%  12360725
		Actual Hit Ratio:			   64.12%  25048415

		Data Demand Efficiency:		 97.78%
		Data Prefetch Efficiency:	   14.90%

		CACHE HITS BY CACHE LIST:
		  Anonymously Used:			 2.68%   716117
		  Most Recently Used (mru):	 39.26%  10483309
		  Most Frequently Used (mfu):   54.55%  14565106
		  MRU Ghost (mru_ghost):		0.70%   188389
		  MFU Ghost (mfu_ghost):		2.79%   746482

		CACHE HITS BY DATA TYPE:
		  Demand Data:				  51.16%  13660148
		  Prefetch Data:				4.46%   1191702
		  Demand Metadata:			  39.50%  10547567
		  Prefetch Metadata:			4.86%   1299986

		CACHE MISSES BY DATA TYPE:
		  Demand Data:				  2.50%   309525
		  Prefetch Data:				55.04%  6803764
		  Demand Metadata:			  40.68%  5029286
		  Prefetch Metadata:			1.76%   218150
------------------------------------------------------------------------
L2 ARC Summary:
		Low Memory Aborts:					  7
		R/W Clashes:							0
		Free on Write:						  146394

L2 ARC Size:
		Current Size: (Adaptive)				123417.26M
		Header Size:					0.05%   73.14M

L2 ARC Evicts:
		Lock Retries:						   170
		Upon Reading:						   0

L2 ARC Read/Write Activity:
		Bytes Written:						  354080.14M
		Bytes Read:							 94325.26M

L2 ARC Breakdown:
		Access Total:						   12353568
		Hit Ratio:					  8.01%   989629
		Miss Ratio:					 91.98%  11363939
		Feeds:								  384254

		WRITES:
		  Sent Total:				   100.00% 74602
f_float_divide: 0 / 0: division by 0
f_float_divide: 0 / 0: division by 0
f_float_divide: 0 / 0: division by 0
------------------------------------------------------------------------
VDEV Cache Summary:
		Access Total:						   0
		Hits Ratio:					 0.00%   0
		Miss Ratio:					 0.00%   0
		Delegations:							0
f_float_divide: 0 / 0: division by 0
f_float_divide: 0 / 0: division by 0
f_float_divide: 0 / 0: division by 0
f_float_divide: 0 / 0: division by 0
------------------------------------------------------------------------
File-Level Prefetch Stats (DMU):

DMU Efficiency:
		Access Total:						   170373872
		Hit Ratio:					  3.07%   5239965
		Miss Ratio:					 96.92%  165133907

		Colinear Access Total:				  0
		Colinear Hit Ratio:			 0.00%   0
		Colinear Miss Ratio:			0.00%   0

		Stride Access Total:					0
		Stride Hit Ratio:			   0.00%   0
		Stride Miss Ratio:			  0.00%   0

DMU misc:
		Reclaim successes:					  0
		Reclaim failures:					   0
		Stream resets:						  0
		Stream noresets:						0
		Bogus streams:						  0
------------------------------------------------------------------------
ZFS Tunable (sysctl):
		kern.maxusers=2379
		vfs.zfs.vol.immediate_write_sz=32768
		vfs.zfs.vol.unmap_sync_enabled=0
		vfs.zfs.vol.unmap_enabled=1
		vfs.zfs.vol.recursive=0
		vfs.zfs.vol.mode=2
		vfs.zfs.sync_pass_rewrite=2
		vfs.zfs.sync_pass_dont_compress=5
		vfs.zfs.sync_pass_deferred_free=2
		vfs.zfs.zio.dva_throttle_enabled=1
		vfs.zfs.zio.exclude_metadata=0
		vfs.zfs.zio.use_uma=1
		vfs.zfs.zil_slog_bulk=786432
		vfs.zfs.cache_flush_disable=0
		vfs.zfs.zil_replay_disable=0
		vfs.zfs.version.zpl=5
		vfs.zfs.version.spa=5000
		vfs.zfs.version.acl=1
		vfs.zfs.version.ioctl=7
		vfs.zfs.debug=0
		vfs.zfs.super_owner=0
		vfs.zfs.immediate_write_sz=32768
		vfs.zfs.min_auto_ashift=12
		vfs.zfs.max_auto_ashift=13
		vfs.zfs.vdev.queue_depth_pct=1000
		vfs.zfs.vdev.write_gap_limit=4096
		vfs.zfs.vdev.read_gap_limit=32768
		vfs.zfs.vdev.aggregation_limit=1048576
		vfs.zfs.vdev.trim_max_active=64
		vfs.zfs.vdev.trim_min_active=1
		vfs.zfs.vdev.scrub_max_active=2
		vfs.zfs.vdev.scrub_min_active=1
		vfs.zfs.vdev.async_write_max_active=10
		vfs.zfs.vdev.async_write_min_active=1
		vfs.zfs.vdev.async_read_max_active=3
		vfs.zfs.vdev.async_read_min_active=1
		vfs.zfs.vdev.sync_write_max_active=10
		vfs.zfs.vdev.sync_write_min_active=10
		vfs.zfs.vdev.sync_read_max_active=10
		vfs.zfs.vdev.sync_read_min_active=10
		vfs.zfs.vdev.max_active=1000
		vfs.zfs.vdev.async_write_active_max_dirty_percent=60
		vfs.zfs.vdev.async_write_active_min_dirty_percent=30
		vfs.zfs.vdev.mirror.non_rotating_seek_inc=1
		vfs.zfs.vdev.mirror.non_rotating_inc=0
		vfs.zfs.vdev.mirror.rotating_seek_offset=1048576
		vfs.zfs.vdev.mirror.rotating_seek_inc=5
		vfs.zfs.vdev.mirror.rotating_inc=0
		vfs.zfs.vdev.trim_on_init=1
		vfs.zfs.vdev.bio_delete_disable=0
		vfs.zfs.vdev.bio_flush_disable=0
		vfs.zfs.vdev.cache.bshift=16
		vfs.zfs.vdev.cache.size=0
		vfs.zfs.vdev.cache.max=16384
		vfs.zfs.vdev.metaslabs_per_vdev=200
		vfs.zfs.vdev.trim_max_pending=10000
		vfs.zfs.txg.timeout=5
		vfs.zfs.trim.enabled=1
		vfs.zfs.trim.max_interval=1
		vfs.zfs.trim.timeout=30
		vfs.zfs.trim.txg_delay=32
		vfs.zfs.space_map_blksz=4096
		vfs.zfs.spa_min_slop=134217728
		vfs.zfs.spa_slop_shift=5
		vfs.zfs.spa_asize_inflation=24
		vfs.zfs.deadman_enabled=1
		vfs.zfs.deadman_checktime_ms=5000
		vfs.zfs.deadman_synctime_ms=1000000
		vfs.zfs.debug_flags=0
		vfs.zfs.debugflags=0
		vfs.zfs.recover=0
		vfs.zfs.spa_load_verify_data=1
		vfs.zfs.spa_load_verify_metadata=1
		vfs.zfs.spa_load_verify_maxinflight=10000
		vfs.zfs.ccw_retry_interval=300
		vfs.zfs.check_hostid=1
		vfs.zfs.mg_fragmentation_threshold=85
		vfs.zfs.mg_noalloc_threshold=0
		vfs.zfs.condense_pct=200
		vfs.zfs.metaslab.bias_enabled=1
		vfs.zfs.metaslab.lba_weighting_enabled=1
		vfs.zfs.metaslab.fragmentation_factor_enabled=1
		vfs.zfs.metaslab.preload_enabled=1
		vfs.zfs.metaslab.preload_limit=3
		vfs.zfs.metaslab.unload_delay=8
		vfs.zfs.metaslab.load_pct=50
		vfs.zfs.metaslab.min_alloc_size=33554432
		vfs.zfs.metaslab.df_free_pct=4
		vfs.zfs.metaslab.df_alloc_threshold=131072
		vfs.zfs.metaslab.debug_unload=0
		vfs.zfs.metaslab.debug_load=0
		vfs.zfs.metaslab.fragmentation_threshold=70
		vfs.zfs.metaslab.gang_bang=16777217
		vfs.zfs.free_bpobj_enabled=1
		vfs.zfs.free_max_blocks=18446744073709551615
		vfs.zfs.zfs_scan_checkpoint_interval=7200
		vfs.zfs.zfs_scan_legacy=0
		vfs.zfs.no_scrub_prefetch=0
		vfs.zfs.no_scrub_io=0
		vfs.zfs.resilver_min_time_ms=3000
		vfs.zfs.free_min_time_ms=1000
		vfs.zfs.scan_min_time_ms=1000
		vfs.zfs.scan_idle=50
		vfs.zfs.scrub_delay=4
		vfs.zfs.resilver_delay=2
		vfs.zfs.top_maxinflight=32
		vfs.zfs.delay_scale=500000
		vfs.zfs.delay_min_dirty_percent=60
		vfs.zfs.dirty_data_sync=67108864
		vfs.zfs.dirty_data_max_percent=10
		vfs.zfs.dirty_data_max_max=4294967296
		vfs.zfs.dirty_data_max=3429124505
		vfs.zfs.max_recordsize=1048576
		vfs.zfs.default_ibs=17
		vfs.zfs.default_bs=9
		vfs.zfs.zfetch.array_rd_sz=1048576
		vfs.zfs.zfetch.max_idistance=67108864
		vfs.zfs.zfetch.max_distance=33554432
		vfs.zfs.zfetch.min_sec_reap=2
		vfs.zfs.zfetch.max_streams=8
		vfs.zfs.prefetch_disable=0
		vfs.zfs.send_holes_without_birth_time=1
		vfs.zfs.mdcomp_disable=0
		vfs.zfs.per_txg_dirty_frees_percent=30
		vfs.zfs.nopwrite_enabled=1
		vfs.zfs.dedup.prefetch=1
		vfs.zfs.arc_min_prescient_prefetch_ms=6
		vfs.zfs.arc_min_prfetch_ms=1
		vfs.zfs.l2c_only_size=0
		vfs.zfs.mfu_ghost_data_esize=9619243008
		vfs.zfs.mfu_ghost_metadata_esize=6459322880
		vfs.zfs.mfu_ghost_size=16078565888
		vfs.zfs.mfu_data_esize=6630696960
		vfs.zfs.mfu_metadata_esize=404825088
		vfs.zfs.mfu_size=7607708672
		vfs.zfs.mru_ghost_data_esize=7222326784
		vfs.zfs.mru_ghost_metadata_esize=1517044736
		vfs.zfs.mru_ghost_size=8739371520
		vfs.zfs.mru_data_esize=16367195648
		vfs.zfs.mru_metadata_esize=2829824
		vfs.zfs.mru_size=16774895104
		vfs.zfs.anon_data_esize=0
		vfs.zfs.anon_metadata_esize=0
		vfs.zfs.anon_size=1809920
		vfs.zfs.l2arc_norw=0
		vfs.zfs.l2arc_feed_again=1
		vfs.zfs.l2arc_noprefetch=0
		vfs.zfs.l2arc_feed_min_ms=200
		vfs.zfs.l2arc_feed_secs=1
		vfs.zfs.l2arc_headroom=2
		vfs.zfs.l2arc_write_boost=40000000
		vfs.zfs.l2arc_write_max=10000000
		vfs.zfs.arc_meta_limit=7699693632
		vfs.zfs.arc_free_target=56617
		vfs.zfs.compressed_arc_enabled=1
		vfs.zfs.arc_grow_retry=60
		vfs.zfs.arc_shrink_shift=7
		vfs.zfs.arc_average_blocksize=8192
		vfs.zfs.arc_no_grow_shift=5
		vfs.zfs.arc_min=4042855424
		vfs.zfs.arc_max=30798774528
		vm.kmem_size=42862501888
		vm.kmem_size_scale=1
		vm.kmem_size_min=0
		vm.kmem_size_max=1319413950874
------------------------------------------------------------------------

 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Also, watch gstat while triggering a slow copy and monitor the disk busy% for the disks in the pool.
Code:
 L(q)  ops/s	r/s   kBps   ms/r	w/s   kBps   ms/w   %busy Name
	0	175	  0	  0	0.0	171  13762	3.4   29.8| da0
	0	196	  0	  0	0.0	192  19767	2.1   26.6| da1
	0	195	  0	  0	0.0	191  19767	2.6   35.6| da2
	5	176	  0	  0	0.0	172  13139	1.9   26.6| da3
	0	180	  0	  0	0.0	176  13766	3.6   27.1| da4
	0	175	  0	  0	0.0	171  13810	1.6   28.2| da5
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da6
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da7
	0	201	  0	  0	0.0	197  19767	2.4   31.3| da8
	0	198	  0	  0	0.0	194  19767	2.1   27.2| da9
	0	200	  0	  0	0.0	196  19787	2.1   28.6| da10
	0	195	  0	  0	0.0	191  19767	2.3   28.6| da11
	0	184	  0	  0	0.0	180  13826	2.9   27.8| da12
	0	177	  0	  0	0.0	173  13806	4.5   32.7| da13
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da14
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da15
	0	390	  0	  0	0.0	390  48430	3.9   34.4| ada0
	0	 49	  0	  0	0.0	 47	284	0.4   13.7| ada1
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da0p1
	0	 89	  0	  0	0.0	 85  13762	1.6   29.8| da0p2
	0	 89	  0	  0	0.0	 85  13762	1.6   29.8| gptid/90d74abf-d18c-11e7-bdb7-002590aecc79
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da1p1
	0	 65	  0	  0	0.0	 61  19767	1.7   26.6| da1p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da2p1
	0	 63	  0	  0	0.0	 59  19767	2.5   35.6| da2p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da3p1
	1	 92	  0	  0	0.0	 88  12756	1.0   26.5| da3p2
	0	 50	  0	  0	0.0	 48	284	0.4   13.1| ada2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da4p1
	0	 93	  0	  0	0.0	 89  13766	1.4   27.1| da4p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da5p1
	0	 88	  0	  0	0.0	 84  13810	1.0   28.3| da5p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da6p1
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da6p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da7p1
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da7p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da8p1
	0	 69	  0	  0	0.0	 65  19767	1.8   31.3| da8p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da9p1
	0	 66	  0	  0	0.0	 62  19767	1.7   27.2| da9p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da10p1
	0	 68	  0	  0	0.0	 64  19787	1.7   28.6| da10p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da11p1
	0	 63	  0	  0	0.0	 59  19767	1.8   28.6| da11p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da12p1
	0	 98	  0	  0	0.0	 94  13826	1.3   27.8| da12p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da13p1
	0	 91	  0	  0	0.0	 87  13806	1.8   32.8| da13p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da14p1
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da14p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da15p1
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| da15p2
	0	 57	  0	  0	0.0	 57  48430	6.0   34.4| ada0p1
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| ada1p1
	0	 49	  0	  0	0.0	 47	284	0.4   13.7| ada1p2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| gptid/90bf3248-d18c-11e7-bdb7-002590aecc79
	0	 65	  0	  0	0.0	 61  19767	1.7   26.7| gptid/87a407df-d20b-11e7-bdb7-002590aecc79
	0	 63	  0	  0	0.0	 59  19767	2.5   35.6| gptid/2698b33e-d23a-11e7-bdb7-002590aecc79
	1	 92	  0	  0	0.0	 88  12756	1.0   26.5| gptid/78e8e147-d97d-11e7-b781-002590aecc79
	0	 93	  0	  0	0.0	 89  13766	1.4   27.1| gptid/78b8bf90-d9b1-11e7-b781-002590aecc79
	0	 88	  0	  0	0.0	 84  13810	1.0   28.3| gptid/1b9f316d-da26-11e7-b781-002590aecc79
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| gptid/ac988093-1c60-11e7-ae9a-002590a96034
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| mirror/swap2
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| gptid/a8c5731f-1c60-11e7-ae9a-002590a96034
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| mirror/swap0.eli
	0	 69	  0	  0	0.0	 65  19767	1.8   31.3| gptid/5c9ac389-d262-11e7-bdb7-002590aecc79
	0	 66	  0	  0	0.0	 62  19767	1.7   27.2| gptid/9615c0df-d2e3-11e7-bdb7-002590aecc79
	0	 68	  0	  0	0.0	 64  19787	1.7   28.6| gptid/ee8fb387-d2ba-11e7-bdb7-002590aecc79
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| mirror/swap4
	0	 63	  0	  0	0.0	 59  19767	1.8   28.7| gptid/feeeb5ef-d307-11e7-bdb7-002590aecc79
	0	 98	  0	  0	0.0	 94  13826	1.3   27.8| gptid/55f074a3-cdb5-11e7-bdb7-002590aecc79
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| mirror/swap3
	0	 91	  0	  0	0.0	 87  13806	1.8   32.8| gptid/bbf7a1c8-73ee-11e7-81aa-002590aecc79
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| gptid/aa5c9c82-1c60-11e7-ae9a-002590a96034
	0	  0	  0	  0	0.0	  0	  0	0.0	0.0| mirror/swap1

 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
So it looks like your disks aren't excessively busy. I can't speak to the SMB configuration, but I might look through there and see if anything can be tuned.

If you search on "zfs large directories small files performance" There's a few illumos and Solaris threads that talk about tuning ARC to hold metadata only, as opposed to metadata + data.

Based on your ZFS stats, a couple things stand out.. You're seeing a 90+% miss rate on File level prefetch, which means your workload is excessively random. So, I might experiment with forcing most of the ARC to be metadata, rather than data. If you have an SSD, you could configure ZFS to use almost all RAM for metatdata and force most of the data caching to the SSD.

It would be interesting to do some testing on the SMB side, in particular to check and see if it's doing sync writes. If it is, that may be another thing to look at.

Oddly enough, This is in @anodos ' message footers. :) It might be relevant.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
So it looks like your disks aren't excessively busy. I can't speak to the SMB configuration, but I might look through there and see if anything can be tuned.

If you search on "zfs large directories small files performance" There's a few illumos and Solaris threads that talk about tuning ARC to hold metadata only, as opposed to metadata + data.

Based on your ZFS stats, a couple things stand out.. You're seeing a 90+% miss rate on File level prefetch, which means your workload is excessively random. So, I might experiment with forcing most of the ARC to be metadata, rather than data. If you have an SSD, you could configure ZFS to use almost all RAM for metatdata and force most of the data caching to the SSD.

It would be interesting to do some testing on the SMB side, in particular to check and see if it's doing sync writes. If it is, that may be another thing to look at.

Oddly enough, This is in @anodos ' message footers. :) It might be relevant.

On the samba side of things, Samba 4.7 introduced new parameter mangled names = illegal. This should stop samba from generating 8_3 compatible hashes for files/dirs unless they contain invalid NTFS characters. If you don't need ADS support, you can disable streams_xattr and ea support.

I don't know if that will help performance in your case because I don't know enough about ARC GIS workloads. You can also use vfs_time_audit to see if any specific vfs operations are hanging.
 

MrToddsFriends

Documentation Browser
Joined
Jan 12, 2015
Messages
1,338
On the samba side of things, Samba 4.7 introduced new parameter mangled names = illegal.

Samba name mangling may have a serious performance impact when writing many small files to common directories. This should be observable using top (remember the single-threadedness of samba). I remember having played with the samba case sensitive setting.
https://forums.freenas.org/index.ph...ctory-with-multiple-files-30-000-files.46655/

The mangled names setting is new to me.
 

c32767a

Patron
Joined
Dec 13, 2012
Messages
371
On the samba side of things, Samba 4.7 introduced new parameter mangled names = illegal. This should stop samba from generating 8_3 compatible hashes for files/dirs unless they contain invalid NTFS characters. If you don't need ADS support, you can disable streams_xattr and ea support.

I don't know if that will help performance in your case because I don't know enough about ARC GIS workloads. You can also use vfs_time_audit to see if any specific vfs operations are hanging.

I'm not the OP, but the particular workload he's talking about is where Arc uses lots of directories to hold 1000s of files with a couple K of text. When the user draws a map, it enumerates all the files and reads data from the ones it needs.. If the user updates a feature, it does the same batch of reads, followed by a number of writes in those directories. It's really hard on metadata I/O.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
speed fluctuates wildly depending on the file size and sometimes the copy even appears to stop for a few seconds before resuming.

I think this is a sign that the writes are blocking on a transaction flush or something like that. A slog might help.

Also, check the smbd thread usage with top, if it’s sitting at 100% then it could also be single thread cpu bound.

Disable atime?
 
Status
Not open for further replies.
Top