Help with ZFS performance. Another one of those ;)

SweetAndLow · Apr 30, 2017

Sparx said:
Yeah I do. I am running two NAS in parallel so the data isnt my biggest worry actually. I was thinking about trashing the pool again and copying the data once more.
Do you know why I get checksum faults in the spare but not in the drives? Could it be due to the replacement of the faulty drive? That it tries to read from a super slow disk but fails?

Some kind of hardware failure. Check smart status of disk.

Sent from my Nexus 5X using Tapatalk

Sparx · Apr 30, 2017

So thats not looking too bad. I hope. Da5 is coming in. Da7 on the way out. Da7 shows some uncorrected errors.

Code:

[root@freenas] ~# smartctl -a /dev/da5
da5%	   da5p1%	 da5p1.eli% da5p2%
[root@freenas] ~# smartctl -a /dev/da5
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:			   TOSHIBA
Product:			  MG04SCA60EE
Revision:			 0101
Compliance:		   SPC-4
User Capacity:		6,001,175,126,016 bytes [6.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Rotation Rate:		7200 rpm
Form Factor:		  3.5 inches
Logical Unit id:	  0x5000039688ca0071
Serial number:		95B0A0MMFWWB
Device type:		  disk
Transport protocol:   SAS (SPL-3)
Local Time is:		Sun Apr 30 15:46:51 2017 CEST
SMART support is:	 Available - device has SMART capability.
SMART support is:	 Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:	 30 C
Drive Trip Temperature:		65 C

Manufactured in week 37 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  97
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  1165
Elements in grown defect list: 0

Error counter log:
		   Errors Corrected by		   Total   Correction	 Gigabytes	Total
			   ECC		  rereads/	errors   algorithm	  processed	uncorrected
		   fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:		  0		0		 0		 0		  0	 242781.223		   0
write:		 0		0		 0		 0		  0	   8982.951		   0

Non-medium error count:	 2205

SMART Self-test log
Num  Test			  Status				 segment  LifeTime  LBA_first_err [SK ASC ASQ]
	 Description							  number   (hours)
# 1  Background long   Completed				   -	3715				 - [-   -	-]
# 2  Background short  Completed				   -	3697				 - [-   -	-]
# 3  Background short  Completed				   -	3691				 - [-   -	-]
# 4  Background short  Completed				   -	3682				 - [-   -	-]
# 5  Background short  Completed				   -	3673				 - [-   -	-]
# 6  Background short  Completed				   -	3667				 - [-   -	-]
# 7  Background short  Completed				   -	3658				 - [-   -	-]
# 8  Background short  Completed				   -	3649				 - [-   -	-]
# 9  Background short  Completed				   -	3644				 - [-   -	-]
#10  Background short  Completed				   -	3641				 - [-   -	-]
#11  Background long   Completed				   -	3190				 - [-   -	-]
#12  Background long   Completed				   -	1261				 - [-   -	-]

Long (extended) Self Test duration: 45470 seconds [757.8 minutes]

[root@freenas] ~# smartctl -a /dev/da7
smartctl 6.5 2016-05-07 r4318 [FreeBSD 10.3-STABLE amd64] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:			   TOSHIBA
Product:			  MG04SCA60EE
Revision:			 0101
Compliance:		   SPC-4
User Capacity:		6,001,175,126,016 bytes [6.00 TB]
Logical block size:   512 bytes
Physical block size:  4096 bytes
Rotation Rate:		7200 rpm
Form Factor:		  3.5 inches
Logical Unit id:	  0x5000039688ca023d
Serial number:		95B0A0QXFWWB
Device type:		  disk
Transport protocol:   SAS (SPL-3)
Local Time is:		Sun Apr 30 15:47:00 2017 CEST
SMART support is:	 Available - device has SMART capability.
SMART support is:	 Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:	 31 C
Drive Trip Temperature:		65 C

Manufactured in week 37 of year 2015
Specified cycle count over device lifetime:  50000
Accumulated start-stop cycles:  84
Specified load-unload count over device lifetime:  600000
Accumulated load-unload cycles:  87
Elements in grown defect list: 6

Error counter log:
		   Errors Corrected by		   Total   Correction	 Gigabytes	Total
			   ECC		  rereads/	errors   algorithm	  processed	uncorrected
		   fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:		  0	   40		14		 0		  0	 170729.322		  14
write:		 0		0		 0		 0		  0	   8576.563		   0

Non-medium error count:	 1117

SMART Self-test log
Num  Test			  Status				 segment  LifeTime  LBA_first_err [SK ASC ASQ]
	 Description							  number   (hours)
# 1  Background long   Completed				   -	2512				 - [-   -	-]
# 2  Background short  Completed				   -	2494				 - [-   -	-]
# 3  Background short  Completed				   -	2488				 - [-   -	-]
# 4  Background short  Completed				   -	2479				 - [-   -	-]
# 5  Background short  Completed				   -	2470				 - [-   -	-]
# 6  Background short  Completed				   -	2464				 - [-   -	-]
# 7  Background short  Completed				   -	2455				 - [-   -	-]
# 8  Background short  Completed				   -	2446				 - [-   -	-]
# 9  Background short  Completed				   -	2441				 - [-   -	-]
#10  Background short  Completed				   -	2438				 - [-   -	-]
#11  Background long   Completed				   -	1987				 - [-   -	-]
#12  Background long   Completed				   -	  58				 - [-   -	-]

Long (extended) Self Test duration: 45470 seconds [757.8 minutes]

SweetAndLow · Apr 30, 2017

Hmmm wonder why it doesn't give you the full output? Maybe Toshiba drives are different.

Sent from my Nexus 5X using Tapatalk

Sparx · Apr 30, 2017

And also elements in grown defect list aswell. That disk really looks like its broken. Very surprising. It must have been handling issues or something when i was swapping servers. Didnt show any faults before.

Sparx · Apr 30, 2017

SweetAndLow said:
Maybe Toshiba drives are different.

Yeah its a SAS drive. The output is not standardized or something for them. There is probably some tool from toshiba to read out the vendor specific stuff.

Sparx · May 2, 2017

So the drive was replaced/
resilvered during the weekend. But the write performance didnt really go up. Also the read performance is very shaky. (When looking on a SMB-share it goes from 20Mb/s to 300Mb/s). Will get exact numbers but right now im doing a scrub.
The SMB write performance was steady at 450MB/s-ish. It should be about double that. (Iperf gives me 9.8Gbit in both directions)

Code:

 pre scrub write performance
[root@freenas] /mnt/pool/users/asdf# dd if=/dev/zero of=testfile bs=1M count=100k
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 375.448286 secs (285989273 bytes/sec)

What is funny right now is that my scrub performance is significantly higher than normal write performance.

Code:

  pool: pool
state: ONLINE
  scan: scrub in progress since Tue May  2 08:10:01 2017
		8.70T scanned out of 12.3T at 700M/s, 1h30m to go
		0 repaired, 70.48% done

Any ideas?

Edit: The read is actually 20MB/s to 300MB/s. was remembering it wrong. But still very much going up and down.

SweetAndLow · May 2, 2017

What's the shift and dataset block size? When getting ashift you use zdb and make sure you don't grab your boot device value. Your perf is about half what it should be.

Sent from my Nexus 5X using Tapatalk

Sparx · May 2, 2017

I cant use zdb pool directly. pool is the name of the pool :)

Is this what you asked for?

Code:

[root@freenas] /mnt/pool/users/asdf# zdb -U /data/zfs/zpool.cache
cache:
	version: 5000
	name: 'cache'
	state: 0
	txg: 92799
	pool_guid: 5217483977355999898
	hostid: 1265787019
	hostname: ''
	vdev_children: 1
	vdev_tree:
		type: 'root'
		id: 0
		guid: 5217483977355999898
		children[0]:
			type: 'disk'
			id: 0
			guid: 14236693763475585468
			path: '/dev/gptid/1ab1b350-2aae-11e7-9374-0cc47ac2d7a2'
			whole_disk: 1
			metaslab_array: 35
			metaslab_shift: 33
			ashift: 12
			asize: 1022057250816
			is_log: 0
			create_txg: 4
	features_for_read:
		com.delphix:hole_birth
		com.delphix:embedded_data
pool:
	version: 5000
	name: 'pool'
	state: 0
	txg: 239408
	pool_guid: 4175284369927752705
	hostid: 1265787019
	hostname: ''
	vdev_children: 1
	vdev_tree:
		type: 'root'
		id: 0
		guid: 4175284369927752705
		children[0]:
			type: 'raidz'
			id: 0
			guid: 18183794312889816212
			nparity: 2
			metaslab_array: 36
			metaslab_shift: 38
			ashift: 12
			asize: 35994137001984
			is_log: 0
			create_txg: 4
			children[0]:
				type: 'disk'
				id: 0
				guid: 9958022679745706725
				path: '/dev/gptid/b9143fdb-2e81-11e7-8435-0cc47ac2d7a2'
				whole_disk: 1
				DTL: 209
				create_txg: 4
			children[1]:
				type: 'disk'
				id: 1
				guid: 11453018795583805680
				path: '/dev/gptid/1cd5d458-29f8-11e7-a752-0cc47ac2d7a2'
				whole_disk: 1
				DTL: 215
				create_txg: 4
			children[2]:
				type: 'disk'
				id: 2
				guid: 8559793710603504932
				path: '/dev/gptid/1e8f9c8b-29f8-11e7-a752-0cc47ac2d7a2'
				whole_disk: 1
				DTL: 214
				create_txg: 4
			children[3]:
				type: 'disk'
				id: 3
				guid: 8681161354936130511
				path: '/dev/gptid/20491adb-29f8-11e7-a752-0cc47ac2d7a2'
				whole_disk: 1
				DTL: 213
				create_txg: 4
			children[4]:
				type: 'disk'
				id: 4
				guid: 10326190067582090984
				path: '/dev/gptid/2208be5c-29f8-11e7-a752-0cc47ac2d7a2'
				whole_disk: 1
				DTL: 212
				create_txg: 4
			children[5]:
				type: 'disk'
				id: 5
				guid: 2961336148613652957
				path: '/dev/gptid/23c4b81c-29f8-11e7-a752-0cc47ac2d7a2'
				whole_disk: 1
				DTL: 211
				create_txg: 4
	features_for_read:
		com.delphix:hole_birth
		com.delphix:embedded_data

Code:

 recordsize
[root@freenas] /mnt/pool/users/asdf# zfs get recordsize pool
NAME  PROPERTY	VALUE	SOURCE
pool  recordsize  128K	 default
[root@freenas] /mnt/pool/users/asdf# zfs get recordsize pool/users
NAME		PROPERTY	VALUE	SOURCE
pool/users  recordsize  128K	 default

SweetAndLow · May 2, 2017

How full is the pool? You can also ignore the scrub performance that number is worthless.

Sparx · May 2, 2017

SweetAndLow said:
How full is the pool?

Not very.

Code:

[root@freenas] /mnt/pool/users/asdf# zpool list
NAME		   SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
cache		  944G   482M   944G		 -	 0%	 0%  1.00x  ONLINE  /mnt
freenas-boot   444G  1.69G   442G		 -	  -	 0%  1.00x  ONLINE  -
pool		  32.5T  12.6T  19.9T		 -	21%	38%  1.00x  ONLINE  /mnt

Edit:
BTW. Why does it say 12.6 allocated when its just 8.6T data in it. Didnt understand that part.

Sparx · May 2, 2017

Or did you mean the volblocksize before?

Code:

[root@freenas] /mnt/pool/users/asdf# zfs get volblocksize pool/users
NAME		PROPERTY	  VALUE	 SOURCE
pool/users  volblocksize  -		 -
[root@freenas] /mnt/pool/users/asdf# zfs get volblocksize pool
NAME  PROPERTY	  VALUE	 SOURCE
pool  volblocksize  -		 -

Sparx · May 3, 2017

Ah.. I get it. Its the pooled size of all disks. Overhead from parity.

Sparx · May 5, 2017

Help?

Dice · May 5, 2017

Not quite sure where you are at in this problem at present. Here is an effort to clarify matters:
1. Did you manage to established all functioning drives?
2. Have you eliminated possibilities that cables/hotswap slot causes issues? (aka can you reproduce the slow speed on another drive in that slots?)
3. Is the current main problem related to network speeds?

Sparx · May 5, 2017

Dice said:
1. Did you manage to established all functioning drives?
2. Have you eliminated possibilities that cables/hotswap slot causes issues? (aka can you reproduce the slow speed on another drive in that slots?)
3. Is the current main problem related to network speeds?

1) I have replaced/resilvered the faulty drive and reformatted it. This one is not used but (after format) passes smartctl -long test. All drives used in pool shows normal speeds with the script suggested before. solnet.
2) Not tested. But since the other drives in pool show good performance maybe it isnt needed? I will re-run the solnet test over the weekend to ensure the results.
3) Iperf tests in both directions show 9.8ish gbit. That shouldnt be the issue.

Edit: The iperf was run with -w 2M to be able to get speed up. With -w 512k i had to run two instances to get the speed to ~10gbit. That could be related to processing power I guess.

Dice · May 5, 2017

1. good.
2. cool. I concur, don't think it is needed at this point.
3.

Sparx said:
The SMB write performance was steady at 450MB/s-ish. It should be about double that.

Looks like you could look into tweaking SMB. There is both a resource and should be an old how-to guide or sticky (maybe in the archived section) that provide some options.
4.

Sparx said:
. That could be related to processing power I guess.

Referencing the cpu graph and load average may perhaps give additional clues (please post your findings).

cheers.

Sparx · May 5, 2017

So when looking in the history of the graphs it has never been above 20% load on CPU. But each individual core still has its limitations especially on the xeon-d that is rather low turbo speeds. Thats why i think one process of iperf might not be able to fill the performance on 10gbit with smaller packages. (Its the xeon-d 1528). So 6 core with turbo up to.. ehhhh.. 2.5Ghz-ish? I havnt had any real success when trying to see the actual current cpu frequency. It always states 1901. With the meaning of anywhere between 1900-2500MHz. For previous os that was debian based i could install a tool called i7z that could see the actual frequencies.
Hmm.. a quick google showed me that it is actually available for freebsd.
https://github.com/freebsd/freebsd-ports/tree/master/sysutils/i7z'] [URL] https://github.com/freebsd/freebsd-ports/tree/master/sysutils/i7z']https://github.com/freebsd/freebsd-ports/tree/master/sysutils/i7z [/URL]

SMB tweaking could definitely be the next step. But what is a bit strange for me is the sequential write performance when doing a 100GB testfile with dd is that the performance drops a bit. Now that wont be a big issue since the files i will place are so far mostly 10-15GB. And the read performance is very unstable. Ill show a windows graph of it a bit later. Cant access the server ATM.

Sparx · May 5, 2017

Im coming back again to the performance of the ZFS. Just look at this dd read test. So at least the arc cache works. But oh my its slow without it cached.

Code:

First without cache
root@freenas:/mnt/pool/downloads/openSUSE-Leap-42.2-DVD-x86_64.iso of=/dev/null bs=1M
4451+1 records in
4451+1 records out
4667525120 bytes transferred in 158.049951 secs (29531962 bytes/sec)
Second cached.
root@freenas:/mnt/pool/downloads/openSUSE-Leap-42.2-DVD-x86_64.iso of=/dev/null bs=1M
4451+1 records in
4451+1 records out
4667525120 bytes transferred in 1.425335 secs (3274685325 bytes/sec)

30MB/s vs 3GB/s read? Jeeeeeez. I was googling how to flush the arc cache but didnt find anything for freebsd ( just for test purposes to not have to find a new file all the time when trying dd from an actual file instead of cached data). Linux has some command like:

Code:

 echo 3 > /proc/sys/vm/drop_cache

Sparx · May 5, 2017

Oh. I just remembered. I had popped an NVMe drive in the server and not actually used it for anything. I shared a new folder with SMB and got write performance around 450MB/s and a read of around 910MB/s. So that would be another ARC read copy I guess. I think i will try switching out my raid/hba tomorrow for something else. See if that is broken perhaps. I think I have an adaptec 8885 laying around somewhere.

Sparx · May 5, 2017

The is the write performance of the NVMe drive.

Code:

[martin@freenas /mnt/cache/downloads2]$ dd if=/dev/zero of=testfile bs=1m count=100k
102400+0 records in
102400+0 records out
107374182400 bytes transferred in 79.314044 secs (1353785241 bytes/sec)

So that 450MB/s write over SMB might be a bottleneck after all on my other pool. But the read on the other hand. Ill try another hba.

Important Announcement for the TrueNAS Community.

Help with ZFS performance. Another one of those ;)

Sweet'NASty

Contributor

Sweet'NASty

Contributor

Contributor

Contributor

Sweet'NASty

Contributor

Sweet'NASty

Contributor

Contributor

Contributor

Contributor

Wizard

Contributor

Wizard

Contributor

Contributor

Contributor

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "Help with ZFS performance. Another one of those ;)"

Similar threads