Slow spa_sync after dataset deletion

drdreuza · May 4, 2020

Hello,

I have a some simple questions:

1) When the system is saying "slow spa_sync: ..." is the server doing some procedure or is doing nothing/waiting?

2) if i do a crtl+t I see a load of 0.00 and the state [sbwait] seems the server is doing nothing, there is a way to see what is going on?

3) I have deleted a 3Tb dataset which has deduplication on an I know this could be the cause, but why with 16cpu/32thread and 128Gb the server has to be down for days for a work (scrub or deletion) that can be done in background?

Freenas: 11.3-U2
Machine: HP dl360e
2xCPU: E5-2450L (8cores/16thread)
Ram: 128Gb (64Gb x cpu) 1333MHz

Storage.
1 Pool RaidZ1
+
500GB nvme L2ARC

-Physical Drive in Port 1I Box 1 Bay 1
- Status OK
  Serial Number WFG0M610
  Model ST4000LM024-2AN1
  Media Type HDD
  Capacity 4000 GB
  Location Port 1I Box 1 Bay 1
  Firmware Version 0001
  Drive Configuration Unconfigured
  Encryption Status Not Encrypted
-Physical Drive in Port 1I Box 1 Bay 2
- Status OK
  Serial Number WFG0X4CB
  Model ST4000LM024-2AN1
  Media Type HDD
  Capacity 4000 GB
  Location Port 1I Box 1 Bay 2
  Firmware Version 0001
  Drive Configuration Unconfigured
  Encryption Status Not Encrypted
-Physical Drive in Port 1I Box 1 Bay 3
- Status OK
  Serial Number WFF0V8VK
  Model ST4000LM024-2AN1
  Media Type HDD
  Capacity 4000 GB
  Location Port 1I Box 1 Bay 3
  Firmware Version 0001
  Drive Configuration Unconfigured
  Encryption Status Not Encrypted
-Physical Drive in Port 1I Box 1 Bay 4
- Status OK
  Serial Number WFG148M6
  Model ST4000LM024-2AN1
  Media Type HDD
  Capacity 4000 GB
  Location Port 1I Box 1 Bay 4
  Firmware Version 0001
  Drive Configuration Unconfigured
  Encryption Status Not Encrypted

Boot status:

drdreuza · May 5, 2020

I think slow spa_sync is just waiting forever, every (clean ctrl+alt+canc) reboot seems that the condensing procedure goes a little forward.

I downgraded from 11.3U1 to 11.2U5 but i cannot mount the pool.

zpool import
pool: spazio1
id: 2243976908156379567
state: UNAVAIL
status: The pool can only be accessed in read-only mode on this system. It
cannot be accessed in read-write mode because it uses the following
feature(s) not supported on this system:
com.delphix:spacemap_v2 (Space maps representing large segments are more efficient.)
action: The pool cannot be imported in read-write mode. Import the pool with
"-o readonly=on", access the pool on a system that supports the
required feature(s), or recreate the pool from backup.
config:

spazio1 UNAVAIL unsupported feature(s)
raidz1-0 ONLINE
gptid/9a714cde-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
gptid/9c7a9a9e-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
gptid/9e9cdf02-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
gptid/a2b56e84-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
logs
gptid/a36b862d-cbfd-11e9-9ade-fc15b4106fc8 ONLINE

Seems that we cant go backward from 11.3 to 11.2 and 11.3 il somewhat buggy?
Disks are good

Yorick · May 5, 2020

Oh it's not buggy. Looks like you updated the pool feature flags, which warns you the pool will be read-only on a previous version. And so it is.

Yorick · May 5, 2020

There is an older thread that talks about this, over yonder: https://www.ixsystems.com/community/threads/slow-spa_sync-on-reboot.42349/

You say you deleted a dataset, so that’s a little different, not a zvol.

How full was this pool?

So if I get this right, because there was a reboot before the operation completed, now it has to do this on import. Every reboot will just have it start over. It could take weeks or months to complete, depending on how badly fragmented the pool was and the speed of your drive. You are not gated by CPU and RAM, but by IOPS.

Couple options for you.
- reboot into 11.3 and wait. If need be weeks and months. Should eventually complete and finish the import.
- install a clean FreeNAS 11.3 as a separate boot environment, one that doesn’t “know” about this pool. It should boot and allow you to do the import from command line. Will take the same time but system is up
- same idea but on a different box, with the drives moved over, so this one can boot up sans its pool

One way or another, ZFS needs to finish this transaction.

What caused the initial reboot that triggered the loop? And how long had it been since the dataset had been deleted, at that point? That’s just a curious question for reference

HoneyBadger · May 5, 2020

drdreuza said:
ST4000LM024-2AN1

Very likely SMR which won't help, especially since your deduplication tables will be getting huge updates from destroying 3T of deduped data.

Yorick · May 5, 2020

HoneyBadger said:
Very likely SMR

Oh that is a very good point. According to Seagate these are DM-SMR, which means on top of everything else, they're slow. https://www.seagate.com/www-content...op-fam/barracuda_25/en-us/docs/100804767e.pdf . "Shingled magnetic recording", right in the data sheet.

You're gonna be here a while, OP ...

drdreuza · May 6, 2020

Yorick said:
Oh it's not buggy. Looks like you updated the pool feature flags, which warns you the pool will be read-only on a previous version. And so it is.

Yeah, now i know...

Yorick said:
There is an older thread that talks about this, over yonder: https://www.ixsystems.com/community/threads/slow-spa_sync-on-reboot.42349/

You say you deleted a dataset, so that’s a little different, not a zvol.

How full was this pool?

So if I get this right, because there was a reboot before the operation completed, now it has to do this on import. Every reboot will just have it start over. It could take weeks or months to complete, depending on how badly fragmented the pool was and the speed of your drive. You are not gated by CPU and RAM, but by IOPS.

Couple options for you.
- reboot into 11.3 and wait. If need be weeks and months. Should eventually complete and finish the import.
- install a clean FreeNAS 11.3 as a separate boot environment, one that doesn’t “know” about this pool. It should boot and allow you to do the import from command line. Will take the same time but system is up
- same idea but on a different box, with the drives moved over, so this one can boot up sans its pool

One way or another, ZFS needs to finish this transaction.

What caused the initial reboot that triggered the loop? And how long had it been since the dataset had been deleted, at that point? That’s just a curious question for reference

The pool was at 75%-80%.

I have verified that after every reboot the bptree traversing (before the slow spa_sync) bookmark raises -1/1272/0/0, then the next reboot -1/1878/0/0 ecc.. So i think that the sync maybe is not restarting from scratch every time.

Here is how i managed to regain a useable pool:

1 downgraded from 11.3U1 to 11.2U5, freenas come up but as you correctly stated the pool can't be mounted.
So i have done an upgrade to 11.3U2.1 and everything changed, after some time (20min?), the system came up. (no slow spa_sync)

then after a while freenas goes in panic and rebooted and after 2 reboot stabilized.

Now is up and data accessible.

but... (my mistakes)

i have turned off the deduplication (bad?) now the quota of the pool raised to 93%
the 3.5TB dataset i have deleted still not have released the space
The scrub seems literally takes forever to complete 15.135.961.220 years: I dont think smr disks last that long ;)

The initial reboot was a shutdown i had to do due a scheduled maintenance power outage in my area :-/

Thank you, is good to have someone help when a data pool goes wild :)

drdreuza · May 6, 2020

HoneyBadger said:
Very likely SMR which won't help, especially since your deduplication tables will be getting huge updates from destroying 3T of deduped data.

I definitely look into what is smr...

Yorick · May 6, 2020

drdreuza said:
I definitely look into what is smr...

It’s a hard drive recording technology that increases density by 10-15% at the cost of drastically slowing down sustained random write. Guess what ZFS does a lot - sustained random write.

Any 2.5” drive of 2TB or more will use SMR, most likely. Some 1TB do as well. WD Red 2-6TB do, and a lot of consumer / desktop drives do.

There’s been quite the excitement over WD “sneaking” SMR into NAS drives. Good for you on avoiding all that shouting so far.

In a nutshell: SMR slows things down so drastically during operations like resilver - or in your case deletion of a deduped dataset - that a case can be made that SMR should never be used with ZFS.

Yorick · May 6, 2020

drdreuza said:
the 3.5TB dataset i have deleted still not have released the space

Something that could cause this:
- Do you have snapshots of the dataset? Delete all of them.

Yorick · May 6, 2020

In the short term, you’re likely going to work to get your pool usage down below 80%, and, if you haven’t done so already, create a backup of your data, and live with the slow write behavior.

Also in the short term, learn about why dedupe in ZFS is always the wrong answer. I kid only a little: Really, in almost all cases, the overhead of dedupe isn’t worth it. There are good articles on that online, including this one: https://www.ixsystems.com/blog/freenas-worst-practices/

At least you have enough RAM for those dedupe tables.

Medium term, plan to move to a case that can hold 3.5” drives and transfer your data over to a new pool. While you are at it, consider the merits of a raidz2 pool, so it has a decent chance of surviving a resilver. 6x8TB, maybe? If going for smaller drives, consider Ironwolf or N300 or Red Pro, but not Red - you’ll be right back in SMR Land with the small WD Red.

The WD Elements 8TB external, and similar externals, currently still use HGST He8 inside. Those are nice drives. A lot of folk “shuck” to get these inexpensively.

drdreuza · May 7, 2020

Yorick said:
Something that could cause this:
- Do you have snapshots of the dataset? Delete all of them.

Fortunately no, the server is only for backup purposes.

Now it seems that is accelerating and freeing the space, I calculated a raw 600Gb of space freed per day.

Yorick said:
In the short term, you’re likely going to work to get your pool usage down below 80%, and, if you haven’t done so already, create a backup of your data, and live with the slow write behavior.

Also in the short term, learn about why dedupe in ZFS is always the wrong answer. I kid only a little: Really, in almost all cases, the overhead of dedupe isn’t worth it. There are good articles on that online, including this one: https://www.ixsystems.com/blog/freenas-worst-practices/

At least you have enough RAM for those dedupe tables.

Medium term, plan to move to a case that can hold 3.5” drives and transfer your data over to a new pool. While you are at it, consider the merits of a raidz2 pool, so it has a decent chance of surviving a resilver. 6x8TB, maybe? If going for smaller drives, consider Ironwolf or N300 or Red Pro, but not Red - you’ll be right back in SMR Land with the small WD Red.

The WD Elements 8TB external, and similar externals, currently still use HGST He8 inside. Those are nice drives. A lot of folk “shuck” to get these inexpensively.

I think I stumbled on the "open buffet syndrome" :) when I created the pools was easy, some click and I enabled everything (i have 32thread and 128GB ram). I admit that is not the first time with zfs, but it is on freenas. The 2.5" drives with smr and not reading the best practices does the rest.

Lesson learned :) "RTFM" and "You cannot bend the law of physics" :)

The 3.5" disks are an interesting option, in the past I used to put inside my servers 2.5" disks mainly for energy concerns (also ibm3650m2,3 ecc.. are mostly on 2.5"), but now the densities are so high that in fact with a 3.5" disk you can replace a bunch of 2.5" disks, so there is no reason to insist on 2.5" disks with these limitations: smr, 5200rpm etc.

Thank you for the suggestions!

Yorick · May 7, 2020

drdreuza said:
5200rpm

That's not really a limitation. I like the 5400/5900 rpm drives, they are quieter and draw less power. By the time your pool fills, whatever speed advantage a 7200rpm drive has is gone. If you can get 4TB of 7200rpm or 8TB of 5400rpm for the same price (that's not a crazy comparison, given that "shucking" is a thing), the 8TB drives will win the speed race every single time, just because the pool isn't as full.

Important Announcement for the TrueNAS Community.

Slow spa_sync after dataset deletion

drdreuza

Cadet

drdreuza

Cadet

Yorick

Wizard

Yorick

Wizard

HoneyBadger

actually does care

Yorick

Wizard

drdreuza

Cadet

drdreuza

Cadet

Yorick

Wizard

Yorick

Wizard

Yorick

Wizard

drdreuza

Cadet

Yorick

Wizard

Similar threads

Status	OK
Serial Number	WFG0M610
Model	ST4000LM024-2AN1
Media Type	HDD
Capacity	4000 GB
Location	Port 1I Box 1 Bay 1
Firmware Version	0001
Drive Configuration	Unconfigured
Encryption Status	Not Encrypted

Status	OK
Serial Number	WFG0X4CB
Model	ST4000LM024-2AN1
Media Type	HDD
Capacity	4000 GB
Location	Port 1I Box 1 Bay 2
Firmware Version	0001
Drive Configuration	Unconfigured
Encryption Status	Not Encrypted

Status	OK
Serial Number	WFF0V8VK
Model	ST4000LM024-2AN1
Media Type	HDD
Capacity	4000 GB
Location	Port 1I Box 1 Bay 3
Firmware Version	0001
Drive Configuration	Unconfigured
Encryption Status	Not Encrypted

Status	OK
Serial Number	WFG148M6
Model	ST4000LM024-2AN1
Media Type	HDD
Capacity	4000 GB
Location	Port 1I Box 1 Bay 4
Firmware Version	0001
Drive Configuration	Unconfigured
Encryption Status	Not Encrypted