Slow spa_sync after dataset deletion

drdreuza

Cadet
Joined
May 4, 2020
Messages
5
Hello,

I have a some simple questions:

1) When the system is saying "slow spa_sync: ..." is the server doing some procedure or is doing nothing/waiting?

2) if i do a crtl+t I see a load of 0.00 and the state [sbwait] seems the server is doing nothing, there is a way to see what is going on?

3) I have deleted a 3Tb dataset which has deduplication on an I know this could be the cause, but why with 16cpu/32thread and 128Gb the server has to be down for days for a work (scrub or deletion) that can be done in background?

Freenas: 11.3-U2
Machine: HP dl360e
2xCPU: E5-2450L (8cores/16thread)
Ram: 128Gb (64Gb x cpu) 1333MHz

Storage.
1 Pool RaidZ1
+
500GB nvme L2ARC
  • -Physical Drive in Port 1I Box 1 Bay 1
    • Status OK
      Serial NumberWFG0M610
      ModelST4000LM024-2AN1
      Media TypeHDD
      Capacity4000 GB
      LocationPort 1I Box 1 Bay 1
      Firmware Version0001
      Drive ConfigurationUnconfigured
      Encryption StatusNot Encrypted
  • -Physical Drive in Port 1I Box 1 Bay 2
    • Status OK
      Serial NumberWFG0X4CB
      ModelST4000LM024-2AN1
      Media TypeHDD
      Capacity4000 GB
      LocationPort 1I Box 1 Bay 2
      Firmware Version0001
      Drive ConfigurationUnconfigured
      Encryption StatusNot Encrypted
  • -Physical Drive in Port 1I Box 1 Bay 3
    • Status OK
      Serial NumberWFF0V8VK
      ModelST4000LM024-2AN1
      Media TypeHDD
      Capacity4000 GB
      LocationPort 1I Box 1 Bay 3
      Firmware Version0001
      Drive ConfigurationUnconfigured
      Encryption StatusNot Encrypted
  • -Physical Drive in Port 1I Box 1 Bay 4
    • Status OK
      Serial NumberWFG148M6
      ModelST4000LM024-2AN1
      Media TypeHDD
      Capacity4000 GB
      LocationPort 1I Box 1 Bay 4
      Firmware Version0001
      Drive ConfigurationUnconfigured
      Encryption StatusNot Encrypted

Boot status:
Clipboard Image (7).jpg


Clipboard Image (8).jpg
 

drdreuza

Cadet
Joined
May 4, 2020
Messages
5
I think slow spa_sync is just waiting forever, every (clean ctrl+alt+canc) reboot seems that the condensing procedure goes a little forward.

I downgraded from 11.3U1 to 11.2U5 but i cannot mount the pool.

zpool import
pool: spazio1
id: 2243976908156379567
state: UNAVAIL
status: The pool can only be accessed in read-only mode on this system. It
cannot be accessed in read-write mode because it uses the following
feature(s) not supported on this system:
com.delphix:spacemap_v2 (Space maps representing large segments are more efficient.)
action: The pool cannot be imported in read-write mode. Import the pool with
"-o readonly=on", access the pool on a system that supports the
required feature(s), or recreate the pool from backup.
config:

spazio1 UNAVAIL unsupported feature(s)
raidz1-0 ONLINE
gptid/9a714cde-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
gptid/9c7a9a9e-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
gptid/9e9cdf02-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
gptid/a2b56e84-cbfd-11e9-9ade-fc15b4106fc8 ONLINE
logs
gptid/a36b862d-cbfd-11e9-9ade-fc15b4106fc8 ONLINE

Seems that we cant go backward from 11.3 to 11.2 and 11.3 il somewhat buggy?
Disks are good
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Oh it's not buggy. Looks like you updated the pool feature flags, which warns you the pool will be read-only on a previous version. And so it is.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
There is an older thread that talks about this, over yonder: https://www.ixsystems.com/community/threads/slow-spa_sync-on-reboot.42349/

You say you deleted a dataset, so that’s a little different, not a zvol.

How full was this pool?

So if I get this right, because there was a reboot before the operation completed, now it has to do this on import. Every reboot will just have it start over. It could take weeks or months to complete, depending on how badly fragmented the pool was and the speed of your drive. You are not gated by CPU and RAM, but by IOPS.

Couple options for you.
- reboot into 11.3 and wait. If need be weeks and months. Should eventually complete and finish the import.
- install a clean FreeNAS 11.3 as a separate boot environment, one that doesn’t “know” about this pool. It should boot and allow you to do the import from command line. Will take the same time but system is up
- same idea but on a different box, with the drives moved over, so this one can boot up sans its pool

One way or another, ZFS needs to finish this transaction.

What caused the initial reboot that triggered the loop? And how long had it been since the dataset had been deleted, at that point? That’s just a curious question for reference
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912

drdreuza

Cadet
Joined
May 4, 2020
Messages
5
Oh it's not buggy. Looks like you updated the pool feature flags, which warns you the pool will be read-only on a previous version. And so it is.

Yeah, now i know...
There is an older thread that talks about this, over yonder: https://www.ixsystems.com/community/threads/slow-spa_sync-on-reboot.42349/

You say you deleted a dataset, so that’s a little different, not a zvol.

How full was this pool?

So if I get this right, because there was a reboot before the operation completed, now it has to do this on import. Every reboot will just have it start over. It could take weeks or months to complete, depending on how badly fragmented the pool was and the speed of your drive. You are not gated by CPU and RAM, but by IOPS.

Couple options for you.
- reboot into 11.3 and wait. If need be weeks and months. Should eventually complete and finish the import.
- install a clean FreeNAS 11.3 as a separate boot environment, one that doesn’t “know” about this pool. It should boot and allow you to do the import from command line. Will take the same time but system is up
- same idea but on a different box, with the drives moved over, so this one can boot up sans its pool

One way or another, ZFS needs to finish this transaction.

What caused the initial reboot that triggered the loop? And how long had it been since the dataset had been deleted, at that point? That’s just a curious question for reference

The pool was at 75%-80%.

I have verified that after every reboot the bptree traversing (before the slow spa_sync) bookmark raises -1/1272/0/0, then the next reboot -1/1878/0/0 ecc.. So i think that the sync maybe is not restarting from scratch every time.

Here is how i managed to regain a useable pool:

1 downgraded from 11.3U1 to 11.2U5, freenas come up but as you correctly stated the pool can't be mounted.
So i have done an upgrade to 11.3U2.1 and everything changed, after some time (20min?), the system came up. (no slow spa_sync)

then after a while freenas goes in panic and rebooted and after 2 reboot stabilized.

Now is up and data accessible.

but... (my mistakes)

i have turned off the deduplication (bad?) now the quota of the pool raised to 93%
the 3.5TB dataset i have deleted still not have released the space
The scrub seems literally takes forever to complete 15.135.961.220 years: I dont think smr disks last that long ;)

Clipboard Image (9).jpg
The initial reboot was a shutdown i had to do due a scheduled maintenance power outage in my area :-/

Thank you, is good to have someone help when a data pool goes wild :)
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
I definitely look into what is smr...

It’s a hard drive recording technology that increases density by 10-15% at the cost of drastically slowing down sustained random write. Guess what ZFS does a lot - sustained random write.

Any 2.5” drive of 2TB or more will use SMR, most likely. Some 1TB do as well. WD Red 2-6TB do, and a lot of consumer / desktop drives do.

There’s been quite the excitement over WD “sneaking” SMR into NAS drives. Good for you on avoiding all that shouting so far.

In a nutshell: SMR slows things down so drastically during operations like resilver - or in your case deletion of a deduped dataset - that a case can be made that SMR should never be used with ZFS.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
the 3.5TB dataset i have deleted still not have released the space

Something that could cause this:
- Do you have snapshots of the dataset? Delete all of them.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
In the short term, you’re likely going to work to get your pool usage down below 80%, and, if you haven’t done so already, create a backup of your data, and live with the slow write behavior.

Also in the short term, learn about why dedupe in ZFS is always the wrong answer. I kid only a little: Really, in almost all cases, the overhead of dedupe isn’t worth it. There are good articles on that online, including this one: https://www.ixsystems.com/blog/freenas-worst-practices/

At least you have enough RAM for those dedupe tables.

Medium term, plan to move to a case that can hold 3.5” drives and transfer your data over to a new pool. While you are at it, consider the merits of a raidz2 pool, so it has a decent chance of surviving a resilver. 6x8TB, maybe? If going for smaller drives, consider Ironwolf or N300 or Red Pro, but not Red - you’ll be right back in SMR Land with the small WD Red.

The WD Elements 8TB external, and similar externals, currently still use HGST He8 inside. Those are nice drives. A lot of folk “shuck” to get these inexpensively.
 

drdreuza

Cadet
Joined
May 4, 2020
Messages
5
Something that could cause this:
- Do you have snapshots of the dataset? Delete all of them.

Fortunately no, the server is only for backup purposes.

Now it seems that is accelerating and freeing the space, I calculated a raw 600Gb of space freed per day.
In the short term, you’re likely going to work to get your pool usage down below 80%, and, if you haven’t done so already, create a backup of your data, and live with the slow write behavior.

Also in the short term, learn about why dedupe in ZFS is always the wrong answer. I kid only a little: Really, in almost all cases, the overhead of dedupe isn’t worth it. There are good articles on that online, including this one: https://www.ixsystems.com/blog/freenas-worst-practices/

At least you have enough RAM for those dedupe tables.

Medium term, plan to move to a case that can hold 3.5” drives and transfer your data over to a new pool. While you are at it, consider the merits of a raidz2 pool, so it has a decent chance of surviving a resilver. 6x8TB, maybe? If going for smaller drives, consider Ironwolf or N300 or Red Pro, but not Red - you’ll be right back in SMR Land with the small WD Red.

The WD Elements 8TB external, and similar externals, currently still use HGST He8 inside. Those are nice drives. A lot of folk “shuck” to get these inexpensively.

I think I stumbled on the "open buffet syndrome" :) when I created the pools was easy, some click and I enabled everything (i have 32thread and 128GB ram). I admit that is not the first time with zfs, but it is on freenas. The 2.5" drives with smr and not reading the best practices does the rest.

Lesson learned :) "RTFM" and "You cannot bend the law of physics" :)

The 3.5" disks are an interesting option, in the past I used to put inside my servers 2.5" disks mainly for energy concerns (also ibm3650m2,3 ecc.. are mostly on 2.5"), but now the densities are so high that in fact with a 3.5" disk you can replace a bunch of 2.5" disks, so there is no reason to insist on 2.5" disks with these limitations: smr, 5200rpm etc.

Thank you for the suggestions!
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912

That's not really a limitation. I like the 5400/5900 rpm drives, they are quieter and draw less power. By the time your pool fills, whatever speed advantage a 7200rpm drive has is gone. If you can get 4TB of 7200rpm or 8TB of 5400rpm for the same price (that's not a crazy comparison, given that "shucking" is a thing), the 8TB drives will win the speed race every single time, just because the pool isn't as full.
 
Top