Can't get why scrub is so slow / replication failing during slow scrub

Ulysse_31 · Aug 22, 2023

sretalla said:
So are you saying we're not even talking about TrueNAS here?

You're posting in the TrueNAS CORE forum.

The issue I am having *is* on a TrueNAS core server ... the server I compared it with (the older server before) in which I DO NOT HAVE ANY issue during scrubbing and replications is an old SOLARIS 11 server ...

sretalla · Aug 22, 2023

Ulysse_31 said:
The issue I am having *is* on a TrueNAS core server ... the server I compared it with (the older server before) in which I DO NOT HAVE ANY issue during scrubbing and replications is an old SOLARIS 11 server ...

Well, you're comparing current(ish) OpenZFS to (old) Solaris ZFS code in addition to any hardware differences.

Like I already said though, if you're on TrueNAS CORE, you really need to consider getting that firmware updated.

Ulysse_31 · Aug 22, 2023

sretalla said:
Well, you're comparing current(ish) OpenZFS to (old) Solaris ZFS code in addition to any hardware differences.

2 things here :
1-OpenZFS and ZFS on solaris 11 are not that far away ... I'm talking about zfs v31 on solaris ...
2-Even if OpenZFS has evolved a lot since then (because v31 is quite old) I would suppose the scrubbing algorithm would improve, not regress in OpenZFS ^^' so performances would increase or stay the same ^^'

sretalla said:
Like I already said though, if you're on TrueNAS CORE, you really need to consider getting that firmware updated.

Yes, I do not that and we will certainly proceed to the firmware upgrade ... We'll have to find a window for that ... may take some time so I'll come back about this.

To go back about the performances issue (which, may be totally linked to the firmware card) I just wanted to clarify some more informations about scrubbing time and IO performances : I launched a scrub to check the performances on the old server (solaris one) just to confirm I was not saying rubbish ^^'. So to recap => an undefrag pool of successive raidz2 vdev added at 90% usage ... right now at 90% usage ... here is the output of the current zpool status :

root@<host>:~# zpool status
pool: STORAGE1
state: ONLINE
scan: scrub in progress since Tue Aug 22 12:00:54 2023
3.00T scanned out of 111T at 223M/s, 5d21h to go
0 repaired, 2.70% done
config:

NAME STATE READ WRITE CKSUM
STORAGE1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
c2t6d0 ONLINE 0 0 0
c2t7d0 ONLINE 0 0 0
c2t8d0 ONLINE 0 0 0
c2t9d0 ONLINE 0 0 0
c2t10d0 ONLINE 0 0 0
c2t11d0 ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
c2t12d0 ONLINE 0 0 0
c2t13d0 ONLINE 0 0 0
c2t14d0 ONLINE 0 0 0
c2t15d0 ONLINE 0 0 0
c2t16d0 ONLINE 0 0 0
c2t17d0 ONLINE 0 0 0
c2t18d0 ONLINE 0 0 0
c2t19d0 ONLINE 0 0 0
c2t20d0 ONLINE 0 0 0
c2t21d0 ONLINE 0 0 0
c2t22d0 ONLINE 0 0 0
c2t23d0 ONLINE 0 0 0
raidz2-2 ONLINE 0 0 0
c2t24d0 ONLINE 0 0 0
c2t25d0 ONLINE 0 0 0
c2t26d0 ONLINE 0 0 0
c2t27d0 ONLINE 0 0 0
c2t28d0 ONLINE 0 0 0
c2t29d0 ONLINE 0 0 0
c2t30d0 ONLINE 0 0 0
c2t31d0 ONLINE 0 0 0
c2t32d0 ONLINE 0 0 0
c2t33d0 ONLINE 0 0 0
c2t34d0 ONLINE 0 0 0
c2t35d0 ONLINE 0 0 0

so ... yeah ... 223MBytes per sec ... no issue with it ...

sretalla · Aug 22, 2023

Ulysse_31 said:
223MBytes per sec ... no issue with it ...

More interesting to know is your recordsize and the types of files (large or small... or a mix)

What I find interesting is that a scrub will take you nearly 6 days.

Ulysse_31 · Aug 22, 2023

sretalla said:
More interesting to know is your recordsize and the types of files (large or small... or a mix)

What I find interesting is that a scrub will take you nearly 6 days.

For a 111Tb of data in a 85% used pool (seems we cleaned up some data ^^), 6 days are not a big deal ^^ (and it did not impact the replications in the past ^^)

Ulysse_31 · Aug 22, 2023

oh and we are at 5days now ^^' (255MBytes per sec)

Ulysse_31 · Aug 22, 2023

I guess in fact I was wrong ... we are getting way more than 200MBytes per sec ^^', we are now at 265MBytes per sec and it is still increasing ...

root@<host>:~# zpool status
pool: STORAGE1
state: ONLINE
scan: scrub in progress since Tue Aug 22 12:00:54 2023
4.50T scanned out of 111T at 265M/s, 4d21h to go
0 repaired, 4.05% done

and again, this is an "old" R610 with MD1200 SAS disk bays ... so we are far from 20MB/s ...

Ericloewe · Aug 22, 2023

Yeah, v28 < zfs < 5000 has increasingly little to do with OpenZFS, so big differences are expected. The specifics are hard to analyze abstractly because it's also a very different OS, but conclusions from one do not apply to the other directly.

Not to say something isn't wrong. I do agree that the updated firmware on the HBA is as good a place as any to start.

Ulysse_31 · Aug 23, 2023

Ericloewe said:
Yeah, v28 < zfs < 5000 has increasingly little to do with OpenZFS, so big differences are expected. The specifics are hard to analyze abstractly because it's also a very different OS, but conclusions from one do not apply to the other directly.

Not to say something isn't wrong. I do agree that the updated firmware on the HBA is as good a place as any to start.

Totally agree, I will proceed to the firmware update as a first start, and if necessary will pause/resume scrubs waiting for the moment in which we can do the update, it'll be a good start to find out from were the problem comes.
Just a quick update about the scrub status on our old system :

489MBytes per sec and 1day 21hours to go ... I think we can definitively conclude that scrub on RAIDZn are NOT around 20MB/s ...

sretalla · Aug 23, 2023

Ulysse_31 said:
I think we can definitively conclude that scrub on RAIDZn are NOT around 20MB/s ...

To confirm what I originally said... that's a potential "worst case" scenario (lots of not filled, non-sequential blocks) that can explain why things would be slow and that would be normal... doesn't mean it can't be faster if IOPS are able to grab multiple sequential (completely filled) blocks.

Ulysse_31 · Aug 23, 2023

sretalla said:
To confirm what I originally said... that's a potential "worst case" scenario (lots of not filled, non-sequential blocks) that can explain why things would be slow and that would be normal... doesn't mean it can't be faster if IOPS are able to grab multiple sequential (completely filled) blocks.

Oh ? so when someone comes telling he is having issues with scrub ... the first things you say are :
1-your config/setup is "wrong"
2-please look at this "potential worse case scenario"
3- you are "wrong"

...
Sorry but ... I think you should work a bit on how to proceed troubleshooting ^^'

joeschmuck · Aug 23, 2023

Ulysse_31 said:
Oh ? so when someone comes telling he is having issues with scrub ... the first things you say are :
1-your config/setup is "wrong"
2-please look at this "potential worse case scenario"
3- you are "wrong"

...
Sorry but ... I think you should work a bit on how to proceed troubleshooting ^^'

Why so aggressive? I personally think you are getting great advice. @sretalla is only providing you information which sounds reasonable to me in order to assist you.

Ericloewe · Aug 23, 2023

I believe our neutral friend has been stating something slightly different, which I agree with:

The numbers you're seeing are not out of the realm of what might happen, depending on the data you're storing. We don't actually know what you're storing (unless I missed that in one of your posts) - but if it's broadly similar to your other system, then there is probably something that can be easily fixed, which is currently slowing things down a lot.

"Slowing things down a lot" is in itself also a problem, as things start to time out, especially over the network. It's not a guarantee that it's a problem derived from the slowness, but there also isn't an indication that it isn't, considering that reducing pool load by pausing the scrub works around the issue.

More generally, context is key and the more you can tell us about your data and your workload, the more specific advice can be.

Ulysse_31 · Aug 23, 2023

joeschmuck said:
Why so aggressive? I personally think you are getting great advice. @sretalla is only providing you information which sounds reasonable to me in order to assist you.

I am not aggressive, just telling something that seems for me obvious : do not state some easy comment like "you are wrong" when someone is asking for advices.
Let me reformulate, even if my problem is not solved, I'm really happy to have a clue and a direction to search, the *second* part in which we discussed about the LSI card message and it *potentially* be the source (potentially because the thread mentioned strictly says "Please Note: This problem applies only to firmware versions below 16.00.12.00 and Only affects SATA drives. SAS drives are not affected" and here we are talking about SAS ...) is really a good start and a first clue.
It just feel normal to me that if something seems totally unproductive, like saying just at first start with no ask before "your config/setup is wrong" and giving some totally unrelated calculation to approve the first stance ... to finally then say it was in a worse case scenario when proven wrong ... well ... yes it needs to be mentioned, because in a support forum, you are searching for support ... not easy judgement with no real logic.

So again yes ^^ thanks a lot for the hint on the card firmware (but ... no thanks on the RAIDZn performance advice :p ).

Ulysse_31 · Aug 23, 2023

Ericloewe said:
I believe our neutral friend has been stating something slightly different, which I agree with:

The numbers you're seeing are not out of the realm of what might happen, depending on the data you're storing. We don't actually know what you're storing (unless I missed that in one of your posts) - but if it's broadly similar to your other system, then there is probably something that can be easily fixed, which is currently slowing things down a lot.

"Slowing things down a lot" is in itself also a problem, as things start to time out, especially over the network. It's not a guarantee that it's a problem derived from the slowness, but there also isn't an indication that it isn't, considering that reducing pool load by pausing the scrub works around the issue.

More generally, context is key and the more you can tell us about your data and your workload, the more specific advice can be.

What really confuses me is that even if the pool is with high IOs ... a zfs sync should not in theory be ended like this with an error .. in worse case scenario the sync should be slow as hell with a ridiculous allocated bandwidth ...
I do see the link between the high IOs and the scrub ... but why the syncs are getting interrupted ? this I do not understand ...

Ulysse_31 · Aug 23, 2023

Ericloewe said:
More generally, context is key and the more you can tell us about your data and your workload, the more specific advice can be.

How do we work with it ... let me give some descriptions about it :
Historically, we had a production and storage and a "backup" (mirroring with more retention ...) under solaris, production storage was with some external support (because at that time, we were using AoE from Coraid ...), the "backup" was our support only. The snapshottings and replications were made with internal scripts (using zfs sync / recv and some DB to keep dataset source/dest/current_sync_snapshot) ...
We replaced the production with a TrueNAS from IXSystems, and we built a TrueNAS Core to take the place of the "backup" node for now (the node that is having those replication issues when scrubbing).
We kept the same "behavior" that we had with the solaris one : the production server has various datasets, accessed via different protocols depending the departments (SMB/CIFS, NFS ...), it has various different retention rules via snapshots. The "backup" mainly replicates the daily snapshots, that have 2/3 weeks of retention on production, but generates weekly and monthly snapshots locally in order to keep more retention ...
This is the general behavior for ... let say 95% of the datasets ... we do have like a month ago a new dataset that is being sync weekly instead of daily (weekly snapshots made in production and those weekly are then replicated to the "backup") ... in total there is actually 47 datasets, the replications are made during the night, about half launched at 1AM ... another half launched at 2AM ...

Not sure if all this is clear ... so if you have questions don't hesitate ^^

sretalla · Aug 23, 2023

Ulysse_31 said:
do not state some easy comment like "you are wrong" when someone is asking for advices.

Looking back over the thread, I see I said that one time, in response to your claim that IOPS are spread across all drives in RAIDZ2 operations.

I stand by my statement. They are not. I explained why I thought that was the case more than once.

I did not claim at any other time that you were wrong and offered suggestions and explanations for what might be happening and why.

You have at no point "proven me wrong", but I hope we have come to a better understanding of what was said.

I also clearly explained to you why I proposed updating your (13 major versions behind) firmware to current and that is was related in no way to the "last incremental fix" discussed on the page I referred you to (which is the iX recommended version for your card).

In reflecting, I really don't feel like I should be here defending my actions, so I'll stop doing that and wish you well in your journey to a solution.

Ericloewe · Aug 23, 2023

Ulysse_31 said:
What really confuses me is that even if the pool is with high IOs .

How many IOPS is it actually pushing, anyway? What does zpool iostat say? Of course, iostat on the disks themselves may also be relevant, depending on how things go.

Ulysse_31 said:
How do we work with it ... let me give some descriptions about it :
Historically, we had a production and storage and a "backup" (mirroring with more retention ...) under solaris, production storage was with some external support (because at that time, we were using AoE from Coraid ...), the "backup" was our support only. The snapshottings and replications were made with internal scripts (using zfs sync / recv and some DB to keep dataset source/dest/current_sync_snapshot) ...
We replaced the production with a TrueNAS from IXSystems, and we built a TrueNAS Core to take the place of the "backup" node for now (the node that is having those replication issues when scrubbing).
We kept the same "behavior" that we had with the solaris one : the production server has various datasets, accessed via different protocols depending the departments (SMB/CIFS, NFS ...), it has various different retention rules via snapshots. The "backup" mainly replicates the daily snapshots, that have 2/3 weeks of retention on production, but generates weekly and monthly snapshots locally in order to keep more retention ...
This is the general behavior for ... let say 95% of the datasets ... we do have like a month ago a new dataset that is being sync weekly instead of daily (weekly snapshots made in production and those weekly are then replicated to the "backup") ... in total there is actually 47 datasets, the replications are made during the night, about half launched at 1AM ... another half launched at 2AM ...

Not sure if all this is clear ... so if you have questions don't hesitate ^^

What kind of data? Large files? Databases? Small files?

Ulysse_31 · Aug 23, 2023

sretalla said:
Looking back over the thread, I see I said that one time, in response to your claim that IOPS are spread across all drives in RAIDZ2 operations.

I stand by my statement. They are not. I explained why I thought that was the case more than once.

I encourage you to do the same => yes I said that IOPS are spread across the drives ... because they are ^^ ... I think there is here one mis-understanding => spreading something across is "dividing one quantity, evenly or non-evenly, to multiple different places" ... again ... like I already said before ... I did NOT said that IOPS were being cumulated/sumed/whatever ... So I do stand on my statements also ^^

sretalla said:
I did not claim at any other time that you were wrong and offered suggestions and explanations for what might be happening and why.

You have at no point "proven me wrong", but I hope we have come to a better understanding of what was said.

I'm sorry but what I understand from your post can be simply resumed into "an average hard drive can do 20Mb/s" ... then you added the worse case scenario afterwards ...

sretalla said:
I also clearly explained to you why I proposed updating your (13 major versions behind) firmware to current and that is was related in no way to the "last incremental fix" discussed on the page I referred you to (which is the iX recommended version for your card).

Yes, and I do not have any issue with that, I even said (and I'll say it again) thank you for hinting this, I solely said that it is for now an hint, not a certainty that the issue is coming from there, since the initial post itself is for issues with old firmwares with SATA drives (NOT SAS ... that we are using ...).

sretalla said:
In reflecting, I really don't feel like I should be here defending my actions, so I'll stop doing that and wish you well in your journey to a solution.

I'm not asking you to defend nor justify ... again ... I just say that answering directly :
1- "your config/setup is wrong" (while ... it isn't ... => you did not even ask any further infos about it ... just read my first post ... and yours just below ... then my answer ... and your again ... the 4 first post gives the tone)
then:
2- its normal because of the size and the following calculation ...
3- no, in fact it was in a "worse case scenario"

I am factual, that, is not what I was expecting from a support forum ... that is all.

Ulysse_31 · Aug 23, 2023

Ericloewe said:
How many IOPS is it actually pushing, anyway? What does zpool iostat say? Of course, iostat on the disks themselves may also be relevant, depending on how things go.

What kind of data? Large files? Databases? Small files?

IOPS are depending on how much data was generated during the day this can depend on the period on the week, the monthly or event the moment in the year ... we are NOT an IT company ... the data itself is very diverse, and depends on the department dataset, can be from 4k csv files, to 20 Mbytes zip files ... or 200 Mbytes tar.gz ... can be typical excel/powerpoint/word documents ... would you like a rought estimate of the repartition of the file types ?

Important Announcement for the TrueNAS Community.

Can't get why scrub is so slow / replication failing during slow scrub

Dabbler

Powered by Neutrality

Dabbler

Powered by Neutrality

Dabbler

Dabbler

Dabbler

Server Wrangler

Dabbler

Powered by Neutrality

Dabbler

Old Man

Server Wrangler

Dabbler

Dabbler

Dabbler

Powered by Neutrality

Server Wrangler

Dabbler

Dabbler

Similar threads