Can't get why scrub is so slow / replication failing during slow scrub

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
IOPS are depending on how much data was generated during the day this can depend on the period on the week, the monthly or event the moment in the year ... we are NOT an IT company ... the data itself is very diverse, and depends on the department dataset, can be from 4k csv files, to 20 Mbytes zip files ... or 200 Mbytes tar.gz ... can be typical excel/powerpoint/word documents ... would you like a rought estimate of the repartition of the file types ?
oh I don't know if it helps, but the file types and content between the old system and the new one are the exact same type (file type/structure did not changed between the two)
 
Last edited:

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
How many IOPS is it actually pushing, anyway? What does zpool iostat say? Of course, iostat on the disks themselves may also be relevant, depending on how things go.

What kind of data? Large files? Databases? Small files?
For the zpool iostat question ... well it seems to reflect the scrubbing :

Code:
zpool iostat STORAGE1 1 1000
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
STORAGE1    57.6T  70.6T  2.08K    181  47.1M  5.98M
STORAGE1    57.6T  70.6T  5.82K      0  33.9M      0
STORAGE1    57.6T  70.6T  4.76K      0  66.1M      0
STORAGE1    57.6T  70.6T  4.95K      0  88.6M      0
STORAGE1    57.6T  70.6T     52  1.09K   211K  14.4M
STORAGE1    57.6T  70.6T  3.77K     23  24.2M   166K
STORAGE1    57.6T  70.6T  5.24K      0  33.5M      0
STORAGE1    57.6T  70.6T  6.69K      0  45.2M      0
STORAGE1    57.6T  70.6T  6.58K      0  42.6M      0
STORAGE1    57.6T  70.6T  6.54K      0  43.5M      0
STORAGE1    57.6T  70.6T  6.41K      0  47.1M      0
STORAGE1    57.6T  70.6T  6.75K      0  46.0M      0
STORAGE1    57.6T  70.6T  7.09K      0  44.5M      0
STORAGE1    57.6T  70.6T  6.91K      0  38.7M      0
STORAGE1    57.6T  70.6T  6.85K      0  37.3M      0
STORAGE1    57.6T  70.6T  6.75K      0  38.9M      0
STORAGE1    57.6T  70.6T  5.27K      0  30.0M      0
STORAGE1    57.6T  70.6T  5.10K      0   105M      0
STORAGE1    57.6T  70.6T  2.66K    134  44.3M  1.74M
STORAGE1    57.6T  70.6T  2.56K    300  16.7M  1.77M
STORAGE1    57.6T  70.6T  4.69K      0  29.3M      0
STORAGE1    57.6T  70.6T  6.76K      0  46.0M      0
STORAGE1    57.6T  70.6T  6.62K      0  41.8M      0
STORAGE1    57.6T  70.6T  6.72K      0  47.3M      0
STORAGE1    57.6T  70.6T  6.49K      0  49.0M      0
STORAGE1    57.6T  70.6T  6.84K      0  48.7M      0
STORAGE1    57.6T  70.6T  7.34K      0  46.1M      0
STORAGE1    57.6T  70.6T  7.11K      0  40.4M      0
STORAGE1    57.6T  70.6T  6.86K      0  37.3M      0
STORAGE1    57.6T  70.6T  6.74K      0  38.8M      0
STORAGE1    57.6T  70.6T  5.44K      0  32.2M      0
STORAGE1    57.6T  70.6T  4.21K      0  90.7M      0
STORAGE1    57.6T  70.6T  1.22K    282  25.1M  2.86M
STORAGE1    57.6T  70.6T  2.36K    554  15.4M  10.2M
STORAGE1    57.6T  70.6T  4.50K      0  27.7M      0
STORAGE1    57.6T  70.6T  6.69K      0  45.5M      0
STORAGE1    57.6T  70.6T  6.65K      0  41.8M      0
STORAGE1    57.6T  70.6T  6.76K      0  47.8M      0
STORAGE1    57.6T  70.6T  6.42K      0  45.6M      0
STORAGE1    57.6T  70.6T  6.42K      0  45.8M      0
STORAGE1    57.6T  70.6T  7.32K      0  46.4M      0
STORAGE1    57.6T  70.6T  7.19K      0  41.7M      0
STORAGE1    57.6T  70.6T  6.86K      0  36.8M      0
STORAGE1    57.6T  70.6T  6.89K      0  39.8M      0
STORAGE1    57.6T  70.6T  5.45K      0  32.3M      0
STORAGE1    57.6T  70.6T  5.06K      0   104M      0
STORAGE1    57.6T  70.6T  3.51K      0  49.9M      0
STORAGE1    57.6T  70.6T    798    703  5.19M  9.70M
STORAGE1    57.6T  70.6T  4.06K      0  25.5M      0
STORAGE1    57.6T  70.6T  5.95K      0  39.2M      0
STORAGE1    57.6T  70.6T  6.71K      0  44.2M      0

Please note that the pool does nothing particular during the day ... nothing at all in fact ... it only "works" during night while replicating ...
 
Last edited by a moderator:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
For the zpool iostat question ... well it seems to reflect the scrubbing :

Code:
zpool iostat STORAGE1 1 1000
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
STORAGE1    57.6T  70.6T  2.08K    181  47.1M  5.98M
STORAGE1    57.6T  70.6T  5.82K      0  33.9M      0
STORAGE1    57.6T  70.6T  4.76K      0  66.1M      0
STORAGE1    57.6T  70.6T  4.95K      0  88.6M      0
STORAGE1    57.6T  70.6T     52  1.09K   211K  14.4M
STORAGE1    57.6T  70.6T  3.77K     23  24.2M   166K
STORAGE1    57.6T  70.6T  5.24K      0  33.5M      0
STORAGE1    57.6T  70.6T  6.69K      0  45.2M      0
STORAGE1    57.6T  70.6T  6.58K      0  42.6M      0
STORAGE1    57.6T  70.6T  6.54K      0  43.5M      0
STORAGE1    57.6T  70.6T  6.41K      0  47.1M      0
STORAGE1    57.6T  70.6T  6.75K      0  46.0M      0
STORAGE1    57.6T  70.6T  7.09K      0  44.5M      0
STORAGE1    57.6T  70.6T  6.91K      0  38.7M      0
STORAGE1    57.6T  70.6T  6.85K      0  37.3M      0
STORAGE1    57.6T  70.6T  6.75K      0  38.9M      0
STORAGE1    57.6T  70.6T  5.27K      0  30.0M      0
STORAGE1    57.6T  70.6T  5.10K      0   105M      0
STORAGE1    57.6T  70.6T  2.66K    134  44.3M  1.74M
STORAGE1    57.6T  70.6T  2.56K    300  16.7M  1.77M
STORAGE1    57.6T  70.6T  4.69K      0  29.3M      0
STORAGE1    57.6T  70.6T  6.76K      0  46.0M      0
STORAGE1    57.6T  70.6T  6.62K      0  41.8M      0
STORAGE1    57.6T  70.6T  6.72K      0  47.3M      0
STORAGE1    57.6T  70.6T  6.49K      0  49.0M      0
STORAGE1    57.6T  70.6T  6.84K      0  48.7M      0
STORAGE1    57.6T  70.6T  7.34K      0  46.1M      0
STORAGE1    57.6T  70.6T  7.11K      0  40.4M      0
STORAGE1    57.6T  70.6T  6.86K      0  37.3M      0
STORAGE1    57.6T  70.6T  6.74K      0  38.8M      0
STORAGE1    57.6T  70.6T  5.44K      0  32.2M      0
STORAGE1    57.6T  70.6T  4.21K      0  90.7M      0
STORAGE1    57.6T  70.6T  1.22K    282  25.1M  2.86M
STORAGE1    57.6T  70.6T  2.36K    554  15.4M  10.2M
STORAGE1    57.6T  70.6T  4.50K      0  27.7M      0
STORAGE1    57.6T  70.6T  6.69K      0  45.5M      0
STORAGE1    57.6T  70.6T  6.65K      0  41.8M      0
STORAGE1    57.6T  70.6T  6.76K      0  47.8M      0
STORAGE1    57.6T  70.6T  6.42K      0  45.6M      0
STORAGE1    57.6T  70.6T  6.42K      0  45.8M      0
STORAGE1    57.6T  70.6T  7.32K      0  46.4M      0
STORAGE1    57.6T  70.6T  7.19K      0  41.7M      0
STORAGE1    57.6T  70.6T  6.86K      0  36.8M      0
STORAGE1    57.6T  70.6T  6.89K      0  39.8M      0
STORAGE1    57.6T  70.6T  5.45K      0  32.3M      0
STORAGE1    57.6T  70.6T  5.06K      0   104M      0
STORAGE1    57.6T  70.6T  3.51K      0  49.9M      0
STORAGE1    57.6T  70.6T    798    703  5.19M  9.70M
STORAGE1    57.6T  70.6T  4.06K      0  25.5M      0
STORAGE1    57.6T  70.6T  5.95K      0  39.2M      0
STORAGE1    57.6T  70.6T  6.71K      0  44.2M      0

Please note that the pool does nothing particular during the day ... nothing at all in fact ... it only "works" during night while replicating ...
Can you get us the output with -q?
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
Can you get us the output with -q?
Sure, here it is :



Code:
zpool iostat -q STORAGE1 1 1000
              capacity     operations     bandwidth    syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool        alloc   free   read  write   read  write   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
STORAGE1    57.7T  70.6T  2.10K    180  47.0M  5.95M      0      0      0      0      0      0      0      0  5.54K     34      0      0
STORAGE1    57.7T  70.6T     6K      0  34.6M      0      0      0      0      0      0      0      0      0  5.41K     36      0      0
STORAGE1    57.7T  70.6T  5.12K      0  47.8M      0      0      0      0      0      0      0      0      0  5.56K     30      0      0
STORAGE1    57.7T  70.6T  4.86K      0   117M      0      0      0      0      0      0      0      0      0    869     20      0      0
STORAGE1    57.7T  70.6T    793    522  6.80M  7.51M      0      0      0      0      0      0     24      9      0      0      0      0
STORAGE1    57.7T  70.6T  6.80K     23  54.1M   165K      0      0      0      0      0      0      0      0  8.09K     36      0      0
STORAGE1    57.7T  70.6T  6.92K      0  55.2M      0      0      0      0      0      0      0      0      0  7.19K     36      0      0
STORAGE1    57.7T  70.6T  7.27K      0  47.2M      0      0      0      0      0      0      0      0      0  6.21K     36      0      0
STORAGE1    57.7T  70.6T  7.02K      0  42.3M      0      0      0      0      0      0      0      0      0  6.55K     36      0      0
STORAGE1    57.7T  70.6T  7.02K      0  39.5M      0      0      0      0      0      0      0      0      0  6.67K     36      0      0
STORAGE1    57.7T  70.6T  6.75K      0  36.9M      0      0      0      0      0      0      0      0      0  5.74K     35      0      0
STORAGE1    57.7T  70.6T  6.38K      0  35.2M      0      0      0      0      0      0      0      0      0  4.86K     36      0      0
STORAGE1    57.7T  70.6T  5.05K      0  30.2M      0      0      0      0      0      0      0      0      0  6.25K     35      0      0
STORAGE1    57.7T  70.6T  4.00K      0  94.5M      0      0      0      0      0      0      0      0      0  1.93K      9      0      0
STORAGE1    57.7T  70.6T    895    367  21.0M  3.71M      0      0      0      0      0      0      0      1      0      0      0      0
STORAGE1    57.7T  70.6T  5.00K    469  40.5M  10.6M      0      0      0      0      0      0      0      0  7.96K     36      0      0
STORAGE1    57.7T  70.6T  6.98K      0  54.8M      0      0      0      0      0      0      0      0      0  7.04K     36      0      0
STORAGE1    57.7T  70.6T  7.18K      0  50.3M      0      0      0      0      0      0      0      0      0  7.10K     35      0      0
STORAGE1    57.7T  70.6T  7.20K      0  42.9M      0      0      0      0      0      0      0      0      0  6.83K     36      0      0
STORAGE1    57.7T  70.6T  7.15K      0  41.1M      0      0      0      0      0      0      0      0      0  5.90K     36      0      0
STORAGE1    57.7T  70.6T  6.86K      0  37.3M      0      0      0      0      0      0      0      0      0  6.06K     33      0      0
STORAGE1    57.7T  70.6T  6.51K      0  36.5M      0      0      0      0      0      0      0      0      0  5.17K     36      0      0
STORAGE1    57.7T  70.6T  5.17K      0  29.4M      0      0      0      0      0      0      0      0      0  6.45K     36      0      0
STORAGE1    57.7T  70.6T  5.21K      0   110M      0      0      0      0      0      0      0      0      0  2.93K     15      0      0
STORAGE1    57.7T  70.6T  2.57K    100  37.5M  1.35M      0      0      0      0      0      0      9      2      0      0      0      0
STORAGE1    57.7T  70.6T  4.15K    363  33.7M  3.76M      0      0      0      0      0      0      0      0  8.14K     36      0      0
STORAGE1    57.7T  70.6T  7.00K      0  55.1M      0      0      0      0      0      0      0      0      0  7.20K     36      0      0
STORAGE1    57.7T  70.6T  7.10K      0  51.8M      0      0      0      0      0      0      0      0      0  6.91K     36      0      0
STORAGE1    57.7T  70.6T  7.29K      0  43.3M      0      0      0      0      0      0      0      0      0  5.55K     36      0      0
STORAGE1    57.7T  70.6T  7.02K      0  41.2M      0      0      0      0      0      0      0      0      0  6.98K     36      0      0
STORAGE1    57.7T  70.6T  6.93K      0  37.6M      0      0      0      0      0      0      0      0      0  6.29K     36      0      0
STORAGE1    57.7T  70.6T  6.50K      0  35.7M      0      0      0      0      0      0      0      0      0  5.42K     34      0      0
STORAGE1    57.7T  70.6T  5.47K      0  31.8M      0      0      0      0      0      0      0      0      0  5.11K     36      0      0
STORAGE1    57.7T  70.6T  4.96K      0  88.4M      0      0      0      0      0      0      0      0      0  2.86K     24      0      0
STORAGE1    57.7T  70.6T  4.42K      0  69.7M      0      0      0      0      0      0      0      0      0    151      9      0      0
STORAGE1    57.7T  70.6T  1.32K    443  8.51M  5.33M      0      0      0      0      0      0      0      0  5.09K     36      0      0
STORAGE1    57.7T  70.6T  7.11K      0  57.4M      0      0      0      0      0      0      0      0      0  7.88K     36      0      0
STORAGE1    57.7T  70.6T  6.79K      0  54.4M      0      0      0      0      0      0      0      0      0  7.40K     36      0      0
STORAGE1    57.7T  70.6T  7.27K      0  45.2M      0      0      0      0      0      0      0      0      0  7.35K     36      0      0
STORAGE1    57.7T  70.6T  6.93K      0  41.8M      0      0      0      0      0      0      0      0      0  7.17K     36      0      0
STORAGE1    57.7T  70.6T  6.97K      0  38.9M      0      0      0      0      0      0      0      0      0  6.64K     36      0      0
STORAGE1    57.7T  70.6T  6.71K      0  36.5M      0      0      0      0      0      0      0      0      0  5.83K     29      0      0
STORAGE1    57.7T  70.6T  6.22K      0  34.5M      0      0      0      0      0      0      0      0      0  5.25K     36      0      0
STORAGE1    57.7T  70.6T  5.04K      0  30.3M      0      0      0      0      0      0      0      0      0  5.76K     33      0      0
STORAGE1    57.7T  70.6T  3.81K      0  94.0M      0      0      0      0      0      0      0      0      0  1.64K      9      0      0
STORAGE1    57.7T  70.6T    801    358  19.3M  5.68M      0      0      0      0      0      0     65     11      0      0      0      0
STORAGE1    57.7T  70.6T  5.96K    170  47.6M  1.47M      0      0      0      0      0      0      0      0  7.64K     36      0      0
STORAGE1    57.7T  70.6T  6.99K      0  54.7M      0      0      0      0      0      0      0      0      0  7.07K     36      0      0
STORAGE1    57.7T  70.6T  7.14K      0  48.8M      0      0      0      0      0      0      0      0      0  6.61K     36      0      0
STORAGE1    57.7T  70.6T  7.08K      0  42.6M      0      0      0      0      0      0      0      0      0  5.88K     36      0      0
STORAGE1    57.7T  70.6T  7.00K      0  40.0M      0      0      0      0      0      0      0      0      0  6.26K     36      0      0


Don't know how many (how much time) you would like ... sorry for the spam if its too big ^^' and tell me if you need more (maybe as an attachment it would be better).
 
Last edited by a moderator:
Joined
Jun 15, 2022
Messages
674
To confirm what I originally said... that's a potential "worst case" scenario (lots of not filled, non-sequential blocks) that can explain why things would be slow and that would be normal... doesn't mean it can't be faster if IOPS are able to grab multiple sequential (completely filled) blocks.
How many IOPS is it actually pushing, anyway? What does zpool iostat say? Of course, iostat on the disks themselves may also be relevant, depending on how things go.

What kind of data? Large files? Databases? Small files?
Nailed it.


 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
Nailed it.


Hello ^^

I have no issue with those definitions, I think that the main issue here is the usage of some words, and some shortcuts that some may think when earing/reading something ...

I'll try to reformulate, again, just to be sure that no misunderstanding will persist (again) ^^ :
A given pool, by its setup / config, will have a given number of IOPS (Input and Output Per Seconds) on a given circumstance. Those IOPS can be quantified with a value which are inputs and outputs usable per seconds.
when operations (reading/writing) will be made on that pool, that amount, the inputs and outputs per seconds done on the pool, will be spread accross the drives that are inside the pool ... I could also say : during one second, inputs and outputs will happen over the drives composing that pool ...

That is why ... IOPS of a pool are spread accross the drives of that pool ... and that was exactly what I meant : not more, not less
I DID NOT meant that IOPS would be in some way increasing or cumulated or ... any other thing

Hope that helps removing some misunderstanding that some may had reading my early posts ^^' (really hope ... I'm honestly getting tired on this subject).
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
The length is fine, but I would appreciate it if you could use CODE Tags instead of QUOTE next time, to preserve whitespace. I've edited your posts accordingly.

Now, as for interpreting the output... You're 100% IOPS-bound, but that's not a surprise. I'm not sure if there's an easy way to explicitly get figures for actual IOPS, rather than sampling the queue lengths. I guess we can examine it at the disk level with iostat -x.

If the disks show 100-200 IOPS each, that's a sign that the problem is that the pool is too fragmented (keep in mind that the fragmentation value reported by ZFS is only Free Space Fragmentation, which impacts write performance but does not really impact reads directly).
If the disks show 10-20 IOPS each, that's a sign that the problem is more on the hardware layer - with the prime suspect at this time being the HBA.

As a point of comparison (sadly without direct iostat comparisons, because the machine is running a Pre-OpenZFS merger version), a pool with a single 12-wide RAIDZ2, with mostly large blocks - though with some non-performance-critical zvols thrown in, plus a lot of tiny-to-smallish files in the mix (maybe like 10 small files for each huge one) - with 10 TB HDDs and close to 80% full, is right now pushing well over 1.5 GB/s of issued scrub reads - and it's using up IOPS to fill up the scan buffer so that it can issue the reads sequentially. This corresponds to high 50s/low 60s read operations per second on every disk, plus a handful of write ops.

hen operations (reading/writing) will be made on that pool, that amount, the inputs and outputs per seconds done on the pool, will be spread accross the drives that are inside the pool ... I could also say : during one second, inputs and outputs will happen over the drives composing that pool ...
They're not spread. One request from ZFS's higher layers will cause one request per disk, with all disks contributing to the same, single request. Yes, they're smaller, but that does not help you, since the disks are limited by how many requests they can serve, regardless of how large they are.
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
So a quick update about the slow scrubbing running actually on our TrueNAS node with issues :
1692878998498.png


Speed of scrub is slowly decreasing (we started at around 50MBytes per sec ... we are now at 38.8 ...) ... no errors for now ...
The estimated time increases also ... seems to turn into a never ending story ...
For now, I have no other hypothesis/clues than the firmware card (maybe ... but will wait that the scrub ends to plan an export/import of the pool ... since it seems we should not be accessing disk data during the flashing ...)
I really cannot understand why those poor performances ... we have a non-TrueNAS with the exact same data type and with bigger data usage with more than 5 times the performances of this ...
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
They're not spread. One request from ZFS's higher layers will cause one request per disk, with all disks contributing to the same, single request. Yes, they're smaller, but that does not help you, since the disks are limited by how many requests they can serve, regardless of how large they are.
OMG ... do you mean only one IO is made per sec on a pool ? or do you mean in one sec only one drive will be asked on a pool ?

...

doesn't those two nonsense question gives you an hint on what I wrote ?
 
Last edited:

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
The length is fine, but I would appreciate it if you could use CODE Tags instead of QUOTE next time, to preserve whitespace. I've edited your posts accordingly.
roger that will do ^^
Now, as for interpreting the output... You're 100% IOPS-bound, but that's not a surprise. I'm not sure if there's an easy way to explicitly get figures for actual IOPS, rather than sampling the queue lengths. I guess we can examine it at the disk level with iostat -x.

If the disks show 100-200 IOPS each, that's a sign that the problem is that the pool is too fragmented (keep in mind that the fragmentation value reported by ZFS is only Free Space Fragmentation, which impacts write performance but does not really impact reads directly).
If the disks show 10-20 IOPS each, that's a sign that the problem is more on the hardware layer - with the prime suspect at this time being the HBA.
strange since zpool status -v shows no frag ... AS you said => it shows only free space fragmentation ... ok but how can a drive be fragmented on a COW system that does not delete data ?
1692880624459.png


As a point of comparison (sadly without direct iostat comparisons, because the machine is running a Pre-OpenZFS merger version), a pool with a single 12-wide RAIDZ2, with mostly large blocks - though with some non-performance-critical zvols thrown in, plus a lot of tiny-to-smallish files in the mix (maybe like 10 small files for each huge one) - with 10 TB HDDs and close to 80% full, is right now pushing well over 1.5 GB/s of issued scrub reads - and it's using up IOPS to fill up the scan buffer so that it can issue the reads sequentially. This corresponds to high 50s/low 60s read operations per second on every disk, plus a handful of write ops.
UPDATE: sorry, read a bit too quick, apologize on that one ... ok you meant a comparison ... again ... we are on a 44% used pool
 
Last edited:

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
The length is fine, but I would appreciate it if you could use CODE Tags instead of QUOTE next time, to preserve whitespace. I've edited your posts accordingly.

Now, as for interpreting the output... You're 100% IOPS-bound, but that's not a surprise. I'm not sure if there's an easy way to explicitly get figures for actual IOPS, rather than sampling the queue lengths. I guess we can examine it at the disk level with iostat -x.
hmmm ... not sure but man zpool-iostat under TrueNAS does not mention about a -x option ... ?
1692882005362.png

Oh ... you meant iostat directly this time ^^ let me look at it ^^
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
OMG ... do you mean only one IO is made per sec on a pool ? or do you mean in one sec only one drive will be asked on a pool ?

...

doesn't those two nonsense question gives you an hint on what I wrote ?
No, not at all. But one operation on the vdev means that every disk needs to do the same operation. Hence why you have the same IOPS capability as a single disk.
strange since zpool status -v shows no frag ... AS you said => it shows only free space fragmentation ... ok but how can a drive be fragmented on a COW system that does not delete data ?
You're not wrong, in the sense that your case does not sound like it should be pathological. It's not completely out of the question, but it doesn't really fit the picture you've painted.
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
No, not at all. But one operation on the vdev means that every disk needs to do the same operation. Hence why you have the same IOPS capability as a single disk.
But I never meant that a pool would have more, or less (or even anything related) IOPS capability than one drive ! that IS my point !
I just said that the inputs and outputs per seconds happening on a pool during a system operation (like a scrub), are spread among the 12 drives that composes the pool. I never, ever talk about the quantity of IOPS.
I said that the "n" IOs happening per sec (referring notably during the scrub operation) are spread between the 12 drives on the pool ...
Really sorry if you cannot understand what I'm trying to explain ... did my best ... ^^

You're not wrong, in the sense that your case does not sound like it should be pathological. It's not completely out of the question, but it doesn't really fit the picture you've painted.
Could you be more explicit on this one ? what does not fit what ?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Could you be more explicit on this one ? what does not fit what ?
Your description does not suggest a workload that would explain the poor performance.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
@Ulysse_31 I have seen the first page of this thread and I suggest reading the following resource.
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
@Ulysse_31 I have seen the first page of this thread and I suggest reading the following resource.
Hi Davvo ^^

Since now my account is not a "rookie" account anymore ^^' I'll be able to update that first post and add some additional informations that were discussed afterwards in an UPDATE status ^^
But let me resume it here, I'll use this occasion to give even more informations:
This server is a "droppin replacement" of an already existing zfs "backup" replication server: its config / shape / potential bottlenecks ... following our type of data : we setup the first "basic shape & config" of this server role in ... 2014 ... by "basic shape" I'm talking about the choice of using a poweredge server with a powervault SAS disk bay in raidz2, at that time, the OS was a freebsd 9, and our first tool scripts to make zfs syncs.
In 2017, since the production node was a solaris 11, and to avoid getting too distant versions of zfs, we decided to move the "zfs replication server" also to a solaris 11 OS.
So we built again, same structure, same shape : a poweredge server, a powervault drive bay, this time 12x 4Tb SAS drives, since this version worked just fine in tandem with the production server, we decided to prolong the lifetime of this last one a first time: when it arrived at 90% pool usage in 2019, we added a new powervault drive bay in daisy chain, and added a new raidz2 vdev to the pool. And we double down in 2021 with a third bay, and a third vdev, again, when it arrived at 90% pool usage.
With both hosts, either the freebsd or the solaris, we never had bottleneck issues during scrubs, those two servers did there job very well.
We recently moved the production server to a TrueNAS server, a M40-HA, in which we replicated the same data structure / data usage that was in the old production server also ... so the type of data is still the exact same type.
That is why we decided to give a try on a "zfs replication server" based on TrueNAS. So again, we took the same setup & hardware profile : a poweredge server, a powervault SAS drive bay, and 12x 12Tb drives this time.
Right now, on its last iteration, the Solaris 11 version, which is right now filled with 6 years of data retention, and 111Tb of data (85% pool usage) we are making scrubs with speeds of 489MBytes per sec.
I can totally understand that "depending on the data type and desired IO load, we need to select the pool & hardware setup accordingly", that of course makes total sense.
But we are talking here of an hardware setup and profile that was tested for our usage since pretty much time and we NEVER had issues ^^" so it is well sized for our usage.

I would like to add an extra side note : We are using zfs since ... a while now ... I touched my first zfs filesystem in solaris 8 (was in ... 2008? ^^' ) while it was owned by Sun ... and here in the company I work for since 2014. We use zfs in various other contexts, we discovered and had goods and bads (the good of snapshoting and replicating ... the bad of dedup :p ... stuff like that ...) We use zfs under linux with ZoL ... we use it in a proxmox cluster environment ... we even use zfs in openindiana on an archival system ... and ... on all those years of usage ... AND within our specific usage we never had a bottleneck like that during scrub ...

But let me add an update text into the first post to avoid other misleading conclusions ^^'
 
Last edited:

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Well solaris is not OpenZFS, which TN is based on, so there could be differences.
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
Well solaris is not OpenZFS, which TN is based on, so there could be differences.
Nor FreeBSD 9 right ^^' but I would not expect an 6 years old version of zfs (v31 /v28 freebsd / 5000 openzfs) to be more efficient on our "basic" usage ^^' than actual 2023 openzfs version ...
 

Ulysse_31

Dabbler
Joined
Aug 22, 2023
Messages
49
Nor FreeBSD 9 right ^^' but I would not expect an 6 years old version of zfs (v31 /v28 freebsd / 5000 openzfs) to be more efficient on our "basic" usage ^^' than actual 2023 openzfs version ...
and this is also why I'm clearly more incline on searching either a firmware (card or drive) potential issue, a bad cable or somewhat any hardware related issue ...
But my main concern in this is the difficulty finding real error message => for now, unless those messages from the SAS card, that are not an error but it being looping may be suspicious ... I do not have anything for now ...
if it was errors happing at read or write ... I would expect to have timeout errors somewhere ... oh well ... again, I'll have to wait for the scrub ending to plan a firmware card upgrade ...
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Hopefully this evening I will find the time to accurately read the complete thread and try to understand what's going on. Then I might be able to offer some help.
 
Top