Troubleshooting extremely slow pool with almost no disk i/o.

Derkades

Dabbler
Joined
Aug 6, 2018
Messages
16
Usually when I transfer files to or from my raidz1 pool I get around ~80MB/s speed, with zpool iostat 1 reporting 500-4k IOPS. However, right now the pool seems to be very slow. I can't playback video smoothly and FTP transfers are at a few kilobytes per second. No scrub is running, and zpool iostat 1 reports almost no io, for example
Code:
read operations: 41
write operations: 0
read bandwidth: 679K
write bandwidth: 37.6K


htop reports a system load of 0.68, memory usage 8G/20G and cpu usage 0-10%.

zpool get fragmentation reports a fragmentation of 36%.

I can read/write at good speeds to my other pools.

What other factors should I consider? Everything looks fine to me but there must be a reason why it's slow.

TL;DR: Low cpu usage, memory usage, disk io. Pool is very slow (1-20KB/s). Other pools with similar disk io (and obviously same memory/cpu usage) are fine.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Please tell me about the kind of drives you are using in the suspect pool and how those drives are connected to the system?

You also might want to look at zpool iostat -v because it will show the IO to individual drives. I used this a while back to identify slow drives in my pool that were slowing down the entire pool.
It looks like this:
Code:
zpool iostat -v
                                           capacity     operations    bandwidth
pool                                    alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
Backup                                  15.7T  27.8T    647    661  76.5M  76.7M
  raidz1                                7.82T  13.9T    321    328  38.0M  38.1M
    gptid/2e919d3d-2c1a-11e9-af8b-00074306773b      -      -     38    101  13.1M  13.1M
    gptid/2f292da6-2c1a-11e9-af8b-00074306773b      -      -     32     75  13.0M  13.0M
    gptid/2fb95d07-2c1a-11e9-af8b-00074306773b      -      -     40    101  13.1M  13.1M
    gptid/30514e6b-2c1a-11e9-af8b-00074306773b      -      -     32     75  13.0M  13.0M
  raidz1                                7.92T  13.8T    326    332  38.5M  38.6M
    gptid/41d3312f-2c1a-11e9-af8b-00074306773b      -      -     34    103  13.2M  13.3M
    gptid/426b7b47-2c1a-11e9-af8b-00074306773b      -      -     30     76  13.2M  13.2M
    gptid/43029d18-2c1a-11e9-af8b-00074306773b      -      -     38    102  13.2M  13.3M
    gptid/43aa9a5b-2c1a-11e9-af8b-00074306773b      -      -     33     76  13.2M  13.2M
--------------------------------------  -----  -----  -----  -----  -----  -----
Emily                                   17.1T  26.4T    164     44  19.6M   497K
  raidz2                                9.42T  12.3T     91      7  10.8M   116K
    gptid/af7c42c6-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     46      2  1.86M  37.8K
    gptid/b07bc723-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     46      2  1.77M  38.7K
    gptid/b1893397-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     47      2  1.80M  37.2K
    gptid/b2bfc678-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     48      2  1.84M  37.7K
    gptid/b3c1849e-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     46      2  1.76M  38.7K
    gptid/b4d16ad2-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     47      2  1.81M  37.4K
  raidz2                                7.66T  14.1T     73     12  8.80M   141K
    gptid/bc1e50e5-c1fa-11e8-87f0-0cc47a9cd5a4      -      -     39      3  1.48M  51.0K
    gptid/a03dd690-c1fb-11e8-87f0-0cc47a9cd5a4      -      -     38      2  1.48M  47.5K
    gptid/a6ed2ed5-c240-11e8-87f0-0cc47a9cd5a4      -      -     38      2  1.48M  47.5K
    gptid/b9de3232-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     39      3  1.44M  51.0K
    gptid/baf4aba8-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     39      3  1.46M  47.3K
    gptid/bbf26621-bf05-11e8-b5f3-0cc47a9cd5a4      -      -     40      3  1.50M  47.4K
logs                                        -      -      -      -      -      -
  gptid/ae487c50-bec3-11e8-b1c8-0cc47a9cd5a4   116K  19.9G      0     25      1   240K
cache                                       -      -      -      -      -      -
  gptid/ae52d59d-bec3-11e8-b1c8-0cc47a9cd5a4  97.2G   902M      1      2  12.9K  2.33M
--------------------------------------  -----  -----  -----  -----  -----  -----
Irene                                   19.2T  24.3T     92      1  10.9M   161K
  raidz2                                9.54T  12.2T     46      0  5.43M  80.4K
    gptid/8710385b-becf-11e8-b1c8-0cc47a9cd5a4      -      -      3      0  1.36M  20.4K
    gptid/87e94156-becf-11e8-b1c8-0cc47a9cd5a4      -      -      2      0  1.36M  20.4K
    gptid/88db19ad-becf-11e8-b1c8-0cc47a9cd5a4      -      -      4      0  1.36M  20.4K
    gptid/89addd3b-becf-11e8-b1c8-0cc47a9cd5a4      -      -      4      0  1.36M  20.4K
    gptid/8a865453-becf-11e8-b1c8-0cc47a9cd5a4      -      -      2      0  1.36M  20.4K
    gptid/8b66b1ef-becf-11e8-b1c8-0cc47a9cd5a4      -      -      5      0  1.36M  20.4K
  raidz2                                9.64T  12.1T     46      0  5.49M  80.1K
    gptid/8c69bc72-becf-11e8-b1c8-0cc47a9cd5a4      -      -      4      0  1.37M  20.3K
    gptid/8d48655d-becf-11e8-b1c8-0cc47a9cd5a4      -      -      3      0  1.38M  20.3K
    gptid/8e2b6d1f-becf-11e8-b1c8-0cc47a9cd5a4      -      -      3      0  1.37M  20.3K
    gptid/8efea929-becf-11e8-b1c8-0cc47a9cd5a4      -      -      2      0  1.37M  20.3K
    gptid/8fd4d25c-becf-11e8-b1c8-0cc47a9cd5a4      -      -      4      0  1.38M  20.3K
    gptid/90c2759a-becf-11e8-b1c8-0cc47a9cd5a4      -      -      5      0  1.37M  20.3K
--------------------------------------  -----  -----  -----  -----  -----  -----
freenas-boot                            11.9G  25.4G      3     43  34.4K   287K
  mirror                                11.9G  25.4G      3     43  34.4K   287K
    gptid/f659fd6d-4b12-11e6-a97c-002590aecc79      -      -      0      8  28.1K   287K
    gptid/f6a61d33-4b12-11e6-a97c-002590aecc79      -      -      0      8  27.8K   287K
--------------------------------------  -----  -----  -----  -----  -----  -----
 

Derkades

Dabbler
Joined
Aug 6, 2018
Messages
16
I'm using 3TB WD REDs. The io stats don't seem to go up or down when I start or stop accessing files.

A few snapshots of iostat -v:

Code:
  raidz1                                8.69T  2.19T     21      0   345K      0
    gptid/aaa06e3f-98ba-11e8-8396-00259050d50e      -      -      0      0      0      0
    gptid/ab7b2ffa-98ba-11e8-8396-00259050d50e      -      -      0      0      0      0
    gptid/ac62c788-98ba-11e8-8396-00259050d50e      -      -      4      0  79.4K      0
    gptid/ad521081-98ba-11e8-8396-00259050d50e      -      -      5      0   151K      0

Code:
  raidz1                                8.69T  2.19T     26      0   420K  3.88K
    gptid/aaa06e3f-98ba-11e8-8396-00259050d50e      -      -      1      0  11.9K      0
    gptid/ab7b2ffa-98ba-11e8-8396-00259050d50e      -      -      0      0  3.98K      0
    gptid/ac62c788-98ba-11e8-8396-00259050d50e      -      -      0      3  7.96K  15.9K
    gptid/ad521081-98ba-11e8-8396-00259050d50e      -      -      1      3  83.5K  15.9K
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Have you run diagnostics on these drives? I have a feeling something is wrong with at least one of them. A drive can be slow because of a defect but still pass diagnostics. I had to replace three drives in my pool last year because of that.

Also, a RAIDz1 pool with only one vdev is only going to be as fast as the slowest drive in that vdev, so a single slow drive will slow the entire pool. I am really looking at gptid/ab7b2ffa-98ba-11e8-8396-00259050d50e with a lot of suspicion because of how low that number is but the others have pretty low numbers also.

Would you share the output of zpool list please?
 

Derkades

Dabbler
Joined
Aug 6, 2018
Messages
16
After a system reboot (which took a very long time, my ssh connection was still open 10 minutes after running the reboot command), it's working again. Here's the output of zpool list and zpool status (note that this is in a working state, but the output is the same as when I ran it before iirc):
Code:
root@freenas:~ # zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Opslag        10.9T  8.68T  2.19T        -         -    25%    79%  1.00x  ONLINE  /mnt
SSD            220G  93.5G   127G        -         -    30%    42%  1.00x  ONLINE  /mnt
freenas-boot   111G   768M   110G        -         -      -     0%  1.00x  ONLINE  -
root@freenas:~ # zpool status
  pool: Opslag
 state: ONLINE
  scan: scrub canceled on Sun Feb 10 21:03:57 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        Opslag                                          ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/aaa06e3f-98ba-11e8-8396-00259050d50e  ONLINE       0     0     0
            gptid/ab7b2ffa-98ba-11e8-8396-00259050d50e  ONLINE       0     0     0
            gptid/ac62c788-98ba-11e8-8396-00259050d50e  ONLINE       0     0     0
            gptid/ad521081-98ba-11e8-8396-00259050d50e  ONLINE       0     0     0

errors: No known data errors

  pool: SSD
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:08:44 with 0 errors on Sun Feb 10 21:10:17 2019
config:

        NAME                                          STATE     READ WRITE CKSUM
        SSD                                           ONLINE       0     0     0
          gptid/34e4dccc-1d8b-11e9-a27f-00259050d50e  ONLINE       0     0     0

errors: No known data errors

  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:06 with 0 errors on Thu Feb  7 03:45:07 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          ada5p2    ONLINE       0     0     0

errors: No known data errors
 

l@e

Contributor
Joined
Nov 4, 2013
Messages
143
Check your scrub time. Maybe the the system was scrubbing the pool. That can explain low perf on client side.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I see a potential problem with this:
Code:
root@freenas:~ # zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Opslag        10.9T  8.68T  2.19T        -         -    25%    79%  1.00x  ONLINE  /mnt

The pool capacity is at 79% and performance will drop significantly as it nears 90 and fall off a cliff at 90%. It is definitely time to consider expanding your storage.
What is zpool status -v showing now? No change?
 

Derkades

Dabbler
Joined
Aug 6, 2018
Messages
16
Check your scrub time. Maybe the the system was scrubbing the pool. That can explain low perf on client side.
Unfortunately that's not the issue, it's currently 01:07 (AM) and I stopped the scrub at 21:03 (09:30 PM).

I see a potential problem with this:
Code:
root@freenas:~ # zpool list
NAME           SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
Opslag        10.9T  8.68T  2.19T        -         -    25%    79%  1.00x  ONLINE  /mnt

The pool capacity is at 79% and performance will drop significantly as it nears 90 and fall off a cliff at 90%. It is definitely time to consider expanding your storage.
What is zpool status -v showing now? No change?
No change. Keep in mind that the status I posted earlier was after the reboot, when everything was working again. Indeed I am a bit low on storage, I don't have much time to go through my files and delete stuff right now, so I am just keeping it under 80% for now.
 

Derkades

Dabbler
Joined
Aug 6, 2018
Messages
16
I just remembered that I did a zfs send/receive from my main pool to my ssd pool, which I cancelled shortly afterwards (ctrl+c) when I realised that the zvol I was trying to move was 300gb (which is larger than the entire ssd). It didn't throw any errors, but it might have "left zfs in a weird state". This happened right around the time I started experiencing the issue
 

l@e

Contributor
Joined
Nov 4, 2013
Messages
143
Can you try a dd and check behavior on that pool.
 
Top