Best practices: Zabbix disk performance monitoring Truenas Scale 22.02

Girl_0dmin · Jan 16, 2023

Hello all!
So, I've added my truenas on my zabbix server 6.0 LTS using SNMP monitoring. And I can't see disk await (r_await, w_await), disk queue (avgrq-sz, avgqu-sz):

Code:

$ iostat -yzx 5
Linux 2.6.32-642.13.1.el6.x86_64 (vagrant1)     04/01/2017      _x86_64_        (1 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.20    0.00    0.00   99.80

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda               0.00     0.20    0.00    2.20     0.00    19.24     8.73     0.00    2.00    0.00    2.00   0.45   0.10
dm-0              0.00     0.00    0.00    2.40     0.00    19.24     8.00     0.01    2.17    0.00    2.17   0.42   0.10

Code:

$ sar -d
04:50:01 PM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
05:00:01 PM    dev8-0      0.12      0.00      0.92      7.67      0.00      1.21      0.90      0.01
05:00:01 PM  dev253-0      0.12      0.00      0.92      8.00      0.00      1.45      0.94      0.01
05:00:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:10:01 PM    dev8-0      0.14      0.00      1.07      7.90      0.00      1.05      0.80      0.01
05:10:01 PM  dev253-0      0.13      0.00      1.07      8.00      0.00      1.18      0.81      0.01
05:10:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:20:01 PM    dev8-0      0.11      0.00      0.79      7.26      0.00      1.52      1.05      0.01
05:20:01 PM  dev253-0      0.10      0.00      0.79      8.00      0.00      2.19      1.15      0.01
05:20:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:30:01 PM    dev8-0      0.12      0.00      0.97      7.89      0.00      1.22      0.93      0.01
05:30:01 PM  dev253-0      0.12      0.00      0.97      8.00      0.00      1.42      0.95      0.01
05:30:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:40:01 PM    dev8-0      0.12      0.00      0.84      7.20      0.00      0.96      0.77      0.01
05:40:01 PM  dev253-0      0.11      0.00      0.84      8.00      0.00      1.19      0.86      0.01
05:40:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
05:50:01 PM    dev8-0      0.11      0.00      0.84      7.75      0.00      1.31      0.94      0.01
05:50:01 PM  dev253-0      0.11      0.00      0.84      8.00      0.00      2.03      0.97      0.01
05:50:01 PM  dev253-1      0.00      0.00      0.00      0.00      0.00      0.00

Is it possible to install zabbix-agent on TrueNas SCALE 22.02? Is it necessary to uninstall zabbix-agent before upgrade TrueNas scale 22.02 > 22.12 ? What are the Best Practices?

tessierp · Feb 26, 2023

I would also appreciate to know what would be the best step by step installation process on TrueNAS.

I have installed Zabbix on many Debian Servers I own and it is quite easy. APT-GET doesn't seem to be available on TrueNAS and so the only way would be to WGET the zabbix agent package and install that. Although this is quite easy to do, I am wondering if this would be recommended. I just do not want to cause any issues with my TrueNAS system. Basic SNMP monitoring is enabled already however, the only way to monitor CPU temps is by installing Zabbix Agent 2.

As anyone done this and is it recommended?

Perhaps it is something the TrueNAS Scale team could consider in the future and adding Zabbix Agent 2 support.

Thanks

Basserra · Mar 16, 2023

No, we will not get Zabbix support, unless you submit a feature request through JIRA and ix likes it/you.
I have been running the official package from Zabbix on TNScale for months with no issues, it also works with TNCore (FreeBSD). I created a dataset with the package, a custom conf file (for temp monitoring & custom address), and a script. Real simple script to install the repo, RE-enable apt, install the package, disable apt, link a custom conf file, restart service, profit. I set it as a 'post-init' script, it doesn't require interaction, and I'm pretty sure it persisted through upgrading 22.02 -> 22.12

#!/bin/bash

dpkg -i /mnt/poolname/docker/zabbix/zabbix-release_6.0-4+debian11_all.deb

chmod +x /usr/bin/apt*

apt update

apt reinstall -y zabbix-agent2

chmod -x /usr/bin/apt*

ln -s /mnt/poolname/docker/zabbix/custom.conf /etc/zabbix/zabbix_agent2.d/

systemctl restart zabbix-agent2

tessierp · Mar 17, 2023

Hi Basserra,

Thanks for sharing your script. The steps in there are pretty much the steps I take to install Zabbix on any Debian.

I'm not sure I follow the part of the Dataset.. You installed the configuration for Zabbix on that dataset as opposed to directly on TrueNAS Scale's system?

Basserra · Mar 17, 2023

tessierp said:
I'm not sure I follow the part of the Dataset.. You installed the configuration for Zabbix on that dataset as opposed to directly on TrueNAS Scale's system?

No, it gets installed to the OS through apt as usual, post TN init. So, I downloaded the repo from zabbix.com/download and keep that file in a dataset for persistance, it can be anywhere, dpkg is using this so you don't have to download it each time. You can upload it to TN from your PC or wget from TN ssh. And I keep the script and a custom .conf file in there for organization.

├── poolname
│ ├── dataset
│ │ ├── zabbix-release_6.0-4+debian11_all.deb
│ │ ├── zabbix.sh
│ │ └── custom.conf

Below is a rough/unfinished conf file I use. Note: That you can get the temp readings any way you want or any other info for that matter, this is just what I found in the past and has been working fine for me. You can run the commands yourself in ssh on a host to see if they're returning the correct values or not, or just start with sensors to see it all. Rather simple sensors|grep|cut just to get the numbers, then, in Zabbix I had to manually 'Create Items' in each host, where the 'Key' matches the UserParameter=CPU.Temp.X and you can copy/paste this °C to the 'Units' line to make it look pretty. I can provide a picture of Zabbix UI for reference if you want. I've used this/similar for temps on TNScale, TNCore, pfSense, & Proxmox. The "Server=" is the IP range for my k8s containers which is required on the TNScale host running Zabbix Server in a container, otherwise it's the standard IP/Port for other hosts.

### Standard zabbix.conf settings ###
SourceIP=10.1.0.71
Server=172.17.0.0/16 ### IP range of k8s containers specific only to Host of containers ###
ListenIP=10.1.0.71
ServerActive=10.1.0.71:30410
Hostname=DataBass

### Custom Parameters for Temperatures ###
UnsafeUserParameters=1 ### Required for all below ###

### AMD ###
UserParameter=CPU.Temp.ctl,sensors | grep "Tctl" | cut -c16-17
UserParameter=CPU.Temp.die,sensors | grep "Tdie" | cut -c16-17
UserParameter=CPU.Temp.ccd1,sensors | grep "Tccd1" | cut -c16-17
UserParameter=CPU.Temp.ccd3,sensors | grep "Tccd3" | cut -c16-17
UserParameter=CPU.Temp.ccd5,sensors | grep "Tccd5" | cut -c16-17
UserParameter=CPU.Temp.ccd7,sensors | grep "Tccd7" | cut -c16-17

### Intel ###
#UserParameter=CPU.Temp.Avg,sensors | tail -n 5 | head -n 4 | awk -F'[:+°]' '{avg+=$3}END{print avg/NR}' ### Avg temp method 1 ###
#UserParameter=CPU.Temp.Avg,sensors | awk '/^Core /{++r; gsub(/[^[:digit:]]+/, "", $3); s+=$3} END{print s/(10*r)}' ### Avg temp method 2 ###
#UserParameter=CPU.Temp.0,sensors | grep "Core 0" | cut -c17-19
#UserParameter=CPU.Temp.1,sensors | grep "Core 1" | cut -c17-19
#UserParameter=CPU.Temp.2,sensors | grep "Core 2" | cut -c17-19
#UserParameter=CPU.Temp.3,sensors | grep "Core 3" | cut -c17-19
#UserParameter=CPU.Temp.4,sensors | grep "Core 4" | cut -c17-19
#UserParameter=CPU.Temp.5,sensors | grep "Core 5" | cut -c17-19
#UserParameter=CPU.Temp.6,sensors | grep "Core 6" | cut -c17-19
#UserParameter=CPU.Temp.7,sensors | grep "Core 7" | cut -c17-19

### NVMe SSDs - UNFINISHED ###
UserParameter=nvme.temp.sabrent,nvme smart-log /dev/disk/by-id/nvme-Sabrent_serialNumber | grep -i '^temperature' | awk '{print $3}'
#UserParameter=nvme.temp.samsung,nvme smart-log /dev/disk/by-id/nvme-Samsung_SSD_950_PRO_256GB_serialNumber | grep -i '^temperature' | awk '{print $3}'

### SATA HDDs - UNFINISHED ###
#UserParameter=hdd.temp.1,hddtemp /dev/disk/by-id/ata-HGST_HUS726040ALE610_serialNumber | awk '{print $4}' | cut -c1-2

EDIT: I mistakenly hit Ctrl+Enter and posted before finishing the write up.

tessierp · Mar 26, 2023

Basserra said:
No, it gets installed to the OS through apt as usual, post TN init. So, I downloaded the repo from zabbix.com/download and keep that file in a dataset for persistance, it can be anywhere, dpkg is using this so you don't have to download it each time. You can upload it to TN from your PC or wget from TN ssh. And I keep the script and a custom .conf file in there for organization.

├── poolname
│ ├── dataset
│ │ ├── zabbix-release_6.0-4+debian11_all.deb
│ │ ├── zabbix.sh
│ │ └── custom.conf

Below is a rough/unfinished conf file I use. Note: That you can get the temp readings any way you want or any other info for that matter, this is just what I found in the past and has been working fine for me. You can run the commands yourself in ssh on a host to see if they're returning the correct values or not, or just start with sensors to see it all. Rather simple sensors|grep|cut just to get the numbers, then, in Zabbix I had to manually 'Create Items' in each host, where the 'Key' matches the UserParameter=CPU.Temp.X and you can copy/paste this °C to the 'Units' line to make it look pretty. I can provide a picture of Zabbix UI for reference if you want. I've used this/similar for temps on TNScale, TNCore, pfSense, & Proxmox. The "Server=" is the IP range for my k8s containers which is required on the TNScale host running Zabbix Server in a container, otherwise it's the standard IP/Port for other hosts.

### Standard zabbix.conf settings ###
SourceIP=10.1.0.71
Server=172.17.0.0/16 ### IP range of k8s containers specific only to Host of containers ###
ListenIP=10.1.0.71
ServerActive=10.1.0.71:30410
Hostname=DataBass

### Custom Parameters for Temperatures ###
UnsafeUserParameters=1 ### Required for all below ###

### AMD ###
UserParameter=CPU.Temp.ctl,sensors | grep "Tctl" | cut -c16-17
UserParameter=CPU.Temp.die,sensors | grep "Tdie" | cut -c16-17
UserParameter=CPU.Temp.ccd1,sensors | grep "Tccd1" | cut -c16-17
UserParameter=CPU.Temp.ccd3,sensors | grep "Tccd3" | cut -c16-17
UserParameter=CPU.Temp.ccd5,sensors | grep "Tccd5" | cut -c16-17
UserParameter=CPU.Temp.ccd7,sensors | grep "Tccd7" | cut -c16-17

### Intel ###
#UserParameter=CPU.Temp.Avg,sensors | tail -n 5 | head -n 4 | awk -F'[:+°]' '{avg+=$3}END{print avg/NR}' ### Avg temp method 1 ###
#UserParameter=CPU.Temp.Avg,sensors | awk '/^Core /{++r; gsub(/[^[:digit:]]+/, "", $3); s+=$3} END{print s/(10*r)}' ### Avg temp method 2 ###
#UserParameter=CPU.Temp.0,sensors | grep "Core 0" | cut -c17-19
#UserParameter=CPU.Temp.1,sensors | grep "Core 1" | cut -c17-19
#UserParameter=CPU.Temp.2,sensors | grep "Core 2" | cut -c17-19
#UserParameter=CPU.Temp.3,sensors | grep "Core 3" | cut -c17-19
#UserParameter=CPU.Temp.4,sensors | grep "Core 4" | cut -c17-19
#UserParameter=CPU.Temp.5,sensors | grep "Core 5" | cut -c17-19
#UserParameter=CPU.Temp.6,sensors | grep "Core 6" | cut -c17-19
#UserParameter=CPU.Temp.7,sensors | grep "Core 7" | cut -c17-19

### NVMe SSDs - UNFINISHED ###
UserParameter=nvme.temp.sabrent,nvme smart-log /dev/disk/by-id/nvme-Sabrent_serialNumber | grep -i '^temperature' | awk '{print $3}'
#UserParameter=nvme.temp.samsung,nvme smart-log /dev/disk/by-id/nvme-Samsung_SSD_950_PRO_256GB_serialNumber | grep -i '^temperature' | awk '{print $3}'

### SATA HDDs - UNFINISHED ###
#UserParameter=hdd.temp.1,hddtemp /dev/disk/by-id/ata-HGST_HUS726040ALE610_serialNumber | awk '{print $4}' | cut -c1-2

EDIT: I mistakenly hit Ctrl+Enter and posted before finishing the write up.

Thanks for sharing this.

This is more or less what I have done with my Proxmox servers. The only reason I haven't done so with my TrueNAS Scale server is that I am concerned this could be an issue in the future as TrueNAS tries to upgrade by installing "foreign" packages like Sensor and Zabbix agents that it doesn't recognize; that it could get in the way. Furthermore, my CPU is an AMD Ryzen 7 3700X and I have to edit /etc/default/grub and add the following : GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_enforce_resources=lax", the ACPI_ENFORCE_RESOURCES=lax. I guess that is the part that worries me the most.

Normally I would try it but given I have important data on my NAS and even if I do regular backups, I just don't want to find myself faced with rebuilding everything from scratch.

Basserra · Mar 26, 2023

Well lm-sensors & nvme smart-log should be a part of TrueNAS already, but not hddtemp, and you could always apt remove the Zabbix stuff before an upgrade. I admit I hate needing the extra Zabbix-agent-plugin packages that comes with the agent package, but it wasn't reporting correctly without them. I think there might be a docker container for the Zabbix-agent that might be a fine, non-volatile option too. I also tried using glances in a similar fashion to my Zabbix setup (also available as a container), but it wasn't reporting everything correctly for me either, maybe it was missing some AMD/EPYC compatibilty. We have netdata available through a container, being the only 'supported' method outside of SNMP (like a jail on TNCore), but I'm not keen on using containers for monitoring the host. Finally, we might be able to post-init script building packages from source into a dataset, and script manually adding/removing the services, just a thought I've had but seems too extreme. Or just script SSH commands to get numbers or whatever in a fully custom setup/logging. I hope you can find something that suits your setup.

tessierp said:
I have to edit /etc/default/grub and add the following : GRUB_CMDLINE_LINUX_DEFAULT="quiet acpi_enforce_resources=lax"

It's not that important, but could I ask what this is needed for? I have a Ryzen 7 2700 with XCP-ng running on it, but it's very stock settings and I've never tried to do anything special with it. I just want that system as stable as possible.

tessierp · Mar 31, 2023

Basserra said:
Well lm-sensors & nvme smart-log should be a part of TrueNAS already, but not hddtemp, and you could always apt remove the Zabbix stuff before an upgrade. I admit I hate needing the extra Zabbix-agent-plugin packages that comes with the agent package, but it wasn't reporting correctly without them. I think there might be a docker container for the Zabbix-agent that might be a fine, non-volatile option too. I also tried using glances in a similar fashion to my Zabbix setup (also available as a container), but it wasn't reporting everything correctly for me either, maybe it was missing some AMD/EPYC compatibilty. We have netdata available through a container, being the only 'supported' method outside of SNMP (like a jail on TNCore), but I'm not keen on using containers for monitoring the host. Finally, we might be able to post-init script building packages from source into a dataset, and script manually adding/removing the services, just a thought I've had but seems too extreme. Or just script SSH commands to get numbers or whatever in a fully custom setup/logging. I hope you can find something that suits your setup.

It's not that important, but could I ask what this is needed for? I have a Ryzen 7 2700 with XCP-ng running on it, but it's very stock settings and I've never tried to do anything special with it. I just want that system as stable as possible.

Sorry for the late answer. Firstly to answer your question about acpi_enforce_resources=lax, you will find your answer here : https://askubuntu.com/questions/1164206/lm-sensors-and-amd-ryzen-x570-chipset

I am using the X570D4U-2L2T motherboard on two of my servers and that includes my TrueNAS system. I had issues to get proper sensor reporting without it. Not sure if that is still the case with the newer kernels though. But anyway, the problem is that there is no /etc/default/grub file that exists in TrueNAS scale. So adding my own custom GRUB_CMDLINE_LINUX_DEFAULT parameter wouldn't work, at least I am not sure how I would be able to do that with TrueNAS scale. In /etc/default/grub.d there is a truenas.cfg file but if I add something to GRUB_CMDLINE_LINUX_DEFAULT, it gets removed after reboot. So not sure how to get sensor information. Right now I get almost no data. Here is what is reported on my TrueNAS system :

k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +30.2°C
Tccd1: +30.5°C

On my Promox server where I can edit that grub file I get a lot more after enable LAX :

nct6798-isa-0290
Adapter: ISA adapter
in0: 520.00 mV (min = +0.00 V, max = +1.74 V)
in1: 1.67 V (min = +0.00 V, max = +0.00 V) ALARM
in2: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM
in3: 3.31 V (min = +0.00 V, max = +0.00 V) ALARM
in4: 1.79 V (min = +0.00 V, max = +0.00 V) ALARM
in5: 1.01 V (min = +0.00 V, max = +0.00 V) ALARM
in6: 1.22 V (min = +0.00 V, max = +0.00 V) ALARM
in7: 3.33 V (min = +0.00 V, max = +0.00 V) ALARM
in8: 3.18 V (min = +0.00 V, max = +0.00 V) ALARM
in9: 1.66 V (min = +0.00 V, max = +0.00 V) ALARM
in10: 392.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in11: 352.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in12: 1.02 V (min = +0.00 V, max = +0.00 V) ALARM
in13: 920.00 mV (min = +0.00 V, max = +0.00 V) ALARM
in14: 2.02 V (min = +0.00 V, max = +0.00 V) ALARM
fan1: 0 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
fan7: 0 RPM (min = 0 RPM)
SYSTIN: -59.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
CPUTIN: +99.5°C (high = +80.0°C, hyst = +75.0°C) ALARM sensor = thermistor
AUXTIN0: +15.0°C sensor = thermistor
AUXTIN1: +38.0°C sensor = thermistor
AUXTIN2: +42.0°C sensor = thermistor
AUXTIN3: -59.0°C sensor = thermistor
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
PCH_MCH_TEMP: +0.0°C
intrusion0: ALARM
intrusion1: ALARM
beep_enable: disabled

k10temp-pci-00c3
Adapter: PCI adapter
Tctl: +44.1°C
Tccd1: +43.5°C
Tccd2: +44.0°C

HOWEVER, my fans are not reported here for some weird reason. But I know they work since my IPMI interface does report FAN speeds.

What you suggest could be an option provided there is a way to install a container on TrueNAS that would have access to the system's sensors. Otherwise the other options are not as interesting; sure I could uninstall the agent before an upgrade but then I could forget. This is the kind of thing I prefer not to have to worry about. And I do understand why we need the agent, I mean it just supports a lot more reporting options compared to SNMP but yes unfortunately it is one more package to install on every machine.

IMO the best way would be for the TrueNAS Scale team to include Zabbix support but like you said perhaps that will never happen.

Important Announcement for the TrueNAS Community.

Best practices: Zabbix disk performance monitoring Truenas Scale 22.02

Girl_0dmin

Cadet

tessierp

Dabbler

Basserra

Dabbler

tessierp

Dabbler

Basserra

Dabbler

tessierp

Dabbler

Basserra

Dabbler

tessierp

Dabbler

Similar threads