Prometheus
2024-10-24
Download and install
Go to https://prometheus.io/download/ and download the latest version.
export PROM_VER="2.54.0"
wget "https://github.com/prometheus/prometheus/releases/download/v${PROM_VER}/prometheus-${PROM_VER}.linux-amd64.tar.gz"
Verify the checksum is correct.
Unpack the tarball:
tar xvfz prometheus-*.tar.gz
rm prometheus-*.tar.gz
Create two directories for Prometheus to use. /etc/prometheus
for configuration files and /var/lib/prometheus
for application data.
sudo mkdir /etc/prometheus /var/lib/prometheus
Move the prometheus
and promtool
binaries to /usr/local/bin
:
cd prometheus-*
sudo mv prometheus promtool /usr/local/bin
Move the configuration file to the configuration directory:
sudo mv prometheus.yml /etc/prometheus/prometheus.yml
Move the remaining files to their appropriate directories:
sudo mv consoles/ console_libraries/ /etc/prometheus/
Verify that Prometheus is installed:
prometheus --version
Configure prometheus.service
Create a prometheus user and assign ownership to directories:
sudo useradd -rs /bin/false prometheus
sudo chown -R prometheus: /etc/prometheus /var/lib/prometheus
Save the following contents to a file at /etc/systemd/system/prometheus.service
:
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--log.level=info
[Install]
WantedBy=multi-user.target
Reload the system daemons:
sudo systemctl daemon-reload
Start and enable prometheus.service
:
sudo systemctl enable --now prometheus.service
For systems running SELinux, the following policy settings must be applied.
module prometheus 1.0;
require {
type init_t;
type websm_port_t;
type user_home_t;
type unreserved_port_t;
type hplip_port_t;
class file { execute execute_no_trans map open read };
class tcp_socket name_connect;
}
#============= init_t ==============
allow init_t hplip_port_t:tcp_socket name_connect;
allow init_t unreserved_port_t:tcp_socket name_connect;
allow init_t user_home_t:file { execute execute_no_trans map open read };
allow init_t websm_port_t:tcp_socket name_connect;
Now compile and import the module:
sudo checkmodule -M -m -o prometheus.mod prometheus.te
sudo semodule_package -o prometheus.pp -m prometheus.mod
sudo semodule -i prometheus.pp
Restart prometheus.service
. If it does not start, ensure all SELinux policies have been applied.
sudo grep "prometheus" /var/log/audit/audit.log | sudo audit2allow -M prometheus
sudo semodule -i prometheus.pp
Restart prometheus.service
again.
The Prometheus web interface and dashboard should now be browsable at http://localhost:9090
Install and configure Node Exporter on each client using Ansible
Install the prometheus.prometheus role from Ansible Galaxy.
ansible-galaxy collection install prometheus.prometheus
Ensure you have an inventory file with clients to setup Prometheus on.
---
prometheus-clients:
hosts:
host0:
ansible_user: user0
ansible_host: host0 ip address or hostname
ansible_python_interpreter: /usr/bin/python3
host1:
...
host2:
...
Create prometheus-setup.yml
.
---
- hosts: prometheus-clients
tasks:
- name: Import the node_exporter role
import_role:
name: prometheus.prometheus.node_exporter
The default values for the node_exporter role variables should be fine.
Run ansible-playbook.
ansible-playbook -i inventory.yml node_exporter-setup.yml
Node Exporter should now be installed, started, and enabled on each host with the homelab label in the inventory.
To confirm that statistics are being collected on each host, navigate to http://host_url:9100
. A page entitled Node Exporter should be displayed containing a link for Metrics. Click the link and confirm that statistics are being collected.
Note that each node_exporter host must be accessible through the firewall on port 9100. Firewalld can be configured for the internal
zone on each host.
sudo firewall-cmd --zone=internal --permanent --add-source=<my_ip_addr>
sudo firewall-cmd --zone=internal --permanent --add-port=9100/tcp
Note: I have to configure the
internal
zone on Firewalld to allow traffic from my IP address on ports HTTP, HTTPS, SSH, and 1965 in order to access, for example, my web services on the node_exporter host.
Install Node Exporter on FreeBSD
As of FreeBSD 14.1-RELEASE, the version of Node Exporter available, v1.6.1, is outdated. To install the latest version, ensure the ports tree is checked out before running the commands below.
sudo cp -v /usr/ports/sysutils/node_exporter/files/node_exporter.in /usr/local/etc/rc.d/node_exporter
sudo chmod +x /usr/local/etc/rc.d/node_exporter
sudo chown root:wheel /usr/local/etc/rc.d/node_exporter
sudo pkg install gmake go
Download the latest release’s source code from https://github.com/prometheus/node_exporter. Unpack the tarball.
tar xvf v1.8.2.tar.gz
cd node_exporter-1.8.2
gmake build
sudo mv node_exporter /usr/local/bin/
sudo chown root:wheel /usr/local/bin/node_exporter
sudo sysrc node_exporter_enable="YES"
sudo service node_exporter start
Configure Prometheus to monitor the client nodes
Edit /etc/prometheus/prometheus.yml
. My Prometheus configuration looks like this:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: "remote_collector"
scrape_interval: 10s
static_configs:
- targets: ["hyperreal.coffee:9100", "box.moonshadow.dev:9100", "10.0.0.26:9100", "bttracker.nirn.quest:9100"]
The job remote_collector
scrapes metrics from each of the hosts running the node_exporter. Ensure that port 9100
is open in the firewall, and if it is a public-facing node, ensure that port 9100
can only be accessed from my IP address.
Configure Prometheus to monitor qBittorrent client nodes
For each qBittorrent instance you want to monitor, setup a Docker or Podman container with https://github.com/caseyscarborough/qbittorrent-exporter. The containers will run on the machine running Prometheus so they are accessible at localhost. Let’s say I have three qBittorrent instances I want to monitor.
podman run \
--name=qbittorrent-exporter-0 \
-e QBITTORRENT_USERNAME=username0 \
-e QBITTORRENT_PASSWORD=password0 \
-e QBITTORRENT_BASE_URL=http://localhost:8080 \
-p 17871:17871 \
--restart=always \
caseyscarborough/qbittorrent-exporter:latest
podman run \
--name=qbittorrent-exporter-1 \
-e QBITTORRENT_USERNAME=username1 \
-e QBITTORRENT_PASSWORD=password1 \
-e QBITTORRENT_BASE_URL=https://qbittorrent1.tld \
-p 17872:17871 \
--restart=always \
caseyscarborough/qbittorrent-exporter:latest
podman run \
--name=qbittorrent-exporter-2 \
-e QBITTORRENT_USERNAME=username2 \
-e QBITTORRENT_PASSWORD=password2 \
-e QBITTORRENT_BASE_URL=https://qbittorrent2.tld \
-p 17873:17871 \
--restart=always \
caseyscarborough/qbittorrent-exporter:latest
Using systemd quadlets
[Unit]
Description=qbittorrent-exporter
After=network-online.target
[Container]
Image=docker.io/caseyscarborough/qbittorrent-exporter:latest
ContainerName=qbittorrent-exporter
Environment=QBITTORRENT_USERNAME=username
Environment=QBITTORRENT_PASSWORD=password
Environment=QBITTORRENT_BASE_URL=http://localhost:8080
PublishPort=17871:17871
[Install]
WantedBy=multi-user.target default.target
Now add this to the scrape_configs
section of /etc/prometheus/prometheus.yml
to configure Prometheus to scrape these metrics.
- job_name: "qbittorrent"
static_configs:
- targets: ["localhost:17871", "localhost:17872", "localhost:17873"]
Monitor Caddy with Prometheus and Loki
Caddy: metrics activation
Add the metrics
global option and ensure the admin endpoint is enabled.
{
admin 0.0.0.0:2019
servers {
metrics
}
}
Restart Caddy:
sudo systemctl restart caddy
sudo systemctl status caddy
Caddy: logs activation
I have my Caddy configuration modularized with /etc/caddy/Caddyfile
being the central file. It looks something like this:
{
admin 0.0.0.0:2019
servers {
metrics
}
}
## hyperreal.coffee
import /etc/caddy/anonoverflow.caddy
import /etc/caddy/breezewiki.caddy
import /etc/caddy/cdn.caddy
...
Each file that is imported is a virtual host that has its own separate configuration and corresponds to a subdomain of hyperreal.coffee. I have logging disabled on most of them except the ones for which troubleshooting with logs would be convenient, such as the one for my Mastodon instance. For /etc/caddy/fedi.caddy
, I’ve added these lines to enable logging:
fedi.hyperreal.coffee {
log {
output file /var/log/caddy/fedi.log {
roll_size 100MiB
roll_keep 5
roll_keep_for 100d
}
format json
level INFO
}
}
Restart caddy.
sudo systemctl restart caddy
sudo systemctl status caddy
Ensure port 2019
can only be accessed by my IP address, using Firewalld’s internal zone:
sudo firewall-cmd --zone=internal --permanent --add-port=2019/tcp
sudo firewall-cmd --reload
sudo firewall-cmd --info-zone=internal
Add the Caddy configuration to the scrape_configs
section of /etc/prometheus/prometheus.yml
:
- job_name: "caddy"
static_configs:
- targets: ["hyperreal.coffee:2019"]
Restart Prometheus on the monitor host:
sudo systemctl restart prometheus.service
Loki and Promtail setup
On the node running Caddy, install the loki and promtail packages:
sudo apt install -y loki promtail
Edit the Promtail configuration file at /etc/promtail/config.yml
:
- job_name: caddy
static_configs:
- targets:
- localhost
labels:
job: caddy
__path__: /var/log/caddy/*.log
agent: caddy-promtail
pipeline_stages:
- json:
expressions:
duration: duration
status: status
- labels:
duration:
status:
The entire Promtail configuration should look like this:
# This minimal config scrape only single log file.
# Primarily used in rpm/deb packaging where promtail service can be started during system init process.
# And too much scraping during init process can overload the complete system.
# https://github.com/grafana/loki/issues/11398
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://localhost:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
#NOTE: Need to be modified to scrape any additional logs of the system.
__path__: /var/log/messages
- job_name: caddy
static_configs:
- targets:
- localhost
labels:
job: caddy
__path__: /var/log/caddy/*log
agent: caddy-promtail
pipeline_stages:
- json:
expressions:
duration: duration
status: status
- labels:
duration:
status:
Restart Promtail and Loki services:
sudo systemctl restart promtail
sudo systemctl restart loki
To ensure that the promtail user has permissions to read caddy logs:
sudo usermod -aG caddy promtail
sudo chmod g+r /var/log/caddy/*.log
The Prometheus dashboard should now show the Caddy target with a state of “UP”.
Monitor TOR node
Edit /etc/tor/torrc
to add Metrics info. x.x.x.x
is the IP address where Prometheus is running.
## Prometheus exporter
MetricsPort 0.0.0.0:9035 prometheus
MetricsPortPolicy accept x.x.x.x
Configure FirewallD to allow inbound traffic to port 9035
on the internal zone. Ensure the internal zone’s source is the IP address of the server where Prometheus is running. Ensure port 443
is accessible from the Internet on FirewallD’s public zone.
sudo firewall-cmd --zone=internal --permanent --add-source=x.x.x.x
sudo firewall-cmd --zone=internal --permanent --add-port=9035/tcp
sudo firewall-cmd --zone=public --permanent --add-service=https
sudo firewall-cmd --reload
Edit /etc/prometheus/prometheus.yml
to add the TOR config. y.y.y.y
is the IP address where TOR is running.
scrape_configs:
- job_name: "tor-relay"
static_configs:
- targets: ["y.y.y.y:9035"]
Restart Prometheus.
sudo systemctl restart prometheus.service
Go to Grafana and import tor_stats.json as a new dashboard, using the Prometheus datasource.
Monitor Synapse homeserver
On the server running Synapase, edit /etc/matrix-synapse/homeserver.yaml
to enable metrics.
enable_metrics: true
Add a new listener to /etc/matrix-synapse/homeserver.yaml
for Prometheus metrics.
listeners:
- port: 9400
type: metrics
bind_addresses: ['0.0.0.0']
On the server running Prometheus, add a target for Synapse.
- job_name: "synapse"
scrape_interval: 1m
metrics_path: "/_synapse/metrics"
static_configs:
- targets: ["hyperreal:9400"]
Also add the Synapse recording rules.
rule_files:
- /etc/prometheus/synapse-v2.rules
On the server running Prometheus, download the Synapse recording rules.
sudo wget https://files.hyperreal.coffee/prometheus/synapse-v2.rules -O /etc/prometheus/synapse-v2.rules
Restart Prometheus.
Use synapse.json for Grafana dashboard.
Monitor Elasticsearch
On the host running Elasticsearch, download the latest binary from the GitHub releases.
tar xvf elasitcsearch_exporter*.tar.gz
cd elasticsearch_exporter*/
sudo cp -v elasticsearch_exporter /usr/local/bin/
Create /etc/systemd/system/elasticsearch_exporter.service
.
[Unit]
Description=elasticsearch exporter
After=network.target
[Service]
Restart=always
User=prometheus
ExecStart=/usr/local/bin/elasticsearch_exporter --es.uri=http://localhost:9200
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
SendSIGKILL=no
[Install]
WantedBy=multi-user.target
Reload the daemons and enable/start elasticsearch_exporter.
sudo systemctl daemon-reload
sudo systemctl enable --now elasticsearch_exporter.service
Ensure port 9114 is allowed in Firewalld’s internal zone.
sudo firewall-cmd --permanent --zone=internal --add-port=9114/tcp
sudo firewall-cmd --reload
If using Tailscale, ensure the host running Prometheus can access port 9114 on the host running Elasticsearch.
On the host running Prometheus, download the elasticsearch.rules.
wget https://raw.githubusercontent.com/prometheus-community/elasticsearch_exporter/refs/heads/master/examples/prometheus/elasticsearch.rules.yml
sudo mv elasticsearch.rules.yml /etc/prometheus/
Edit /etc/prometheus/prometheus.yml
to add the elasticsearch_exporter config.
rule_files:
- "/etc/prometheus/elasticsearch.rules.yml"
...
...
- job_name: "elasticsearch_exporter"
static_configs:
- targets: ["hyperreal:9114"]
Restart Prometheus.
sudo systemctl restart prometheus.service
For a Grafana dashboard, copy the contents of the file located here: https://files.hyperreal.coffee/grafana/elasticsearch.json.
Use HTTPS with Tailscale
If this step has been done already, skip it.
sudo tailscale certs HOSTNAME.TAILNET.ts.net
sudo mkdir /etc/tailscale-ssl-certs
sudo mv HOSTNAME.TAILNET.ts.net.crt HOSTNAME.TAILNET.ts.net.key /etc/tailscale-ssl-certs/
sudo chown -R root:root /etc/tailscale-ssl-certs
Ensure the prometheus.service
systemd file contains the --web.config.file
flag.
[Unit]
Description=Prometheus
Wants=network-online.target
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
RestartSec=5s
ExecStart=/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
--web.console.templates=/etc/prometheus/consoles \
--web.console.libraries=/etc/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090 \
--web.enable-lifecycle \
--web.config.file /etc/prometheus/web.yml \
--log.level=info
[Install]
WantedBy=multi-user.target
Create the file /etc/prometheus/web.yml
.
tls_server_config:
cert_file: /etc/prometheus/prometheus.crt
key_file: /etc/prometheus/prometheus.key
Copy the cert and key to /etc/prometheus
.
sudo cp -v /etc/tailscale-ssl-certs/HOSTNAME.TAILNET.ts.net.crt /etc/prometheus/prometheus.crt
sudo cp -v /etc/tailscale-ssl-certs/HOSTNAME.TAILNET.ts.net.key /etc/prometheus/prometheus.key
Ensure the permissions are correct on the web config, cert, and key.
sudo chown prometheus:prometheus /etc/prometheus/web.yml
sudo chown prometheus:prometheus /etc/prometheus/prometheus.crt
sudo chown prometheus:prometheus /etc/prometheus/prometheus.key
sudo chmod 644 /etc/prometheus/prometheus.crt
sudo chmod 644 /etc/prometheus/prometheus.key
Reload the daemons and restart Prometheus.
sudo systemctl daemon-reload
sudo systemctl restart prometheus.service