Monitoring ZFS with Grafana

Date: 2026-02-08
Host: grafana
Component: Prometheus + Grafana + exporters +zfs
Scope: ZFS monitoring

ZFS / Zpool Metrics for Prometheus (node_exporter + textfile collector on FreeBSD)

This guide sets up zpool capacity metrics (zpool list) for Prometheus/Grafana using node_exporter’s textfile collector on FreeBSD.

This complements (not replaces) zfs_exporter, which provides ARC/internal ZFS metrics but not pool capacity.


What I Got Working

Metrics exposed to Prometheus:

  • zpool_size_bytes{pool="zroot"}
  • zpool_alloc_bytes{pool="zroot"}
  • zpool_free_bytes{pool="zroot"}
  • zpool_capacity_ratio{pool="zroot"}

These map directly to:

zpool list

1. Ensure node_exporter is enabled with textfile collector

/etc/rc.conf

node_exporter_enable="YES"
node_exporter_args="--collector.textfile.directory=/var/db/node_exporter/textfile_collector"

Restart:

service node_exporter restart

Create the directory:

mkdir -p /var/db/node_exporter/textfile_collector

2. Create the Zpool Prometheus script

Save as:

/usr/local/bin/zpool_prom.sh
#!/bin/sh

OUT="/var/db/node_exporter/textfile_collector/zpool.prom.$$"
FINAL="/var/db/node_exporter/textfile_collector/zpool.prom"

# zpool list -Hp columns:
# name size alloc free ckpoint expandsz frag cap dedup health altroot

zpool list -Hp | awk '
BEGIN {
  print "# HELP zpool_size_bytes ZFS pool total size in bytes"
  print "# TYPE zpool_size_bytes gauge"
  print "# HELP zpool_alloc_bytes ZFS pool allocated bytes"
  print "# TYPE zpool_alloc_bytes gauge"
  print "# HELP zpool_free_bytes ZFS pool free bytes"
  print "# TYPE zpool_free_bytes gauge"
  print "# HELP zpool_capacity_ratio ZFS pool capacity used as ratio (0-1)"
  print "# TYPE zpool_capacity_ratio gauge"
}
{
  pool=$1; size=$2; alloc=$3; free=$4; cap=$8;
  gsub(/%/,"",cap);
  printf "zpool_size_bytes{pool=\"%s\"} %s\n", pool, size;
  printf "zpool_alloc_bytes{pool=\"%s\"} %s\n", pool, alloc;
  printf "zpool_free_bytes{pool=\"%s\"} %s\n", pool, free;
  printf "zpool_capacity_ratio{pool=\"%s\"} %.6f\n", pool, cap/100.0;
}' > "$OUT" && mv "$OUT" "$FINAL"

Make executable:

chmod +x /usr/local/bin/zpool_prom.sh

Test once:

/usr/local/bin/zpool_prom.sh
cat /var/db/node_exporter/textfile_collector/zpool.prom

3. Schedule the script (cron)

crontab -e

Add:

* * * * * /usr/local/bin/zpool_prom.sh >/dev/null 2>&1

This updates metrics once per minute.


4. Verify node_exporter exposes the metrics

curl -s http://127.0.0.1:9100/metrics | egrep '^zpool_(size|alloc|free|capacity)_'

You should see:

zpool_size_bytes{pool="zroot"} ...
zpool_alloc_bytes{pool="zroot"} ...
zpool_free_bytes{pool="zroot"} ...
zpool_capacity_ratio{pool="zroot"} ...

If not, node_exporter is not reading the textfile directory you configured.


5. Prometheus scrape config

Example prometheus.yml:

scrape_configs:
  - job_name: node
    static_configs:
      - targets:
          - clemente:9100

Reload Prometheus.


6. Grafana Queries (zpool list equivalent)

Use these PromQL queries:

Pool size

zpool_size_bytes

Allocated

zpool_alloc_bytes

Free

zpool_free_bytes

Capacity %

100 * zpool_capacity_ratio

Units:

  • Bytes → IEC (GiB)
  • Capacity → Percent (0–100)

Notes

  • node_filesystem_*zpool list
    • node_filesystem_* shows dataset/mountpoint view (df/statfs)
    • This setup exposes real pool-level capacity
  • zfs_exporter is useful for ARC/cache telemetry, not pool capacity.

Per‑Jail ZFS Usage (Bastille jails)

This section adds per‑jail disk usage using the ZFS datasets that back each jail: <pool>/bastille/jails/<jail>/root. This reports REFER (actual referenced data), USED (includes snapshots), and AVAIL per jail.

What You’ll Get (per jail)

  • zfs_jail_refer_bytes{jail="<jail>",dataset="<dataset>"}
  • zfs_jail_used_bytes{jail="<jail>",dataset="<dataset>"}
  • zfs_jail_avail_bytes{jail="<jail>",dataset="<dataset>"}

These map directly to:

zfs list -Hp -o name,used,avail,refer

1) Create the per‑jail exporter script

Save as:

/usr/local/bin/zfs_jails_prom.sh
#!/bin/sh

OUT="/var/db/node_exporter/textfile_collector/zfs_jails.prom.$$"
FINAL="/var/db/node_exporter/textfile_collector/zfs_jails.prom"

# Export per‑jail root datasets:
# <pool>/bastille/jails/<jail>/root

zfs list -Hp -o name,used,avail,refer | awk '
BEGIN {
  print "# HELP zfs_jail_used_bytes ZFS dataset USED bytes for the jail root dataset (includes snapshots)"
  print "# TYPE zfs_jail_used_bytes gauge"
  print "# HELP zfs_jail_avail_bytes ZFS dataset AVAIL bytes for the jail root dataset"
  print "# TYPE zfs_jail_avail_bytes gauge"
  print "# HELP zfs_jail_refer_bytes ZFS dataset REFER bytes for the jail root dataset (referenced data)"
  print "# TYPE zfs_jail_refer_bytes gauge"
}
$1 ~ /\\/bastille\\/jails\\/[^\\/]+\\/root$/ {
  ds=$1; used=$2; avail=$3; refer=$4;

  jail=ds;
  sub(/^.*\\/bastille\\/jails\\//, "", jail);
  sub(/\\/root$/, "", jail);

  printf "zfs_jail_used_bytes{jail=\\"%s\\",dataset=\\"%s\\"} %s\\n", jail, ds, used;
  printf "zfs_jail_avail_bytes{jail=\\"%s\\",dataset=\\"%s\\"} %s\\n", jail, ds, avail;
  printf "zfs_jail_refer_bytes{jail=\\"%s\\",dataset=\\"%s\\"} %s\\n", jail, ds, refer;
}' > "$OUT" && mv "$OUT" "$FINAL"

Make executable and test:

chmod +x /usr/local/bin/zfs_jails_prom.sh
/usr/local/bin/zfs_jails_prom.sh
tail -n 20 /var/db/node_exporter/textfile_collector/zfs_jails.prom

2) Schedule the script (cron)

* * * * * /usr/local/bin/zfs_jails_prom.sh >/dev/null 2>&1

3) Verify metrics are exposed by node_exporter

curl -s http://127.0.0.1:9100/metrics | egrep '^zfs_jail_(used|avail|refer)_bytes' | head

You should see one line per jail/dataset.

4) Grafana / PromQL examples

Actual jail footprint (REFER):

zfs_jail_refer_bytes{job="host"}

Includes snapshots (USED):

zfs_jail_used_bytes{job="host"}

Percent of pool allocated (approx):

100 * zfs_jail_refer_bytes{job="host"} / zpool_alloc_bytes{pool="zroot",job="host"}

Top consumers (bar gauge):

topk(10, zfs_jail_refer_bytes{job="host"})

Notes

  • This requires one dataset per jail (you have .../bastille/jails/<jail>/root).
  • ZFS cannot attribute pool ALLOC/FREE to jails; this reports dataset usage only.

Troubleshooting

No zpool metrics in /metrics:

  • Check node_exporter args:
    ps -auxww | grep [n]ode_exporter
    
  • Directory mismatch is the most common failure.

Wrong capacity %:

  • Ensure awk uses $8 for CAP from zpool list -Hp.

Done

You now have accurate, pool-level ZFS metrics in Prometheus/Grafana that match:

zpool list