pfSense DHCP Outage a day after installing NetGate Plus 26.03-RELEASE

Summary

On 2026-04-06 at approximately 09:38 EDT, the Kea DHCP service on pfSense (26.03-RELEASE) stopped serving leases on all interfaces. The root cause was a corruption of the <dhcpd> block in /cf/conf/config.xml, which caused pfSense to generate an empty Kea interfaces list. Service was restored by activating a previous ZFS boot environment.


Timeline

Time (EDT) Event
08:54 config.xml drops ~15KB — <dhcpd> block wiped (config-1775480086.xml > config-1775480087.xml)
08:54–09:37 Multiple config saves at reduced size; pfSense regenerates Kea config with empty interfaces each time
09:38 Kea begins logging DHCPSRV_NO_SOCKETS_OPEN; DHCP stops serving leases
09:38–09:41 Kea restarts repeatedly, fails each time
~09:45 Operator detects outage; accesses box via WireGuard VPN
~10:00 Investigation begins
~10:30 Root cause identified: empty <dhcpd> in config.xml
~10:45 Config restore from backup attempted; Kea config generator continues producing empty interfaces
~11:00 ZFS boot environment rollback to default_20260405014509 initiated
~11:10 Service restored

Root Cause

A pfSense package operation (pfBlockerNG or Suricata reload/apply) triggered a config write at 08:54 that silently cleared the entire <dhcpd> section from /cf/conf/config.xml. This is a pfSense 26.x bug: package-initiated config saves can clobber unrelated service configuration blocks.

With <dhcpd> empty, pfSense’s Kea config generator correctly (but fatally) produced:

"interfaces-config": {
    "interfaces": []
}

Kea had no interface to bind to and could not serve DHCP traffic.


Diagnosis

Initial indicators

Kea log showed repeated DHCPSRV_NO_SOCKETS_OPEN warnings with no interface binding errors — indicating Kea started successfully but had nothing configured to listen on.

Key finding

<!-- /cf/conf/config.xml -->
<dhcpd>
</dhcpd>

Empty <dhcpd> block confirmed via xmllint. LAN interface igc1 was fully up with correct IP (192.168.40.1/24), ruling out interface failure.

Config backup file size comparison

663796  Apr  6 08:54  config-1775480086.xml   < last known good
648920  Apr  6 08:54  config-1775480087.xml   < dhcpd block gone (~15KB loss)

Everything from config-1775480087.xml onward was corrupted.


Recovery Attempts

1. Config restore (partial failure)

Restored config-1775480086.xml to /cf/conf/config.xml. The <dhcpd> block was confirmed present and correct via xmllint and PHP config parsing. However, dhcpd_dhcp4_configure() continued generating an empty interfaces list. The Kea config generator function could not be located on disk (pfSense 26.x uses a different package structure), preventing direct inspection of the gating condition.

2. ZFS boot environment rollback (success)

bectl list
bectl activate default_20260405014509
reboot

Rolled back to the Apr 5 boot environment predating both the upgrade and the config corruption. Full service restored on reboot.


Available Boot Environments

BE                      Created               Size
default_20260405014509  2026-02-01 08:05      147M   < used for recovery
default                 2026-04-05 01:45      12.3G  < corrupted state

Contributing Factors

  • pfSense 26.03 package reloads (pfBlockerNG/Suricata) can trigger config writes that corrupt unrelated service blocks — known upstream issue
  • No pre-maintenance config backup was taken before working on pfBlockerNG/Suricata
  • rc.reload_all and service restart commands do not force regeneration of Kea config from a restored config.xml in this pfSense version
  • The Kea config generator location changed in pfSense 26.x, making surgical debugging harder

Resolution

ZFS boot environment rollback to default_20260405014509 (2026-04-05).


Action Items

  • Before touching pfBlockerNG or Suricata: manually snapshot config — cp /cf/conf/config.xml /cf/conf/config.xml.pre-$(date +%Y%m%d%H%M%S)
  • Investigate pfSense 26.x bug report for package-initiated config corruption of <dhcpd>
  • Document bectl list / bectl activate in pfSense runbook as primary recovery path
  • Consider scheduled config backups off-box (scp to local box)

References

  • Affected host: pfSense (26.03-RELEASE)
  • Last good config backup: /cf/conf/backup/config-1775480086.xml
  • Recovery BE: default_20260405014509
  • Operator: Johnny