model-watchdog
March 15, 2026 · View on GitHub
Auto-rollback for AI agent config changes. Zero dependencies beyond Python 3.8+.
The Problem
I changed my AI agent's model config from claude-opus-4-5 to claude-opus-4-6 without checking if the installed software version supported it. The agent went down for 10 hours while I was asleep.
This tool watches your agent's health endpoint and automatically rolls back the config if it detects failures — then restarts the service. It also saves a "last known good" backup whenever the config changes and the agent is healthy.
Quick Start
# Probe http://localhost:18789/health every 30s
# Roll back after 3 failures in 3 minutes
python3 watchdog.py
# Custom config
python3 watchdog.py --config watchdog.yaml
# One-shot health check (for CI/scripts)
python3 watchdog.py --check-once
How It Works
- Probe your agent's health endpoint every N seconds
- On K failures within M minutes → rollback config + restart service
- When agent is healthy after config change → update the "good backup"
- Alert via Telegram, Slack, Discord, or any HTTP webhook
Agent healthy with new config → save as "good backup"
↓
Config changes (model upgrade, etc.)
↓
Agent starts failing
↓
K failures in M minutes → rollback to good backup → restart
↓
Alert sent → agent back online
Config
Generate a sample config:
python3 watchdog.py --dump-config > watchdog.yaml
Key options:
{
"probe": {
"url": "http://localhost:18789/health",
"timeout_sec": 5,
"expected_status": 200,
"expected_body": "ok"
},
"thresholds": {
"failures": 3,
"window_sec": 180,
"probe_interval_sec": 30
},
"rollback": {
"config_path": "~/.openclaw/openclaw.json",
"backup_path": "~/.openclaw/openclaw.json.watchdog-good",
"restart_cmd": "systemctl --user restart openclaw-gateway",
"restart_wait_sec": 10
},
"alerts": {
"telegram_bot_token": "...",
"telegram_chat_id": "..."
}
}
Run as a Service
# Install as systemd user service
cat > ~/.config/systemd/user/model-watchdog.service << EOF
[Unit]
Description=model-watchdog AI agent health monitor
After=network.target
[Service]
ExecStart=/usr/bin/python3 /path/to/watchdog.py --config /path/to/watchdog.yaml
Restart=always
RestartSec=5
[Install]
WantedBy=default.target
EOF
systemctl --user enable --now model-watchdog
systemctl --user status model-watchdog
Works With
- OpenClaw (default config paths)
- Any AI agent with an HTTP health endpoint
- Any service with a config file + restart command
Why No Dependencies?
Agents running 24/7 on minimal VPS installs shouldn't need a pip install to stay alive. This is a single Python file, standard library only.
Optional: pip install pyyaml for YAML config support (JSON works without it).
License
MIT