Build an AI Incident Copilot (CLI) in Python
When an incident hits, most engineers repeat the same manual loop: pull recent logs, scan for errors, and guess what to check next.
This post builds incopilot—a CLI tool that automates the first-pass triage:
- Collect logs from systemd journal and/or Docker
- Detect high-signal patterns (timeouts, OOM, disk full, 5xx, panics)
- Map findings to the Four Golden Signals
- Output
report.md+report.jsonready to paste into an incident doc
Safe by design: suggestions only — no destructive commands.
Architecture
Project structure
incopilot/
__init__.py
cli.py # argument parsing + console output
collectors.py # journalctl, docker logs, file, bundle
analyzer.py # pattern detection + line normalization
reporter.py # report.md / report.json generation
config.py # patterns, golden-signal map, safe-command list
scripts/
demo_generate_sample_logs.py
posts/
requirements.txt
pyproject.toml
README.md
Setup
git clone https://github.com/AutoShiftOps/incopilot.git
cd incopilot
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
Quick test (no real services needed)
python scripts/demo_generate_sample_logs.py
python -m incopilot file --path sample.log
ls out/
Systemd journal triage
python -m incopilot journal --unit nginx --since "30 min ago"
Docker triage
python -m incopilot docker --container my-api --since 1h
Both sources (bundle)
python -m incopilot bundle \
--unit nginx \
--container my-api \
--since-journal "30 min ago" \
--since-docker 1h
What you get
out/report.md — paste into your incident doc
out/report.json — attach to a ticket or POST to a webhook
What to improve next
- Per-service pattern packs (nginx, postgres, java, node)
- Slack/Teams webhook posting (
--webhook <url>) - Unit tests + GitHub Actions CI
- Scheduled timer (systemd timer unit) for proactive reports
