Compliance-Oriented Incident Command

Incident command for teams that
answer to compliance.

Monitoring tools detect failures.
AlertEngine records how humans respond to them.

Authorized. Audited. Replayable. Nothing executes without your explicit approval β€” and every decision is recorded for auditors.

πŸ“‹
Policy
Deterministic rules β€” no AI
β†’
🧠
Diagnose
AI explains root cause
β†’
πŸ’¬
Alert
WhatsApp or Telegram
β†’
βœ…
Authorize
Engineer taps approve
β†’
⚑
Execute
Your webhook runs
β†’
πŸ“’
Audit
Immutable ledger entry
$ pip install fastapi-alertengine
No autonomous remediation No alert fatigue No silent failures No dashboards required
232
Tests Passing
Python 3.10 Β· 3.11 Β· 3.12
10/10
Adversarial Audit
All checks passed
17
Orchestrator Modules
~3,500 lines, zero stubs
Live
Production Tenant
Fintech, Zimbabwe

Policy is the floor. AI is the ceiling.

The hierarchy is enforced by architecture, not convention. Claude cannot trigger a state transition. Policy can override Claude. The audit log proves every decision.

1

Detection

Deterministic policy rules evaluate health score, P95 latency, and error rate. No AI involved. Policy decides whether an incident exists.

Actor: policy
2

Diagnosis

Two AI models independently analyze the incident. If they agree, one clean alert. If they diverge, a Dissent Alert shows both theories.

Actor: claude
3

Authorization

Engineer receives WhatsApp or Telegram alert. Taps approve on a JWT-signed, single-use recovery link. Nothing executes without this step.

Actor: engineer
4

Execution

Orchestrator calls your recovery webhook. 3 retries with exponential backoff. Dead Letter Queue on failure. You control what the webhook does.

Actor: orchestrator
5

Audit

Every stage, every actor, every confidence score, every policy version β€” written to an append-only Redis log. Replayable from the ledger alone.

Actor: system
🚨 Checkout API degraded
Health score: 23/100  |  P95: 2.8s  |  Errors: 19%
Diagnosis
Both models agree β€” confidence: 87%
Database connection pool exhausted after query timeout change
Recent deployment (Diff-in-Pocket)
3m ago β€” a1b2c3d: "Fix checkout query isolation level" (+12/-3)
⚠️ This commit touched database/query files
Suggested fix
Restart checkout worker pool
[ Approve fix ]

Two models. One verdict β€” or a Dissent Alert.

When two AI specialists disagree, the disagreement is more valuable than either answer alone. AlertEngine surfaces it before you approve anything.

Models agree

Consensus Alert

Both models independently reached the same diagnosis. You receive one clean alert with combined confidence.

⚑ Action Recommended
Score: 23 | P95: 2.8s
 
Issue: Database pool exhausted
Confidence: 87% (both models agree)
 
πŸ‘‰ Approve fix: [link]
Models disagree

Dissent Alert

The models reached different conclusions. You see both theories, specific logs to check, and two approve paths. The disagreement prevents false confidence.

⚠️ Degraded State β€” Models Disagree
Score: 23 | P95: 2.8s
 
Theory A (Database): Pool exhausted β€” 82%
Check: DB slow query log
 
Theory B (Network): Upstream timeout β€” 76%
Check: Upstream response times
 
Investigate before approving.
πŸ‘‰ Trust A   πŸ‘‰ Trust B

Six stages. Full attribution at every step.

From anomaly detection to authorized recovery β€” entirely through your phone. Every actor logged.

1

Instrument

Add instrument(app). P95 latency, error rate, and health scoring start immediately. Free SDK, MIT licensed.

2

Detect

Orchestrator polls /health/alerts every 5s. Policy gates run first β€” deterministic, no AI. Incident opens when thresholds breach.

3

Diagnose

Two AI models analyze independently. Commit context injected (Diff-in-Pocket). Dissent alert if models diverge. Confidence-gated β€” no noise.

4

Authorize

WhatsApp or Telegram alert arrives. Plain English diagnosis. Tap approve. JWT-signed, single-use, 5-minute TTL. Nothing runs without this.

5

Execute

Your recovery webhook is called. 3 retries with exponential backoff. DLQ on failure. Orchestrator never touches your servers directly.

Free Forever Β· MIT Licensed

Local Incident Sensing β€” Free Forever

Everything you need to understand what your API is doing right now. No account. No cloud. No catch β€” until you need alerts.

  • βœ“
    P95 Latency Tracking β€” real percentiles, not averages
  • βœ“
    Error Rate Detection β€” 4xx/5xx with configurable thresholds
  • βœ“
    Health Score 0–100 β€” composite, trend-aware
  • βœ“
    Anomaly Scoring β€” detects spikes vs your baseline
  • βœ“
    /health/alerts Endpoint β€” clean JSON, AI-agent friendly
  • βœ“
    Memory Fallback β€” Redis optional, never crashes your app
  • βœ“
    MIT Licensed β€” use it however you like
The catch: You see the score drop. You don't know why. You don't get alerts. You don't get recovery links.
That's the orchestrator.
# Install pip install fastapi-alertengine # In your FastAPI app from fastapi import FastAPI from fastapi_alertengine import instrument app = FastAPI() instrument(app) # that's it # Now visit /health/alerts # { # "status": "critical", # "health_score": {"score": 23, "trend": "degrading"}, # "metrics": { # "overall_p95_ms": 2847.3, # "error_rate": 0.19, # "anomaly_score": 1.4 # }, # "alerts": [...] # }

Alerts where engineers actually are.

No new apps to install. No dashboards to check. Just the channel your team already uses.

πŸ’¬

WhatsApp

Via Twilio or Sent.dm. The most reliable mobile interrupt channel globally. Recovery approvals arrive as tappable links.

Growth plan+
✈️

Telegram

Via Telegram Bot API. Available on all plans including Starter. No per-message cost. Instant delivery globally.

All plans
#

Slack

Webhook-based Slack integration for team notifications. Incidents posted to your channel with recovery link.

Compliance plan+
πŸ”—

Webhook

Generic HTTP webhook fallback. Fires when primary channel fails. Integrates with any endpoint β€” PagerDuty, Teams, custom.

All plans
πŸ“ž

Voice

Automated voice call escalation via Twilio. Fires after 180s if no approval received. Secondary engineer notified after 300s.

Compliance plan+
πŸ“’

Audit Ledger

Every delivery attempt logged immutably. Success, failure, provider, actor, timestamp. Full ledger per incident.

All plans

From awareness to evidence.

The SDK is free forever. As your compliance requirements grow, so does what AlertEngine proves. Every plan includes unlimited team users.

Free
$0

Detection SDK only. MIT licensed. Runs on your servers. See the score drop β€” but not why, and not on your phone.

You see it. You don't get told.

pip install
Starter
$19/mo
Operational Awareness

1 service. 5 incidents/mo. Telegram alerts. Know when your app breaks β€” before your users tell you.

One hour of downtime costs more than a year of Starter.

Get started
Growth
$99/mo
Diagnostic Intelligence

1 service. 10 incidents/mo. WhatsApp + AI diagnosis. Know what broke, not just that it broke.

One false-positive 3am alert costs more than a month of Growth.

Get started
Compliance
$799/mo
Regulatory-Grade Auditability

10 services. 200 incidents/mo. Every incident logged with actor, policy version, and decision. Export your audit trail. Prove compliance to auditors.

SOC 2 Type II audit costs $15K–$50K. Compliance is $799/mo insurance against that delay.

Get started
Platform
$1,500/mo
Regulatory-Grade Auditability

20 services. 1,000 incidents/mo. Custom policy thresholds versioned and logged in every audit entry. Built for platforms that answer to regulators.

Generic thresholds don't work at scale. Custom thresholds become compliance evidence.

Get started

Need dedicated deployment, custom SLA, or procurement paperwork?

Contact us β€” Enterprise

Every principle enforced by code. Every claim provable by audit.

AlertEngine is designed for teams where operational decisions must be documented and defensible.

PrincipleEnforcementAudit proof
Policy decides incidents, not AIshould_recover() in pipeline.py gates RECOVEREDactor: "policy" in audit log
AI explains, humans authorizeClaude generates message; JWT gates executionactor: "claude" then actor: "engineer"
Nothing executes without approvalPOST /action/recover/confirm requires valid JWTAUTHORIZED before EXECUTED in every log
Every action logged immutablyappend_event() on every transitionget_audit_log() returns complete timeline
Deterministic alert rulesincident_policy.py β€” single versioned POLICY dictpolicy_version in every audit entry
Cross-tenant isolationTenant ID validated on every endpoint403 on mismatch β€” adversarial audit confirmed
Replay attack preventionAtomic Redis SET NX, single-use JWT20 concurrent attempts β†’ exactly 1 succeeded

Human-Authorized. Always.

No automated remediation. No background execution. Every recovery action requires explicit human authorization.

πŸ”‘

JWT Recovery Tokens

Every recovery action is gated by a tenant-scoped JWT with a 5-minute TTL. Tokens are single-use and validated atomically in Redis β€” no replay possible.

πŸ‘

Preview Before Authorization

GET the recovery link to see exactly what will happen. POST to execute. The preview is read-only and irreversible actions are always a separate, explicit step.

πŸ”’

Cross-Tenant Isolation

All endpoints enforce tenant ownership. Adversarial audit confirmed: attempting to access another tenant's incidents returns 403 β€” always.

πŸ“’

Immutable Audit Ledger

Every alert, diagnosis, delivery attempt, and recovery authorization is written to an append-only log with full actor attribution. State is reconstructable from the ledger alone.

Survived a full adversarial audit.

An autonomous AI agent acted as a hostile tenant and attempted to break isolation, replay tokens, and flood the system. 10/10 passed.

CheckResultDetail
Cross-tenant audit accessβœ“ Blocked403 returned
Cross-tenant delivery accessβœ“ Blocked403 returned
Recovery token replay (20 concurrent)βœ“ Protected1 succeeded, 19 rejected
Duplicate incident creation (race)βœ“ ProtectedExactly 1 created
Concurrent token floodβœ“ HandledAtomic Redis SET NX
Natural incident detectionβœ“ ConfirmedEnd-to-end verified
WhatsApp deliveryβœ“ ConfirmedLive production delivery
Recovery authorization audit trailβœ“ WrittenImmutable append-only log
Degraded mode handlingβœ“ ConfirmedNORMAL/DEGRADED/EMERGENCY
Lease renewal under loadβœ“ AtomicLua compare-and-delete

Source-available orchestrator. MIT-licensed SDK.

Clean separation between the free SDK and the paid orchestrator. The orchestrator is published for security audit β€” not for self-hosting.

fastapi_alertengine/ ← Free PyPI package β€” MIT licensed
middleware.py ← RequestMetricsMiddleware
engine.py ← Core alert engine
intelligence.py ← Adaptive thresholds, health scoring
storage.py ← Redis Streams persistence
 
orchestrator/ ← Source-available for audit β€” NOT for self-hosting
pipeline.py ← State machine + IncidentStage enum
incident_policy.py ← Single source of truth for all thresholds
claude_engine.py ← AI diagnosis (tool use, hardened)
diagnostic_council.py ← Dual-model incident court
commit_context.py ← Diff-in-Pocket commit correlation
audit.py ← Immutable forensic ledger
plans.py ← Billing tiers and feature gates
 
tests/ ← 232 tests, Python 3.10/3.11/3.12
docs/ ← This landing page + ARCHITECTURE.md
πŸ‡ΏπŸ‡Ό

Built in Zimbabwe β€” where the constraint became the feature.

I spent my career in accounting and finance before building AlertEngine. In finance, no transaction executes without authorization and every action leaves an audit trail. AlertEngine applies that same discipline to production infrastructure.

In Zimbabwe, engineers aren't always at laptops when things break. WhatsApp is the operational control plane. That constraint produced something better than a dashboard ever could.

10/10
Adversarial audit checks passed including replay attacks and cross-tenant isolation
Live
Live fintech tenant monitored in production β€” real workloads, real incidents
232
Tests passing across the full SDK and orchestration suite
5s
Detection latency from spike to WhatsApp alert

You'll be live within 2 hours.

Fill in your details and we'll configure your tenant, fire a test alert to your phone, and send your invoice. No credit card upfront.

The endpoint AlertEngine will poll every 5 seconds
The endpoint we call when you tap Approve β€” you control what it does

No credit card upfront. Invoice sent after your test alert fires. Pay via Payoneer or wire transfer.

Policy is the floor. AI is the ceiling. The ledger proves it.

The SDK is free and takes one line. The managed layer is ready when you are.

$ pip install fastapi-alertengine