security2026-01-241 min read
Incident Response for LLM Agents
Runbooks for misfires—containment, rollback, evidence capture, and post-incident improvements.
title: Incident Response for LLM Agents
description: Runbooks for misfires—containment, rollback, evidence capture, and post-incident improvements.
date: 2026-01-24
tags: [security, reliability, operations, governance]
What counts as an “agent incident”?
- Unauthorized tool call
- Data exfiltration attempt
- Incorrect action taken in an external system
- Budget runaway (cost spike)
The 4-phase runbook
1) Detect
- anomaly alerts: cost / tool error spikes / policy denies
- user reported issue (support channel)
2) Contain
- disable workflow or tool at policy layer
- rotate tenant-scoped keys if needed
- quarantine run logs and evidence
3) Eradicate
- patch policy rules, tool schema, or prompt template
- add regression tests for the failing case
- verify with eval harness
4) Recover
- re-enable under tighter budgets
- add monitoring and alerts
- communicate to stakeholders
Evidence capture (non-negotiable)
- run envelope (policy hash, route decision)
- tool call ledger
- output diff vs expected
- human approvals (if any)
Related insights
View all →governance2026-01-27
Prompt Versioning and Rollbacks for Production Agents
Treat prompts like code—semantic versions, changelogs, and instant rollback when behavior shifts.
Governance2026-01-12
Governance-First Agentic AI: A Practical Blueprint
A step-by-step blueprint for governed agents: policy gates, audit evidence, risk controls, and enterprise deployment patterns.
reliability2026-01-23
Evaluation Harness for Agentic Workflows
Ship agents like software—regression tests for prompts, tools, policies, and routing decisions.
security2026-01-25
Tooling Catalog and Blast Radius Control
Treat tools as product surface area—documented schemas, permissions, and safe defaults.
security2026-01-22
Data Minimization for Agentic AI
Reduce data exposure while improving reliability—scoped retrieval, redaction, and least-privilege connectors.
governance2026-01-20
Agentic AI Operating Model for Enterprises
A practical operating model for deploying agents safely—roles, controls, runbooks, and measurable outcomes.