Series

S06: The AI Agent Threat Model

Direct prompt injection was the easy case. The 2026 attacks ride in on tool descriptions, retrieved content, and MCP sampling. Comment-and-Control hit Claude Code, Gemini CLI, and Copilot Agent from one PR comment. The OpenClaw crisis exposed 21,000 framework instances. Platform teams own the defence, not the model vendors.

Defence is layered, platform-enforced, and sandboxed. Tool description allowlists, retrieval provenance, MCP sampling limits, per-conversation sandboxes, and human gates for state-changing actions. The model is one piece of the system. The platform is the rest.

No posts yet

Command Palette