Series

S03: The Agent Production Stack

The industry spent 2024–2025 building agents like prompts: role-play a "researcher, " role-play a "planner, " role-play a "critic, " wire them together with a prompt, ship to production, debug in Slack.

It didn't work. Research claims that >75% of multi-agent systems become unmanageable past five agents. Production agent observability is a black box; your APM shows green while the agent has burned $4,200 on a failure mode nobody has a dashboard for. Prompt-injection defences live inside each app, get re-implemented wrong, and drift.

The winning 2026 pattern is not "more agents." It is fewer agents, better tools, durable execution, OpenTelemetry gen_ai traces, platform-level guardrails, and SPIFFE-based identity. That is an infrastructure problem, not a modelling problem which is exactly why this is a Playbook series.

The through-line: Every article in this series refuses the "agents are just prompts" framing. The unit of architecture is the platform primitive traces, state machines, policy, identity, not the agent itself.

Stop building agents like prompts. Build them like state machines.
Durable execution (Temporal, Restate, Inngest), idempotency keys for tool calls, and human-in-the-loop as an interrupt primitive.
Apr 19, 202611 min read46

Command Palette