Blast radius: bound what one turn can touch · The Builder's Stack

The autonomy ladderWhen inputs attack

Your follow-up agent drafts replies inside sales threads and sends them once a rep approves. One afternoon one turn goes wrong. Maybe the model produced a confident misreading, maybe a malformed thread set it up; the cause will matter tomorrow. Today the only question is what that turn could reach. If the send tool accepts any address and the token behind it can read the whole CRM, then a feature scoped to "reply in this thread" was, for one bad turn, a system that could message every contact your company has. Nobody designed that reach; it accumulated one convenient permission at a time, and you would have met it for the first time in the incident review.

That reach is the blast radius.

The blast radius is everything a single turn can touch before something outside the model stops it.

This chapter asks you to assume the turn goes wrong, through bad output, bad input, or bad luck, and to decide what it can reach when it does. The radius is a design decision, not a discovery, and if you skip the decision you will make the discovery in production.

The model does not set the radius, your infrastructure does

The action surface: every tool is delegated authority gave you the inventory, and The autonomy ladder: place every action deliberately set how much supervision each action gets. The radius is the third question, and it decides the stakes of the other two, because it answers how far a placed, supervised action carries when it misfires anyway. None of the answer lives in the model or the prompt; it lives in the scopes on the credential, the caps in the executor, the environment the agent runs in, and how fast you can stop it. Two products can run the same model on the same instructions and have one bad turn cost a draft in the first and the whole account in the second.

What an unbounded radius costs you

The clearest example is older than any model you will ship. On August 1, 2012, the trading firm Knight Capital deployed an update to its automated trading system incorrectly, and the system began sending unintended orders at machine speed. Roughly 440 million dollars was gone in about 45 minutes. There was no effective way to stop the system while the event ran, and the firm was effectively gone within the hour.

The lesson travels precisely because there was no model in the loop and no prompt to harden. An automated system held live authority, nothing bounded what it could reach, and nothing could stop it mid-flight. The trading itself does not matter here. Every tool in this chapter exists to be the thing that was missing that morning, either a bound the turn cannot cross or a switch that stops it while it runs.

Bound the turn before it runs

Each of these bounds is enforced in infrastructure, where no model output can reach it.

Least privilege, per session and per turn. A turn carries only the authority its current action needs, and the authority expires with the session. In practice that means scoped tokens with short lifetimes instead of a standing master key, allowlists for recipients and destinations (the reply tool can address the thread's participants, not the address book), and a sandbox around anything that executes. Sandboxed execution is now the norm across agent products for a reason: code the agent writes runs in a disposable environment with no production credentials and almost no reach, so the worst script costs you the sandbox.
Hard caps on money and volume. Set a per-user spend ceiling, a per-session action count, and a per-turn limit on rows written or messages sent. A cap is not a quality check; it sets your worst case in advance, so if a turn can send at most five messages, even a fully compromised turn is a five-message incident no matter what the output requested. Size each cap from the legitimate job, meaning what a good turn realistically needs plus some margin, then let it hold.
Environment separation. An agent in an experiment never meets production data, and an agent in production holds credentials only for the systems its feature needs. The separation you built in Guardrails: keep secrets, money, and data safe matters more once a loop generates its own next step, because the path from a test prompt to a production write no longer needs a human to walk it. Eval runs and load tests get staging copies and synthetic records, never live customers.

Add a kill switch that stops the agent mid-action

You probably have a way to stop the agent between runs, a flag to flip or a queue to drain. That is not enough, because nothing Knight had in place could stop its system while the event ran, and a bad turn does not wait for the end of the run to do its damage. A real kill switch stops the loop mid-action: the executor checks a stop flag before every tool call rather than once at session start, in-flight work halts in a state you can reconcile, and flipping it needs no deploy and no particular engineer on call. Once you have one, rehearse it by pulling the switch in production conditions on a quiet day, timing how long the agent keeps acting after you decide to stop, and treating that number as part of the radius. Monitoring, how you know it broke (and what it costs) tells you when to pull the switch; the switch itself decides whether the damage stops after one action or runs for the rest of the hour.

Try it now

The drill takes about fifteen minutes and runs on your own agent feature, real or planned.

Compute today's radius. Gather every tool definition and the credential behind each one: token scopes, API keys, database grants. For each tool, write down the worst a single compromised turn could reach with today's permissions: which recipients, which records, which dollar amounts, which environments. The fast version is to paste the definitions and scopes into Claude Code and ask for the worst-case single-turn reach per tool, the strongest case rather than the likely one.

Write the radius as one sentence. State the honest worst case in one line, in the form of "one bad turn can email every contact in the CRM and write to the production orders table." Most teams find the sentence names authority nobody remembers granting.

Add the two caps that shrink it most. Pick the two bounds that cut the most reach for the least product cost. A recipient allowlist and a per-session action cap are the usual winners, with an expiring scoped token close behind.

Re-compute and keep both sentences. Write the new worst-case line under the bounds you chose, and put the before and after where the team can see them. The distance between the two sentences is risk you removed without touching the model, the prompt, or the feature.

Chapter Summary

The blast radius is everything one turn can reach before something outside the model stops it. Assume the turn goes wrong and set that reach yourself, before production sets it for you.
The radius is set by your infrastructure, not the model or the prompt: the scopes on the credential, the caps in the executor, the environment, and how fast you can stop the agent.
Give each turn only the authority its current action needs, and make it expire with the session. Use scoped, short-lived tokens and allowlists for recipients and destinations, not a standing master key.
Put hard caps on spend, action counts, and volume so a fully compromised turn is still a small, known incident rather than an open-ended one.
Keep environments separate, so an experiment never touches production data and a production agent holds credentials only for the systems its feature needs.
A cap written into the prompt is not a cap. A limit only counts when the executor, the token, or the database grant enforces it where no output can reach it.
Add a kill switch that stops the loop mid-action, not just between runs, then rehearse it and time how long the agent keeps acting after you pull it.
Knight Capital shows what machine-speed authority costs with no bound and no switch, and it needed no model to get there.
Everything here assumed the bad turn arrives by accident. Next, Injection: the input is the attack surface covers the bad turn that an attacker deliberately engineers, and the bounds you just set are what make that turn survivable.

Sources

U.S. Securities and Exchange Commission (2013). Administrative order in the matter of Knight Capital Americas LLC.
Press and post-incident reporting on the Knight Capital trading event (2012).
Published agent-building guidance from major AI labs on tool scoping, least privilege, and sandboxed execution (2024).

Marks this chapter complete on your course map. Reaching the end does this for you.