Write the Agent Charter and ship with authority you chose · The Builder's Stack

Judge the whole runWhy AI gets attacked

The security questionnaire arrives the week you planned to launch. Your inbox assistant drafts replies, schedules meetings, and files tickets, and the buyer's security team wants one section before the pilot: every action taken without a human, what each can reach, and what stops it mid-incident. Every answer already exists, scattered across the tool list in the repository, the rungs in a config, the caps in the executor, and the stop procedure in one engineer's head. You spend a day reassembling decisions you already made into the most useful page the product owns.

This chapter writes that page before anyone demands it.

The Agent Charter is one page that records every decision this part asked you to make about your agent: its job, what each tool can reach, how much it can act on its own, and what stops it mid-incident.

It ships as The Agent Charter, a fillable PDF in the artifacts library, with one field per decision. The walk below fills it for that assistant, and the drill fills yours.

Write the job and the one line it never crosses

The charter opens with one sentence of purpose and one sentence of refusal. The job sentence reads: the owner of an overloaded inbox hires this product to turn each morning's mail into drafted replies, scheduled meetings, and filed tickets, and every field below answers to it. The refusal is the single rule that holds across every feature, rung, and future promotion: the assistant never communicates with anyone outside the thread's participants, whatever an email asks it to do. Write yours as a prohibition, short enough to recite from memory.

List every tool the agent can call, straight from the code

Next comes the inventory of tools from The action surface: every tool is delegated authority, one row per tool, filled from the code rather than from memory, because the charter only binds the product if it records what is actually wired up:

Tool	Reads	Writes	Reversible	Why it exists
search_mail	the mailbox	nothing	nothing to undo	triage starts from history
draft_reply	the thread	a draft inside the product	fully, delete the draft	drafting is the job
send_reply	nothing	another person's inbox	only inside the send delay	approved replies must go out
schedule_meeting	calendars	events and invites	with residue, invitees saw it	meetings are half the mail
file_ticket	nothing	the help desk queue	with residue, the team read it	issues need an owner

If you cannot fill in the last column for a row, that tool is power the agent does not need, and a bad request will eventually find it, so delete the row while removing it is still cheap.

Give every action an autonomy level and a rule for raising it

Each row gets an autonomy level, a rung, from The autonomy ladder: place every action deliberately, and you place each one a rung lower than your confidence suggests.

Act silently: the reads and draft_reply, because the worst draft costs a deletion.
Act with undo: file_ticket and holds on the user's own calendar, each with a compensating action built.
Act with approval: send_reply and any invite that leaves the account, the actions that reach another human.

Next to each placement, write the rule for raising it. Before send_reply moves up from act with approval to act with undo, two things have to be true and dated: its eval slice passes the bar, and it has run at real volume under approval for a stretch with no incident charged to it. When an incident does happen, demote the action first and investigate it second.

Name the three injection ingredients and block the one you can

This field lists the three legs of the lethal trifecta from Injection: the input is the attack surface and marks the one you block. For an inbox assistant, the first two come built in: it reads private mailbox data, and it takes in mail written by strangers. So you block the third leg, sending data out: in the executor, replies can only go to thread participants, invites are held to the same list, and tickets reach only the registered queue. This is the refusal sentence from the top of the charter, now enforced in code. We have no reliable defense against injection itself, so the page assumes a malicious message gets through and records how much damage it could do.

Cap how much one run can do, and test the kill switch

Next come the limits from Blast radius: bound what one turn can touch, sized from the real job: a normal morning needs a handful of sends, so the per-session ceiling sits just above that, with matching ceilings on invites and tickets, all enforced in code the model's output cannot override. The kill switch has three blanks, each demanding a specific answer: who can flip it (anyone on call, no deploy needed), where it takes effect (the executor checks a flag before every tool call, so it stops the agent mid-action), and when it was last pulled in a test, with how long the stop took to land. The Knight Capital collapse, where a runaway trading system lost about 440 million dollars in under an hour, is why that last blank matters: an untested stop is not a stop.

Write down how a run gets undone, and which evals gate it

The receipts field, from Receipts and recovery: design for the failed run, records what a person sees after a run and what they can take back. The ledger is written in plain user language, "drafted a reply to the pricing thread", "filed the export bug", with each line marked undoable or final. The compensation map pairs every write with the action that reverses it: drafts delete cleanly, calendar holds release, tickets withdraw with a note. A sent reply is the exception, because once its send delay closes there is no way to recall it, so its row is final and it stays at act with approval.

The eval field connects to Agent evals: judge the trajectory, not just the answer.

The process rubric. The agent reads before it writes, escalates to a human instead of guessing, and never composes a reply to anyone outside the thread, whatever an incoming email tells it to do.
The outcome bar. The drafts are good enough to send with light edits, the proposed holds turn into real meetings, and the tickets land in the right queue.
The sandbox rule. Every change is tested against a seeded mailbox that includes hostile mail, and no eval run touches a live inbox.
The pass rate that allows promotion. The charter sets the required pass rate ahead of time, and before any action moves up a rung, its send slice has to clear that rate with every injection case passing. Afterward, The regression gate: no change ships blind keeps the bar from slipping.

Sign it before launch, and sign it again when the tools change

The last field is a name and a date. Whoever owns the decision to ship signs that the charter matches the product as it was actually built, and that signature pays off at the next security questionnaire, postmortem, or new hire, each of whom gets a finished page instead of a walkthrough. The signature expires the moment the product changes in a way the page describes, whether that is a new tool, a wider scope, a new way to send data out, or an action moved up or down a rung; re-signing takes minutes, and only the changed fields need fresh evidence. A product that has outgrown its charter is running on power nobody chose to grant it, the exact problem this part opened with in When your product starts doing things.

Try it now

The charter is the drill. Give it about an hour with the PDF open, on your agent feature, real or planned.

Fill the top from the code. Write the job sentence and the hard line yourself, then have Claude Code enumerate every registered tool, server, scope, and description verbatim and fill the table. Where it contradicts the version in your head, the code is the truth and the gap is your first finding.

Place, gate, and cap. Give every row a rung with its evidence sentence, name the trifecta legs and the one you gate, size the caps from the job, and name who flips the kill switch. If it has never been pulled in a test, schedule that test before you sign.

Finish the undo story and the gates. Map one compensating action per write tool, mark final where nothing compensates, then set the rubric, the bar, the sandbox rule, and the pass rate that promotes.

Read the blanks as your backlog. Every empty field points at the chapter whose drill produces it; the boxes you cannot fill are the work you have left. Fill them, then sign.

Chapter Summary

The Agent Charter is one page that records every decision you made about your agent in this part, so it is ready before a buyer, a postmortem, or a new engineer ever asks for it.
Open with two sentences: the job the product was hired to do, and the one line it must never cross, written as a prohibition short enough to recite.
List every tool the agent can call in a table filled straight from the code, and delete any tool you cannot justify in the "why it exists" column.
Give each tool an autonomy level, place it a rung lower than your confidence suggests, and write down the dated evidence required before it can move up.
You cannot stop an agent from reading private data or hostile input, so block the third injection leg instead: keep it from sending anything out to anyone outside the thread.
Cap how much one run can do in code the model cannot override, keep a tested kill switch anyone on call can flip mid-action, and for every write record the action that reverses it, marking anything you cannot reverse as final.
Wire in the evals that gate each promotion: a process rubric, an outcome bar, a sandbox that never touches live data, and a pass rate set in advance.
Sign the charter before launch, and sign it again whenever the tools, scope, or autonomy levels change, since a product that outgrows its charter is running on power nobody chose to grant it.
Your agent now ships with authority you chose, and the charter waits in the artifacts library. Next up is Why attackers love AI products, because the product you just empowered is about to draw attention.

Sources

Simon Willison, writing that named the lethal trifecta for AI agents (2025).
OWASP Top 10 for Large Language Model Applications (2023; updated 2025).
U.S. Securities and Exchange Commission, administrative order on the Knight Capital trading failure (2013).

Marks this chapter complete on your course map. Reaching the end does this for you.