Write your Orchestration Plan and run your first fleet · The Builder's Stack

Supervision at scaleWhy it's different

The folder lands on your desk: a forty-page research dump of analyst notes, sales-call fragments, and competitor changelogs, with a structured competitive brief due by the end of the week. You have run a fleet on work like this before, which is exactly the problem, because that run survives only in fragments. The cut-list was a message typed to an orchestrator, the budget was a number in your head, the gates were wherever you happened to pause, and the transcript tying it together is three projects back. Every decision that made the run work has to be made again at full price.

The work itself did not get any harder, but you are missing the one page where all those decisions live, and writing that page is the last skill in this part.

One page that makes a fleet repeatable

Each chapter in this part settled one running decision: whether to run a fleet at all, where to cut the work, which pattern to run it in, who checks the output, what it costs, how it fails, and where you stand while it runs.

The Orchestration Plan is the one page where those answers stop living in your head and become something the next run starts from.

It ships as a fillable PDF in the artifacts library and stays one page, because it only works if you read it at kickoff and rewrite it in minutes at the close.

Today the page lets you run the fleet, because every contract, check, and ceiling the agents need is right there to paste in. Next quarter it lets you repeat the job, when someone asks for the same brief on a new market. The rest of the chapter fills it in for the brief due by the end of the week.

Write the task, the success bar, and the cut-list

The task, in two sentences. Write what goes in and what comes out: the dump becomes a competitive brief, one section per competitor named in the folder index, every claim cited to a page. A task that needs a paragraph is several tasks.

The success bar, stated so a stranger could grade it. For the brief: a reader who never opens the dump can state each competitor's positioning, pricing model, and latest move, and any claim spot-checked against its page reference holds up. If grading would need a conversation with you, the bar is not written yet.

The cut-list, with contracts. Write one line per item in the three-clause form from Decomposition: split work so agents never collide. The dump cuts by entity: each competitor item gets its tagged pages, the template, and the settled decisions; returns one section at an agreed path with page citations; must not touch the other sections or the template. The cross-competitor summary is its own item, the only one that reads every section. Finish the cut-list with one collision check: no entry may appear in two must-not-touch lines.

Choose the pattern, the checks, and the budget

The pattern, with one line of why. Pick from the menu in Patterns: fan-outs, pipelines, and judge panels and write the reason beside it, the one-sentence defense from Why one agent stops being enough. For the brief, the pattern is a fan-out of uniform section writers, a verifier joining each section as it lands, and a single barrier before the summary, which has to wait because it is the one step that reads every finished section at once.

The verification stages, deterministic first. Scripts run before models, the order from Verification: make the fleet check its own work: the section matches the template, every claim carries a page reference, every reference points inside the dump. Judgment runs second, one verifier per section tasked with refuting each claim against its cited page. Then the authority line: mechanical defects are fixed in place and logged, structural misses send the item back to its writer.

The budget, with the ceiling rule attached. The cap sits where your tooling can enforce it mid-run, sized the way Economics: what a fleet costs and when it pays prices it. The tiers put a small model on tagging and splitting the dump, your everyday model in the writing seats, and the strongest on judging the assembled brief. The ceiling rule is written before any tokens move: when the cap hits, spawning stops and checking finishes on what already exists.

Set the gates, the abort trigger, and the resume story

The gates. Mark where the run stops and waits for you, and give a first fleet exactly two. The first sits after the pilot item: one writer runs, you read its section end to end, and the fan-out widens only after the template survives a real competitor. The second sits before the merge: verifier logs and the coverage line get read before anything is assembled. Where to stand between them is the discipline of Supervision: stay in command of many agents.

The abort trigger. Write one condition that ends the run early, the kill rule given a fleet's blast radius. For the brief: two sections rejected for the same structural reason means the flaw lives in the template, so the run stops before paying for it at full width.

The resume story. Write down what survives an interruption, because something eventually will: a sleeping laptop, a rate limit, you reclaiming the machine. For the brief: each section lands in its own file, a journal line records each completion as its output lands, and a cold resume reads the journal, skips what it lists, and reruns what was live at the cut. The cures come from Failure modes: how fleets go wrong together; the plan records where the journal lives, so a resume needs no memory of the run that died.

Try it now

The run is the drill, so block an afternoon and bring real work.

Fill the plan. Pick a backlog job too big for one session, open the fillable Orchestration Plan in the artifacts library, and complete every field before any tokens move. The cut-list from the decomposition drill and the cap from the economics drill drop straight in. An empty field is a decision you have not made, and mid-run is the most expensive place to make it.

Run the fleet from the page. Launch three to six agents in Claude Code and brief them by pasting from the plan itself: the contracts, the stages, the ceiling rule. If you catch yourself typing an instruction that is not on the page, add it to the page first.

Hold your two gates. At each gate, actually stop: read what the gate was placed to catch, write one line on what you checked and what you let through, then let the run continue. The gate notes look optional today, and they are the memory the opening scene was missing.

Write down the one thing the plan got wrong. Every first plan misses something: a cap set too tight, a must-not-touch entry nobody predicted, a gate placed too late to matter. Correct the page while the run is fresh and keep it with the gate notes, because the second fleet starts from the first plan.

Scale it down: three agents on one document. Two writers take two sections of the same brief and one verifier runs behind them, with every field still filled, because the page is most of the practice at any fleet size.

Chapter Summary

The Orchestration Plan is one page that records every decision a fleet needs, so the next run starts from the page instead of remaking each call at full price.
It stays one page: you read it at kickoff and rewrite it in a few minutes at the close, which only works if it never grows into a long document.
Write the task in two sentences, a success bar a stranger could grade, and a cut-list with a contract per item. Finish the cut-list by checking that no item appears in two must-not-touch lines.
Choose the pattern with one line of why, list the verification stages with scripts before models, and set a budget with a ceiling rule that says what happens when the cap is hit.
Give a first fleet exactly two gates, one abort condition that ends the run early, and a written note of what survives if the run is interrupted.
The plan sits at the moving edge of the field, so let every run correct the page; that is how it keeps up as the tooling changes.
One thing does not move: every fleet scales up from a single supervised agent, argued in Supervision: keep a human in charge of the agent, and the short version of the rules lives in the verify-first rule.
This chapter closes The Frontier level, the last level of the Builder's Stack.

Sources

Anthropic (2025). How we built our multi-agent research system. Anthropic engineering blog.
Cognition (2025). Don't Build Multi-Agents. Cognition engineering blog.

Marks this chapter complete on your course map. Reaching the end does this for you.