Skip to content
AI-Native PM
The work

Founding Essay · 04

The Operating Manual

The manual that operationalizes Shape · Ship · Track. It tells you how to run the cycle, not whether you are pointed at the right problem.

14 min read · Steps to run the framework

In "Shape · Ship · Track" we said: You shape how a model behaves, you ship it to a human, and you track whether it holds, then you do it again. This manual tells you how to run that cycle. It does not tell you whether you are pointed at the right problem.

The Operating Manual: Shape · Ship · Track, operationalizedA blueprint of columns headed Shape, Ship and Track, under one umbrella band labelled Continuous Operations, the umbrella across every turn of the cycle. Shape opens into Frame the problem, Write the behavior, Choose the model, and Prototype it. Ship opens into Build the guardrails, Build the eval suite, Earn trust when unsure, and Set the cost and speed budget. Track opens into Watch in production, Catch the drift, and Feed back into Shape. The rows draw in column by column, then a dotted feedback path runs from Feed back into Shape back up to the Shape column. The cycle opens into its activities. Continuous Operations runs across every turn.THE CYCLE, OPERATIONALIZEDCONTINUOUS OPERATIONSthe umbrella across every turn of the cycleShapeFrame the problemWrite the behaviorChoose the modelPrototype itShipBuild the guardrailsBuild the eval suiteEarn trust when unsureSet the cost & speed budgetTrackWatch in productionCatch the driftFeed back into ShapeThe cycle opens into its activities. ContinuousOperations runs across every turn.
The cycle opens into its activities. Continuous Operations runs across every turn.

Quick reference

MoveActivityWhat it produces
ShapeFrame the problemDiscovery guide, Problem frame, Definition of good
Write the behavior, then the promptBehavior contract, Starter system prompt, Escalation rules
Choose the model and groundingModel rationale, Context architecture, Grounding decision
Prototype itWorking prototype, Prototyping checklist, Edge-case list
ShipBuild the guardrailsInput and output guardrails, Escalation path, Rollout checklist
Build the eval suiteRegression evals, Pass and fail thresholds, Release gate
Earn trust when unsurePreview and undo design, Source and confidence display
Set the cost and speed budgetLatency and cost budget, Routing and caching plan
TrackWatch behavior in productionQuality dashboard, Metrics taxonomy, Session-review habit
Catch the driftDrift alerts, Regression gate on model changes
Feed it back into ShapeNew eval cases, Ranked list of contract fixes
Continuous OperationsGovern the knowledgeCuration policy, Refresh cadence, Conflict rules
Govern access and safetyAccess-as-behavior rules, Safe-refusal patterns
Supervise the agentsSupervision design, Iteration caps, Reliability budget
Build the teamHiring rubric, Org change plan

Each activity below expands its row. The bold term in every bullet is the output named in the table.

Shape: decide how the system behaves, and make it real

Shape is the move that changed the job. You are no longer specifying features. You are specifying behavior, and a probabilistic model behaves differently every time unless you give it a reason not to.

Frame the problem. Decide what the model should do, and what good looks like, before you build.

  • Discovery guide: the questions you ask real users to tell whether AI is the right answer, not just a possible one.
  • Problem frame: who the user is and what failure looks like for them.
  • Definition of good: the plain bar for behavior you will hold the product to.

Write the behavior, then the prompt. Put the behavior in a contract, then build the system prompt that enforces it.

  • Behavior contract: what the model must do, must never do, and one example of each.
  • Starter system prompt: its role, boundaries, tone, and output format, versioned like code.
  • Escalation rules: the triggers that hand a conversation to a human.

Choose the model and grounding. Pick the model as a product decision, then decide how it gets facts it was not trained on.

  • Model rationale: the model you chose and why, weighed on capability, cost, latency, scale, data sensitivity, and modalities. Most products route: a frontier model for the hard path, a smaller one for simple volume.
  • Context architecture: what gets retrieved and what stays in the prompt.
  • Grounding decision: prompt context, retrieval, fine-tuning, or a mix. They are layers, not alternatives.

Prototype it. Build a real version yourself in a day, and let it find the cases your spec missed.

  • Working prototype: a real version you built yourself, run against messy real inputs.
  • Prototyping checklist: the steps you reuse on the next idea.
  • Edge-case list: what it got wrong (the blurry plate, the three foods, chicken or turkey), fed back into the contract.
Shape, the activities Shape breaks into four activities: frame the problem, write the behavior, choose the model, and prototype it. SHAPE 1 Frame the problem what good looks like 2 Write the behavior contract + prompt 3 Choose the model model + grounding 4 Prototype it build, find edge cases Decide how the system behaves, and make it real.
The activities inside Shape.

Ship: put it in front of a human, behind guardrails, at a workable cost and speed

AI failures are quiet. A confidently wrong answer looks exactly like a right one, so a bad nutrition number just tells a parent their underfed kid is fine. Ship is where you defend against the failures users will never report.

Build the guardrails. Decide what the model may never do on its own, and enforce it.

  • Input and output guardrails: filter inputs for private data and prompt injection, and validate outputs against the schema and content rules.
  • Escalation path: anything irreversible or high-stakes goes to a human first.
  • Rollout checklist: the tripwires that halt a release.

Build the eval suite. Measure that it works, because you cannot catch quiet failures by eye.

  • Regression evals: a small set of known-good cases you run on every change, plus automated faithfulness and safety checks.
  • Pass and fail thresholds: the bar a change must clear, on a rubric you own.
  • Release gate: nothing ships until it clears, backed by a weekly human sample that catches new failure modes.

Earn trust in the uncertain moments. Design those moments instead of hiding them.

  • Preview and undo design: preview what an action will do, confirm the irreversible ones, and keep undo cheap.
  • Source and confidence display: show where an answer came from and how reliable it is, not one flat tone for everything.

Set the cost and speed budget. A good response is relevant, consistent, appropriate, affordable, and fast. You cannot maximize all five.

  • Latency and cost budget: the speed the experience needs and the cost a request may run.
  • Routing and caching plan: cap output length, cache repeats, route easy work to cheaper models, and stream for perceived speed.
Ship, the activities Ship breaks into four activities: build the guardrails, build the eval suite, earn trust when unsure, and set the cost and speed budget. SHIP 1 Build the guardrails filter, validate, escalate 2 Build the eval suite evals + release gate 3 Earn trust when unsure preview, sources, undo 4 Set cost & speed budget latency + routing Put it in front of a human, behind guardrails, at a workable cost and speed.
The activities inside Ship.

Track: find out whether it is behaving, and catch what users never report

People do not file bugs against a quietly wrong AI. They lose trust and leave. Track is where you go looking for the failures they never send you.

Watch behavior in production. Measure what matters once real people are using it, and read real sessions.

  • Quality dashboard: accuracy, faithfulness, latency, cost, and safety, live.
  • Metrics taxonomy: product signal kept separate from model behavior.
  • Session-review habit: read a sample of real sessions. The numbers say something is wrong, the sessions say what.

Catch the drift. The model moves under you, so watch for slow decay.

  • Drift alerts: a warning when a metric slides while the product still looks fine from the outside.
  • Regression gate on model changes: re-run the eval suite on every model update before it reaches users.

Feed it back into Shape. Turn every failure you catch into the next turn's work.

  • New eval cases: each real failure turned into a test you keep.
  • Ranked list of contract fixes: what Shape changes on the next turn of the cycle.
Track, the activities Track breaks into three activities: watch behavior in production, catch the drift, and feed it back into Shape. TRACK 1 Watch in production dashboard + sessions 2 Catch the drift alerts + regression gate 3 Feed back into Shape new evals + fixes Find out whether it is behaving, and feed what you learn back into Shape.
The activities inside Track.

Continuous Operations: the umbrella across every turn of the cycle

This is not a fourth step in the sequence. It is what keeps the cycle running once you have a team and a portfolio, not one person and one product. It follows from the fact that makes the cycle a cycle: a probabilistic system is never finished, so operating it is never finished either.

Govern the knowledge. A retrieval system is only as good as what it is allowed to read.

  • Curation policy: what is eligible to be indexed.
  • Refresh cadence: how often sources update, and how stale content gets flagged.
  • Conflict rules: how a new source wins over an old one it contradicts.

Govern access and safety. Tell the system what each person is allowed to see.

  • Access-as-behavior rules: what each person may see, treated as a product decision, not just infrastructure.
  • Safe-refusal patterns: what the system says when asked for what someone cannot have, without leaking that it exists.

Supervise the agents. Once a product acts on its own, every weakness compounds.

  • Supervision design: a human or an explicit check over anything that acts on its own.
  • Iteration caps: limits, loop detection, and give-up criteria so an agent cannot run forever.
  • Reliability budget: a per-step error bar, because ten steps at 95% each finish near 60%.

Build the team. A cycle only you can run is not yet a practice.

  • Hiring rubric: how you hire for the new craft.
  • Org change plan: how you bring the wider organization to the new bar.
Continuous Operations, the activities Continuous Operations is the umbrella across every turn of the loop. Its four activities are govern the knowledge, govern access, supervise the agents, and build the team. CONTINUOUS OPERATIONS RUNS ACROSS EVERY TURN OF THE LOOP Govern the knowledge curate, refresh, resolve Govern access access as behavior Supervise the agents checks, caps, budget Build the team hiring + change Continuous Operations: always on, across Shape, Ship, and Track.
Continuous Operations, the umbrella over every turn.

Where to start

You do not read this end to end. Start where the work is hardest right now.

  • You cannot say what good behavior is? Start in Shape.
  • It seems to work, but you cannot prove it? Start in Ship.
  • You suspect it is sliding but cannot see how? Start in Track.
  • It works for you and breaks for everyone else? Start in Continuous Operations.

Sources and further reading

This sits alongside the Founding Essays. The discipline and the fundamentals are in "AI-Native Product Management Is a Discipline." The cognitive science behind Ship and Track is in "The Mind on the Other Side of the Model." The three-move practice is in "Shape · Ship · Track."

  • Retrieval-augmented generation: Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Facebook AI Research, 2020). The naive-to-advanced-to-modular progression: Gao et al., "Retrieval-Augmented Generation for Large Language Models: A Survey" (2023).
  • The agent that senses, plans, and acts: from robotics and autonomous-systems research, applied here to language-model agents. OpenAI's five levels of progress toward AGI: reported by Bloomberg (2024).
  • Deeper pieces on system prompts, model selection, evals, and agent reliability follow this one, each with its own citations.

FuelTheFam is real and live at fuelthefam.com.