Skip to content
AI-Native PM
The Operating Manual

The Operating Manual · Interactive

The Console

The cycle, broken all the way down. Pick a move, open any activity to see the artifacts it produces, and take the whole thing with you as a checklist you can run.

ShapeShipTrackContinuous Operations

Where to start

You do not read this end to end. Start where the work is hardest right now.

Copy or download the current view as a checklist you can fork into your own docs.

Shape

Decide how the system behaves, and make it real.

What it produces

Discovery guide
the questions you ask real users to tell whether AI is the right answer, not just a possible one.
Problem frame
who the user is and what failure looks like for them.
Definition of good
the plain bar for behavior you will hold the product to.

What it produces

Behavior contract
what the model must do, must never do, and one example of each.
Starter system prompt
its role, boundaries, tone, and output format, versioned like code.
Escalation rules
the triggers that hand a conversation to a human.

What it produces

Model rationale
the model you chose and why, weighed on capability, cost, latency, scale, data sensitivity, and modalities. Most products route: a frontier model for the hard path, a smaller one for simple volume.
Context architecture
what gets retrieved and what stays in the prompt.
Grounding decision
prompt context, retrieval, fine-tuning, or a mix. They are layers, not alternatives.

What it produces

Working prototype
a real version you built yourself, run against messy real inputs.
Prototyping checklist
the steps you reuse on the next idea.
Edge-case list
what it got wrong (the blurry plate, the three foods, chicken or turkey), fed back into the contract.

Ship

Put it in front of a human, behind guardrails, at a workable cost and speed.

What it produces

Input and output guardrails
filter inputs for private data and prompt injection, and validate outputs against the schema and content rules.
Escalation path
anything irreversible or high-stakes goes to a human first.
Rollout checklist
the tripwires that halt a release.

What it produces

Regression evals
a small set of known-good cases you run on every change, plus automated faithfulness and safety checks.
Pass and fail thresholds
the bar a change must clear, on a rubric you own.
Release gate
nothing ships until it clears, backed by a weekly human sample that catches new failure modes.

What it produces

Preview and undo design
preview what an action will do, confirm the irreversible ones, and keep undo cheap.
Source and confidence display
show where an answer came from and how sure the model really is, not one flat tone for everything.

What it produces

Latency and cost budget
the speed the experience needs and the cost a request may run.
Routing and caching plan
cap output length, cache repeats, route easy work to cheaper models, and stream for perceived speed.

Track

Find out whether it is behaving, and catch what users never report.

What it produces

Quality dashboard
accuracy, faithfulness, latency, cost, and safety, live.
Metrics taxonomy
product signal kept separate from model behavior.
Session-review habit
read a sample of real sessions. The numbers say something is wrong, the sessions say what.

What it produces

Drift alerts
a warning when a metric slides while the product still looks fine from the outside.
Regression gate on model changes
re-run the eval suite on every model update before it reaches users.

What it produces

New eval cases
each real failure turned into a test you keep.
Ranked list of contract fixes
what Shape changes on the next turn of the cycle.

Continuous Operations

The umbrella across every turn of the cycle.

What it produces

Curation policy
what is eligible to be indexed.
Refresh cadence
how often sources update, and how stale content gets flagged.
Conflict rules
how a new source wins over an old one it contradicts.

What it produces

Access-as-behavior rules
what each person may see, treated as a product decision, not just infrastructure.
Safe-refusal patterns
what the system says when asked for what someone cannot have, without leaking that it exists.

What it produces

Supervision design
a human or an explicit check over anything that acts on its own.
Iteration caps
limits, loop detection, and give-up criteria so an agent cannot run forever.
Reliability budget
a per-step error bar, because ten steps at 95% each finish near 60%.

What it produces

Hiring rubric
how you hire for the new craft.
Org change plan
how you bring the wider organization to the new bar.