This is a working checklist of what an AI-native product manager owns, grouped by the three moments of ownership: Shape, decide what the product does; Ship, prove it before release; and Track, own it once it is live. Left alone, these decisions get made by accident, by whoever is closest to the build, and show up later as incidents. Run the map against your own product to find what is running unowned before an incident finds it for you.
Each item below comes with the move that delivers it and a plain test for done. The test matters more than the move: if you cannot finish the sentence "done when," you do not own the item yet, you are hoping for it.
Shape: decide what the product does
Shape is where behavior gets chosen, before a line of integration code is worth writing, because everything downstream inherits these four.
| You own | The move that delivers it | Done when | Learn it |
|---|---|---|---|
| The behavior contract | Write the rules in plain words: what it does, what it refuses, the tone, the format. Treat the system prompt as a spec, not a vibe. | A teammate can read the contract and predict the product's behavior without running it. | Prompting is engineering, not wording |
| The model and the fallback | Pick a model on cost, latency, and quality for this job, then decide what happens the moment it runs past its depth. | You can point to the cost, latency, and quality numbers behind the model you chose, and the escalation path for when it runs past its depth is written down before launch. | Choose a model you can live with |
| Grounding | Feed it the specific facts your product needs, and bound what it is allowed to state as true. | Every factual answer traces to a source you control, not to the model's training. | Give the model the facts it wasn't trained on |
| The action surface | List every tool the product can call, place each on the autonomy ladder, and decide whose authority it borrows for each one. | No action runs at higher autonomy, or on wider access, than you chose for it on paper. | The action surface: every tool is delegated authority |
Ship: prove it before release
Shaping behavior is a claim. Shipping is where you make the claim testable, so a release becomes a decision backed by evidence rather than a hope backed by a good demo.
| You own | The move that delivers it | Done when | Learn it |
|---|---|---|---|
| The quality bar | Break "good" into a few qualities you can judge one at a time, with a pass line for each. | "Is this good?" has a yes-or-no answer that two reviewers would reach the same way. | The quality bar: decide what good means |
| The eval set | Gather real and adversarial inputs into a set that samples production, not the happy path you already believed in. | The set holds the inputs that have actually broken things, not just the ones you expected, and each case traces to the transcript or attack it came from. | Test cases: build the set that samples reality |
| The regression gate | Make every change clear the set, so a tone fix cannot quietly break a refund quote three cases away. | No change ships without passing the gate, and the gate lives in the pipeline, not in someone's memory. | The regression gate: no change ships blind |
| The human factors of doubt | Show what the product is unsure about, make the warning impossible to miss, and keep an undo on every risky action. | A first-time user can tell when the product is guessing and can reverse a mistake in one step. | Perception: make the warning impossible to miss |
Track: own it once it is live
A shipped behavior drifts. Models get swapped, prompts get edited, inputs change, and the product you tested is not the product running a month later. When an AI coding agent deleted a company's production database during a stated change freeze, the gap was not a smarter model, it was a blast radius and a stop that no one kept watching. Track is the ownership that does not end at launch.
| You own | The move that delivers it | Done when | Learn it |
|---|---|---|---|
| The stop still works | Halt a live run yourself after each model or prompt change, and confirm one wrong action still cannot reach past what it was scoped to. | You have stopped a real run mid-flight since the last change, not just at launch, and a bad turn still touches only the one thing it was scoped to. | Blast radius: bound what one turn can touch |
| Production signals, fed back | Read real transcripts on a schedule, sort failures into a few buckets you can count, and turn each caught failure into a permanent eval case. | You can name your top failure modes by volume from this week, and the latest incident is already a case in the set. | Production signals: evals after the ship |
| Monitoring and the bill | Wire alerts for when it breaks and a ceiling for what it spends, before either one surprises you. | You hear about an outage or a runaway bill from a page, not from a customer or an invoice. | Monitoring, how you know it broke |
| The trust posture | Decide what the product may do, on whose behalf, and with which data, then write it where the on-call engineer can find it at 3 a.m. | The boundaries are one document, not tribal knowledge, and your public claims do not promise more than your controls deliver. | Write your Security Posture and ship defended |
Track is the row most teams underbuild, because the demo is over and the dashboards look quiet. It runs as a loop, not a launch step: a signal from production names a failure, the failure becomes a bucket, the bucket becomes an eval case, and the gate keeps that case from ever regressing.
Try it this week
Open the map against one feature you have already shipped. For every row, write two things: the name of the person who owns it, and the date you last checked its "done when." The blank rows are your backlog, ranked by what the failure would cost. Two blanks are worth hunting for first: an autonomous action with no stop, and a factual answer with no source. Each one is cheap to close now and expensive to clean up later.
Where this goes next
This map is the index; the Builder's Stack chapters in the last column are the depth behind each row. For the operating loop the three groups come from, read Shape Ship Track, and for why owning behavior is the job rather than acquiring a faster set of tools, read AI-Native Product Management Is a Discipline. The fillable versions of the documents this map points to, the Agent Charter, the Quality Bar, and the Security Posture, live in the Builder's Stack artifacts.
Sources
- Reporting on an AI coding agent that deleted a customer's production database during a stated change freeze, and the platform's response (July 2025): https://www.theregister.com/2025/07/21/replit_saastr_vibe_coding_incident/
- The three groups and the chapter depth come from our Shape Ship Track loop and the practice parts of The Builder's Stack.