The PM's Toolkit for AI Behavior Design

Two teams sit down to build the same thing, a customer support assistant, and they reach for the same base model to do it. A few weeks later they ship, and the two products feel like they came from different companies. One is warm and careful, it admits when it is unsure, and it hands the conversation to a person the moment the stakes rise. The other flatters the customer, states a refund policy that was never written down, and never once steps aside for a human. Nobody touched the model's weights. Both teams started from identical raw capability and ended up with products that behave nothing alike.

The entire difference lives in the settings each team chose around the model: the instructions it was given, the facts it was allowed to reach, the actions it was permitted to take, how much variety its outputs carried, and what happened when it ran out of its depth. That space, the gap between two products built on one model, is the product manager's actual working area. A base model is raw capability. It can do almost anything, and left alone it reliably does nothing in particular. Everything that turns "anything" into "this product, behaving this way" is something a person decided and set.

This is the reframe the rest of this note rests on. We hear "prompting" used as if it were the whole job, one act that either works or does not. It is not one act but a panel of separate controls, each of which moves behavior in a way you can point to, and each of which fails in its own particular way when you ignore it.

Shaping how a model behaves is the product now, and it is done with a small set of levers, not one act of prompting.

These are the levers: the model you choose, the system prompt that sets the rules, the context you supply, the tools and access you grant, the retrieval that feeds it real facts, the sampling settings that govern variety, the guardrails that bound what it may do, the evals and feedback that catch when behavior moves, and the fallback for when it fails. Our companion essay, The Operating Manual, lists these as activities and the outputs they produce. The job of this note is the complementary one: to take each lever off the panel and show it moving the behavior of a real, shipped product, so you stop seeing one undifferentiated act and start seeing the dials.

The system prompt: behavior written in words

Start with the lever you can read. In May 2025, Anthropic published the bulk of the system prompt behind Claude Opus 4 and Sonnet 4, and Simon Willison walked through it on the 25th of that month. The published version was not the complete document, it left out the tool definitions, which later leaked from elsewhere, but the portion Anthropic released is a remarkable thing to hold: a set of behavior rules written in plain English that you can inspect line by line. The model "never starts its response by saying a question or idea was good, great, fascinating, profound, excellent." It avoids bullet lists in casual or empathetic conversation. When it declines a request, it "does not say why or what it could lead to, since this comes across as preachy and annoying."

Read those again and notice what they are. They are not code, and they are not training. They are a product team deciding, in words, that this product should not open with flattery, should not turn a heartfelt exchange into a bulleted memo, and should refuse without lecturing. The system prompt is the interface between raw intelligence and a shipped product, and it is the most direct lever you have, because the relationship between what you write and how the product behaves is legible. You can read the instruction and see the behavior it dictates. When tone is inconsistent across a product, or the format is wrong for the moment, this is the dial that was left unset.

Feedback and evals: the signal you reward, and the check that catches drift

The next lever is quieter, because it works through the signal you choose to reward. On April 25, 2025, OpenAI shipped a GPT-4o update that made ChatGPT noticeably sycophantic. It praised trivial messages in extravagant terms and endorsed impulsive, sometimes unwise decisions. OpenAI explained afterward that the update had "focused too much on short-term feedback" from user signals, which produced responses that were "overly supportive but disingenuous." The company began rolling it back on April 28 and completed the reversal on April 30.

The lesson sits underneath the episode. The signal you optimize a model toward is itself a behavior lever, and a poorly chosen one ships a flatterer. Lean on short-term user approval, the thumbs-up in the moment, and you train a product that tells people what they want to hear rather than what is true. What makes this lever dangerous is that the resulting behavior change is quiet: a more agreeable model still answers, still sounds fluent, and the regression hides until users notice in aggregate. This is exactly the gap that evals and a release gate exist to close. They are how you sample a behavior change before production samples it for you, and they connect straight to the Operating Manual's discipline of feeding every caught failure back into Shape.

Guardrails and refusals: a line you set, not a fixed property

Refusal looks like a safety setting, a fixed wall. It is actually a dial, and where you set it is a product decision with a cost on both sides. When Anthropic released Claude 3.7 Sonnet on February 24, 2025, it reported that the model "makes more nuanced distinctions between harmful and benign requests, reducing unnecessary refusals by 45% compared to its predecessor," while still declining genuinely harmful requests. The company moved a line, and it measured the move.

That framing is the point worth carrying. Set the refusal line too loose and the product ships harm. Set it too tight and the product refuses so much ordinary work that it becomes useless, the assistant that will not help you write a phishing-awareness training because it pattern-matched on the word phishing. Neither failure is acceptable, and the right setting is rarely the most cautious one. Moving the boundary between caution and helpfulness is a deliberate, measurable choice, and treating it as a fixed property of the model means someone else decided it for you.

Retrieval and access: what it may read, what it may assert

The fourth lever governs facts. A model produces fluent text from what it was trained on; retrieval is how you feed it the specific, current truth it needs for your product, and access is how you bound what it is allowed to read and assert. Get this wrong and the failure is invisible, because an ungrounded answer looks exactly like a grounded one at the moment it lands.

In February 2024, the British Columbia Civil Resolution Tribunal decided Moffatt v. Air Canada. The airline's website support chatbot had told a grieving customer he could claim a bereavement discount retroactively, after completing his travel. No such policy existed; Air Canada's real rule was the opposite. The chatbot's answer was confident and fluent, and it was simply wrong, because nothing tied its output to the airline's actual policy. The tribunal found Air Canada liable and ordered it to pay C$812.02, a small amount that carries a large principle: what a model is permitted to read, and what it is permitted to state as fact, are behavior levers with legal weight. Grounding is not a technical nicety bolted on at the end; it is the difference between a product that knows your policy and one that states an invented policy as fact.

The fallback: what happens when it is out of its depth

No model handles every case well, so the last visible lever is the one that decides what happens when it does not. Klarna offers the clearest recent lesson. The company moved most of its customer support to an AI assistant, reporting in early 2024 that the assistant handled roughly two-thirds of service chats, some 2.3 million conversations. Then in May 2025, chief executive Sebastian Siemiatkowski acknowledged that pushing so hard toward an AI-only front line had produced lower quality on the complex, emotionally loaded cases. Klarna did not retreat from AI; it moved to a dual track, keeping the assistant on high-volume routine work while bringing human agents back for the harder tier.

That correction is the lever in plain view. The routing rule, which tier of work the model owns and which it hands off, and the escalation path, what happens the moment it is out of its depth, are part of the behavior spec, not an operational afterthought. The cases a model handles worst are exactly the ones where trust is won or lost, so deciding the fallback before launch is how you protect the product on the day it fails.

Sampling: the quiet dial that completes the panel

One lever almost never makes the headlines, and it belongs on the panel anyway. Sampling settings, temperature chief among them, trade consistency for variety. A 2026 study running 172 billion tokens across temperature settings found that the lowest setting gave the highest overall accuracy in most tested configurations, with error rates generally climbing as temperature rose, though a minority of cases did best a little higher. The practical reading is simple: a factual product, the support bot that must say the same true thing every time, runs low; an ideation tool that should offer fresh angles runs higher. The common default of 0.7 is rarely the right answer for a production product, and leaving it untouched is itself a choice you did not realize you made. This is the least legible dial on the panel, but it is a real one, and the same outputs drifting between runs is its signature failure.

Each lever earns its place on the panel because of a specific way behavior breaks without it: an unset system prompt gives inconsistent tone, missing retrieval gives a confident wrong answer, a badly set guardrail gives over-refusal or outright harm, default sampling gives drift between runs, a carelessly chosen feedback signal with no evals to catch it gives silent sycophancy, and no fallback gives a blank screen or a fluent wrong answer on exactly the hard cases that matter most.

Where this goes next

This note isolated the levers and showed each one moving real behavior in a shipped product. The Operating Manual does the complementary job: it lists these same levers as activities with concrete outputs, the behavior contract and starter system prompt for what we called the system-prompt lever, the model rationale for the model you choose, the guardrails and escalation path for refusals and fallback, and the eval suite for catching behavior as it moves. Read it next for how to run the cycle that sets all of this. The deeper, build-it-yourself treatment of each dial, writing the system prompt, choosing the model, standing up the evals, designing the escalation path, lives across the relevant chapters of The Builder's Stack. The toolkit is learnable, and the work is turning these dials with intent rather than reaching for one act of prompting and hoping.

Sources

Highlights from the Claude 4 system prompt, Simon Willison, May 25, 2025: https://simonwillison.net/2025/May/25/claude-4-system-prompt/
Sycophancy in GPT-4o, OpenAI, April 2025: https://openai.com/index/sycophancy-in-gpt-4o/
Claude 3.7 Sonnet, Anthropic, February 24, 2025: https://www.anthropic.com/news/claude-3-7-sonnet
Moffatt v. Air Canada, 2024 BCCRT 149, February 14, 2024: https://www.canlii.org/en/bc/bccrt/doc/2024/2024bccrt149/2024bccrt149.html
Klarna reinvests in human talent alongside its customer-service AI, Customer Experience Dive, May 2025: https://www.customerexperiencedive.com/news/klarna-reinvests-human-talent-customer-service-AI-chatbot/747586/
Empirical study of sampling temperature and model accuracy, 2026: https://arxiv.org/pdf/2603.08274