Skip to content
AI-Native PM
The work

Founding Essay · 02

The Human Factors

The AI-native PM's job is to shape how the system behaves. This essay is the toolkit for shaping it around the human on the other side, drawn from the cognitive science of how people perceive, remember, and decide.

16 min read · Design AI products using cognitive science

Reading mode
The recommendations, without the research.

From 2013 to 2015 I was in Bentley's research-intensive Human Factors in Information Design program, where I wrote a few papers on how people perceive, remember, and make sense of software. The products I studied were ordinary, and none of them had anything to do with AI. I keep reaching for those papers now, because the cognitive science in them is the most practical guide I know to building AI products that fit the people who use them. This essay is the outcome of applying that research to AI products.

The AI-native PM's job is to shape how the system behaves. The question is what to shape it toward, and the answer is the human on the other side, a cognitive system with real, well-mapped limits. The science of those limits is a working toolkit for the job.

The following ideas from that research carry most of the weight here:

  • Pre-attentive processing is the fast, automatic pass your vision makes in about the first fifth of a second, before you consciously read anything.
  • Working memory is the small, fragile store for what you are thinking about right now, the same limited pool your attention and your anxiety both draw from.
  • Prior knowledge and mental models are the structures you have built over years that let you understand a new tool by analogy to old ones.
  • Metacognition is your thinking about your own thinking: the supervisor that watches whether you are right and decides when to change course.

Each of the recommendations below is drawn from this research.

The ideas Independent lenses on the person using an AI product: pre-attentive processing, working memory, mental models, and metacognition. They are not a sequence. THE IDEAS 1 Pre-attentive what the eye grabs first 2 Working memory the small pool you think in 3 Mental models the priors you bring 4 Metacognition knowing when to doubt Independent lenses on the person using your product. Any one can be the thing you got wrong.
Independent lenses from the research.

The recommendations

These hold wherever a human meets a model: a chat box, an autonomous agent, a recommendation feed, an AI feature buried in another product.

1
Make the warning impossible to miss.

The eye sorts a screen in a fraction of a second, a chat answer, a report, a code suggestion, a number on a dashboard, all of it. Put the warning, the confidence, and the source where the eye lands first: in color, in weight, in an icon. A model's "I'm not sure about this" buried in the same gray as everything else might as well not be there.

One cue, seen first A row of identical gray dots with one larger teal dot that stands out. A pre-attentive cue is seen before any text is read. PERCEPTION One cue is seen before a word is read.
2
Don't make users hold the session in their head.

Ten steps into an agentic run, or ten turns into a chat, the user has lost track of what the system knows and what it has changed. Keep the working set visible, flag the model's uncertain outputs, and make undo one step.

The session outgrows working memory A long bar for the session or context window, and below it a short seven-segment bar for the user's working memory. The session outgrows what the user can hold. WORKING MEMORY THE SESSION / CONTEXT WORKING MEMORY (ABOUT 7 THINGS) The session outgrows what the user can hold.
3
Lower the anxiety at the high-stakes moments.

When the stakes are high, people get anxious, and anxiety eats the very attention they need to think. It bites hardest in agentic moments: just before the agent sends the email, moves the money, or deletes the record. Make the model's behavior predictable, preview what it will do, confirm anything irreversible, and keep undo cheap. A calmer moment leaves the user more mind for the decision.

One pool: the task and the worry Two bars of equal width. At low stakes most of the pool is capacity for the task; at high stakes worry takes most of it, leaving little to think with. WORKING MEMORY LOW STAKES TASK WORRY HIGH STAKES TASK WORRY Higher stakes, more worry, less left to think with.
4
Tell users what the system can do.

A blank prompt box tells the user nothing, and neither does an agent with hidden powers or a feature no one finds. Show what the system can and cannot do, and seed real examples. Never quietly do the thing an expert would never do; that is where trust dies fastest. And the bar keeps rising: people who live in AI now expect any software to understand plain intent.

Two schemas, one user Two boxes: an old schema where software is literal, and a new schema where AI understands intent. The interface decides which one fires. MENTAL MODELS OLD SCHEMA Software is literal NEW SCHEMA AI gets what I mean vs The same user carries both. Your interface picks which fires.
5
Help users tell when you're wrong.

A confident wrong answer is most dangerous to the people who don't know the domain, because they can't catch it. It gets worse when the system acts on that answer: an agent running a whole plan off a wrong premise, a recommendation engine pushing the wrong thing. Show sources, show real confidence levels, make checking cheap, and flag when the user is past what they can judge.

The sensemaking loop A loop between evaluating output and domain knowledge: each requires the other, so someone who has neither cannot catch a confident, wrong answer. METACOGNITION Evaluate the output Domain knowledge requires comes from Confident, wrong output is most dangerous to the novice.
6
Don't let the model be its own boss.

A model can't reliably detect when it is wrong or when its own plan has stopped working, so the supervisor has to be a human or an explicit external check. This matters most once a product is agentic. An agent working toward a goal needs someone who can see the plan before it runs, watch it as it runs, and stop or redirect it in one step.

A supervisor from outside the model A supervisory layer (human, critic, eval, guardrail, checkpoint) sits above the model, which has execution but no boss of its own. METACOGNITION THE SUPERVISOR · FROM OUTSIDE THE MODEL human · critic · eval · guardrail · checkpoint THE MODEL · execution, no boss of its own A model can't supervise itself. The boss comes from outside.

The research behind each recommendation

1. The first two hundred milliseconds

Before you consciously read anything, your visual system has already swept over the screen. This is the pre-attentive stage, named by Feature Integration Theory, and it catches only a few features: color, size, orientation, motion. It runs, in one description I keep coming back to, "autonomously, involuntarily, nearly effortlessly, prior to and even in the absence of conscious awareness."

A uniform wall of AI output, a chat answer, a generated document, a crowded dashboard, gives that first pass nothing to grab. Worse, under heavy load the visual system suppresses detail rather than just ignoring it: a busy display measurably lowers detection of things clearly visible on screen, an effect called load-induced blindness. A dense answer does not just bury the warning, it can make the user unable to see it.

The first two hundred milliseconds Two grids of identical gray dots. On the left, a warning is lost among them. On the right, one dot is rendered in a pre-attentive cue (color and size) and pops out immediately. THE FIRST 200 MILLISECONDS Buried in the same gray One cue, seen at a glance A pre-attentive cue is grasped before the text is read. The same gray as everything else is invisible.
One cue, seen before the rest.

2. The conversation is heavier than it looks

Working memory, the store for what you are actively thinking about, is famously small: the magical number seven, plus or minus two. Cognitive Load Theory splits its load three ways: intrinsic (the task's own difficulty), extraneous (difficulty added by how it is presented), and germane (the real work of understanding). Design exists to cut the extraneous kind.

A long agentic run or a multi-turn chat piles on extraneous load. The user has to hold the whole thread: what they said, what the model has been given, what it has not, and what it will do next. The product quietly assumes their working memory is as large as the model's context window. It is not, and that gap is where users get lost.

Working memory vs the context window A long bar represents the model's effectively vast context window. A short bar of about seven segments represents the user's working memory. The product assumes they are the same size; they are not. WORKING MEMORY VS THE CONTEXT WINDOW MODEL'S CONTEXT WINDOW effectively unlimited YOUR WORKING MEMORY about seven things The product assumes these are the same size. They are not, and the gap is where users get lost.
The session outgrows working memory.

3. When the stakes rise, anxiety eats the memory you need

Working memory does not just hold things, it shapes what we perceive. Perception is the brain's work, not the eye's, which is why one patient could see a glove perfectly yet describe it only as "a continuous surface, infolded on itself" with "five outpouchings." At the center of working memory is executive attention, a mechanism of fluid intelligence, and it is one fixed, shared pool. That is the catch, because emotion draws on the same pool.

Anxiety is worry about your own performance; it rises as your confidence falls and peaks at maximum uncertainty. At high arousal the mind spends its limited capacity on the worry itself, leaving less for the task. So the higher the stakes, the more anxious the user, and the less working memory they have left exactly when they need it most.

AI products manufacture exactly these moments. Is this answer right? What is the agent about to do to my files, my money? At the sharp end of that uncertainty, the user is running on a shrunken working memory. So shaping the model's behavior is partly emotional design: make what it will do predictable, preview it, confirm the irreversible, keep undo cheap. Lowering the stakes is not hand-holding, it hands the user back the attention the moment is stealing.

One pool: the task and the worry Two bars of equal total width represent one fixed pool of working memory. At low stakes most of it is capacity for the task. At high stakes anxiety takes most of it, leaving little for the task. ONE POOL: THE TASK AND THE WORRY LOW STAKES HIGH STAKES Capacity for the task Anxiety One fixed pool. The higher the stakes, the more anxiety takes, and the less is left to think with.
Anxiety draws from the same pool as attention.

4. The blank box and the model in the user's head

Prior knowledge is stored as schemas, and a schema fires when something matches it. In one of my papers I used an image I still love: a frog surrounded by dead flies will starve, yet it snaps at a single buzzing fly and flees a large moving shape, because a small moving thing trips its food schema and a large one its escape schema. Same frog, opposite reactions. People work the same way, with far richer schemas, making sense of a new tool through the metaphors and mental models that similar tools left behind.

With AI, the schema that fires is usually the wrong one. People reach for their model of ordinary software, same input, same output, and meet a system that answers differently every time. A blank prompt box affords nothing visible, telling the user nothing about what it can do. Part of the job is to build the new schema for them, with example prompts and plain disclosure of what the system can do.

The reverse has started to bite, and it is the part I would watch. AI is writing a new schema of its own. People who live in these tools now expect any software to take plain language and hand back the result, not the steps, and older products feel broken to them. The same user carries both schemas, the old literal one and the new AI one, and your interface decides which fires. A product that looks like a chat but behaves literally trips the wrong one, and so does a traditional product that ignores the new expectation.

One kind of schema is unforgiving: negative knowledge, the hard-won sense of what not to do that is half of real expertise. When a tool quietly does the thing an expert knows never to do, it breaks trust, because it crossed a line the user knew was there even when the model did not.

Two schemas, one user The same user carries two mental models: an old one where software is literal and deterministic, and a new one where AI understands intent. The interface decides which one fires. TWO SCHEMAS, ONE USER The same user carries both. Your interface decides which one fires. OLD SCHEMA Software is literal: same input, same output. NEW SCHEMA AI understands what I mean. vs A product that looks like a chat but behaves literally trips the wrong one.
Which mental model the interface fires.

5. Knowing when to doubt the answer

Metacognition is thinking about your own thinking, the boss function that decides where to spend limited mental effort. The most important skill anyone brings to an AI product is exactly that: knowing what you do not know, and when to check.

Why it is so hard with AI has a name, the sensemaking paradox: you need domain knowledge to judge the output, but judging the output is how you build domain knowledge. For an expert, fine. For a novice it is a trap: they cannot tell the AI is wrong precisely because they lack the knowledge it is meant to supply. This is how lawyers have filed briefs full of cases the model invented, fluent and confident and wrong, aimed at the person least able to catch it. Most products make it worse by giving every answer the same flat confidence.

The sensemaking paradox A loop between two boxes: evaluating the output requires domain knowledge, and domain knowledge comes from evaluating output. Someone who has neither cannot enter the loop. THE SENSEMAKING PARADOX Evaluating the output Domain knowledge requires comes from You need domain knowledge to evaluate, but evaluating is how you build it. Confident, wrong output is most dangerous to the person least able to catch it.
The sensemaking loop.

6. The model cannot be its own boss

Metacognition is not only monitoring. On its regulatory side, the boss function plans the approach, watches the work as it runs, and drops a strategy once it stops paying off. It has been called "a higher-order agent overlooking and governing the cognitive system, while simultaneously being part of it."

There is an old catch in that phrase, Comte's paradox: "one cannot split one's self in two, of whom one thinks whilst the other observes him thinking." Nothing fully supervises itself from the inside, and a language model is a sharp example. It cannot reliably detect when it is wrong or when its plan has stopped working. It has execution without a boss.

So the supervisor has to come from outside the model, a human or explicit scaffolding: a critic, an eval, a guardrail, a checkpoint. This turns urgent the moment a product becomes agentic and starts acting on its own. An agent running a bad plan with no one watching is among the most expensive failures in AI today. The judgment a model cannot bring to its own work is exactly what the product has to supply around it.

Execution without a boss The model is execution with no boss of its own; it cannot supervise itself. A supervisory layer from outside the model, a human plus critics, evals, guardrails, and checkpoints, watches and governs it. EXECUTION WITHOUT A BOSS THE SUPERVISOR, FROM OUTSIDE THE MODEL human · critic · eval · guardrail · checkpoint THE MODEL execution, but no boss of its own A model cannot supervise itself. The boss has to come from outside, most of all once the product is agentic.
A check that sits outside the model.

The best AI products are already built this way

None of this is theoretical. The clearest sign that human factors matter more now, not less, is that the teams building the strongest AI products are spending more on them as their models get more capable. The better the model, the more the hard part shifts to the human deciding whether to trust the output, and the best teams design for exactly that.

Anthropic's Claude is a useful example, because it is not one product but several on the same model. That model sits under the chat, under Claude Code for engineers, and under Cowork for everyday knowledge work, yet each one meets a different human doing a different job, and the affordances change to match. Three products instead of one universal chat box is the mental-models principle in practice: a coding task, a research question, and a folder of files each trip a different mental model, so each gets its own tools instead of one blank box pretending to do everything.

Look closer and the rest of the list is there too. Cowork keeps a running task list and loops you in as it works, sparing your working memory so you never hold ten steps in your head. Claude Code shows its plan before it runs and its diffs before they land, and it asks before the irreversible thing, easing the working-memory load at the sharp moment and supplying the metacognition the model lacks, a checkpoint between it and the commit. Across all of them the model shows its reasoning, marks its uncertain outputs, and points back to what it used, the metacognition that helps the human tell when it is wrong.

The pattern repeats at other labs too, in a research tool that shows its sources or a coding assistant that previews its edits. Take the load the model creates, and design it back down for the human who has to act on the output. The companies pulling ahead treat that as the product itself, not as packaging around the model.

One model, every principle Anthropic's Claude runs several products on one model. Each product offers affordances tagged to the human-factors principle it serves: Claude the chat cites what it used and flags uncertainty; Claude Code plans before it runs and asks before it commits; Cowork keeps a running task list and loops the user in to approve. A product per job is itself the mental-models principle. ONE MODEL, MANY PRODUCTS ONE MODEL Claude the chat Cites what it used METACOGNITION Flags what it's unsure of PERCEPTION Claude Code for engineers Plan before it runs METACOGNITION Asks before it commits WORKING MEMORY Cowork for knowledge work Running task list WORKING MEMORY Loops you in to approve METACOGNITION A product per job, not one chat box MENTAL MODELS One model underneath. Each product shapes it to the human doing a different job.
One model, shaped to each job.

The mind on the other side

The products that win the next few years will be the ones designed for the mind that has to use them, with its real and well-documented limits, rather than for an idealized user who has none. That mind is on the other side of every model. It is the part of the system the model has no access to, and it is the part the PM is there to protect.

The mind on the other side On one side of the interface sits the model, a grid of components. On the other side sits the mind that has to use it, which the model cannot see and the PM is there to protect. THE MODEL THE MIND The mind on the other side is the part of the system the model cannot see. It is the part the PM is there to protect.
The mind on the other side of the model.

Sources and further reading

The cognitive science here is drawn from the canonical literature and from the peer-reviewed papers I wrote applying it to real consumer products at Bentley.

  • Anne Treisman (1985, 1986, 1992), Feature Integration Theory, and John Bargh (1992), on the automaticity of pre-attentive processing.
  • Nilli Lavie and colleagues, including Konstantinou and Lavie, on perceptual load and load-induced blindness.
  • Alan Baddeley and Graham Hitch (1974), the working memory model, and George Miller (1956), "The Magical Number Seven, Plus or Minus Two," Psychological Review.
  • John Sweller, with Paas and Renkl, on Cognitive Load Theory and its intrinsic, extraneous, and germane components.
  • On working memory as executive attention and how anxiety draws on the same capacity: Engle (2002); Liebert and Morris (1967); Atkinson and Feather (1966); Pessoa (2009); Lang, Davis, and Ohman (2000); and Oliver Sacks, The Man Who Mistook His Wife for a Hat.
  • James Gibson, on the ecological theory of affordances, and Don Norman, The Design of Everyday Things (Basic Books, 1988, revised 2013).
  • Kirschner, Clark, and Sweller (2006), on prior knowledge and long-term memory, and Arbib (1992), on schemas and their activation.
  • Gartmeier, Bauer, Gruber, and Heid (2008), on negative knowledge.
  • John Flavell (1979), "Metacognition and Cognitive Monitoring," American Psychologist, and Diane Halpern (1998), on metacognition as the boss function; Veenman, Van Hout-Wolters, and Afflerbach (2006), and Vermunt (1996), on metacognition as a higher-order, regulatory function.
  • Pirolli and Russell (2011), on sensemaking, and Pirolli (1999), on information foraging.
  • On the products in "The best AI products are already built this way": Claude Cowork, Claude Code permission modes, and an overview of Anthropic's 2026 releases.

These ideas are drawn from the papers I wrote between 2013 and 2015, on pre-attentive processing, prior knowledge and mental models, metacognition, and working memory under emotional load. When the academic record page goes live, the papers will be linkable there.