Why AI products need human factors · The Builder's Stack

Get the warning seen

The model under your product got better this quarter, and it will get better again next quarter. The people using your product are not following the same curve. They hesitate before accepting an output, they drift back to the old way of doing the task, and some of them tried the feature twice and never came back. The model keeps getting more capable, while the trust people place in it stays about where it was.

The clearest evidence we know comes from a randomized controlled trial run in 2025 by the research group METR, which put experienced open-source developers on real tasks with frontier AI tools.

What happened: the developers finished 19 percent slower with the tools than without them.
What they believed: that the tools had made them about 20 percent faster.

The bottleneck was not the model. The time the developers lost, and could not feel themselves losing, went into the human side of the work: prompting, waiting, reviewing the output, and deciding whether to trust what came back.

This part of The Practice is about that human side. Human factors is the discipline of designing for how people actually perceive, remember, decide, and verify, and the chapters ahead turn it into tools you can run against your own product.

Where this material comes from

Our essay The Human Factors makes the argument this part is built on: the user is a cognitive system with real, documented limits, and the AI products that win are the ones designed for those limits. The essay makes the case, and these chapters show you what to build.

The material has a specific provenance. One of us spent two years in a human factors graduate program writing research papers that applied cognitive science to real products, among them a stock-trading interface, a dense e-commerce homepage, a professional camera, and photo-editing software. Every paper ran the same discipline:

Establish the science.
Apply it to the shipping design.
Separate what the product got right from what it got wrong.
End with fixes a team could build.

The chapters in this part run that same discipline on AI products.

Those papers predate modern AI, and that is exactly why they still hold. They describe the half of the system that does not change. Models get retrained every few months, while the human visual system, the limits of working memory, and the way people store and retrieve what they know have not changed in fifty thousand years.

The four lenses: perception, working memory, mental models, metacognition

This part stands on four bodies of research, and each one becomes a lens you can hold up to your own product.

Perception. Your visual system makes a fast, automatic pass over every screen in roughly the first fifth of a second, before you consciously read a word. That pass is pre-attentive processing, and it registers only a few features, such as color, size, orientation, and motion. Whatever fails to stand out in that pass may never be noticed at all.

Working memory. The mental workspace for whatever you are actively thinking about holds only a handful of items, Miller's seven plus or minus two, and it is a single limited pool. Attention draws on it, anxiety draws on it, and a long AI session draws on it hardest of all.

Mental models. Nobody meets your product cold. Users arrive with a mental model assembled from every tool they have used before, and that stored knowledge makes them fast when your design matches it and lost when it does not. The model people bring to AI products is usually wrong in specific, predictable ways.

Metacognition. Metacognition is the skill of monitoring your own thinking, knowing what you do not know, and deciding when to verify. It is the most important skill a user brings to an AI product, and most AI products give it nothing to work with.

The chapters on anxiety and supervision build directly on these lenses. Anxiety matters because it consumes the same limited pool that working memory runs on, and supervision extends metacognition to systems that act on their own, where the checking has to come from a person.

What each chapter covers

The essay closes with six recommendations, and each one gets a chapter here. Every chapter follows the same arc: a concrete failure moment, the science behind it, the recommendation as an operating rule, a real shipping product examined closely, moves to operationalize the rule in your own product, and a drill you can run in fifteen minutes.

Perception: make the warning impossible to miss. If a critical signal does not register in the first glance, for the user it does not exist.
Working memory: keep the session on the screen. Session state belongs in the interface, not in the user's head.
Anxiety: lower the stakes at risky moments. Worry consumes the exact attention a risky moment requires, so make actions previewable and reversible.
Mental models: show people what the system can do. A blank prompt box teaches nothing, and a capability nobody can find might as well not have shipped.
Metacognition: help people catch wrong answers. A confident wrong answer does the most damage to the user least equipped to spot it, so verification has to be cheap.
Supervision: keep a human in charge of the agent. The model cannot reliably detect its own failures, so the check has to live outside it.

The examples come from shipping products across categories, from writing assistants and meeting tools to support bots and coding agents, and the failures get covered as honestly as the successes.

The capstone, Run the human factors audit, assembles all six recommendations into one checklist you can run against your own product in about an hour. The checklist ships as a downloadable, fillable PDF in the artifacts library.

How to read this part

You do not need a cognitive science background. Every concept is introduced in plain language the first time it appears, and the studies behind each chapter are listed in its Sources section if you want to go deeper.

Read the chapters in order for the full build, lens by lens, or jump straight to the one your product is bleeding on:

If users keep missing warnings you are sure are visible, start with perception.
If they lose the thread ten turns into a session, start with working memory.

The chapters reference each other, but each stands alone.

If you have not yet built or run anything with AI in the loop, go through Working with AI as a builder in Foundations first, because this part assumes you have a product, or at least a prototype, to point these tools at.

The one idea that runs through every chapter

The user is a cognitive system with hard limits, and most AI products are designed as if they are not.

If you keep one sentence from this part, keep that one. Your model has a context window with a published size, an engineering team tuning it, and a dashboard tracking its use. The person on the other side of the screen has a context window too, made of a working memory that holds about seven items, a visual system that commits in a fifth of a second, and attention that narrows exactly when the stakes rise. Nobody publishes that spec and nobody on most teams owns it, which is why designing for it is the clearest advantage left in AI products.

The model's half of the system gets stronger every quarter without your help. The human half is fixed, and designing for it is the work this part teaches you to do.

Chapter Summary

The model under your product keeps getting better every quarter, but the people using it do not improve on the same schedule.
A 2025 controlled trial showed developers working 19 percent slower with AI tools while believing they were 20 percent faster, so the real cost was on the human side and they could not feel it.
Human factors is the study of how people actually perceive, remember, decide, and verify, and this part turns it into tools you can run against your own product.
This material comes from graduate research that applied cognitive science to shipping products, and it still holds because it describes the half of the system that does not change.
The part stands on four lenses you can hold up to any product: perception, working memory, mental models, and metacognition.
The one idea to keep is that the user is a cognitive system with hard limits, and most AI products are built as if they are not.
The model half of the system gets stronger on its own, while the human half stays fixed, so designing for the human half is the clearest advantage left.
You can read the chapters in order for the full build, or jump straight to the lens your product is struggling with most.
This part operationalizes our essay The Human Factors, so start there for the argument, then work through the chapters and run the audit at the end.

Sources

The Human Factors, the essay this part operationalizes.
Becker, J., Rush, N., Barnes, E., & Rein, D. (2025). Measuring the impact of early-2025 AI on experienced open-source developer productivity. METR. arXiv:2507.09089.
Treisman, A. (1986). Features and objects in visual processing. Scientific American, 255(5).
Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2).
Baddeley, A. D., & Hitch, G. (1974). Working memory. In The Psychology of Learning and Motivation (Vol. 8). Academic Press.
Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11(1).
Norman, D. A. (2013). The Design of Everyday Things. Revised edition. Basic Books.
Flavell, J. H. (1979). Metacognition and cognitive monitoring. American Psychologist, 34(10).
The graduate research papers behind this part (2013 to 2015), applying vision science, working memory, mental models, and metacognition to shipping products.

Marks this chapter complete on your course map. Reaching the end does this for you.