A new user opens your AI product for the first time and finds an empty input box. She types the kind of question she would put into a search bar, the closest thing she has used before. The answer comes back fine. She files the product under "chatbot," closes the tab, and moves on.
She never learns that the same product could have drafted the report she then wrote by hand, checked it against her team's style guide, and turned it into slides. The capability was there the whole session, and nothing on the screen said so.
She did not skip the manual, because there was no manual. The product asked her to guess, she guessed from the tools she already knew, and the guess undersold it by an order of magnitude. This chapter turns the fourth recommendation of our essay The Human Factors into practice: show people what the system can do.
Why people guess wrong about a new AI product
Nobody works out a new tool from first principles. We reach into long-term memory for a stored pattern that looks similar and act on it. A classic illustration is the frog that starves among dead flies but snaps at a buzzing one, because the moving fly matches a stored pattern for food and the still flies match nothing. People run on the same machinery with far larger libraries. A mental model, the internal sketch of how a thing works that experience builds, is why an expert operates a familiar tool without conscious thought.
With AI, the stored pattern misleads in a predictable way, because your users now carry two competing mental models for software. The older one, built over decades of literal software, expects the same input to produce the same output and every capability to be listed in a menu. The newer one, built in the last few years, expects to state an intent in plain language and get the result back. Which of the two a user reaches for is decided by what your interface looks like, just as the moving fly rather than the still ones set off the frog.
A blank input box shows nothing about what is possible, so the user simply acts on whichever stored pattern comes to mind first. Studies of non-experts writing prompts show how that goes wrong: people instruct the system the way they would instruct another person, and they judge the whole product from a single success or failure. Researchers studying LLM products call the result a gulf of envisioning, where users do not know what to ask for, how to phrase it, or what to expect back.
A second failure costs you your best users. One of the graduate research papers behind this part reviewed a high-end camera aimed at photographers upgrading from the maker's entry-level line, and its findings carry straight into AI products.
- Reversing a stored reflex breaks trust. The camera reversed the rotation direction of its main control dials, so the stored sense of which way to turn for a higher value was suddenly wrong on the controls those photographers used most. They had to relearn a reflex that had been automatic for years, and the paper concluded the design had cost the product their trust, not just their speed.
- Experts carry a never-do list. The same review drew on negative knowledge, an expert's hard-won sense of what is wrong and what must be avoided. The camera dropped a reading its users considered essential, and without it the rest of the display lost its value.
An AI product hits the same wall when it quietly does something an expert considers off-limits, like sending a message in the user's name or editing work it was not asked to touch. The expert reacts the way those photographers did and stops trusting every other control too.
Put the capabilities on the screen
Treat capability disclosure as interface design, not documentation.
For a new user, whatever the first screen shows is the entire product, so the message about what it can do belongs on the screen itself, not in a help center or a launch post.
Design research has a name for what a person can tell is possible just by looking: the perceived affordance. A capability the screen gives no sign of does not exist for the person sitting in front of it.
Show users where the product stops, not only what it can do. Research on human-AI teams found that people perform better with a system whose errors they can predict than with a slightly more accurate one they cannot read, so where your product fails is part of what you disclose. We go deeper in help people catch wrong answers.
Real products that show what they can do
NotebookLM teaches its own boundary before you can misunderstand it. Google's research tool answers only from the documents you upload, not the open web. The empty state will not let you chat until you add sources, and every answer carries citations back into your own files, so every use reinforces that the tool reads your documents and nothing else.
Duolingo Max names each capability and puts it where it is needed. Instead of one open chat, the app scopes its AI into labeled features. "Explain My Answer" appears as a button after specific exercise types and opens a bounded chat about why you got the exercise wrong, and "Roleplay" offers practice conversations in set scenarios. Users always know what the AI can do and where to find it.
Apple shows the cost of promising a capability the product cannot match. Its iPhone 16 marketing, starting in September 2024, advertised a personal, context-aware Siri and gave buyers a picture of what the product could do that the shipped product never delivered. The features were delayed indefinitely in March 2025, a false-advertising class action followed, and in May 2026 Apple agreed to a 250 million dollar settlement while denying wrongdoing. The gap between what was advertised and what shipped is now something a company can be made to pay for.
How to build capability disclosure into your product
- Rebuild the empty state as a capability map. Replace the bare box with seeded examples that span the range: one obvious task, one intermediate, one nobody would guess. Rotate them so returning users keep learning.
- Mark the door to the full list. If a command menu, palette, or help screen lists everything the product can do, say so on the first screen. A capability list nobody can find might as well not exist.
- Disclose the boundary with the same care as the highlights. Write down the failure modes your users hit most, starting from the failure modes catalog in the Playbook, and place each warning in the product where it applies.
- Collect the never-do list before you ship. Ask your most expert users what they would never let a tool do without asking, such as sending email in their name, overwriting unsaved work, or touching production data. Enforce those lines as guardrails, not as instructions the model may or may not follow.
- Pick the mental model your interface invokes, then keep its promise. If it looks like open conversation it has to handle plain intent, and if it is a literal tool then stop dressing it up as chat.
A 15-minute drill: audit your first screen
Open your product as a brand-new user, with a fresh account and a cleared cache. Without clicking or scrolling, write down every capability the first screen communicates; only what is visible counts. Then write down the five capabilities you most want users to discover, and compare the lists.
The overlap is usually close to zero, and the gap is your finding. Every capability on the wish list that is missing from the screen is a backlog item, and most cost a sentence, an example prompt, or a labeled menu entry. The result feeds one row of the audit workbook in the artifacts library.
Chapter Summary
- People understand a new product by reaching for a stored pattern from an old one, so they guess at your AI product instead of learning it from scratch.
- Those stored patterns fail in two ways: they either undersell what the product can do or expect things it cannot deliver.
- For a new user, the first screen is the whole product, so what it can do has to show on the screen, not in a help center or a launch post.
- A capability the screen never shows is one most users will never find.
- Disclose where the product fails alongside what it does well, because users work better with a system whose errors they can predict.
- Crossing an expert's hard line without asking, like acting in their name or editing work it was not told to touch, costs you their trust in every other control.
- Make the empty state show the range of tasks, mark the door to the full list, and enforce the never-do list as guardrails the model cannot ignore.
- To score your product against this lens and the others, continue to run the human factors audit.
Sources
- Arbib, M. A. (1992). Schema Theory. In Encyclopedia of Artificial Intelligence (2nd ed.). Wiley.
- Bansal, G., Nushi, B., Kamar, E., Lasecki, W. S., Weld, D. S., & Horvitz, E. (2019). Beyond Accuracy: The Role of Mental Models in Human-AI Team Performance. HCOMP 2019.
- Blackler, A., & Hurtienne, J. (2007). Towards a unified view of intuitive interaction. MMI-Interaktiv, 13.
- Gartmeier, M., Bauer, J., Gruber, H., & Heid, H. (2008). Negative Knowledge: Understanding Professional Learning and Expertise. Vocations and Learning, 1(2).
- Gibson, J. J. (1977). The Theory of Affordances. In Perceiving, Acting, and Knowing. Lawrence Erlbaum.
- Norman, D. (2004). Affordances and Design. jnd.org.
- Subramonyam, H., Pea, R., Pondoc, C., Agrawala, M., & Seifert, C. (2024). Bridging the Gulf of Envisioning: Cognitive Challenges in Prompt-Based Interactions with LLMs. CHI 2024.
- Zamfirescu-Pereira, J. D., Wong, R. Y., Hartmann, B., & Yang, Q. (2023). Why Johnny Can't Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. CHI 2023.
- One of the graduate research papers behind this part: a 2013 human factors review of a prosumer camera's controls against the prior knowledge and expertise of its upgrading users.
- Google NotebookLM Help: source-grounded answers and citations.
- Duolingo Max announcement (2023) and OpenAI's Duolingo case study.
- Public reporting on the Apple Intelligence Siri feature delay (2025) and the related false-advertising settlement (2026).