Threat-model your AI feature · The Builder's Stack

Why AI gets attackedWhose keys it holds

The launch review for your support assistant has been circling one question for an hour. Someone asks whether the feature is safe to ship, and the answers pull in every direction: an engineer says the system prompt forbids anything dangerous, a designer remembers a headline about a chatbot tricked into leaking data, and nobody can say whether that could happen here, because nobody can name what there is to leak. The meeting ends with "investigate safety" as its action item, which guarantees the same meeting in two weeks. The argument circles because "safe" has no object yet: safe from whom, protecting what, at which door.

Why attackers love AI products covered why your feature draws this attention. This chapter turns the circular argument into a page you can defend: a threat model. Security teams have run this exercise for decades, and the PM-sized version needs no security degree: it is three questions answered on one page.

What do you hold that someone wants?
Where does outside input get in?
Where does trust change hands?

Name what you hold that someone wants

Start with assets, anything that costs you real money or real trust when it is stolen, corrupted, or abused. An AI feature usually holds five.

User data. Conversations, uploaded files, and stored history are valuable to your users, which makes them valuable to someone else.
Keys and credentials. Model API keys, vendor tokens, and the permissions your feature acts with are worth stealing because they spend as you.
The corpus. The knowledge base your feature answers from cuts both ways: whoever reads it has your private material, and whoever writes to it steers your answers.
The bill. Every model call costs money, so an attacker who can spend your tokens or your compute has found an open wallet.
Your reputation. A screenshot of your product misbehaving travels further than anything your marketing publishes that quarter.

Assets rarely fail alone: the same unguarded door tends to hold several at once, chat history beside credentials beside the bill, and every incident in this chapter has that pattern.

List the doors where outside input gets in

The second question lists the entry points where text or data you did not write reaches your product.

The text box. Whatever the user types goes straight into the model's working context.
Files. Uploads, attachments, and shared documents carry whatever their author put in them, visible or not.
Retrieved pages. Anything your feature fetches from the web or pulls out of search results was written by a stranger.
Tool results. Whatever a tool call returns lands in the same context as everything else, and you do not control everything that feeds those systems.
The doors around the model. Admin panels, vendor dashboards, test accounts, and the tokens connecting them are not AI at all, which is exactly why AI teams forget to count them.

You cannot defend a product whose doors you have not listed, and most circular safety arguments mean the list was never made.

The ordinary doors bite hardest. In July 2025, researchers got into the chatbot hiring platform McDonald's uses through a test admin account that accepted 123456 as username and password, and behind that login sat chats and contact details tied to roughly 64 million applicants. The AI took the headlines; the way in had nothing to do with the model.

Mark where trust changes hands

The third question marks boundaries, the handoffs where one side grants more trust than the other earned. An AI feature has three big ones.

User to model. The user's words, plus anything riding along with them, become the context the model works from.
Model to tools. Model output stops being text and starts being action: a draft becomes a sent email, generated code runs, a record gets written.
Product to vendors. Your data crosses into systems you do not operate, and their security becomes part of yours.

Boundary failures tend to be quiet. Ray, an open-source framework that runs AI workloads for thousands of companies, is designed for trusted private networks and ships without authentication by default. In March 2024, researchers reported attackers had spent months inside clusters sitting open on the internet, mining cryptocurrency and collecting credentials. Nobody broke the software; the trusted network it was designed for was not there.

Write each threat as one sentence

With assets, doors, and boundaries on the page, a threat stops being a vague worry and becomes something you can write.

A threat is an asset, a door, and a path between them, written in one sentence.

For the support assistant from the opening scene, the lines could read like this.

A crafted email in the shared inbox carries instructions that walk customer history out through the assistant's next reply.
A poisoned page in the help-center corpus steers the assistant into quoting refund terms you do not offer, and the screenshots cost you a month of trust.
A leaked vendor token lets a stranger run your model account flat over a weekend, and you find out from the invoice.

A line this specific names its own defense and its own test. Five honest lines beat a day of workshop slides, and we write ours with the whole team in the room, because arguing over which five make the cut is the real security review.

The frameworks are free when you want depth

When you outgrow one page, the public frameworks are free.

NIST's AI Risk Management Framework, published in January 2023, gives you a shared vocabulary for mapping, measuring, and managing AI risk, which matters the day an enterprise customer asks how you govern yours.
MITRE ATLAS is a public catalog of attack techniques used against real AI systems. One of its case studies documents MathGPT, a math app that turned questions into Python code and ran it, until a crafted question in January 2023 read out the host's environment variables and its API key, the bill and the keys taken through the text box. When your threat lines feel far-fetched, ATLAS is the reality check.
OWASP's Top 10 for LLM applications has ranked prompt injection first since the list began, and Injection: the input is the attack surface gives that attack the full chapter it deserves.

Use them to check your lines, not to replace them.

Score each line by reversibility and reach

Your lines need an order, and the ranking reuses the two questions The autonomy ladder: place every action deliberately asked of every action: how reversible is the damage, and how far does it travel before a human can react.

A stolen API key rotates in an hour; leaked customer records never come back.
A bad answer to one user costs an apology; a poisoned corpus misleads every user until someone notices.

Defend the irreversible, far-reaching lines first, and let the reversible, contained ones wait behind monitoring. The scored page becomes page one of the document this part builds toward in Write your Security Posture and ship defended, and every chapter between now and then adds a section to it.

Try it now

The drill takes about fifteen minutes and runs on your own AI feature, real or planned.

Write your asset list. List what your feature holds that someone could want: user data, keys, the corpus, the bill, your reputation, plus anything specific to your product, then underline the one whose loss you could not undo.

Write your door list. Name every place outside input gets in: the text box, files, retrieved pages, tool results, and the vendor tokens and admin accounts around the model. A good list includes at least one door you had not considered before.

Write your top five threat lines. Give each line one asset, one door, and one path, in a single sentence. If a line will not fit in one sentence, it is two threats, so split it.

Sort the lines by reversibility and reach. The line that is both irreversible and far-reaching moves to the top; that is the threat your next sprint should answer.

Keep the page and date it. Paste your assets and doors into Claude Code and ask it to propose the five threat lines you missed, then keep any that sting. This page seeds the Security Posture you will write at the end of this part.

Chapter Summary

"Is it safe" cannot be answered until you name what you are protecting, so the meeting that skips that step will be held again.
A PM-sized threat model is three questions on one page: what you hold that someone wants, where outside input gets in, and where trust changes hands.
Your assets are user data, keys and credentials, the corpus, the bill, and your reputation, and they often fail together.
The doors include the text box, files, retrieved pages, tool results, and the ordinary infrastructure around the model, and you cannot defend a door you never listed.
Trust changes hands at three handoffs, user to model, model to tools, and product to vendors, and the quiet failures happen where a boundary was assumed instead of built.
A threat is one asset, one door, and one path between them, written in one sentence, and five honest lines do more than a workshop.
NIST's AI RMF, MITRE ATLAS, and OWASP's LLM list are free checks on whether your lines match how AI products actually get attacked.
Score each line by how reversible the damage is and how far it reaches, and defend the worst ones first.
Reread the page whenever the product gains a tool, a vendor, or an input type, because a threat model that stops changing stops protecting.
Your threat map becomes page one of your Security Posture, and next up is Identity: whose keys your AI holds, which follows the asset attackers reach for first.

Sources

NIST, Artificial Intelligence Risk Management Framework (AI RMF 1.0), January 2023.
MITRE ATLAS, the public knowledge base of adversary techniques and case studies against AI systems, including the case study "Achieving Code Execution in MathGPT via Prompt Injection" (incident, January 2023).
OWASP Top 10 for Large Language Model Applications (2023; updated 2025).
Security researchers' disclosure and TechCrunch reporting on the McHire hiring platform's test admin account, July 2025.
Oligo Security, research on the ShadowRay campaign against publicly exposed Ray clusters, March 2024.

Marks this chapter complete on your course map. Reaching the end does this for you.