Your support bot is live, and a customer asks the simplest possible question: how long do I have to return something? The bot answers, confidently, "you have fourteen days." Your actual policy is thirty. The model did not lie and it is not broken. It was never told your policy, so it did what a model does with a question it has no facts for, producing the most plausible-sounding answer, which happened to be wrong.
A model has only its training data plus whatever you put in the prompt, so to get answers about your business, your users, or anything recent, you have to put those facts in front of it.
The model was trained on the world, not your business
A model is trained on a large snapshot of public text, frozen at a cutoff date. That gives it broad competence and a real blind spot: it has no access to your prices, your policies, your documents, your user's account, or anything that happened after its training ended. Ask it about any of those and it answers in the same fluent voice it uses for things it was trained on, which is exactly what makes the wrong answer dangerous. The fix is almost never a smarter model. It is giving the model the facts.
Two ways to hand over the facts
There are two, and you choose by size.
- Stuff them in. When the facts are small and stable, your refund policy, a price list, a short FAQ, just include them in the prompt on every call. It is the simplest thing that works, and for a surprising number of features it is all you need.
- Fetch the relevant few. When the facts are too large to send every time, hundreds of help articles or a whole product catalog, you store them, and for each question you pull only the handful that match and put those in the prompt. That is retrieval, also called RAG, short for retrieval-augmented generation, and the diagram above is the whole pattern of it: store your documents, find the ones that fit the question, add them to the prompt, and let the model answer from them.
The machinery that makes "find the ones that fit" fast and accurate, embeddings, vector stores, chunking, reranking, is real and worth learning, and it lives in The Frontier. At this level you need the pattern, not the plumbing, because the pattern is what decides whether your feature can answer from your facts at all.
When you actually need retrieval
You do not always. Builders reach for a vector database the way they reach for microservices, because it sounds serious, and then maintain a search system to answer questions a paragraph of text would have covered. The honest rule is to stuff first and add retrieval only when the knowledge is genuinely too big to send every time, changes too often to paste by hand, or differs per user. A refund policy is three sentences; put it in the prompt and move on.
Context makes a right answer likely, not certain
Handing the model the right document sharply raises the odds of a right answer, and it does not guarantee one. The model can still lean on the wrong line, and your retrieval step can pull the wrong document, so the same discipline from the last chapter applies: check the values, and show your work. When you can, let the feature display which sources it used, so a wrong answer comes with a visible trail instead of arriving as a confident sentence from nowhere.
Try it now
No setup: Find a question your model gets wrong about your own domain, one your own facts could answer. Paste the relevant policy or document straight into the prompt and ask again. Watch the answer go from confident-and-wrong to correct. You just did retrieval by hand, and you proved the fix is facts, not a better model.
With your tools: In your feature, ask Claude Code to add the few facts the model needs directly to the prompt if they are small, or a simple retrieval step that pulls matching documents if they are large, and to show which sources each answer used. If your tools are not set up yet, The Setup Clinic gets you there in one sitting. In Codex or Cursor the move is the same: add the facts to the prompt, or a small retrieval step, and show the sources.
Chapter Summary
- A model has only its training data plus what you put in the prompt, so it cannot answer about your business, your users, or anything after its cutoff unless you tell it.
- Asked about facts it was never given, a model produces a confident, plausible guess, which is what makes the wrong answer dangerous.
- The fix is giving it the facts, not finding a smarter model.
- For small, stable facts, stuff them into the prompt on every call; it is simple and often enough.
- For facts too large to send every time, use retrieval: store your documents, fetch the few that match each question, and add those to the prompt.
- The deeper retrieval machinery, embeddings and vector stores, is a Frontier topic; here you need the pattern, not the plumbing.
- Reach for retrieval only when the knowledge is too big, too fresh, or too per-user to paste by hand; do not build a search system for a paragraph.
- Right context makes a right answer likely, not certain, so check the values and show which sources an answer used.
- More context is not better: extra material costs tokens and can bury the fact that matters, so give the model the few right pieces.
- Next up, The customization ladder places stuffing, retrieval, and the heavier options in the order you should reach for them.
Sources
- AWS and Pinecone explainers on retrieval-augmented generation, 2026.
- OpenAI, Anthropic, and Google documentation on context windows and providing context, 2026.