Building the right context · The Builder's Stack

Choosing the modelSecurity

In June 2023, a federal judge sanctioned two lawyers 5,000 dollars for filing a brief that cited six cases that did not exist. The brief had been drafted with ChatGPT. If you build in a regulated industry, Mata v. Avianca is probably already in your compliance team's training deck, usually under a slide about hallucination.

That reading is comfortable and incomplete. Models produce fabrications as a matter of course, and no amount of scolding changes the failure rate. The deeper failure sat upstream of the fake cases: an answer arrived with no source attached, in a profession where every claim must trace to authority, and nothing in the workflow forced the trace before the document went out the door.

This chapter is about making the trace mandatory. In a high-stakes product, you do not let the model answer from whatever its training data happened to contain. You assemble the context it answers from, and you assemble it under rules.

In a regulated product, retrieval is a control

In a consumer product, retrieval is a quality lever: fetch better passages, get better answers. In a regulated product, it belongs to the same family as access control and change control. Retrieval is the step that fetches documents and places them in front of the model before it answers.

Retrieval decides whether the model answers for itself or reads from documents your firm already stands behind.

An answer with no retrieved source is the model's own output, and nobody in your company has ever reviewed it. A grounded answer is, at its best, a restatement of material that already cleared review, with a trail leading back to it.

Regulators and auditors evaluate controls by asking how they were designed, how they are tested, and what evidence they leave behind. Hold your retrieval pipeline to that standard and the rest of this chapter follows from it.

Answer only from an approved corpus

The most important move is to limit answers to documents your firm has approved. It has three parts you can copy directly.

Define the approved corpus. Answers come only from a body of content the firm has signed off: policies, filings, product documentation, published research. What qualifies is a decision your legal and compliance teams own, and they are the authority here; your job is to make the product incapable of answering from anywhere else.
Refuse when the corpus is silent. When retrieval returns nothing relevant, the product says it cannot answer and routes the question to a person, because falling back to training data reopens the exact hole you just closed.
Say so in the product. Tell users where answers come from. In this market the constraint is a selling point, not a limitation.

Morgan Stanley shipped this pattern in September 2023: an assistant built on GPT-4 that answers financial advisors' questions from roughly 100,000 pieces of the firm's own research, with citations into that corpus. It went to advisors rather than customers, so a licensed professional sat between the model and any client, and that pairing of an internal corpus with an internal audience is a sound default for a first regulated deployment.

Cite sources so a reviewer can check the answer

Every answer shows which documents produced it, precisely enough that a reviewer, an auditor, or the user can pull the source and check the claim against it. Citations are what let review happen at all, which is not the same as letting reviewers skip it. A 2024 Stanford study of leading AI legal research tools found false or misgrounded answers in roughly one in six queries, and worse for some tools, even in products marketed as hallucination-free. Misgrounded means the citation is real but the document does not actually back the claim, which is the failure a glance at a citation list never catches.

Professional deployments build that humility into the rollout. When Allen & Overy rolled out the legal assistant Harvey firm-wide in 2023, output was treated as a first draft under lawyer review, with the citation pointing the reviewer to the source.

Serve only the current version

Serving stale documents is a compliance problem, not just a quality one. Documents in the corpus carry versions and effective dates, and the index, the searchable copy your retrieval queries, serves only the current one. When a policy is superseded or a prospectus replaced, the old chunks (the stored passages retrieval returns) leave the index in the same change that publishes the new ones, never in a quarterly cleanup. A product that quotes last year's prospectus or a withdrawn policy has handed a customer a superseded document, which is a compliance incident rather than a ranking bug.

Effective dates also matter when a question is about a specific point in time, since a question about the rules as they stood in March should never be answered from today's documents without saying so.

Apply the asker's permissions on every query

Entitlements, the record of who may open which documents, are enforced at your document store and almost nowhere else by default. The part teams miss is that the user's permissions have to follow the question into the index. Retrieval runs as the user, filters candidate passages down to documents that user could open, and only then does anything reach the model. If the user could not open a document, the model must not read it to them, however relevant the passage scores. In multi-tenant products, where one deployment serves many customer organizations, the rule gets stricter still: retrieval that crosses a tenant line is a breach, with everything that word obligates you to do.

Log what the model saw

This is the habit whose payoff comes last, but it costs almost nothing to add on the day you build the pipeline: for every answer, log which chunks the model received and which document versions they came from. When someone asks months later why the product said what it said, that log is the difference between replaying the exact context and having nothing to show, and later in this part the evidence work turns it into proof you can hand an auditor. Assembling the right documents, versions, and constraints for each request is the same discipline as context engineering, practiced under rules, and the log is how you check your own work.

Try it now

The grounding audit takes about fifteen minutes and runs against your own product and the documents you already trust.

Pick ten questions your approved documents answer. Choose questions whose correct answers live in the corpus: a fee, a coverage limit, an eligibility rule. Note which document you expect each answer to come from.

Ask them from a real account. Use the least-privileged account you can get, the trial tenant or the junior role, never your admin login.

Record the evidence for each answer. Note whether the answer cited a source, whether that source was the current version, and whether anything returned, in the answer or its citations, came from a document that account could not open.

Tally and route. A missing citation is a grounding gap, a stale version is a freshness gap, and anything across an entitlement line goes to your security team today rather than into the backlog.

Scale it down: three questions against one document you know cold.

Chapter Summary

In a regulated product, retrieval is a control on a par with access control and change control, not just a way to fetch better passages.
An answer with no retrieved source is the model's own output that no one has reviewed. A grounded answer restates material that already cleared review and leaves a trail back to it.
Answer only from an approved corpus. Your legal and compliance teams decide what goes in it, and your job is to make the product unable to answer from anywhere else.
When retrieval finds nothing relevant, the product should say it cannot answer and route the question to a person rather than fall back to training data.
Put a citation on every answer so a reviewer can pull the source and check the claim. Citations let review happen; they do not replace it.
Serve only the current version of every document, and drop superseded versions the moment the new ones publish. Quoting a withdrawn policy is a compliance incident, not a ranking bug.
Carry the asker's permissions into the index on every query. If the user could not open a document, the model must not read it to them, and crossing a tenant line is a breach.
Log which chunks and document versions the model saw for every answer, so you can explain months later why the product said what it said.
Next is Security and guardrails, which covers the people who will deliberately try to bend these controls.

Sources

Mata v. Avianca, sanctions order, S.D.N.Y. (2023).
Morgan Stanley, announcement of the AI @ Morgan Stanley Assistant (2023).
Magesh and colleagues, Stanford study of hallucination in AI legal research tools (2024).
Allen & Overy, announcement of the Harvey deployment (2023).

Marks this chapter complete on your course map. Reaching the end does this for you.