Setting boundaries: what your AI may and may never do · The Builder's Stack

Why it's differentChoosing the model

DoNotPay sold itself for years as "the world's first robot lawyer." In January 2023 it planned to have its AI argue a live traffic case through an earbud, then withdrew after state bar associations objected that a robot arguing in court is unauthorized practice of law. In September 2024 the FTC settled with the company for 193,000 dollars over the robot-lawyer claims themselves, which the agency said had never been tested.

The software ran fine. The boundary was what failed, because the product was sold into work the law reserves for licensed people, and selling it that way was the violation. For a consumer app the cost was a settlement and a bad news cycle, but the same mistake in a bank or an insurer can end the company, since the only thing that changes is how high the stakes are.

Nobody had drawn an envelope around the product.

An envelope is a written decision about what your product may do freely, what it must refuse, and what it hands to a person, made before anyone writes a prompt.

Why high-stakes industries are different covered why wrong outputs carry asymmetric cost in regulated industries. This chapter covers the envelope itself: how to decide what your product may do, what it must refuse, and what it hands to a person.

Sort everything into three zones: may do, must refuse, hand to a human

The envelope is a product decision that comes before prompts, evals, and model selection. It sorts everything your product could do into three zones.

May do freely. The product can explain how an offering works, compare published options, define terms, and summarize documents the user provides. These outputs inform the user without committing the firm to anything.
Must refuse. The product never advises on a specific decision, diagnoses, decides a claim, promises an outcome, or quotes a binding price. Each of these is either the regulated act itself, or it commits the firm the instant it appears on screen.
Hand to a licensed human. This zone holds anything account-specific and anything whose right answer depends on the person's actual circumstances rather than on general facts. Here the deliverable is a clean handoff with context attached, not an answer.

Where the lines sit is a call your legal and compliance teams own; they are the authority on what counts as advice under your licenses, and this chapter teaches the product skill, not legal advice. Your job is to make sure the envelope exists before the first prompt and that the product enforces it. DoNotPay had no refuse zone, and the product was sold straight into licensed work.

Why finance puts advice in the refuse zone

In finance, the law forces advice into the refuse zone. Under Regulation Best Interest, the SEC rule adopted in 2019, a broker-dealer's recommendation to a retail customer must serve that customer's best interest, and the older suitability duties point the same way, since advice has to fit the person, their goals, holdings, debts, and risk tolerance. A chatbot answering an anonymous visitor has no verified picture of any of that, and a duty to fit the recommendation to the person cannot be met when the product has no reliable record of who the person is. That leaves only the outer zones as compliant: education, options with their tradeoffs, and a handoff to someone licensed.

Regulators also read bot output as the firm speaking. FINRA, the self-regulatory organization that supervises US broker-dealers, confirmed in Regulatory Notice 24-09 in 2024 that generative chatbot output is a communication with the public under its rules, supervised and retained like an ad or a letter. A throwaway chat reply carries the same compliance weight as a published brochure.

Write refusals that move the user forward

A refusal is a piece of product design, not a legal apology, yet most teams write refusal copy last and treat it as an error state. That produces a dead-end reply with no next move, and dead ends teach users to rephrase and probe until something slips through, which is the same manipulation pattern the guardrail above describes. A refusal that works is built differently.

State the boundary in plain words, not policy citations. A working version sounds like "I can't recommend what to do with your retirement account, because the right answer depends on your full financial picture, which I don't have."
Offer the adjacent thing the product may do. The same reply can continue with "I can walk you through how people generally weigh this choice, and the questions worth bringing to an advisor."
Route forward in one step. Put a licensed human one tap away with the conversation attached, rather than a phone number and a fresh start.

Users hit these boundaries exactly when something real is at stake for them, which makes refusal and handoff the highest-anxiety screens you will ship. The skills in Anxiety: lower the stakes at risky moments apply at full strength: lower the temperature, show what happens next, and never make a person feel punished for asking.

Keep the final decision with a person, and be careful what you claim in public

Some decisions have to stay with a person even when a model could produce the answer. Lemonade, the insurance company, ran into the public side of this in 2021 after describing AI that scanned the videos customers submit with claims for signs of fraud. The backlash forced the company to delete the posts and clarify publicly that humans decide claims. The failure here was not a wrong output reaching a customer, it was the way the company described its system. Saying out loud that a model makes the decision is itself a regulatory and trust problem.

Keep the decision with a named person. The model can assemble the file, flag inconsistencies, and draft the letter, while the decision itself, on a claim, a credit line, a diagnosis, is made by a person whose name goes on it.
Treat your public description as part of the envelope. Regulators and customers read marketing copy as a statement of fact about your controls, so copy that inflates the model's role can create the incident by itself.

Start with an internal tool, then widen the envelope as you prove it

The lowest-risk first version of a high-stakes AI product faces your own staff, not your customers. It drafts the reply a licensed advisor reviews and sends, summarizes the case file before the claims adjuster opens it, and pulls the relevant policy language while the agent is on the call. Trained professionals catch the errors, and every catch becomes evidence that the product works.

That evidence is what lets you widen the envelope. When we scope a first version in a regulated product, we start here even when the customer-facing vision is the whole point, because compliance leaders only widen an envelope when they can see the record: accuracy on real questions, how the product refuses under real pressure, and review logs with named reviewers. Show that the controls held internally, then open the may-do zone to customer education, and widen it again as the record grows. The envelope opens to customers one step at a time, and each step has to be earned.

Try it now

Draw the envelope for your own product using the asks you actually receive. Budget about 15 minutes.

Pull ten real asks. Collect the ten most recent real user requests from support tickets, chat logs, or notes from sales calls, in their verbatim wording rather than the categories in your head.

Sort each ask into a zone. Mark each one as may do freely, must refuse, or hand to a human, and notice your hesitations; any ask you stall on marks the edge of your envelope and belongs in your next conversation with compliance.

Write the two hardest refusals. Take the two asks where refusing feels most costly and draft the actual copy: the boundary in plain words, the adjacent thing the product offers instead, and the one-step route to a person. If you cannot write a refusal that leaves the user better off than a dead end, the ask belongs in the handoff zone instead.

Chapter Summary

The envelope is your written decision about what the product may do freely, what it must refuse, and what it hands to a person, drawn before anyone writes a prompt.
Sort everything the product could do into three zones: may do freely, must refuse, and hand to a licensed human.
In regulated work, putting advice in the refuse zone is not a style choice; the law requires it, because real advice has to fit a specific person and an anonymous chatbot cannot see who it is talking to.
Enforce the lines with gates, review, output filters, restricted tools, and human checkpoints, not by hoping the system prompt holds up under pressure.
A motivated user will eventually get the product to say anything it can say, so anything that would commit the firm belongs in the refuse or handoff zone until a real control makes it safe.
A refusal is product design: state the boundary in plain words, offer the nearest thing the product can do, and put a person one tap away.
Keep binding decisions with a named human, and be careful what you claim in public, because regulators and customers read your marketing as a statement of fact about your controls.
Ship the first version to your own staff, prove the controls with a clean record, then widen the envelope toward customers one earned step at a time.
Next up is Choosing the right model, which takes up the question of which model can operate inside these boundaries.

Sources

FTC (2024). Action against DoNotPay over robot-lawyer claims; press reporting on the withdrawn 2023 courtroom plan.
SEC (2019). Regulation Best Interest.
FINRA (2024). Regulatory Notice 24-09, on generative AI chatbot output as communications with the public.
Lemonade (2021). Public statements on AI claims handling, including the clarification that humans decide claims.

Marks this chapter complete on your course map. Reaching the end does this for you.