You have made an API call before, even if you did not write the code yourself. In APIs, how systems talk to each other you saw the pattern: your app sends a request to a server it does not own, and the server sends back a response. Calling a model is that, almost exactly that. The aura around "using AI" makes the first call feel like it needs a research degree, but the call itself is an ordinary request to someone else's server, with one twist in what comes back and one twist in the bill.
A model call is an ordinary API request with two twists: the response is generated fresh each time instead of looked up, and you are billed by the token for both what you send and what comes back.
What you send, and what you get back
The request carries two things: your instructions and the input to work on. The instructions, often called the system prompt, set the job ("you list groceries from a photo, one item per line"). The input is whatever the user supplied, which for many current models can be an image as easily as text. The whole thing is a normal web request, and the pattern barely changes across OpenAI, Anthropic, and Google.
POST (the provider's messages endpoint)
{
"model": "<your chosen model>",
"messages": [
{ "role": "system", "content": "List the groceries you see. One item per line." },
{ "role": "user", "content": "<the fridge photo>" }
]
}
What comes back is generated text, not a record pulled from a table. Send the same request twice and the two replies can differ slightly, because the model produces an answer rather than retrieving one. That single fact shapes everything later in this part: you cannot assume the output is identical every time, so you will learn to constrain it in Get output you can build on and to check it before you trust it.
Tokens: the unit you are billed in
The model reads in units called tokens, not characters or words.
A token is a chunk of text the model reads and writes, roughly a few characters or a short word, and you pay per token for everything you send and everything you get back.
A short sentence is a dozen or two tokens; a photo and its list might run to a thousand or two. You are charged for the input tokens and the output tokens together, so a long instruction you send on every call costs you on every call. Per-token prices change too often to print here, so we work in orders of magnitude: a single small call like the one above lands around a fraction of a cent today, and the current numbers live on the providers' pricing pages and in the Playbook's cost section. The same unit sets a hard limit you will meet in Give the model the facts it wasn't trained on: the context window, the most tokens a model can take in one call.
The call lives on your server, never in the browser
To make the call, your app sends an API key, a secret string that tells the provider who is paying. That key can spend real money, so where it lives matters more than any other choice in this chapter.
The key belongs on your backend, the server-side half of your app, read from an environment variable that never ships to the browser. If you put the key in frontend code, anyone who opens your site can read it and run up your bill on their own projects. The pattern is fixed: the browser asks your server, your server calls the model with the key, and the answer comes back through your server. The key never leaves your side.
You direct the call, then read it
You will not hand-write this code, the same way you have not hand-written the rest of your build. You describe the feature to your tool, it writes the backend route that takes an input, calls the model, and returns the result, and you read the diff the way Review what the AI built taught you: the key comes from an environment variable, the call happens on the server, and the response is handled when it arrives. The drill below makes your first call two ways, once by hand to see it directly and once through your tool to put it in your build.
Try it now
No setup: Open a provider's web console (OpenAI's Playground, Anthropic's Console, or Google's AI Studio) and make one call by hand. Write a one-line system instruction, paste or type an input, and run it. Read what it shows you: the reply, and the token count for the request and the response. Then run the exact same input twice and notice the two replies are not identical. You have now made a model call and seen the bill, with no code at all.
With your tools: In the project you are building, ask Claude Code to add a single backend route that calls a model with a fixed instruction and returns the text, reading the API key from an environment variable and never exposing it to the browser. Run it once and read the response. If your tools are not set up yet, The Setup Clinic gets you to a working session in one sitting. In Codex or Cursor the move is the same: ask for one server-side route that makes the call and returns the result, with the key in an environment variable.
Chapter Summary
- A model call is an ordinary API request: your app sends instructions and an input to the provider's server, and the server returns a reply.
- The two twists are that the reply is generated fresh each time, so the same input can return slightly different output, and that you pay by the token.
- A token is a small chunk of text, and you are billed for the tokens you send and the tokens you get back, so long instructions cost you on every call.
- Per-token prices change constantly, so think in orders of magnitude and read the providers' pricing pages and the Playbook for current numbers.
- The number of tokens a model can take in one call is its context window, a limit you will plan around later in this part.
- Your API key can spend money, so it lives in a server-side environment variable and never reaches the browser, a public repo, or a screenshot.
- The call itself runs on your backend: the browser asks your server, your server calls the model, and the answer returns through your server.
- You direct your tool to write the route and then read the diff, rather than hand-writing the code.
- Next up, Choose a model you can live with helps you pick which model that call should reach.
Sources
- OpenAI, Anthropic, and Google developer documentation on messages, tokens, and authentication, 2026.
- Provider pricing pages for current per-token rates, 2026.