You land at a hotel front desk at midnight with a reservation that has gone missing. You do not walk around the counter and fix it in the booking system yourself. You state your case in the form the desk expects, a name, a confirmation number, a card, and the receptionist disappears into systems you will never see. What comes back is a key or a polite no with a reason attached.
Connected software gets work from other software the same way. One system never reaches into another's code or database; it hands a front desk a request in an agreed form and takes what comes back. That desk is an API, and every AI feature you will ever scope depends on at least one.
What an API is
An API is the front desk of a service: you hand in a request in exactly the form it asks for, and you get back a result in exactly the form it promised, or a refusal with a reason.
The service itself, the code, the data, the machines doing the work, stays in a back office you cannot enter. What you get instead is a contract posted at the desk, and the deal protects both sides, since you never have to understand the back office and the service never lets strangers inside.
An API call is the same round trip you watched in the journey of a request, now with both notes standardized. The knocking is usually done by your backend, which asks other services for the work your product cannot do itself, charging a card, sending an email, generating text.
Endpoints are the doors, and every request has the same parts
The desk in the diagram is not one window but a row of doors. Each thing the service can do gets its own named door, called an endpoint. A payments service has one door for creating a charge, another for looking one up, a third for refunds. When a tool says it is calling an API, it means knocking on one specific door, and the request has the same few parts every time.
- A verb.
GETasks the service to look something up;POSTasks it to do something new. - An address. The door's name, written like a web address, such as
POST /charges. - A body. The job's details as labeled fields: amount, currency, which card.
- A key. A long secret string that proves who is knocking and which account to bill; auth, who you are vs. what you can do gives keys their full treatment.
Get the form right and the desk does the work; get it wrong and the refusal comes back as an error code with a reason. Either way the desk answers, which is what makes other services safe to build on.
Model APIs add tokens, token pricing, and rate limits
When the service behind the desk is a model provider, the same front-desk picture still works. The door for generating text takes the model name, the conversation so far, and a cap on reply length, and returns the generated text. What changes is the meter on the desk.
- The cargo is measured in tokens. A token is a chunk of text the model counts, a short word or a piece of a longer one; a page of English runs a few hundred. Everything you send is counted on the way in, instructions, user question, attached documents, and everything generated is counted on the way out.
- You pay per token, in both directions. Prices are quoted in dollars per million tokens, with input and output priced separately and output costing more. The numbers move often enough that we will not print any here; the provider's pricing page is the source of truth.
- Rate limits are the bouncer at the door. The desk caps how fast one customer can knock, usually as requests per minute and tokens per minute, and past the cap you get a polite no that means slow down and retry. It keeps one customer's bug from taking the desk down for everyone.
How one cheap request becomes a real bill
Token pricing reads as free at the scale you first meet it. A request that sends a few thousand tokens and gets a few hundred back costs pennies at most, which is so small that nobody stops to check it. The real bill, though, is that price paid by every user, every day, for a month.
That multiplication is the step that surprises people, and it cuts both ways: sometimes it turns a feature that demos for pennies into a real line on the monthly invoice, and just as often it shows an idea is cheaper than everyone assumed. Either answer is worth having before the build starts.
How to read an API docs page fast
None of this machinery was invented for AI. Stripe, the classic example of a well-run front desk, has accepted payment requests in the same form for years. Most modern builds, AI or not, are a thin layer of your own logic standing at a row of other people's desks, so the skill that lasts is reading the sign on a new desk quickly.
The sign is the endpoint's documentation page, and reading one is a matter of spotting a familiar pattern rather than doing engineering. Every provider's docs follow the same skeleton, and you only ever need three things from it.
- The form. The verb and address at the top, then the table of fields with the required ones marked. Leave a required field out and the request fails before any work starts.
- The example. Nearly every page shows one complete request and the response it produces. Read it first; a worked example tells you in seconds what the field tables explain in paragraphs.
- The limits. Rate limits, maximum sizes, and the link to pricing, sometimes on a separate page. This part tells you whether the volume you sketched in the token math will fit.
Find those three and you can sanity-check any integration an AI tool proposes, which is most of what you will ever need from API documentation.
Try it now
No setup: Pick a model provider and open its pricing page next to the docs page for its text-generation endpoint. On the pricing page, find the unit, dollars per million tokens, and confirm that input and output carry different prices. On the docs page, find the three things: the form, the worked example, and the limits. Then run the receipt on your own idea. Estimate tokens in and tokens out for one typical request of the feature you want, multiply by an honest guess at requests per day, then by thirty, and price the total against the page you just read. The drill replaces a guess with a monthly number.
With your tools: Hand Claude Code one sample task from that same feature, a paragraph to summarize or a question over a document, and ask it to estimate input and output tokens for that request, then the daily and monthly cost at your expected volume. Check the arithmetic yourself with the pricing page open. We run this estimate before any AI feature of ours gets a yes. If Claude Code is not installed yet, the Setup Clinic walks you through it. In Codex or Cursor the move is the same: paste the sample task into the sidebar chat and ask for the token estimate and the monthly math.
Chapter Summary
- An API is the front desk of a service: you hand in a request in the form it asks for, and you get back the work or a refusal with a reason.
- One system never reaches into another's code or database. It only ever sends a request to the front desk and takes what comes back.
- Endpoints are the named doors, and every request carries the same few parts: a verb, an address, a body, and a key.
- A model API is the same kind of front desk, except the thing being measured is tokens, the chunks of text the model counts going in and coming out.
- You pay per token in both directions, priced per million, with output costing more than input. Prices change often, so the provider's pricing page is the only number to trust.
- Rate limits cap how fast you can call the service. Past the cap you get a polite "slow down and retry."
- A single request costs pennies, but the real bill is that price paid by every user, every day, for a month, so do that multiplication before you build.
- To read any docs page, find three things: the form, the worked example, and the limits. That is enough to sanity-check any integration an AI tool proposes.
- Everything these services hand back has to live somewhere, and where it lives shapes the rest of your build, which is where data, where information lives picks up.