You type a question into ChatGPT and hit send. For a moment there is nothing but a blinking cursor, and then words begin to arrive, not all at once but in a steady drip, the answer assembling itself on screen as if someone on the other end were typing very fast. You have watched this hundreds of times without asking what happens in that gap, or why the words arrive one at a time instead of landing as a finished block.
Both questions have the same answer, and it is the most reusable picture in this level.
Every interaction with a connected product, every tap, click, search, and send, is a round trip: a request goes out, a computer somewhere does some work, and a response comes back.
This chapter walks the loop once at slow speed, names its parts, and shows you where to watch it run for real.
The round trip, step by step
Take the moment you hit send and stretch it out.
Your device packages a request. Your browser or phone wraps everything the other side needs into a small structured note: the text you typed, which conversation it belongs to, proof of who you are. That note is the request, and the device sending it is called the client.
The request travels. It leaves through your Wi-Fi or cellular connection, passes through your internet provider, and hops across a chain of machines until it reaches a specific computer run by the company behind the product. That computer may sit in a data center in another city or on another continent, the territory we mapped in where software lives.
The server works. The computer that receives the request is the server. It checks who you are, loads your conversation, and hands your question to the model, which begins generating an answer. All of this happens out of sight, which is why it feels like magic from the outside.
A response returns. Whatever the server produced is wrapped into a response and sent back along the same kind of path. Your device unpacks it and updates the screen, and the loop is closed.
The cargo changes from product to product, but the loop does not. When you like a photo, the request says, in effect, add one like to this photo for this account, the server updates one record in a database (the subject of data, where information lives), and the response is just an acknowledgment. A search, a checkout, and a sent message run the same loop with different notes inside.
The vocabulary of the round trip
When an AI tool says it will add an endpoint, or warns you about response latency, it is pointing at pieces of this loop. Here are the words.
- Request. The structured note the client sends, asking for something to happen.
- Response. The note that comes back, carrying the result or an error.
- Server. The computer that receives requests and does the work; the backend opens it up later in this level.
- Endpoint. The specific door on the server a request knocks on; liking a photo and posting a comment go through different doors, the subject of APIs.
- Latency. The time between sending a request and receiving its response, the wait you feel.
Each of these also lives in the glossary for the day you need it back.
Latency earns extra attention because it explains how products feel. Every wait is two things added together: distance and work. Distance is the travel time across the physical gap between you and the server, a small fraction of a second even with an ocean in between, and work is everything the server does before it can answer.
The mix is what varies. A like button is nearly all distance, because updating one database record takes almost no time, while an AI answer is nearly all work, because generating the text takes far longer than the trip there and back. When a product of yours starts feeling slow, the useful question is whether distance or work got bigger; monitoring answers that with data instead of guesses.
Streaming is one answer arriving in pieces
Now return to the words arriving one at a time. The tempting explanation is that the product is making many tiny round trips, one per word. The real mechanism is simpler. There is one request and one response, but the server starts sending the response before the model has finished generating it. The connection stays open, the answer flows through it in chunks, and your screen paints each piece as it lands. That is streaming.
It exists because of the work half of latency. A full answer can take many seconds to generate, and if the server held the response until the last word existed, you would watch a spinner the whole time. Streaming sends you the first word almost right away instead of making you wait for the last one. The total time does not shrink; what changes is when you start reading.
You already know this pattern from video, where a film starts playing long before the whole file has arrived. AI chat does the same thing with text, so the typewriter effect is not a design flourish; it is just what a response looks like while it is still being delivered.
One screen, many round trips
The chat window makes the loop easy to see because there is roughly one journey per message, but most screens are busier than that. Open a feed app and the first second sets off dozens of requests, one for the page itself, one for notifications, one per image. They return at different speeds in no guaranteed order, and the screen assembles from whatever has landed. The loading you see is just many of these round trips finishing at different times and filling in the page as they arrive.
Each of those journeys is work a server performed on your behalf, which is why traffic, the number of requests arriving per second, does a lot to set the bill in hosting.
Try it now
No setup: Open a site you use daily in a desktop browser, right-click the page, and choose Inspect (some browsers label it slightly differently). The panel that appears is DevTools, built into every major browser, and its Network tab is a live window onto every round trip the page makes. Refresh and watch the lines roll in; each line is one request paired with its response, along with how long the round trip took. Now click one thing on the site, a like, a filter, a search, and find the new line your click created. Select it and read the response that came back. If you use an AI chat, send it a question and watch the reply stream word by word, now that you can name what you are seeing. We keep this tab open whenever a build misbehaves, because it answers the first diagnostic question at a glance: did the request go out, and did a response come back?
With your tools: Give Claude Code a small task in any folder, something like reading one file and summarizing what it does, and watch the terminal while it works. The pause after you press enter is a request traveling and work starting on a model server, and the text flowing in afterward is the response streaming back, the same journey from the chat window, now in your terminal. Ask for a multi-step task and you will see several round trips land in sequence, one per model call. If Claude Code is not installed yet, the Setup Clinic walks you through it. In Codex or Cursor the move is the same: hand the sidebar a small task and watch the response stream into the panel.
Chapter Summary
- Every interaction with a connected product is a round trip: your device sends a request, a server does some work, and a response comes back.
- The device that sends the request is the client, and the computer that receives it and does the work is the server.
- An endpoint is the specific door on the server that a request knocks on; different actions go to different doors.
- The cargo inside a request changes from one action to the next, but the pattern of the loop stays the same.
- Latency is the wait between sending a request and getting the response, and it is always distance plus work added together.
- A like button is almost all distance because the server does very little, while an AI answer is almost all work because generating the text takes time.
- Streaming is one response delivered in pieces: the server starts sending before the model has finished, so you start reading sooner even though the total time is the same.
- Most screens fire many round trips at once, and the loading you see is those requests finishing at different times and filling in the page.
- Your browser's Network tab in DevTools shows you every round trip on any site, which makes it the first place to look when a build misbehaves.
- Next on the map is what both ends of this journey are written in, and how to approve a tool's pick with confidence, in Languages and frameworks: approve the pick.