A small product goes live, and launch week goes well. Then, at 3 a.m. one night, the email service behind the sign-up form starts rejecting requests. Nothing visible changes: the homepage loads, the demo runs, the server reports itself healthy, and every visitor who tries to join hits a spinner and leaves. Two days later a stranger's email arrives: "Tried to sign up three times this week. Is this still happening?" That email is the team's monitoring system, and it reported the incident two days late.
Things break, and that part is normal, but here the team found out second, after a user did.
Monitoring is the discipline of finding out about a problem before your users do, and in an AI product it also watches the one thing users will never report: what every answer costs.
Logs, metrics, and error tracking: the three layers you need
Monitoring as an industry runs deep, but the part you need collapses into three layers, each answering one question about your running product.
- Logs answer "what happened." Every request your backend handles, every error it hits, every model call it makes can be written down as a timestamped line of text. You read logs after something has gone wrong, to reconstruct the story.
- Metrics answer "how much." Numbers over time: requests per minute, response time, error rate, and in an AI product, tokens. You watch metrics for trends, and you attach alerts to their thresholds.
- Error tracking answers "what broke." When code fails, the failure is captured, grouped with identical failures, and ranked by how many users it hit, so you fix the worst one first.
Little of this needs building: hosts ship logs and basic metrics by default, and dedicated error trackers are quick to add to an existing app. That covers every failure a machine can recognize. An AI product adds two signals this stack will not watch for you, and both deserve attention from day one.
Watch the AI bill, because it climbs without anything failing
Every model call is metered in tokens, and the meter never pauses. Your bill is roughly the tokens you spend per request times the number of requests per day, the same receipt you priced in APIs, how systems talk to each other, except now it runs against real production traffic instead of your estimate.
What makes the bill a monitoring problem is that it can climb without anything breaking. A retry loop calls the model three times for one question, a prompt edit doubles the context you send, traffic spikes overnight, or one user pastes a whole contract where you expected a paragraph. Every one of those returns success and leaves your error tracker green, so the cost meter is the only place the change shows up.
Set a billing alert before the feature meets real users. This is the cheapest first move in AI monitoring, and the one we treat as non-negotiable. Every major provider lets you set a spending alert in minutes, at a threshold you choose. Put it a notch above what a normal month should cost (the provider's pricing page defines normal), and the quiet runaway bill becomes a routine notification that reaches you while the number is still small.
A request can succeed and still answer badly
Classic monitoring catches what machines can detect: a crash, a timeout, an error code. A model feature can fail with none of those, because the model returns a fluent, assured, wrong answer, the request completes cleanly, and every dashboard reports success. A support bot can misstate your refund policy a hundred times before the first angry email, because as far as the infrastructure can tell, every one of those conversations worked.
Count what users do next. That is the honest countermeasure at this level. Track the thumbs-down rate if your product has the buttons, retries (the same user asking the same question again within a minute), and abandons mid-flow. These are crude proxies, but they turn "answered badly" into a number that can trend and alert like any other, so a retry rate that doubles in a week reports a quality drop that produced zero errors. Measuring answer quality properly is a discipline called evaluation, and the next level, The Practice, goes deep there.
A status page shows users what your monitoring sees
A status page shows users what a product's own monitoring sees: current health by component, live updates during an incident, and a history of resolved ones. Every major model provider runs one, and the histories are not empty, because incidents happen at every scale of company. For you as a builder, the page works as both a diagnostic tool and an education.
- Diagnosis. When your AI feature degrades, the provider's status page is the fastest answer to "is this us or them." Check it before you spend an hour debugging your own code.
- Education. A resolved incident, read end to end, shows what competent monitoring sounds like: detection time, mitigation steps, root cause, and follow-up work, written for the public.
Once people rely on your build, a small status page of your own is one of the cheapest ways to earn their trust, because it shows them you noticed the problem first and are already on it.
Write your one-sentence monitoring spec
You do not have to design any of this. You have to specify it, and the specification fits in one sentence with three slots: the worst silent failure, the signal that would catch it, and where the alert lands. The worst silent failure is the thing users would feel before you did, which for most products is the sign-up flow, the checkout, or the AI feature itself. The signal is the number that moves when that breaks, and the landing spot is anywhere that actually reaches you.
"If sign-up success rate drops below ninety percent for fifteen minutes, message our shared channel." "If daily token spend doubles against the trailing week, email both of us." Either sentence is a complete monitoring spec, and handing it to your build tool is enough for the tool to propose the wiring. If you cannot finish the sentence, the failure is not specific enough yet, and sharpening it is the real work.
Try it now
No setup: Pick one model provider or hosting company you would build on and find its status page (the convention is the word "status" next to the company name in a search). Open a resolved incident and read it end to end. Note how the problem was detected, how long detection to mitigation took, how often updates were posted, and what the root cause turned out to be. The calm, specific writing is worth copying when your own first incident arrives.
With your tools: Open the billing page of the account your coding tools charge against and turn on its spending alert, set a notch above a normal month. Every provider offers one, and if no account exists yet, the Setup Clinic gets you there in one sitting. Then write your one-sentence monitoring spec: describe the build you are circling to Claude Code, ask for the most likely silent failures, pick the one that would hurt most, and finish the sentence yourself with the signal and the landing place. In Codex or Cursor the move is the same: ask the sidebar chat for candidate silent failures, then complete the spec by hand. Keep the sentence, because it goes into the plan you will assemble in Write your Build Plan and start the build.
Chapter Summary
- Monitoring means finding out about a problem before your users have to tell you about it.
- The three layers cover most of it: logs record what happened, metrics count how much, and error tracking names what broke and how many people it hit.
- Your hosting and a quick-to-add error tracker give you most of those three by default, so little of it needs building.
- An AI product adds two signals the standard stack will not watch for you: the bill and the quality of the answers.
- The bill can climb without anything breaking, so set a provider spending alert before the feature meets real users; it is the cheapest first move there is.
- A request can finish cleanly and still return a wrong answer, so count what users do next: thumbs-down, retries, and abandons stand in for quality until you measure it properly.
- Check a provider's status page when your feature degrades to see fast whether the problem is yours or theirs, and run a small one of your own once people depend on your build.
- You do not have to design any of this: write a one-sentence spec naming the worst silent failure, the signal that would catch it, and where the alert lands, then hand it to your build tool, and send that alert somewhere you actually look.
- Next, the course turns to the product only you can scope, starting with Find your build and pick its shape.
Sources
- Public status pages and incident histories of major model providers and cloud hosting platforms.