Skip to content
AI-Native PM
7 min · 0 of 7 in The Technology Stack

Data, where information lives

Previous chapterNext chapter

You are twenty minutes into a build session. The screens exist, the agent is wiring up the save button, and then it stops to ask: should it set up a database for the saved items, or keep them in a file for now? For most people the word database lands with extra weight, as if this were the moment the project turns into serious infrastructure, but the question is smaller than it sounds. All data lives in some structure, somewhere. Your photos live in files, your household budget lives in a spreadsheet, your bank balance lives in a database. The decision is which structure this data deserves, and the menu is short.

The four forms data can live in

Nearly every piece of data in every piece of software sits in one of these places.

  • A file on disk. The simplest answer, and the right home for drafts, exports, images, and configuration. Files suit data that is read often, written rarely, and touched by one person or one program at a time.
  • A spreadsheet. A file with structure: rows, columns, and simple lookups. Google Sheets and Airtable have made the spreadsheet a respectable data layer for small products, with an editing interface included for free.
  • A database. A purpose-built service for storing and querying structured records while many users read and write at once. Postgres, MySQL, and SQLite are the names you will meet first. This is where the backend keeps everything it must not lose.
  • A data warehouse. A database tuned for analytics across long history ("which feature do paying customers touch most, by month") rather than for serving an app. Snowflake and BigQuery live here, and you will not need one for a long time.
The four forms data can live inFour tiles in a row, ascending in weight: a file on disk, a spreadsheet, a database, and a data warehouse. Each tile carries a small glyph and a note on what the form is for. The database tile is highlighted with a spruce border and a DEFAULT eyebrow, because once data is shared between users or written at once, a database is the safe pick. The warehouse is labeled much later. Caption: for most builds, a hosted Postgres database , move on.FILEread often,written rarelySPREADSHEETsmall, sharedlookupsDEFAULTDATABASEconcurrentreads & writesWAREHOUSEanalytics,much laterFor most builds: a hosted Postgres database. Move on.

A short test sorts almost any piece of data into its form. Who reads it? If only the person who created it, on one device, a file is fine, and the moment data is shared between users or synced across devices, you want a database. How much, how often? Modest volume read casually fits a spreadsheet or any database, while heavy constant traffic calls for a database chosen for that workload. When the test comes up "database," the unexciting default is Postgres, hosted by a provider so you never operate it yourself. Approving that default is the same move you practiced in Languages and frameworks: approve the pick.

Relational versus document, in plain English

Database articles sort the options into two camps, and the distinction is simpler than it looks. Relational databases (Postgres, MySQL, SQLite) hold tables with rows and columns, like a strict spreadsheet. Every row in the users table has the same columns, and one query can answer questions that span tables, like "every comment by every user who signed up in January." Document databases (MongoDB, Firestore) hold JSON-style records with no enforced structure, so two records in the same collection can carry different fields.

Pick relational unless you have a specific reason not to. The strictness works in your favor, because the database itself refuses malformed records, and you can still change the structure later through controlled steps called migrations.

The two kinds of data in an AI product

Once a model enters your product, data splits into two kinds, and the split clears up most of the mystery in how these products behave.

What the model is given this turn. Every request to a model carries everything available for that one answer: instructions, the user's question, snippets retrieved from documents, the recent turns of the conversation. The bundle is called the prompt, or the context. Your backend assembles it fresh for every request, the model returns an answer, and the bundle is gone. It is the same round trip you traced in The journey of a request, with a heavier payload. The bundle also has a hard ceiling called the context window, the maximum amount of text one request can include.

What your product stores. Accounts, documents, preferences, and conversation history are ordinary records in the ordinary forms above, durable between sessions, queryable, and yours to protect.

IN THE PROMPT, THIS TURNSTORED BY YOUR PRODUCTinstructionsthe questionretrieved snippetsrecent turnsrebuilt for every request, then goneaccountsdocumentshistorydurable, queryable, yours to protectProducts feel like they remember because they store, then re-send.

The bridge between the two columns is the entire illusion of memory. When an AI product greets you by name, matches your writing style, or picks up last week's thread, the model did not retain any of it between requests, because it cannot. The product stored those details as records and packed them into the next prompt.

When a product seems to remember you, what actually happened is that the product stored the detail as a record and re-sent it in the next prompt.

The design question that follows is what to store and what to send each turn.

NotebookLM puts both kinds of data on screen where you can see them

Google's NotebookLM makes both kinds of data visible on one screen. You upload sources, such as PDFs, notes, and transcripts, then ask questions, and it answers only from the sources you uploaded, with every answer citing back into them.

Read it as a dissection. The sources are stored data, durable records that wait for you between sessions. Each answer rebuilds its context from them: the product pulls the relevant passages, packs them into a prompt alongside your question, and the model returns an answer whose citations point straight back into the stored sources. Close the tab tonight and the sources are still there tomorrow, while the context behind each answer was assembled for it and then discarded. Most AI products run on exactly this machinery and hide it. NotebookLM leaves it showing, which makes it worth a careful look from any builder.

Pick the smallest form that fits

For a first build, the move is to list what you would actually store, then choose the smallest form that holds it. A tool only you use can keep its data in files. A signup list, a tracker, or a small internal catalog runs honorably on a spreadsheet, and Sheets or Airtable hands you an admin screen for free. Shared records, user accounts, or anything written by many people at once points to hosted Postgres. We have started more than one build on a spreadsheet, and the later upgrade to a database was an afternoon of work rather than a rewrite. Moving up a form is a routine, well-documented job, while dragging unneeded infrastructure through your first months costs you every week.

Try it now

No setup: Audit one AI product you already use, such as a chat assistant, an email tool that drafts replies, or a meeting notetaker. Write two lists. The first list holds what it keeps across sessions: saved conversations, uploaded files, custom instructions, anything a settings page shows it has retained about you. That is stored data. The second list holds what survives only within one conversation: the details you gave it ten messages ago that are absent when you open a fresh thread. That is context, rebuilt every request. If the product seems to remember something across sessions, find the stored record that explains it.

With your tools: Open Claude Code, describe the product you are circling, and ask: "Split this product's data into what we store durably and what we assemble into the model's prompt on each request, as two lists. Then name the smallest storage form that fits the stored list." Challenge anything heavy in its answer, and ask what would have to change before you graduate a form. In Codex or Cursor the move is the same: put the two-list question to the sidebar chat and push back on the storage pick. If your tools are not set up yet, the Setup Clinic gets you there in one sitting.

Chapter Summary

  • Data lives in one of four places: a file on disk, a spreadsheet, a database, or a data warehouse.
  • Pick the smallest form that holds what you need, since moving up a form later is a routine job, not a rewrite.
  • Once records are shared between users or synced across devices, you want a database, and hosted Postgres is the safe default.
  • Prefer a relational database unless you have a specific reason not to, because its strictness keeps malformed records out.
  • An AI product holds two kinds of data: what you store durably, and what your backend packs into the model's prompt for a single request.
  • The prompt, also called the context, is built fresh for every request and then discarded, and it has a size limit called the context window.
  • When a product seems to remember you, the product stored the detail as a record and re-sent it in the next prompt; the model kept nothing.
  • Store less by default, because data you never keep cannot leak, cannot be subpoenaed, and cannot be sent to the wrong place.
  • Every form on that menu still needs a machine to run on, and renting that machine is the subject of Hosting, renting a computer on the internet.

Sources

  • Google NotebookLM documentation on source-grounded answers and citations.
Marks this chapter complete on your course map. Reaching the end does this for you.