A plain-English field guide
Built in four tiers, each a bit deeper than the last. Read the first to get the shape, keep going as far as you fancy. No code, no jargon left unexplained.
01
Building software by describing what you want in plain English to an AI, and judging the result by whether it works, not by reading the code.
You steer on outcomes. The AI writes, runs and fixes the actual code underneath. The skill isn't typing code. It's knowing what you want, recognising when it's wrong, and describing the fix clearly.
The work always moves in the same loop:
Things being a bit wrong on the first pass is the normal state, not a sign of failure. Most of vibe coding is that fourth step: spotting what's off and saying so. Get comfortable there and the whole thing stops feeling stressful.
02
Almost every app is the same five pieces. Learn these names and you can follow any conversation about how an app works.
Everything the user sees and taps, running on their phone or browser. It mostly asks for things and displays what comes back. It holds no real secrets and stores nothing permanently itself.
Small programs on a server that each do one job: fetch a question, mark an answer, take a payment. The screen asks; a worker does the actual work and sends the result back.
The permanent filing cabinet: users, scores, who has paid. Anything that must survive the app being closed and reopened lives here. A common one is Supabase.
When an app needs content written or judged on the fly (a fresh question, marking a written answer), a worker calls an AI model and gets text back.
A service that runs the whole thing on the public internet so anyone can reach your web address. Vercel is a popular one. "Deploying" is handing it a new version to put live.
An app is a screen that asks, workers that do, a memory that remembers, a brain that writes, and a landlord that hosts it. Everything below is a zoom-in on one of those.
03
The most useful idea in the whole guide. There are two versions of any app: the local copy on the computer where work happens, and the live version that real users actually hit.
They are different copies, and a change isn't real for users until it's deployed to the live one. You can change a colour, see it look perfect on your screen, and users still see the old version, because your change is sitting on the local copy and hasn't been pushed live yet.
When something "should work but doesn't," ask first: is the live version actually running the change I think it is? Very often the fix was made but never deployed, so users are still on the old one.
04
Apps rarely break with a clear "here's the problem" message. They tend to misbehave in a handful of classic ways. Knowing the shapes means you can describe what you're seeing, which is most of the battle.
Describe exactly what you saw and did: "I tapped play and nothing happened, no error." That short, precise description points straight at one of the shapes above. Vague ("it's broken") makes it ten times slower to find.
05
Git is a time machine with save points. As an app is built, you make commits — labelled snapshots of the whole project at a moment ("added login," "fixed scoring"). Each one is frozen forever, so the project is a long chain of save points going back to the start.
A branch is a parallel copy where you can try something risky without touching what works. The live version lives on a branch called main. To attempt a big redesign you make a new branch, go wild there, and only merge it back into main once it's good. Until then, main keeps serving real users untouched.
Git runs on a computer; GitHub is the website where projects live in the cloud — the master copy everyone agrees on. You push your commits up to it and pull the latest down.
It's the real answer to "what if my laptop dies." Once code is pushed to GitHub, the laptop is just a window onto it. Lose the laptop, get a new one, pull everything back in minutes. Your work isn't tied to any one machine.
06
The supporting cast you'll hear named constantly.
07
You know it's the filing cabinet. Here's the actual structure, because it's simpler than it sounds.
A database is a set of tables, and a table is basically a spreadsheet. An app might have a users table (one row per person, with columns for id, email, whether they've paid) and a scores table (one row each time someone answers a question).
The clever part is the link between tables. Every user has a unique id, and every score row carries that id. So "show me this person's progress this week" means: find their id, then pull every score row with that id from the last seven days. You link by id instead of copying someone's name onto hundreds of rows. That linking is why it's called a "relational" database.
08
A sane app never keeps your real password. It stores a hash — your password run through a one-way scrambler that cannot be reversed. When you log in, it scrambles what you typed and compares the two scrambles. A match means you're right, and the real password was never kept anywhere. (A pinch of per-user randomness, called salt, stops two identical passwords producing the same scramble.)
Proving your password once is step one. Every later tap also has to prove it's still you, but you don't retype the password each time. On success the server hands your device a token — a signed pass — stored in a cookie, and every later request quietly carries it. If that pass expires or stops matching, the server stops trusting it and you get logged out.
09
Computers only truly understand binary — on and off. A programming language is a human-writable layer that gets translated down to that. There are many because there are many jobs; you wouldn't frame a house with a screwdriver.
A web app is unusual in that it speaks several at once, each handling one layer:
What's on the page: this is a heading, this is a button. No looks, no behaviour.
Colour, spacing, fonts, layout. The brand colours, the rounded buttons, the readable type. Pure looks.
Behaviour: what happens when you tap. It's the only language browsers run, so every interactive site uses it — and it also runs server workers, so it covers both ends of an app.
JavaScript plus "types": you declare that a score is a number and an email is text, and the computer catches the mix-ups before anything ships. Common in serious modern apps.
How a worker asks the filing cabinet for exactly the rows it needs.
Not always in an app's core, but the language of AI and data work, so you'll hear it constantly.
React isn't a language — it's a framework, a big pre-built kit on top of JavaScript that makes building app screens far less painful. A framework is to a language what a kit car is to raw metal: same underlying material, most of the hard structure already made for you.
10
A large language model — the AI you build with, and the AI inside an app writing content — is a giant next-chunk predictor. Trained on an enormous amount of text, it learned the patterns of language and code well enough to continue any prompt sensibly. Predict the next bit, over and over, very fast. That's the whole trick.
The AI doesn't read letters or whole words. It reads tokens — chunks of roughly three-quarters of a word. A short word is one token; a longer or unusual one is two or three. Both what you send in and what comes back are measured in tokens.
You pay per token, in and out, so longer prompts and longer answers cost more and take longer. It's why apps that generate a lot of content often use cheaper or free models for the high-volume work, and save the expensive, cleverer models for the genuinely hard tasks.
The AI can only hold so many tokens in view at once — a lot, but finite. Push past it and the oldest parts drop out of sight. This is why very long conversations sometimes "forget" things said near the start, and why building with AI involves keeping the important context in front of it.
Training is the months-long, hugely expensive process where the model's maker teaches it patterns from data. It happens once. Using the finished model to answer — every reply, every piece of content an app generates — is called inference, and it's cheap and quick by comparison. You never pay to train; you pay small amounts to use. Bigger models are smarter but slower and pricier, so the craft is matching the model to the job.
The AI you build with: a clever, conversational assistant that writes and fixes your app's code.
The AI inside the app: a model your app calls to generate or mark content for users, live and at scale, usually on a tight budget.
Same underlying technology, very different jobs and costs.
It predicts plausible text, not verified truth. The same machinery that writes a clean answer can write a confident wrong one — that's called a hallucination. It's exactly why good apps put guardrails around AI output (checking, auto-marking, limits) rather than trusting it blindly. Fluent is not the same as correct.
11
In vibe coding the human isn't redundant — the human is the product brain. The valuable work is:
Describe outcomes, not solutions. Give concrete examples ("like Duolingo's streak"). Test the real thing on a real device. When something breaks, say exactly what you saw. And push back when an answer is vague or unproven.
12