AI Without the Hype — a field guide, from nesting dolls to agents

Say the word "AI" at a dinner table and you'll get five different pictures in five different heads. Someone thinks of the chatbot that helped write their last email. Someone thinks of the recommendation feed that knows them a little too well. Someone thinks of a Hollywood robot with opinions about humanity. Someone thinks of their job. And someone thinks it's all a bubble.

They're not wrong, exactly. They're each holding a different piece of the same thing. "AI" has become one of those words that's used so often it's stopped meaning anything specific — like "cloud" a decade ago, or "digital" before that.

This is a field guide to the real thing underneath the word. No hype, no doom, no jargon left undefined. By the end you'll be able to tell the difference between AI, machine learning, generative AI, and the "agents" everyone suddenly can't stop talking about — and, more usefully, you'll know when to reach for these tools and when to keep your hands off.

I build with these tools every day, so this isn't a view from the bleachers. It's what I'd tell a smart friend who asked, "Okay, but what is all this, really?"

01The nesting dolls

Here's the single most useful thing to hold in your head. These terms aren't rivals. For the most part they're nested — each one lives inside the last, like Russian dolls.

Artificial Intelligence⊃Machine Learning⊃Deep Learning⊃Generative AI→Agentic AI

The map on one screen. The first four nest cleanly — each is a narrower kind of the one before. The fifth is different: today's agents are generative AI given tools and a loop — built on top, not strictly inside. More on that below.

The biggest doll, Artificial Intelligence, is the whole ambition: machines doing things we'd call "intelligent" if a person did them. It's old — the term dates to a 1956 workshop at Dartmouth College — and most of it, for most of its history, had nothing to do with the chatbots of today.

Inside it sits Machine Learning: instead of a programmer writing the rules, the machine learns the rules from examples. Inside that is Deep Learning, a particular style of machine learning using neural networks with many layers. Inside that is Generative AI — the deep-learning systems that produce new content: text, images, code, audio. And the newest layer, built on top of these, is Agentic AI: generative models wired up so they can take actions in the world, not just produce a wall of text.

A note on that last one, because a careful reader will spot it. The first four really do nest — each is a smaller, more specific kind of the thing around it. Agents don't fit that pattern. An agent isn't a narrower kind of generative AI; it's a generative model with scaffolding bolted on — tools, memory, a control loop. Think of the fifth doll less as sitting inside the fourth and more as the fourth given hands. (Autonomous agents in the older sense — game-players, robots — predate generative AI entirely. But the ones everyone's talking about now are built right here.) We'll come back to why that distinction matters for safety.

When a headline says "AI," it almost always means one of the two smallest dolls. The confusion comes from using the biggest word for the smallest thing. Let's open each one.

02The idea that changed everything

For decades, making a computer "smart" meant a human writing down the rules. Want to catch spam? A programmer writes: if the email contains "FREE MONEY," flag it. This works right up until spammers write "FR€€ M0NEY," and now you're writing rules forever, losing a war one clever spelling at a time.

Machine learning flipped the approach. Instead of writing the rules, you show the machine thousands of emails already labelled spam or not spam, and let it figure out the patterns itself. Nobody tells it to look for "€" — it discovers, from the examples, that certain patterns tend to go with spam. That's the whole idea, and it's genuinely profound: the program isn't written, it's grown from data.

This is why your photos app can find every picture of your dog, why your bank flags a strange transaction, and why Netflix has an unsettlingly good guess about your Friday night. None of those were hand-coded rule by rule. They were trained on examples.

"Training" is worth pausing on, because it's the word that trips people up. Training is the phase where the model looks at mountains of examples and slowly adjusts millions, billions, sometimes trillions of internal numbers — called parameters or weights — until its guesses match reality. It's expensive, slow, and done once (or occasionally). "Inference" is the opposite: the everyday act of using the finished model to get an answer. Training is writing the textbook; inference is looking something up in it. When people worry about AI's energy use, they're often picturing training — a massive one-time cost — but the quieter, constant cost is billions of inferences a day.

Deep learning, the next doll in, is just machine learning done with neural networks: layers of simple math loosely inspired by how neurons connect. Stack enough layers ("deep") and feed them enough data, and these networks turned out to be shockingly good at messy, real-world patterns — images, sound, and, crucially, language. Everything that follows is deep learning.

03Generative AI, and the one trick behind it

Here's where the last few years happened.

Most machine learning classifies: spam or not, dog or cat, fraud or fine. Generative AI produces. Ask it a question and it writes you a paragraph. Describe a picture and it paints one. It creates content that never existed before — hence "generative."

The systems behind text — Large Language Models, or LLMs — run on a trick that sounds almost too simple to be interesting: they predict the next word.

That's it. That's the engine. Give the model "The capital of France is" and it predicts the most likely next chunk of text: "Paris." Give it the first half of your email and it predicts the second. It does this over and over, one piece at a time, each new word feeding back in to predict the next. (That's the raw engine. The polite, helpful assistant you actually talk to is this engine plus a layer of training that reshapes what it chooses to say — but next-word prediction is what's humming underneath all of it.)

People hear this and shrug — "so it's fancy autocomplete?" And, mechanically, yes. But something strange and important happens at scale. To predict the next word well across billions of examples — code, poetry, physics papers, arguments, recipes, jokes — the model is forced to build internal representations of grammar, facts, tone, reasoning patterns, the shape of a good explanation. Predicting the next word well enough, across enough of human writing, turns out to require a working model of an enormous amount of how the world gets described. Autocomplete this good stops feeling like autocomplete.

A few mechanics worth knowing, because they explain almost every quirk you'll meet:

Tokens. The model doesn't read letters or whole words — it reads tokens, chunks of text roughly ¾ of a word on average. "Copenhagen" might be one token; an unusual name might be several. Everything is counted, priced, and limited in tokens. It's part of why models can stumble on letter-level tasks like counting the r's in a word — they work in chunks, not characters, so spelling-level operations don't come naturally (even though newer models handle the classic examples fine).
Embeddings. Each token gets turned into a long list of numbers — an embedding — that positions it in a vast "meaning space" where related ideas sit near each other. "King" lands near "queen"; "Copenhagen" near "Denmark." This is how a machine that only does arithmetic can operate on meaning: it turned meaning into geometry.
The context window. The model can only "see" so much text at once — its context window, measured in tokens. Everything you've said in a conversation, plus its own replies, has to fit. Run past it and the earliest parts fall out of view — which is why a long chat can seem to "forget" how it began. Windows have grown enormously, but they're never infinite.
Temperature. Because the model deals in probabilities for the next token, there's a dial — temperature — for how adventurous it gets. Low temperature: it leans hard toward the likeliest words, sounding focused and a bit repetitive (turn it all the way down and it's nearly deterministic). High temperature: it takes more risks, sounding more creative and more prone to wandering. Same model, different personality, one number.

Why it makes things up

Now the most important thing to understand about generative AI, the thing that should shape how you use it: it is a fluent guesser, not a database.

An LLM has no lookup table of facts it consults. It generates the most plausible-sounding continuation based on patterns it learned. Most of the time, plausible and true line up — the patterns of real language are mostly about real things. But when they don't, the model will produce something beautifully written, perfectly confident, grammatically flawless, and completely false. A court case that never happened. A citation to a paper that doesn't exist. A quote the person never said.

This is called hallucination, and here's the part people miss: it's not a bug being patched out — it's the flip side of the exact same mechanism that makes the model useful. The system that improvises fluent language for you is the system that improvises fluent nonsense. You can reduce it — better training, giving the model real sources to work from, teaching it to say "I don't know" — but you can't fully separate the creativity from the confabulation, because they're the same act. A model built so it could never venture past what it can verify would lose much of what lets it write you something genuinely new.

How confident an LLM sounds tells you nothing about whether it's right. It is exactly as smooth and certain when it's wrong as when it's right.

That's the single rule to tattoo somewhere. Fluency is not truth.

One recent wrinkle

Reasoning models. Instead of answering in one shot, newer models can spend extra effort thinking step-by-step before they reply — a private scratchpad of working-out. This noticeably improves maths, code and multi-step logic, and it cuts (never eliminates) the confident-nonsense problem, because the model gets to check its own work.

It doesn't repeal the rule above — a reasoning model can still be fluently wrong — but on hard problems it's a real step up from pure first-guess autocomplete. It's the biggest shift of the last couple of years, and it's worth knowing the option exists.

04Grounding it: "read these first"

If a model makes things up because it's guessing from memory, the obvious fix is: don't make it guess from memory. Hand it the actual documents and tell it to answer from those.

That's the whole idea behind RAG — Retrieval-Augmented Generation, an ugly name for a sensible trick. When you ask a question, the system first retrieves the relevant text — from your company's files, a manual, a knowledge base — often by turning each passage into its own point in that same meaning space and grabbing the ones nearest your question. It pastes those into the model's context window, and then asks the question. The model answers from what's in front of it, not from fuzzy memory. It can even point you to where each claim came from — though it's still worth spot-checking that the source actually says what it's cited for.

RAG in one picture: fetch the real passages, put them in front of the model, then ask. It changes the game from "trust my memory" to "here's the source, check my work."

This is why "chat with your PDF" and most useful company AI tools work the way they do. It doesn't make hallucination impossible — the model can still misread what it's given — but it moves you from blind trust to something checkable. That's most of what separates a toy from a tool.

05Agentic AI, or giving the model hands

Everything so far is a model that talks. You ask, it answers, the loop ends. Useful, but passive — a brilliant intern who can only ever hand you a memo.

Agentic AI is what happens when you give that intern hands and a to-do list. The breakthrough is unglamorous: you let the model use tools. You tell it, "you can search the web, run this code, query this database, send this email, book this calendar slot" — and, critically, you let it decide when to reach for each one and what to do next based on the result.

That last part is the whole ballgame. An agent runs in a loop:

The agent loop. It keeps going — perceive, plan, act, observe — deciding each step from what the last one returned, until the goal is met or it gives up.

Ask a plain chatbot to "book me a table for four near the office Thursday" and it writes you a lovely paragraph about how it can't do that. Ask an agent with the right tools, and it checks your calendar, searches restaurants, reads the availability, picks one, books it, and adds it to your calendar — deciding each step based on what the last step returned. The intern stopped writing memos and started running the errand.

This is genuinely powerful, and it's also where the sharp edges are. An agent that can act can act wrongly — book the wrong night, email the wrong person, delete the wrong file, spend real money, and do it faster than you can catch it. It compounds its own mistakes: one bad observation leads to a bad plan leads to three more bad actions. Every capability you hand an agent is also a way it can go wrong without asking. The engineering that matters most in agents isn't making them capable — it's the guardrails: what they're allowed to touch, and when they have to stop and ask a human first. (This is the "given hands" caveat from the very first section, made real.)

If 2023 was the year AI learned to talk, the years since have been about teaching it to do — carefully.

06The jargon decoder

The terms you'll actually run into, one line each:

Model: The trained system itself — the "brain." Everything else is plumbing around it.
LLM: Large Language Model — a model trained on vast amounts of text to predict and generate language.
Token: The chunk of text a model reads and writes in; roughly ¾ of a word. What you're charged and limited by.
Prompt: What you send the model: your question plus any instructions and context.
Context window: The maximum amount of text (in tokens) a model can consider at once.
Hallucination: When a model states something false with total confidence. A feature of guessing, not a fixable glitch.
Training vs. inference: Learning from data (once, expensive) vs. using the finished model (constantly).
Fine-tuning: Training a general model a bit more on specialised examples to shape its behaviour.
RAG: Feeding the model real source documents at question time so it answers from them, not from memory.
Parameters / weights: The millions-to-trillions of internal numbers a model tunes during training. A rough proxy for size.
Reasoning model: One that "thinks" step-by-step before answering, trading extra time for better hard-problem accuracy.
Multimodal: A model that handles more than text: images, audio, video, together.
Agent: A model given tools and a loop, so it can take actions, not just produce text.

07How a model gets its manners

One more piece, because it explains why today's assistants feel so different from the raw technology. A model is built in roughly three stages:

Pretraining. The model reads a staggering amount of text and learns, purely by predicting the next token, the patterns of language and the world as described in it. Out of this comes something knowledgeable but wild — a savant with no sense of what you actually want.
Fine-tuning. It's then trained on curated examples of good responses — helpful, on-format, following instructions — which shapes the raw ability into something that behaves like an assistant.
Alignment (RLHF and friends). Finally, humans (and increasingly other AI) rate its answers, and the model is nudged toward the ones people prefer. RLHF — Reinforcement Learning from Human Feedback — is the best-known version. This stage is why the model is polite and hedges on dangerous questions. It also nudges it to admit uncertainty — imperfectly, since the same training can make a model sound more agreeable and confident than it should, which is exactly why fluency still isn't truth.

None of this makes the model know things the way a database does — it's still, underneath, the fluent guesser. It just learned to guess in a more helpful, better-behaved way. (One caveat that matters in 2026: the same reinforcement-learning machinery, pointed at problems with checkable answers like maths and code, is also how those "reasoning" models are trained to think in steps — so this stage can add real capability now, not just polish.)

08Using AI well

So how should you actually use this? The honest framing: an LLM is a fast, tireless, wildly well-read intern with no memory of yesterday, occasional confident delusions, and zero stake in being right. Manage it like that and you'll get enormous value. Trust it like an oracle and it'll eventually embarrass you.

It's genuinely excellent at:

First drafts of anything. Emails, outlines, plans, code, cover letters. A blank page is expensive; a mediocre draft to react to is cheap. The model closes that gap instantly.
Transforming text you already have. Summarise this report, reformat these notes, make this friendlier, translate this. Here it barely guesses — the source is right there — so it's at its most reliable.
Explaining and tutoring. Ask it to explain something at any level, then ask "why?" five times. A patient tutor that never sighs is a real gift for learning.
Brainstorming and un-sticking. Forty names, ten angles, the counter-argument to your own position. Quantity on demand, and you keep the good ones.
Coding. Boilerplate, unfamiliar syntax, explaining an error, a rough first version. This is where the productivity leap is most real right now — including how much of my own work gets built.

A few habits that separate people who get a lot from AI from people who get burned:

Give it context. "Improve this" is weak. "Rewrite this for a skeptical CFO, under 150 words, plain English" is strong. The model can't read your mind — only your prompt. Vague in, vague out.
Ask for the reasoning, then check it. "Walk me through how you got there" both improves the answer and gives you something to sanity-check.
Treat every factual claim as a lead, not a verdict. Names, numbers, dates, quotes, citations, legal or medical specifics — verify before you rely on them. The fluent-guesser problem is always present.
Keep a human in the loop for anything that matters — especially the moment it can act, not just suggest.

09When not to use it

Just as important, and less often said. Some of these are "don't," and some are "only with your eyes open."

High-stakes decisions with no human check. Medical, legal, financial, safety. Use it to understand your options, to prepare questions, to draft — never as the final word where a confident hallucination could genuinely hurt you. It doesn't know it's wrong, and it can't be held responsible. You can.
Anything needing current, exact truth it can't look up. A plain model's knowledge has a cutoff and no live feed. Today's price, this week's news, a real-time number — either give it a tool to look it up, or don't ask.
Confidential or personal data you don't control. Assume anything you paste into a consumer tool may be stored or used to improve the service unless you've checked the terms and settings. Client secrets, health info, passwords, proprietary code — know the tool's data policy before you hand it something you can't take back.
Things you can't verify. If a claim is confidently stated and you have no way to check it, you don't have an answer — you have a guess wearing a suit. In a domain where you can't tell right from wrong-but-plausible, the tool is at its most dangerous, because it removes the friction that used to make you look things up.
When checking costs more than doing it yourself. Sometimes verifying a plausible-looking answer takes longer than just doing the task. Notice when that's true.

And a quiet one: the skill you delegate is the skill you lose. Using AI to get unstuck builds you up; using it to never think hollows you out. Let it handle the toil; keep doing the thinking you actually want to be good at. The goal is a sharper you, not a dependent one.

10The traffic-light rule

If you remember nothing else, remember three lights.

Most bad AI experiences are just a yellow treated like a green, or a red treated like a yellow.

🟢 Green — go. Low stakes, easy to verify, or you're transforming material you already have. Drafts, summaries, brainstorms, explanations, code you'll test, translation you can gut-check. Let it rip.
🟡 Yellow — verify. Useful, but check before you trust or send. Facts, figures, names, citations, anything that leaves your desk with your name on it. Use it, then confirm it.
🔴 Red — stop, or human-only. High-stakes and unverifiable. Final medical/legal/financial calls, confidential data in tools you don't control, letting an agent act irreversibly without a human gate. Slow down.

11The honest bottom line

The hype says AI is either about to solve everything or take everything. Up close, from inside the work, it's neither. It's a genuinely new kind of power tool — the biggest lever on knowledge work in a generation — that happens to be fluent, tireless, occasionally and confidently wrong, and completely without judgement about when its output matters.

That combination rewards a specific kind of person: not the one who trusts it most, and not the one who refuses to touch it, but the one who knows what it's doing under the hood and stays in the loop. The person who can tell green from yellow from red. The person who lets it draft, and does the deciding themselves.

You now know what's inside all five nesting dolls, why the thing makes things up, and how to tell a good job for it from a dangerous one. That's more than most people arguing about AI online can say.

The tool is remarkable. The person who verifies wins.