Introduction
Guide 1 was about getting good output. This guide is about not getting fooled by it.
LLMs have one specific failure mode that matters more than all the others combined: they confidently produce plausible-sounding output that is wrong. The wrong output looks identical to the right output. There's no warning shimmer, no italicized "I'm not sure about this," no detectable hesitation. The model says false things in exactly the same tone it says true things.
This is sometimes called hallucination. The word is a bit misleading — it sounds like a rare malfunction. It's not. It's an everyday feature of how these models work. The skill isn't preventing hallucinations (you can reduce them but not eliminate them); it's building habits that catch them before they cost you.
This guide is about those habits. We'll cover why hallucinations happen, where they cluster, how to verify outputs efficiently, and the trust failures most people fall into — especially smart, technical people who think they're skeptical.
What you'll be able to do by the end
- Predict which kinds of questions are most likely to produce hallucinated answers
- Know which verification techniques actually work and which feel like verification but don't
- Recognize the model's tells (and accept that most of the time, there aren't any)
- Calibrate your trust by task type and stakes
- Build a personal workflow for AI-assisted work that's reliable in production
1. Why Hallucinations Happen
A hallucination isn't a glitch. It's the model doing exactly what it's optimized to do, on a question where doing that produces a wrong answer.
Recall the mental model from Guide 1: an LLM predicts plausible next text. It does not have a fact-checker module, a knowledge confidence meter, or a "don't know" reflex. When it's asked something it doesn't know, it doesn't pause and say "I don't know" — that's not how the underlying mechanism works. It generates the most plausible-shaped continuation, which often is a fluent, confident, completely fabricated answer.
Three related things contribute:
Training optimizes for fluency, not honesty. Models are rewarded during training for generating helpful-looking responses. "I'm not sure" looks less helpful than a confident answer, even when "I'm not sure" is more accurate. Modern training pipelines try to correct for this, but the underlying pressure is still there.
There's no internal signal of uncertainty. When you don't know something, you usually feel it — a hesitation, a sense of "I'm reaching." The model has nothing equivalent. From the inside, generating a true answer and generating a false answer feel identical because there is no inside.
Plausibility ≠ accuracy at the long tail. For well-known facts (the boiling point of water, how a for loop works), plausibility and accuracy align — the most plausible-shaped continuation is also true. For obscure, recent, or specific facts (a niche library's API, a 2024 paper on a narrow topic, your customer's account history), the most plausible-shaped continuation can easily be invented. The mechanism is the same; only the territory has changed.
The core insight The model is not lying. It also is not being honest. Those concepts don't apply. It is generating plausible text, and on questions outside its reliable knowledge, plausible text is often fiction.
2. Where Hallucinations Cluster
Hallucinations aren't uniformly distributed. Some categories of questions produce them constantly; others almost never. Knowing the difference is the highest-leverage skill in this guide.
High-risk: where hallucinations are common
- Specific citations and references. "Cite three papers on X." "What was the title of that 2023 study?" The model is fluent at generating plausible-looking paper titles, authors, and even DOIs, none of which exist. This is the single most reliably hallucinated category.
- Niche library APIs and function signatures. "What does
pandas.read_excel_fast()do?" The model knows that pandas has many functions; it may or may not know whether that specific one exists. When it doesn't know, it doesn't say so — it fabricates. - Recent events. Anything after the model's training cutoff. The model has no awareness of its own cutoff in a reliable way, and will often confidently describe events that didn't happen, or miss events that did.
- Specific numerical claims. Statistics, percentages, dollar figures, dates. "Revenue grew 23% YoY." Easy to invent, hard to spot, sounds authoritative.
- Quotations. Verbatim quotes from books, papers, speeches, or articles. Often a paraphrase or invention, presented as a direct quote.
- Anything you specifically pressured the model to provide. If you say "give me three reasons" and there are only two real ones, you'll get three reasons.
Lower-risk: where models tend to be reliable
- Stable, widely-known facts. "What's the capital of Italy?" Almost always right.
- General principles and explanations of well-known concepts. "How does TCP work?" Reliable.
- Tasks grounded in provided context. "Summarize this document I just pasted." Reliable, but with one big caveat we'll cover below.
- Code on common patterns in common languages. Standard CRUD operations, standard data manipulation, standard algorithms in Python/JavaScript/etc. Usually reliable, with the caveat that uncommon APIs in those languages are not.
- Translations between common language pairs. Generally reliable.
The risky middle: feels reliable but isn't
These are the dangerous ones, because they look like the "reliable" category but behave like the "high-risk" one:
- Summaries of documents that include specific facts. The summary is grounded, but specific numbers or names within it can still be fabricated — the model fills in plausible-looking details from pattern, not from the source. Always check specifics against the original.
- Confident-sounding domain knowledge in your specific field. General medicine, general law, general finance — the model is fluent. Specifics of your jurisdiction, your year, your edge case — often invented. The fluency is what makes it dangerous.
- Code referencing real APIs you don't recognize. Looks correct, compiles in your head, doesn't actually exist.
Why "feels reliable but isn't" is the worst category
If a question is obviously hard (niche citations, recent news), you instinctively verify. If a question is obviously easy (capital of Italy), you don't need to verify. The dangerous zone is questions that feel easy but are actually in the risky middle — they fly past your verification reflex because they don't trigger any alarm bells.
This is why building structural verification habits matters more than relying on your intuition. Your intuition for what's hard for a human is poorly calibrated for what's hard for a language model.
The pattern to internalize Specific + verifiable + you can't immediately verify it = treat with suspicion, regardless of how confident it sounds.
3. Verification Techniques That Work
Verification is not optional. It's part of the workflow. The question is what kind, and how much, for each task. Here are the techniques that actually work.
3.1 Check the source of truth, not the model
The single most reliable technique: for any specific factual claim (function name, API signature, statistic, citation, quotation), verify it against the actual source. For code, this is the official documentation. For citations, the actual paper or DOI lookup. For statistics, the source document the summary came from.
This sounds obvious, but in practice people skip it because it feels slow. It's faster than fixing a production bug caused by a hallucinated function name.
Habit to build When you see a specific factual claim in AI output, ask: "Where would I check this if a colleague said it?" Then check there.
3.2 Check the original document, not the summary
If the model summarized a document for you and the summary contains specific claims you'd act on, search the original document for those claims. Don't trust the summary to be faithful to specifics — it usually is, but the failure mode is silent.
A 30-page report becomes a 1-page summary including "revenue grew 23%." You don't immediately know whether the report actually says 23%, 32%, or never gave a percentage at all. Open the original and find "23%" or its absence. Five seconds; catches a meaningful percentage of summary errors.
3.3 Run code; read code
For code, "looks right" is not "is right." The model can generate code that imports nonexistent modules, calls fictional functions, or has subtle bugs that pass casual reading. The minimum bar:
- Read it line by line. If you don't understand a line, you can't trust it. Ask the model to explain that line, or look it up yourself.
- Run it on real inputs, including edge cases. Especially edge cases.
- Run existing tests. If any test breaks, treat the change as suspect.
- Write a new test for whatever the new code is supposed to do.
For anything going to production, layer human review on top. AI-generated code goes through the same review process as human-generated code — minimum.
3.4 Two-run consistency check
For an answer that matters and isn't easily verifiable by other means, run the same prompt twice (in fresh sessions, not as a follow-up). If the model produces meaningfully different answers, neither is trustworthy. If it produces consistent answers, that's weak evidence the claim is something the model "knows" rather than something it just generated this time. Weak evidence — not proof.
This is most useful for "did I prompt this poorly or is the model just unsure?" — a question prone to high variance often improves with prompt iteration.
3.5 Get the model to cite, then verify the citations
For questions where you'd benefit from citations, ask for them. But — and this is critical — treat the citations themselves as claims to verify. The model will happily generate plausible-looking citations that don't exist. The presence of a citation increases verifiability; it does not increase reliability.
A useful pattern: "Answer this question. For each specific factual claim, note where I should verify it." The model will often produce citations or pointers; you then check the ones that matter.
3.6 Use retrieval (RAG) when the corpus exists
The most reliable form of grounding: don't let the model answer from memory at all. If the answer should come from a specific corpus (your internal docs, a knowledge base, the contents of a specific document), use retrieval to put the relevant chunks into the prompt and instruct the model to answer only from those chunks. Modern setups also support "if the answer isn't in the provided context, say you don't know" — and that instruction actually works better than asking the model to estimate its own uncertainty.
This is covered in Guide 1, Section 5. The cross-link is intentional: pattern choice and verification strategy are deeply connected.
4. Verification Techniques That Don't Work
These feel like verification but aren't. Don't rely on them.
4.1 "Are you sure?"
The most common verification attempt, and one of the least reliable. Asking the model "are you sure?" or "is that correct?" does not access an internal certainty signal — there isn't one. The model interprets the question as social pressure suggesting its previous answer was wrong, and often flips to a different answer. That different answer may be right; it may also be wrong, just differently.
In practice: "are you sure?" introduces errors about as often as it removes them. A model that was right will sometimes flip to wrong; a model that was wrong will sometimes flip to a different wrong. The technique looks like skepticism but functions as noise.
What to do instead: verify against an external source, or ask the model to show its reasoning (then check the reasoning). If you must use a follow-up question, ask "what would you check to verify this?" rather than "are you sure?"
4.2 "Why are you confident?"
Variant of the above. The model will generate a plausible-sounding rationale, but the rationale is also generated text — it isn't introspection. A confident-sounding rationale for a wrong answer is just as easy to generate as for a right one.
4.3 Asking the model to score its own confidence
"On a scale of 1–10, how confident are you?" The number you get back is also generated text. It doesn't correspond to anything internal. The model may say "8" for a fabricated citation and "6" for a true one.
4.4 Trusting fluency
A common implicit verification: "this sounds well-written and authoritative, so it's probably right." This is exactly backwards. Fluency is what the model is good at. Confidence is the default tone. Neither tells you anything about accuracy.
The same applies to specific signals like:
- Citing sources (citations can be fabricated)
- Using technical jargon correctly (the model is fluent at jargon)
- "Hedging" with phrases like "according to the literature" (the literature reference may be invented)
- Appeals to authority: "this is the standard approach used by Google" (the model has no real visibility into what Google uses)
4.5 Trusting because it got the easy part right
"It correctly identified the language as Python, so the rest must be right too." Doesn't follow. Models can be confidently correct on easy parts of a task and confidently wrong on hard parts within the same response. Don't extrapolate trust from one section to another.
The trap to avoid Treating fluency, confidence, structure, or apparent reasoning as evidence of accuracy. They aren't. The model is good at all of these regardless of whether it's right.
5. The Sycophancy Problem
Modern LLMs are trained, in part, to be agreeable. This produces an effect called sycophancy: the model tends to align with what it thinks the user wants to hear. Push back on an answer and it will often back down, even if the original answer was correct. Suggest a different answer and it will often agree. Express a strong opinion and it will often validate it.
This matters for verification because it means the model is a bad check on its own work. "Wait, are you sure it's X, not Y?" causes a much higher rate of flipping than the truth warrants. The model isn't reconsidering — it's reading social pressure.
What to do about it
- Don't use leading follow-ups. "Are you sure?" pressures the model toward a flip. "Walk me through how you arrived at that" is neutral.
- Don't reveal your guess before asking. Compare "Is the answer X or Y?" with "I think the answer is X — is that right?" The first gets the model's best guess; the second gets confirmation of yours.
- Treat agreement with skepticism, especially fast agreement. If you push back and the model immediately agrees, that's not evidence you were right. It's evidence the model is sycophantic.
- Be more skeptical of the model when you have a strong prior. A confident user prompts a more agreeable model. Counter-rotate.
6. Calibrating Trust by Task and Stakes
Not every output needs the same level of verification. Spending five minutes verifying a one-line Slack draft is wasted effort; not verifying a SQL query that's about to run in production is reckless. Calibrate.
A useful two-axis frame:
| Low stakes (easy to fix or revert) | High stakes (hard to fix or revert) | |
|---|---|---|
| Low complexity (easy to verify) | Eyeball it; ship it. Email drafts, one-liners. | Quick verify against the source. Renaming variables, simple SQL on prod. |
| High complexity (hard to verify) | Run it; iterate. Refactoring scratch code, exploring data. | Verify rigorously. Tests, code review, run in staging, second pair of eyes. Production code, customer-facing copy, anything with regulatory or financial exposure. |
The shift is mostly between "eyeball" and "verify rigorously." Most workplace AI usage falls into "high complexity, low stakes" (work-in-progress) and "high complexity, high stakes" (production deliverables). The mistake people make is treating production work like work-in-progress.
A few specific stakes calibrations
- Code that touches money, security, identity, or data deletion → high stakes, regardless of how small the change looks. Verify rigorously.
- Customer-facing communication that goes out under your name or your team's name → high stakes. Verify content and tone.
- Anything you'll cite in a meeting, document, or report → high stakes. If you wouldn't bet a small amount of money on it being true, don't put it in the report.
- Throwaway exploration, sketching, ideation → low stakes. Don't over-engineer the verification.
7. Worked Examples
Three real-world scenarios, walked through.
Example 1: The fabricated function
You ask an LLM for help importing data faster from Excel into pandas. It suggests:
df = pd.read_excel_fast('data.xlsx') # 5x faster than read_excel
Looks plausible. pandas has lots of functions. Maybe read_excel_fast is one of them.
Wrong move: run it and see if it errors. Better, but you might miss it if there's a similarly-named function that does something different.
Wrong move: ask the model "are you sure that function exists?" The model may flip and apologize, or double down. Neither gives you reliable information.
Right move: check the pandas documentation. Search for read_excel_fast on the pandas docs site. It doesn't exist. The model hallucinated. The actual answer is that read_excel has performance options (engine='calamine' on newer pandas versions, openpyxl vs xlrd for older), or you can convert XLSX to a faster format like Parquet upfront.
Time to verify: 30 seconds. Time saved by not debugging a script that imports a nonexistent function: indeterminate.
Example 2: The grounded summary with a fabricated number
You upload a 50-page Q3 earnings report and ask for a summary. The summary states: "Revenue grew 23% year-over-year, driven primarily by enterprise sales."
The summary feels reliable — it's grounded in a specific document you uploaded. The general claim "revenue grew, driven by enterprise sales" is likely accurate (consistent with the document's general thrust). But the specific 23% figure? Could be right. Could be a different number from a different section. Could be invented from pattern.
Right move: open the original report. Search for "23%". Confirm the number appears, and that it's in the right context (annual revenue growth, not some other metric). If you can't find it, treat the entire summary's specifics with more suspicion.
Time to verify: 30 seconds. Catches a meaningful percentage of summary errors that would otherwise survive into whatever you're writing next.
Example 3: The confident appeal to authority
You ask an LLM how to structure a microservice migration. It responds with a detailed plan, including: "This is the strangler-fig pattern recommended by Martin Fowler and used at scale by Netflix and Amazon."
The plan itself may be sound (strangler-fig is real, well-known, and a reasonable pattern). But the specific authority claims should not increase your confidence:
- Fowler did write about strangler-fig — verifiable, and worth checking the actual writing rather than the model's paraphrase
- Netflix and Amazon may or may not use this exact approach — the model has no real visibility into their internal practices, and is pattern-matching on "what sounds authoritative"
Right move: evaluate the plan on its merits. If you want to cite Fowler, read Fowler's actual writing. Don't quote a Netflix/Amazon claim from an LLM in a presentation; if you need such a claim, source it from an actual engineering blog or conference talk.
8. Exercises
Exercise 1 — Hallucination prediction
For each of the following prompts, predict on a 1–5 scale how likely the model is to hallucinate a key part of the answer. Don't run them yet — predict first.
- "What is the capital of France?"
- "What are the three most-cited papers on differential privacy from 2024?"
- "Translate this paragraph from English to Spanish: [paragraph]"
- "Explain how
git rebaseworks." - "What is the average annual rainfall in Lagos, Nigeria?"
- "Summarize the Q3 2024 results from this earnings report I uploaded."
- "Write a Python script using the
aiohttp_circuit_breakerlibrary." - "Who is the current CEO of [your favorite mid-sized public company]?"
- "What did Senator X say in last week's hearing about the bill?"
- "Refactor this function for readability: [function]"
Then run them. Compare predictions to outcomes. Note especially where you predicted high reliability and got a fabrication, or vice versa. This is how you calibrate.
Exercise 2 — Build a verification habit checklist
For the next week, every time you use an LLM for work, take ten seconds to note:
- What kind of task was it? (extraction, generation, analysis, code, etc.)
- Was the output something you verified? If so, how?
- Was there anything you almost trusted without verifying that turned out to be wrong?
After a week, you'll have a personal pattern of where your verification habits are sharp and where they're missing. The gaps are usually the most useful finding.
Exercise 3 — The sycophancy test
Pick a topic you genuinely know well. Ask the model a question. When you get an answer, push back — politely but firmly — even if you think the answer was actually correct. ("Hmm, I'm not sure about that — I thought it was X.")
Observe:
- Does the model flip its answer?
- Does it give the same confident tone for both positions?
- Does it acknowledge what changed?
This isn't a clever trick — it's a calibration exercise. You'll feel the sycophancy in a way that's hard to forget.
Reference Appendix
A. Where hallucinations cluster
High risk:
- Specific citations, paper titles, DOIs, page numbers
- Niche library APIs, function signatures, version-specific syntax
- Statistics, percentages, dollar figures, dates
- Quotations attributed to specific people
- Anything after the training cutoff
- Forced quantities ("give me 5 reasons" when there are only 2)
Medium risk (looks reliable, isn't always):
- Summaries of documents containing specific numbers
- Confident-sounding domain knowledge in regulated fields (medicine, law, finance)
- Code using less-common APIs in common languages
Low risk:
- Stable widely-known facts
- General principles and explanations
- Code on common patterns in common languages
- Translations between common language pairs
B. Verification techniques — what works
| Technique | Effort | Reliability |
|---|---|---|
| Check the source of truth (docs, original document) | Low | High |
| Run code on real inputs including edge cases | Medium | High |
| RAG with "say I don't know if not in context" | Setup cost | High (when set up well) |
| Two-run consistency check | Low | Medium |
| Ask for citations, then verify them | Low | Medium |
| Human code review | High | High |
C. Verification techniques — what doesn't work
- "Are you sure?" / "Is that correct?"
- "Why are you confident?"
- Asking the model to rate its own confidence
- Trusting fluency or confidence as evidence
- Trusting because the easy part was right
D. The single most useful question
When you see a specific claim in AI output, ask yourself:
"Where would I check this if a human colleague said it?"
Then go check there. This one habit catches the majority of hallucinations that would otherwise reach your work.
E. Calibration heuristic
If the stakes are high and you can't immediately verify, the right answer is usually one of:
- Verify before using
- Use a different pattern (RAG, agent with tool access, etc.)
- Don't use AI for this part
If the stakes are low, don't over-engineer the verification. Save the rigor for where it matters.
F. The sycophancy reminder
The model agrees with confident pushback more than the truth warrants. Don't take agreement as evidence. Don't reveal your guess before asking. Be especially skeptical of the model on topics where you yourself have strong priors — those are exactly the topics where it's most likely to tell you what you want to hear.
Continue to Guide 3: Responsible Use — data, privacy, IP, security, and accountability when working with AI tools.