Can't I just fine-tune the model on my documents so it knows them?

You can, but it's usually the wrong call. Fine-tuning is poor at storing facts: the knowledge gets baked in vaguely, leaks at the edges, and goes stale the moment your documents change, forcing you to retrain. RAG keeps facts in a store the model reads at question time, so updating knowledge is as simple as editing a file. Use fine-tuning for behaviour, RAG for knowledge.

What is few-shot prompting, and is it different from fine-tuning?

Few-shot prompting means including a handful of examples directly in your prompt to show the model the pattern you want. It changes nothing about the model itself, it's just words in the message, so it's free and instant to try. Fine-tuning teaches the model through actual retraining on many examples, permanently. Few-shot is the cheap first thing to try before you ever consider fine-tuning.

How do I know whether I have a knowledge problem or a behaviour problem?

Ask what's actually wrong with the output. If the model gives wrong or made-up facts, or can't answer about your specific, private, or current information, that's a knowledge problem, reach for RAG. If the facts are fine but the tone, format, or style is inconsistent or off-brand, that's a behaviour problem, and that's where fine-tuning helps. Many real issues are neither and just need a clearer prompt.

Is RAG always cheaper than fine-tuning?

In most cases yes, especially over time. RAG has real engineering cost to set up retrieval well, but it leaves the model untouched and updates by editing documents. Fine-tuning costs more up front (dataset, training run) and keeps costing in maintenance and re-tuning as things drift. The exception is very high-volume narrow tasks, where a fine-tuned model can be cheaper per call than carrying long examples in every prompt.

If I'm just starting, what should I actually do first?

Spend real effort on prompting before anything else. Write clear instructions, add a few examples, and iterate. A large share of what people assume needs RAG or fine-tuning turns out to be solvable with a better prompt, and you'll learn this in an afternoon for free. Only after prompting genuinely runs out of road should you add RAG for knowledge gaps, then fine-tuning for behaviour gaps.

Fine-Tuning vs Prompting vs RAG: Which Should You Use?

A plain-English decision guide to prompting, RAG, and fine-tuning: what each one actually does, when to reach for it, and how to combine all three.

Happyness Mallya·March 28, 2026·11 min read

Photo by Markus Winkler on Unsplash

Sooner or later, everyone who builds anything with an AI model asks the same question: "How do I make this thing know my stuff?" My documents, my tone of voice, my product, my rules. The model is clearly capable, but out of the box it answers like a clever stranger who has never met you. So you start hunting for the lever that makes it yours, and within about ten minutes you've bumped into three intimidating words: prompting, RAG, and fine-tuning.

The internet will happily tell you all three are "how you customise an LLM," which is technically true and practically useless, because they solve completely different problems, cost wildly different amounts, and most people reach for the expensive one first when the cheap one would have done the job. So let me lay them out plainly, in the order you should actually try them, and give you a decision framework you can carry into your own project.

The three levers, in plain English

Before the framework, here's what each one actually is, stripped of jargon.

Prompting is just telling the model what you want, well. You write clear instructions, and often you include a few examples right there in the message ("here are three customer emails and how I'd reply to each, now you do the fourth"). That last trick has a name, few-shot prompting, but don't let the name fool you. It's still just words in the prompt. You change nothing about the model. You change what you say to it.

RAG (retrieval-augmented generation) is giving the model the right pages to read before it answers. You keep your own documents in a searchable store, and when a question comes in, you fetch the relevant passages and paste them into the prompt alongside the question. The model then answers grounded in your real material instead of its frozen training memory. If you want the full walkthrough, I wrote one in What Is RAG (Retrieval-Augmented Generation)? — but the one-line version is: look it up first, then answer.

Fine-tuning is actually retraining the model a little, on your own examples, so its behaviour changes. You feed it hundreds or thousands of input-output pairs ("when the input looks like this, respond like that") and the model's internal settings shift to make that pattern its default. After fine-tuning, you have a genuinely different model that has absorbed a style, a format, or a specialised skill into its bones.

Notice the pattern. Prompting changes your words. RAG changes the model's reading material. Fine-tuning changes the model itself. They escalate in power, but they also escalate in cost and effort, which is exactly why the order you try them in matters.

Start with prompting

Almost every problem people try to solve with RAG or fine-tuning turns out, on inspection, to be a prompting problem in disguise. The model wasn't lacking knowledge or training. It was lacking clear instructions.

Prompting is the cheapest, fastest, most reversible lever you have. There's nothing to build, nothing to host, nothing to retrain. You edit a string and try again. You can iterate ten times in the time it would take to set up either of the other two approaches. So it should always be your first move, and often your only move.

A few-shot example is your secret weapon here. If you want the model to classify support tickets into five categories, you don't need to fine-tune it on ten thousand labelled tickets. You can often just show it three or four labelled examples in the prompt and it will catch the pattern immediately. If you want a specific output format, like JSON with exact field names, show it the format once. Models are remarkable mimics; give them a clear target and they hit it.

When does prompting run out of road? When you hit one of two walls. Either the model needs to know something it simply wasn't trained on (your private data, today's prices), or you need it to behave a consistent way so reliably that stuffing examples into every single prompt becomes impractical. Those two walls point to the next two levers, in that exact order.

If you want to get genuinely good at this first lever before reaching for the others, I put the practical techniques in How to Write AI Prompts That Actually Work.

Add RAG when the problem is knowledge

The first wall, the model not knowing your stuff, is a knowledge problem, and RAG is the knowledge tool.

Here's the key insight people miss: a model's training is a frozen snapshot. It was never trained on your internal handbook, your support tickets from last night, or the price you changed this morning, and no amount of clever prompting can summon information that isn't there. You can't prompt your way to facts the model never saw. (For why the model's memory is frozen like this, see How Large Language Models Actually Work.)

RAG solves this by changing what the model gets to read, not what it is. Real examples where RAG is the right call:

A support bot that answers from your current help docs, so when you update a policy, the bot updates with it, no retraining required.
An internal tool that lets staff ask questions across thousands of company documents and get answers with citations they can verify.
A research assistant that pulls from a constantly growing library of papers, where the knowledge changes faster than any training run could keep up.

The thread running through all of these: the information is private, changing, or both. That's RAG's home turf. It also meaningfully reduces hallucination, because a model handed the actual passage has far less room to invent one. It's reading instead of guessing.

What RAG does not fix is behaviour. RAG can hand the model your entire knowledge base and the model will still answer in its own generic voice, in whatever format it feels like, with whatever quirks it came with. Retrieval feeds the brain; it doesn't change the personality. For that, you need the third lever.

Fine-tune when the problem is behaviour

The second wall, needing rock-solid consistent behaviour, is a behaviour problem, and fine-tuning is the behaviour tool.

Fine-tuning shines when you need the model to always do something a particular way and prompting it every time has become unreliable or unwieldy. Good real-world fits:

A model that must always output in a strict, unusual format your downstream system depends on, where even a 1-in-50 deviation breaks things.
A brand voice so specific that describing it in a prompt never quite lands, but feeding the model a thousand examples of "us" teaches it the feel.
A narrow, repetitive classification or extraction task done at high volume, where baking the skill in is cheaper per call than carrying examples in every prompt.

The common thread here is behaviour, style, or format that must be consistent and is hard to capture in words. When you can't easily explain the pattern but you can show lots of examples of it, fine-tuning earns its keep.

But be honest with yourself about the cost. Fine-tuning means assembling a quality dataset (often the hardest part, and it has to be genuinely good, because the model learns your mistakes too), running and paying for a training job, and then hosting or managing a custom model that you now have to maintain and re-tune as things drift. It is the most powerful lever and the most expensive in time, money, and ongoing upkeep. It's also the wrong tool for knowledge: people try to fine-tune facts into a model and find it expensive, leaky, and instantly stale. Facts belong in RAG. Behaviour belongs in fine-tuning.

They combine, and the best systems use all three

These aren't rivals on a shelf where you pick one. The most capable systems layer them.

Picture a polished customer-support assistant. It's fine-tuned so it always replies in the company's warm, on-brand voice and in a tidy structured format. It uses RAG to pull the customer's actual order history and the latest help docs, so its answers are grounded in current, private facts. And it runs on a carefully written prompt that ties it together with clear instructions and a couple of few-shot examples for the tricky edge cases. Behaviour from fine-tuning, knowledge from RAG, glue from prompting. Each lever doing the one job it's actually good at.

The reason the order still matters even when you combine them is cost discipline. If you start with prompting, you'll discover that maybe 70 percent of what you wanted was just an instructions problem, and you'll have spent nothing finding that out. Then RAG handles the knowledge gap for a modest engineering effort. Only what's genuinely left, the stubborn behaviour you couldn't prompt your way to, justifies the real expense of fine-tuning. Reach for the expensive lever first and you'll often spend weeks training a model to do something a better paragraph would have fixed in an afternoon.

So the next time you ask "how do I make the AI know my stuff?", pause and split the question. Is it a knowledge gap, a behaviour gap, or just an instructions gap? Name which wall you've actually hit, and the right lever names itself.

Frequently asked questions

Can't I just fine-tune the model on my documents so it knows them?: You can, but it's usually the wrong call. Fine-tuning is poor at storing facts: the knowledge gets baked in vaguely, leaks at the edges, and goes stale the moment your documents change, forcing you to retrain. RAG keeps facts in a store the model reads at question time, so updating knowledge is as simple as editing a file. Use fine-tuning for behaviour, RAG for knowledge.
What is few-shot prompting, and is it different from fine-tuning?: Few-shot prompting means including a handful of examples directly in your prompt to show the model the pattern you want. It changes nothing about the model itself, it's just words in the message, so it's free and instant to try. Fine-tuning teaches the model through actual retraining on many examples, permanently. Few-shot is the cheap first thing to try before you ever consider fine-tuning.
How do I know whether I have a knowledge problem or a behaviour problem?: Ask what's actually wrong with the output. If the model gives wrong or made-up facts, or can't answer about your specific, private, or current information, that's a knowledge problem, reach for RAG. If the facts are fine but the tone, format, or style is inconsistent or off-brand, that's a behaviour problem, and that's where fine-tuning helps. Many real issues are neither and just need a clearer prompt.
Is RAG always cheaper than fine-tuning?: In most cases yes, especially over time. RAG has real engineering cost to set up retrieval well, but it leaves the model untouched and updates by editing documents. Fine-tuning costs more up front (dataset, training run) and keeps costing in maintenance and re-tuning as things drift. The exception is very high-volume narrow tasks, where a fine-tuned model can be cheaper per call than carrying long examples in every prompt.
If I'm just starting, what should I actually do first?: Spend real effort on prompting before anything else. Write clear instructions, add a few examples, and iterate. A large share of what people assume needs RAG or fine-tuning turns out to be solvable with a better prompt, and you'll learn this in an afternoon for free. Only after prompting genuinely runs out of road should you add RAG for knowledge gaps, then fine-tuning for behaviour gaps.

Liked this essay?

Get the next one in your inbox. One thoughtful email a week, nothing more.

Keep reading

Retrieval-augmented generation (RAG) — an abstract layered data structure

Technology

What Is RAG (Retrieval-Augmented Generation)? Plain English

A calm, plain-English guide to RAG (retrieval-augmented generation): why it exists, how the pipeline works, where it's used, and its honest limits.

May 14, 2026 · 9 min read

What is vibe coding — program code displayed on a screen

Technology

What Is Vibe Coding? A Plain-English Guide

Vibe coding means describing what you want and letting AI write the code while you steer. Here is how it works, what it's great for, and where it bites.

June 10, 2026 · 8 min read

AI agents concept — an abstract network of connected nodes

Technology

AI Agents Explained: What Actually Changed in 2026

A calm, plain-English explainer on AI agents: how they differ from chatbots, the core loop, real examples, why 2026 feels different, and honest limits.

June 8, 2026 · 10 min read

Fine-Tuning vs Prompting vs RAG: Which Should You Use?

The three levers, in plain English

Start with prompting

Add RAG when the problem is knowledge

Fine-tune when the problem is behaviour

They combine, and the best systems use all three

Frequently asked questions

Further reading on this site

Liked this essay?

Related articles

What Is RAG (Retrieval-Augmented Generation)? Plain English

What Is Vibe Coding? A Plain-English Guide

AI Agents Explained: What Actually Changed in 2026