Every team that gets serious about AI hits the same fork in the road: do you keep handing the model a single context file, or do you build retrieval (RAG)? It's framed as a technical decision, but it's really a question of scale and freshness. Get it right and your AI answers like it knows your world. Get it wrong and you either drown the model in irrelevant text or spend a weekend wiring a vector database you never needed. Let's settle it.
What a context file actually is
A context file is a single document — context.md, CLAUDE.md, a Custom Instructions block — that you load in full at the start of a session. Your role, your standards, your active projects, the decisions you've already made: all of it goes into the model's context window every single time. The model sees the whole thing, top to bottom, on every turn.
This is the simplest possible memory. There's no index, no search, no infrastructure — just text you re-supply (or that the tool auto-loads for you). Because the model reads all of it, nothing relevant can be "missed." The only ceiling is size: a context file works beautifully right up until your knowledge stops fitting comfortably in the window.
What RAG actually is
RAG stands for retrieval-augmented generation. Instead of loading everything, you chop your knowledge into many small chunks, convert each into a numerical fingerprint (an embedding), and store them in a search index (often a vector database). When you ask a question, the system embeds your query, finds the handful of chunks most similar to it, and injects only those into the prompt. The model "augments" its answer with retrieved context — hence the name.
The payoff is scale: RAG can sit on top of thousands of documents and still send the model only the few paragraphs that matter for the question at hand. The cost is everything you just added — an embedding step, a store to maintain, a retrieval step that has to pick the right chunks, and a new failure mode when it picks the wrong ones.
A context file gives the model everything, every time. RAG gives the model only the relevant slice, on demand. One trades scale for simplicity; the other trades simplicity for scale.
The tradeoffs that actually decide it
Four dimensions separate the two. Most "which should I use" debates are really arguments about one of these four.
| Dimension | Context file | RAG / retrieval |
|---|---|---|
| Size / scale | Bounded by the context window — great up to ~10–20 pages, then it strains | Effectively unlimited; sits on thousands of documents |
| Freshness | You edit one file; changes apply instantly on next load | You must re-embed and re-index when source content changes |
| Cost per request | You pay for the whole file's tokens every call — cheap when small, expensive when huge | You pay only for the retrieved chunks — cheaper at large scale |
| Complexity | Plain text. No infra. Easy to read, diff, and debug | Embeddings + vector store + retrieval logic to build and maintain |
| Reliability | Nothing is "missed" — the model sees all of it | Only as good as retrieval; the right chunk has to surface |
Notice the pattern: the context file wins on simplicity, freshness, and guaranteed coverage; RAG wins on scale and per-request cost once you're large. There's a crossover point, and the whole decision is figuring out which side of it you're on.
One context method, every issue.
SmarterContext is a free newsletter on the context layer — how to feed AI exactly the right information, one field-tested method at a time. No fluff, no spam.
A decision framework
Don't start with RAG. Start with the simplest thing that works and climb only when it stops. Here's the ladder.
Default: a context fileStart here
If everything the model needs fits in roughly 10–20 pages or less and doesn't change many times a day, use a context file. This covers the vast majority of individuals and small teams. It's free of infrastructure, instantly editable, and impossible to "mis-retrieve." Reach for anything fancier only when this genuinely breaks.
Outgrowing one file: split + summarizeIn between
Before jumping to full RAG, try the middle ground. Break your one file into a few topic files and load only the relevant one per task. Or keep a short index/summary always loaded and link out to detail. Modern large context windows (200K–1M tokens) push this stage much further than people expect — you can hold a lot before retrieval earns its keep.
Graduate to RAGWhen it scales
Move to retrieval when your knowledge is bigger than any window, changes constantly, or must be searched — a documentation site, a support knowledge base, years of notes, many clients' material. At that point the per-request token savings and unlimited scale of RAG outweigh its setup cost. Below it, RAG is complexity you'll regret.
The decision in one sentence each
If you only remember one heuristic per side, make it these:
- Use a context file when your knowledge is small, stable, and you want zero infrastructure and guaranteed coverage.
- Use RAG when your knowledge is large, constantly changing, or searchable, and you can accept the cost of building and tuning retrieval.
- Use both — the strongest setups keep a tiny always-loaded context file for the essentials and let RAG handle the long tail.
Concrete examples
The framework is easiest to feel with real cases:
- A freelance writer who wants AI to match their voice and remember three active clients: a single
context.md. Tiny, stable, RAG would be absurd here. - A developer on one codebase with conventions and architecture decisions: a
CLAUDE.mdin the repo root. The tool auto-loads it; nothing to retrieve. - A support team answering from a 2,000-article help center that updates weekly: RAG. No window holds it, and the content changes too often to keep pasting.
- A consultant with 40 client folders of contracts and notes: RAG over the archive, plus a small always-loaded file with their standards and working style. The file guarantees the essentials; retrieval finds the right client doc.
Bigger windows raise the ceiling but don't remove the need. Filling a 1M-token window costs more per call, can dilute the model's attention so it misses buried details, and still can't hold an entire enterprise knowledge base. Large windows make context files viable for more cases — RAG stays the answer when your corpus is bigger than any window or changes by the hour.
Start simple, earn the complexity
The most common mistake is reaching for RAG because it sounds like the "real" engineering answer. It usually isn't — not yet. A clean context file fixes the forgetting problem for most people on day one, with no infrastructure to maintain and nothing that can silently retrieve the wrong thing. Build the file first, live with it, and let the pain tell you when you've genuinely outgrown it. When your knowledge no longer fits the window or changes faster than you can edit, that's your signal to graduate — and by then you'll know exactly what you need retrieval to do.