amarnathresearch.com

RAG for Dummies

Who pretend to know everything by using jargons.
Here's what actually happens under the hood.

Part 1 — The Indexing Phase

Jump to Part 2: Query Phase →

Someone at a party says "Oh yeah, we use RAG with vector embeddings in our semantic search pipeline" and everyone nods like they understand.

They don't. Let's fix that.

RAG (Retrieval-Augmented Generation) is just a fancy way of saying: "Let the AI read your documents before answering." That's it. The rest is implementation details — which we'll walk through with a real example, animated, and with zero hand-waving.

The Indexing Phase is the prep work. It's like making flashcards before an exam — you do it once, and then you can answer questions fast.

🎬 Watch the demo in action

See exactly how RAG works — upload a document, ask questions, get answers grounded in your data. All running locally, no API keys, no external calls.

🚀 Try it yourself: Visit our live demo on HuggingFace Spaces — anyone can practice with their own documents, instantly.

1
The starting point
You Have Documents. That's It.

PDFs, Word docs, text files — whatever you've got. A document loader opens each file and rips out the raw text, page by page.

The important part? It also records where each piece of text came from — the filename, the page number. This is metadata, and it's how you'll trace answers back to the source later.

What actually happens
📄 company_handbook.pdf PyPDFLoader Raw text + metadata
🔍 Real example — page 4 extracted
"Our company offers 20 days of paid leave per year for full-time employees. Part-time employees receive 10 days. Unused leave can be carried over to the next year, up to a maximum of 5 days. Employees must submit leave requests at least 2 weeks in advance..."
🏷️ Metadata — the sticky note attached
{ source_file: "company_handbook.pdf", page: 4 }

Translation: "This text came from company_handbook.pdf, page 4." That's all metadata is. A label.
2
The chopping block
Chunking. AKA "Cutting Text Into Pieces."

A full page is too long to search effectively. So we cut it into smaller pieces called chunks. Around 800 characters each. About the length of this paragraph three times.

The secret sauce? Each chunk overlaps with the next one by ~200 characters. Why? Because if an important sentence happens to land right at the cutting point, it'll show up in both chunks. No information lost.

People at conferences call this "recursive character text splitting with configurable overlap". You can call it "chopping with safety margins."

Chunk #1 (chars 0–780)
"Our company offers 20 days of paid leave per year for full-time employees. Part-time employees receive 10 days. Unused leave can be carried over to the next year, up to a maximum of 5 days. Employees must submit leave requests at least 2 weeks in advance through the HR portal. Emergency leave can be requested with manager approval. The company also provides 10 paid public holidays per year."
Chunk #2 (chars 580–1200)
"Emergency leave can be requested with manager approval. The company also provides 10 paid public holidays per year. Maternity leave is 16 weeks paid. Paternity leave is 4 weeks paid."
💜 See the purple text? It's in BOTH chunks.
That's the overlap. If someone asks "can I take emergency leave?", both chunks have the answer. The overlap is your insurance policy against losing information at the seams.
3
The magic trick
Embeddings. The Part Everyone Pretends to Understand.

Here's where people start throwing around "vector embeddings" and "384-dimensional semantic space" at dinner parties.

Here's what actually happens:

A small AI model reads each chunk and converts it into a list of 384 numbers. That's it. It's a list of numbers. These numbers capture the meaning of the text — not the exact words, but what the text is about.

Texts with similar meanings get similar numbers. "Paid leave" and "vacation days" produce nearly identical lists, even though they share zero words. "Leave the building" produces a completely different list, because the meaning is different despite using the word "leave."

Input → Model → Output
"20 days of paid leave..." 🧠 MiniLM [0.023, -0.189, 0.452, ... ×384]
✅ Same meaning, different words = Similar numbers
Text AText BSimilarity
"paid leave policy""vacation days rules"0.87 — high! Model knows they mean the same thing
"leave the building""paid leave"0.23 — low! Same word, totally different meaning
What a "384-dimensional vector" looks like. It's just... numbers.

Each colored cell = one number. Blue = positive, red = negative, pale = near zero. That's the whole vector. No magic. Just 384 numbers.

4
The filing cabinet
Vector Store. It's a Database. That's All.

People call it a "vector database" because calling it "a database that stores lists of numbers alongside text" doesn't sound as impressive at conferences.

ChromaDB stores three things per chunk: the original text, the 384-number list, and the metadata (filename, page). It saves everything to a folder on your disk called ./chroma_db/.

Next time you start the app, it loads from disk. No re-processing. You only run the indexing step again when your documents change.

What's actually stored — think of it as a spreadsheet
IDTextNumbersLabel
c_001 "Our company offers 20 days of paid leave..." [0.023, -0.189, ...] handbook.pdf, p4
c_002 "Emergency leave can be requested..." [-0.145, 0.278, ...] handbook.pdf, p4
c_003 "The dress code policy requires..." [0.312, -0.067, ...] handbook.pdf, p7
c_004 "Quarterly revenue reached $2.4M..." [0.087, 0.441, ...] q3_report.pdf, p1
✅ Indexing complete. Here's the summary.
Your documents are now searchable by meaning, not keywords. The database is saved to disk. You can unplug, reboot, come back in a week — it's all still there. Zero API keys used, zero data left your machine, zero dollars spent.

Next up: Part 2 — The Query Phase, where someone actually asks a question and the magic unfolds.
The Whole Indexing Phase in One Breath
📄 Your PDFs ✂️ Chop into pieces 🧠 Convert to numbers 💾 Save to database

That's it. Four steps. Everything else is implementation details and fancy vocabulary.