Who pretend to know everything by using jargons.
Here's what actually happens under the hood.
Someone at a party says "Oh yeah, we use RAG with vector embeddings in our semantic search pipeline" and everyone nods like they understand.
They don't. Let's fix that.
RAG (Retrieval-Augmented Generation) is just a fancy way of saying: "Let the AI read your documents before answering." That's it. The rest is implementation details — which we'll walk through with a real example, animated, and with zero hand-waving.
The Indexing Phase is the prep work. It's like making flashcards before an exam — you do it once, and then you can answer questions fast.
PDFs, Word docs, text files — whatever you've got. A document loader opens each file and rips out the raw text, page by page.
The important part? It also records where each piece of text came from — the filename, the page number. This is metadata, and it's how you'll trace answers back to the source later.
{ source_file: "company_handbook.pdf", page: 4 }
A full page is too long to search effectively. So we cut it into smaller pieces called chunks. Around 800 characters each. About the length of this paragraph three times.
The secret sauce? Each chunk overlaps with the next one by ~200 characters. Why? Because if an important sentence happens to land right at the cutting point, it'll show up in both chunks. No information lost.
People at conferences call this "recursive character text splitting with configurable overlap". You can call it "chopping with safety margins."
Here's where people start throwing around "vector embeddings" and "384-dimensional semantic space" at dinner parties.
Here's what actually happens:
A small AI model reads each chunk and converts it into a list of 384 numbers. That's it. It's a list of numbers. These numbers capture the meaning of the text — not the exact words, but what the text is about.
Texts with similar meanings get similar numbers. "Paid leave" and "vacation days" produce nearly identical lists, even though they share zero words. "Leave the building" produces a completely different list, because the meaning is different despite using the word "leave."
| Text A | Text B | Similarity |
|---|---|---|
| "paid leave policy" | "vacation days rules" | 0.87 — high! Model knows they mean the same thing |
| "leave the building" | "paid leave" | 0.23 — low! Same word, totally different meaning |
Each colored cell = one number. Blue = positive, red = negative, pale = near zero. That's the whole vector. No magic. Just 384 numbers.
People call it a "vector database" because calling it "a database that stores lists of numbers alongside text" doesn't sound as impressive at conferences.
ChromaDB stores three things per chunk: the original text, the 384-number list, and the metadata (filename, page). It saves everything to a folder on your disk called ./chroma_db/.
Next time you start the app, it loads from disk. No re-processing. You only run the indexing step again when your documents change.
| ID | Text | Numbers | Label |
|---|---|---|---|
c_001 |
"Our company offers 20 days of paid leave..." | [0.023, -0.189, ...] | handbook.pdf, p4 |
c_002 |
"Emergency leave can be requested..." | [-0.145, 0.278, ...] | handbook.pdf, p4 |
c_003 |
"The dress code policy requires..." | [0.312, -0.067, ...] | handbook.pdf, p7 |
c_004 |
"Quarterly revenue reached $2.4M..." | [0.087, 0.441, ...] | q3_report.pdf, p1 |
That's it. Four steps. Everything else is implementation details and fancy vocabulary.