Ever asked an AI about something that happened yesterday, only for it to confidently lie to your face? Thatβs because LLMs are frozen in timeβlimited by their training data.
Enter RAG (Retrieval-Augmented Generation). Itβs like giving your AI an open-book exam. Instead of guessing, it looks up the answer in your documents first.
In this post, weβre building a simple RAG pipeline using LangChain. Letβs dive in! πββοΈ
π₯ The "Big Idea"
RAG works in three simple steps:
Index: Chop your documents into small "chunks" and turn them into math (vectors).
Retrieve: When a user asks a question, find the chunks that match best.
Augment: Stuff those chunks into the prompt and let the AI summarize them.
π οΈThe Setup
You'll need a few libraries. Open your terminal and run:
pip install langchain langchain-openai langchain-community chromadb pypdf
π» The Code
Here is a complete, minimal script to chat with a PDF. Replace "your_api_key" with your actual OpenAI key.
import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA
# 1. Set your API Key
os.environ["OPENAI_API_KEY"] = "sk-..."
# 2. Load your data (Change this to your PDF path!)
loader = PyPDFLoader("my_awesome_doc.pdf")
data = loader.load()
# 3. Chop it up! (Chunking)
# We split text so the AI doesn't get overwhelmed.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = text_splitter.split_documents(data)
# 4. Create the "Brain" (Vector Store)
# This turns text into vectors and stores them locally.
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=OpenAIEmbeddings()
)
# 5. Build the RAG Chain
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
rag_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff", # "Stuff" all chunks into the prompt
retriever=vectorstore.as_retriever()
)
# 6. Ask away!
question = "What is the main conclusion of this document?"
response = rag_chain.invoke(question)
print(f"π€ AI: {response['result']}")
π€ Why did we do that?
RecursiveCharacterTextSplitter: Why not just feed the whole PDF? Because LLMs have a "context window" (limit). Chunking keeps the info bite-sized and relevant.
ChromaDB: This is our temporary database. It stores the "meaning" of our text so we can search it numerically.
chain_type="stuff": This is the funniest name in LangChain. It literally means "stuff all the retrieved documents into the prompt."
π Pro-Tips for the Road
Overlap matters: Notice chunk_overlap=100? This ensures that if a sentence is cut in half, the context lives in both chunks.
Local Models: Don't want to pay for OpenAI? Swap ChatOpenAI for Ollama and run it 100% locally!
Garbage In, Garbage Out: If your PDF is a messy scan, your RAG will be messy too. Clean your data!
π Wrapping Up
You just built a production-grade logic loop. RAG is the backbone of almost every AI startup today. Whether it's a legal bot, a medical assistant, or a "Chat with your Resume" toolβyou now have the blueprint.
What are you planning to build with RAG? Let me know in the comments! π
Top comments (0)