Building an Enterprise RAG System: Lessons from Production
RAG (Retrieval-Augmented Generation) is the most practical way to give LLMs access to your private documents. But most tutorials stop at "here's a LangChain hello world." Production RAG is a different beast.
I've been running a RAG system in production for Turkish documents. Here's what I learned - and why standard approaches fail for non-English text.
The Problem with Default RAG
If you follow a typical RAG tutorial:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
# This works fine for English
splitter = RecursiveCharacterTextSplitter(chunk_size=512)
chunks = splitter.split_documents(docs)
embeddings = OpenAIEmbeddings()
This gives you ~90% retrieval accuracy on English documents. On Turkish? About 60%.
Why Turkish Breaks Standard RAG
1. Tokenization
Turkish is agglutinative. One word can express an entire English sentence:
- "goruntuleyemeyebileceklerimizdenmissinizcesine" = "as if you were one of those whom we would not be able to view"
BPE tokenizers trained on English split this into 15+ meaningless subwords. Your 512-token chunk now covers much less text than expected.
2. Chunking
Token-count chunking cuts mid-word in Turkish because words are longer. A chunk boundary at token 512 might split "goruntuleyemeyebileceklerimizdenmissinizcesine" in half.
3. Embeddings
Multilingual embeddings (e.g., multilingual-e5) help but still underperform. The embedding space doesn't fully capture Turkish morphological relationships.
Our Solution: A 4-Step Pipeline
Step 1: Morphological Preprocessing
Before chunking, we analyze Turkish morphology:
from turkishnlp import detector
def preprocess_turkish(text):
# Stem words for better embedding alignment
# But keep original text for display
words = text.split()
stems = [detector.stem(w) for w in words]
return {
"original": text,
"stemmed": " ".join(stems),
"tokens": words
}
Step 2: Sentence-Boundary Chunking
Instead of splitting by token count, we split on sentence boundaries:
import re
def chunk_by_sentences(text, max_sentences=5, overlap=1):
sentences = re.split(r'(?<=[.!?])\s+', text)
chunks = []
for i in range(0, len(sentences), max_sentences - overlap):
chunk = " ".join(sentences[i:i + max_sentences])
chunks.append(chunk)
return chunks
This ensures no word is ever split mid-morpheme.
Step 3: Weaviate Hybrid Search
We use Weaviate's hybrid search combining BM25 (keyword) with vector search:
import weaviate
client = weaviate.Client("http://localhost:8080")
result = client.query.get("Document", ["content", "metadata"]) \
.with_hybrid(
query="turkce belge arama",
alpha=0.5, # 50% BM25, 50% vector
fusion_type="relativeScoreFusion"
) \
.with_autocut(2) \
.with_limit(5) \
.do()
BM25 catches exact term matches (critical for proper nouns in Turkish), while vector search handles semantic similarity.
Step 4: Production Docker Stack
version: '3.8'
services:
weaviate:
image: semitechnologies/weaviate
ports:
- "8080:8080"
environment:
QUERY_DEFAULTS_LIMIT: 25
DEFAULT_VECTORIZER_MODULE: text2vec-transformers
ENABLE_MODULES: text2vec-transformers,reranker-transformers
CLUSTER_HOSTNAME: node1
api:
build: ./api
ports:
- "8000:8000"
environment:
WEAVIATE_URL: http://weaviate:8080
ingest:
build: ./ingest
volumes:
- ./documents:/data
Benchmark Results
We tested on 500 Turkish documents with 2000 queries:
| Approach | Recall@5 | Precision@5 | Latency (p95) |
|---|---|---|---|
| LangChain defaults | 61% | 54% | 450ms |
| + Sentence chunking | 72% | 65% | 420ms |
| + Morphological preprocessing | 84% | 78% | 480ms |
| + Hybrid search (Weaviate) | 93% | 88% | 520ms |
Each step adds measurable improvement. The full pipeline reaches 93% recall, comparable to English-optimized systems.
Lessons Learned
- Don't trust default settings. Every RAG component assumes English text.
- Hybrid search is not optional for non-English. BM25 catches what embeddings miss.
- Preprocessing > better embeddings. Spending time on morphological analysis gave bigger gains than switching embedding models.
- Test with real queries. Our benchmark uses questions actual users asked, not synthetic queries.
- Monitor retrieval quality continuously. We log every query + retrieved chunks and review weekly.
Try It Yourself
I've packaged this entire pipeline as a product:
- Starter ($79): Single project, up to 10K documents, Docker Compose stack, community support
- Professional ($249): Unlimited projects, custom embedding training, white-label, priority support
Check it out: BilgeStore RAG System
Read more: Blog post with full technical details
Questions? I'm happy to discuss Turkish NLP challenges or RAG architecture decisions in the comments.
Top comments (0)