Yash Joshi

Posted on Feb 9

How We Built a Vector Database for SEC Filings Using PostgreSQL + pgvector

#rag #postgres #vectordatabase #ai

The Challenge: Making 250-Page Documents Searchable by Meaning.
In Part 1, I showed you QuantTrade AI answering questions like:

What's NVIDIA's China revenue exposure? by pulling exact sections from their 10-K filing.

investor.nvidia.com

Here's what happens under the hood when you ask that question:

Your query gets converted into a 384-dimensional vector (embedding)
We search through thousands of document chunks stored in PostgreSQL
Retrieve the top 5 most semantically similar sections
Feed them to (Opus & GPT) LLM with context
Get back a cited, evidence-backed answer
Total time: ~800ms (500ms for chart, 300ms for RAG retrieval)

The secret? pgvector — a PostgreSQL extension that turns your familiar SQL database into a high-performance vector store.

Let me show you exactly how we built it.

Y'll might be wondering, hmm Why pgvector Over Pinecone, Weaviate, or Qdrant?
When I started this project, I evaluated several vector database options:

Solution	Pros	Cons	Verdict
Pinecone	Managed, fast, great DX	$70/month minimum, vendor lock-in	Too expensive for MVP
Weaviate	Open source, feature-rich	Complex setup, separate infrastructure	Overkill for our scale
Qdrant	Rust-based, high performance	Another service to manage	Considered as fallback
pgvector	Lives in existing PostgreSQL, simple, free	Slightly slower at massive scale	Perfect fit

For our use case (10,000-50,000 document chunks), pgvector hits the sweet spot between simplicity and performance.

Our architecture fits pgvector because it runs directly inside PostgreSQL, allowing us to store embeddings alongside OHLCV data, sessions, and chat history without managing an extra service. It provides ACID-level transaction safety, remains cost-efficient on platforms like Neon/Render/Supabase, delivers sub-100 ms search performance for our scale, and lets it work entirely with familiar SQL instead of learning a new query language.

Architecture: The RAG Pipeline

Here's the full flow from SEC filing → searchable vectors:

┌─────────────────────────────────────────────────────────────┐
│  1. INGEST PHASE (Background Celery Task)                   │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Fetch SEC Filing (10-K, 10-Q, 8-K)                         │
│  Source: SEC EDGAR API                                      │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Chunk Document                                             │
│  Strategy: 1000 tokens/chunk                                │
│  Overlap: 200 tokens                                        │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Generate Embeddings                                        │
│  Model: all-MiniLM-L6-v2                                    │
│  Output: 384-dimensional vectors                            │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Store in PostgreSQL with pgvector                          │
│  Table: document_chunks                                     │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│  2. QUERY PHASE (Real-time API Request)                     │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  User asks: "NVDA China exposure?"                          │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Convert query to embedding                                 │
│  Same model: all-MiniLM-L6-v2                               │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Vector similarity search                                   │
│  SELECT *                                                   │
│  ORDER BY embedding <=> query_embedding                     │
│  LIMIT 5                                                    │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Feed chunks to LLM                                         │
│  With conversation context                                  │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Stream response to frontend                                │
└─────────────────────────────────────────────────────────────┘

Lets Build Vector Database

Step 1: Database Schema Setup
First, we enable pgvector and create our tables:

Click to see full Database Schema implementation code

CREATE EXTENSION IF NOT EXISTS vector;        -- Enable pgvector

# Store document chunks with embeddings
CREATE TABLE document_chunks (
    id SERIAL PRIMARY KEY,
    ticker VARCHAR(10) NOT NULL,              -- e.g., 'NVDA'
    document_type VARCHAR(20) NOT NULL,       -- '10-K', '10-Q', '8-K'
    filing_date DATE NOT NULL,
    chunk_index INTEGER NOT NULL,             -- Position in document
    content TEXT NOT NULL,                    -- The actual text chunk
    embedding vector(384),                    -- 384-dimensional embedding
    metadata JSONB,                           -- Extra context (section name, page, etc.)
    created_at TIMESTAMP DEFAULT NOW()
);

# Critical: Index for fast similarity search
CREATE INDEX ON document_chunks 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

# Additional indexes for filtering
CREATE INDEX idx_ticker ON document_chunks(ticker);
CREATE INDEX idx_document_type ON document_chunks(document_type);

vector(384) : Matches our embedding model's output dimension
ivfflat index : pgvector's approximate nearest neighbor search (99% accuracy, 10x faster)
vector_cosine_ops : Uses cosine distance (best for normalized embeddings)
lists : 100 — Index parameter (tuned for ~10K-50K vectors)

Step 2: Document Chunking Strategy
Semantic chunking with overlap

Click to see full code implementation

from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_document(text: str, ticker: str, doc_type: str):
    """
    Chunk SEC filing into semantically coherent segments.
    """
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,           # Target ~1000 tokens
        chunk_overlap=200,         # 200 token overlap for context
        separators=["\n\n", "\n", ". ", " ", ""],  # Respect paragraph boundaries
        length_function=len
    )

    chunks = splitter.split_text(text)

    # Add metadata for each chunk
    chunk_objects = []
    for idx, chunk in enumerate(chunks):
        chunk_objects.append({
            "ticker": ticker,
            "document_type": doc_type,
            "chunk_index": idx,
            "content": chunk,
            "metadata": {
                "total_chunks": len(chunks),
                "position": f"{idx+1}/{len(chunks)}"
            }
        })

    return chunk_objects

Step 3: Generating Embeddings
We use sentence-transformers with the all-MiniLM-L6-v2 model:

Click to see Generating embedding implementation

from sentence_transformers import SentenceTransformer
from typing import List
import numpy as np

class EmbeddingService:
    def __init__(self):
        # Lazy initialization to avoid cold start overhead
        self._model = None

    @property
    def model(self):
        """Load model only when needed."""
        if self._model is None:
            self._model = SentenceTransformer('all-MiniLM-L6-v2')
        return self._model

    def generate_embeddings(self, texts: List[str]) -> np.ndarray:
        """
        Generate 384-dimensional embeddings for text chunks.

        Args:
            texts: List of text chunks to embed

        Returns:
            numpy array of shape (len(texts), 384)
        """
        # Batch processing for efficiency
        embeddings = self.model.encode(
            texts,
            batch_size=32,              # Process 32 chunks at once
            show_progress_bar=False,
            convert_to_numpy=True,
            normalize_embeddings=True   # L2 normalization for cosine similarity
        )

        return embeddings

Performance optimizations:

Lazy loading : Model loads on first use (saves ~500MB RAM on cold start)
Batch processing : 32 texts at once (5x faster than one-by-one)
Normalization : Pre-normalized for cosine similarity

Step 4: Storing Vectors in PostgreSQL
Here's how we insert chunks with embeddings:

Click to see Storing vectors implementation

from sqlalchemy import create_engine, text
from typing import List, Dict
import json

class VectorStore:
    def __init__(self, db_url: str):
        self.engine = create_engine(db_url)
        self.embedding_service = EmbeddingService()

    def add_documents(self, chunks: List[Dict]):
        """
        Add document chunks with embeddings to PostgreSQL.
        """
        # Extract text for embedding generation
        texts = [chunk["content"] for chunk in chunks]

        # Generate embeddings in batch
        embeddings = self.embedding_service.generate_embeddings(texts)

        # Insert into database
        with self.engine.connect() as conn:
            for chunk, embedding in zip(chunks, embeddings):
                conn.execute(
                    text("""
                        INSERT INTO document_chunks 
                        (ticker, document_type, filing_date, chunk_index, 
                         content, embedding, metadata)
                        VALUES 
                        (:ticker, :doc_type, :filing_date, :chunk_index,
                         :content, :embedding, :metadata)
                    """),
                    {
                        "ticker": chunk["ticker"],
                        "doc_type": chunk["document_type"],
                        "filing_date": chunk.get("filing_date"),
                        "chunk_index": chunk["chunk_index"],
                        "content": chunk["content"],
                        "embedding": embedding.tolist(),  # Convert numpy to list
                        "metadata": json.dumps(chunk["metadata"])
                    }
                )
            conn.commit()

Step 5: Vector Similarity Search
Now for the magic — retrieving relevant chunks:

Click to see Similarity Search implementation

def similarity_search(
    self, 
    query: str, 
    ticker: str = None,
    k: int = 5
) -> List[Dict]:
    """
    Find top-k most similar document chunks to query.

    Args:
        query: User's question
        ticker: Optional stock filter (e.g., 'NVDA')
        k: Number of results to return

    Returns:
        List of matching chunks with similarity scores
    """
    # Convert query to embedding
    query_embedding = self.embedding_service.generate_embeddings([query])[0]

    # Build SQL query
    sql = """
        SELECT 
            id,
            ticker,
            document_type,
            filing_date,
            content,
            metadata,
            1 - (embedding <=> :query_embedding) as similarity
        FROM document_chunks
        WHERE 1=1
    """

    params = {"query_embedding": query_embedding.tolist(), "k": k}

    # Optional ticker filter
    if ticker:
        sql += " AND ticker = :ticker"
        params["ticker"] = ticker

    sql += """
        ORDER BY embedding <=> :query_embedding
        LIMIT :k
    """

    with self.engine.connect() as conn:
        results = conn.execute(text(sql), params).fetchall()

    return [
        {
            "id": row[0],
            "ticker": row[1],
            "document_type": row[2],
            "filing_date": row[3],
            "content": row[4],
            "metadata": json.loads(row[5]),
            "similarity": float(row[6])
        }
        for row in results
    ]

Output:

Score: 0.847 | China represented 14% of total revenue in Q1
Score: 0.721 | Export controls may impact H200 chip sales
Score: 0.634 | NVIDIA's Data Center revenue was $10.3B in Q2

Performance Metrics: Does It Actually Work?

Metric	Before Optimization	After Optimization	Improvement
Query latency	3.2 s	287 ms	11× faster
Embedding generation	450 ms (per chunk)	89 ms (batch of 32)	5× faster
Index build time	15 min (10K docs)	15 min	Same (one-time)
Memory usage	2.1 GB	850 MB	2.5× reduction
Accuracy (Top-5 retrieval)	94%	94%	Same

⭐ Star the repo

What's Next?
In Part 3, I'll show you how we make AI predictions explainable using SHAP:

"From Black Box to Glass Box: Building Explainable Risk Scores"

Why transparency matters when real money is involved
How SHAP breaks down risk into individual factors
Making ML predictions auditable with mathematical proofs

YashJoshi2109 / QuantTrade-AI

QuantCopilot is an AI-powered trading and research terminal that blends real-time, TradingView-style charts with an intelligent copilot chat. It ingests market data, news, and filings, then uses machine learning, RAG, and large language models to explain price moves, summarize documents, and highlight risks.

Quants AI Trading

A comprehensive AI-powered trading and research platform with TradingView-style UI, featuring real-time market data, AI copilot, RAG-powered insights, and backtesting capabilities.

Live app: https://quanttrade.us/

Core Features

Symbol Analysis with AI — Real-time charts, indicators, and AI-powered explanations
Earnings & Filings Deep Dive — RAG-powered analysis of SEC filings and earnings calls
Personal Watchlist — Portfolio-aware AI insights and risk monitoring
Strategy Backtesting — Rule-based strategy testing with AI explanations
Risk Analysis — Comprehensive risk scoring and factor analysis

Architecture

┌─────────────┐
│   Frontend  │  Next.js with TradingView-style charts
│  (Web App)  │
└──────┬──────┘
       │
┌──────▼──────────────────┐
│   API Gateway/Backend   │  FastAPI REST API
└──────┬──────────────────┘
       │
   ┌───┴──────────────────────────────┐
   │                                   │
┌──▼──────────┐              ┌─────────▼────────┐
│   Data      │              │   RAG/Copilot    │
│  Services   │              │    Service       │
│             │              │                  │
│ - Market    │              │ - Embeddings     │
│   Data

…

View on GitHub