DEV Community

Cover image for How We Built a Vector Database for SEC Filings Using PostgreSQL + pgvector
Yash Joshi
Yash Joshi

Posted on

How We Built a Vector Database for SEC Filings Using PostgreSQL + pgvector

The Challenge: Making 250-Page Documents Searchable by Meaning.
In Part 1, I showed you QuantTrade AI answering questions like:

What's NVIDIA's China revenue exposure? by pulling exact sections from their 10-K filing.


Here's what happens under the hood when you ask that question:
  • Your query gets converted into a 384-dimensional vector (embedding)
  • We search through thousands of document chunks stored in PostgreSQL
  • Retrieve the top 5 most semantically similar sections
  • Feed them to (Opus & GPT) LLM with context
  • Get back a cited, evidence-backed answer
  • Total time: ~800ms (500ms for chart, 300ms for RAG retrieval)

Pgvector secret gif

The secret? pgvector — a PostgreSQL extension that turns your familiar SQL database into a high-performance vector store.

Let me show you exactly how we built it.

Y'll might be wondering, hmm Why pgvector Over Pinecone, Weaviate, or Qdrant?
When I started this project, I evaluated several vector database options:

Solution Pros Cons Verdict
Pinecone Managed, fast, great DX $70/month minimum, vendor lock-in Too expensive for MVP
Weaviate Open source, feature-rich Complex setup, separate infrastructure Overkill for our scale
Qdrant Rust-based, high performance Another service to manage Considered as fallback
pgvector Lives in existing PostgreSQL, simple, free Slightly slower at massive scale Perfect fit

For our use case (10,000-50,000 document chunks), pgvector hits the sweet spot between simplicity and performance.

Our architecture fits pgvector because it runs directly inside PostgreSQL, allowing us to store embeddings alongside OHLCV data, sessions, and chat history without managing an extra service. It provides ACID-level transaction safety, remains cost-efficient on platforms like Neon/Render/Supabase, delivers sub-100 ms search performance for our scale, and lets it work entirely with familiar SQL instead of learning a new query language.

architecture let me tell you something meme

Architecture: The RAG Pipeline

Here's the full flow from SEC filing → searchable vectors:

┌─────────────────────────────────────────────────────────────┐
│  1. INGEST PHASE (Background Celery Task)                   │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Fetch SEC Filing (10-K, 10-Q, 8-K)                         │
│  Source: SEC EDGAR API                                      │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Chunk Document                                             │
│  Strategy: 1000 tokens/chunk                                │
│  Overlap: 200 tokens                                        │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Generate Embeddings                                        │
│  Model: all-MiniLM-L6-v2                                    │
│  Output: 384-dimensional vectors                            │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Store in PostgreSQL with pgvector                          │
│  Table: document_chunks                                     │
└─────────────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode
┌─────────────────────────────────────────────────────────────┐
│  2. QUERY PHASE (Real-time API Request)                     │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  User asks: "NVDA China exposure?"                          │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Convert query to embedding                                 │
│  Same model: all-MiniLM-L6-v2                               │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Vector similarity search                                   │
│  SELECT *                                                   │
│  ORDER BY embedding <=> query_embedding                     │
│  LIMIT 5                                                    │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Feed chunks to LLM                                         │
│  With conversation context                                  │
└─────────────────────────────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────┐
│  Stream response to frontend                                │
└─────────────────────────────────────────────────────────────┘

Enter fullscreen mode Exit fullscreen mode

Lets Build Vector Database

Step 1: Database Schema Setup
First, we enable pgvector and create our tables:

lego image showcasing database setup

Click to see full Database Schema implementation code
CREATE EXTENSION IF NOT EXISTS vector;        -- Enable pgvector

# Store document chunks with embeddings
CREATE TABLE document_chunks (
    id SERIAL PRIMARY KEY,
    ticker VARCHAR(10) NOT NULL,              -- e.g., 'NVDA'
    document_type VARCHAR(20) NOT NULL,       -- '10-K', '10-Q', '8-K'
    filing_date DATE NOT NULL,
    chunk_index INTEGER NOT NULL,             -- Position in document
    content TEXT NOT NULL,                    -- The actual text chunk
    embedding vector(384),                    -- 384-dimensional embedding
    metadata JSONB,                           -- Extra context (section name, page, etc.)
    created_at TIMESTAMP DEFAULT NOW()
);

# Critical: Index for fast similarity search
CREATE INDEX ON document_chunks 
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

# Additional indexes for filtering
CREATE INDEX idx_ticker ON document_chunks(ticker);
CREATE INDEX idx_document_type ON document_chunks(document_type);
Enter fullscreen mode Exit fullscreen mode

  • vector(384) : Matches our embedding model's output dimension
  • ivfflat index : pgvector's approximate nearest neighbor search (99% accuracy, 10x faster)
  • vector_cosine_ops : Uses cosine distance (best for normalized embeddings)
  • lists : 100 — Index parameter (tuned for ~10K-50K vectors)

Step 2: Document Chunking Strategy
Semantic chunking with overlap

lego showcasing Document Chunking

Click to see full code implementation
from langchain.text_splitter import RecursiveCharacterTextSplitter

def chunk_document(text: str, ticker: str, doc_type: str):
    """
    Chunk SEC filing into semantically coherent segments.
    """
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,           # Target ~1000 tokens
        chunk_overlap=200,         # 200 token overlap for context
        separators=["\n\n", "\n", ". ", " ", ""],  # Respect paragraph boundaries
        length_function=len
    )

    chunks = splitter.split_text(text)

    # Add metadata for each chunk
    chunk_objects = []
    for idx, chunk in enumerate(chunks):
        chunk_objects.append({
            "ticker": ticker,
            "document_type": doc_type,
            "chunk_index": idx,
            "content": chunk,
            "metadata": {
                "total_chunks": len(chunks),
                "position": f"{idx+1}/{len(chunks)}"
            }
        })

    return chunk_objects
Enter fullscreen mode Exit fullscreen mode

Step 3: Generating Embeddings
We use sentence-transformers with the all-MiniLM-L6-v2 model:

lego showcasing Generating Embeddings

Click to see Generating embedding implementation
from sentence_transformers import SentenceTransformer
from typing import List
import numpy as np

class EmbeddingService:
    def __init__(self):
        # Lazy initialization to avoid cold start overhead
        self._model = None

    @property
    def model(self):
        """Load model only when needed."""
        if self._model is None:
            self._model = SentenceTransformer('all-MiniLM-L6-v2')
        return self._model

    def generate_embeddings(self, texts: List[str]) -> np.ndarray:
        """
        Generate 384-dimensional embeddings for text chunks.

        Args:
            texts: List of text chunks to embed

        Returns:
            numpy array of shape (len(texts), 384)
        """
        # Batch processing for efficiency
        embeddings = self.model.encode(
            texts,
            batch_size=32,              # Process 32 chunks at once
            show_progress_bar=False,
            convert_to_numpy=True,
            normalize_embeddings=True   # L2 normalization for cosine similarity
        )

        return embeddings
Enter fullscreen mode Exit fullscreen mode

Performance optimizations:

  • Lazy loading : Model loads on first use (saves ~500MB RAM on cold start)
  • Batch processing : 32 texts at once (5x faster than one-by-one)
  • Normalization : Pre-normalized for cosine similarity

Step 4: Storing Vectors in PostgreSQL
Here's how we insert chunks with embeddings:

Click to see Storing vectors implementation
from sqlalchemy import create_engine, text
from typing import List, Dict
import json

class VectorStore:
    def __init__(self, db_url: str):
        self.engine = create_engine(db_url)
        self.embedding_service = EmbeddingService()

    def add_documents(self, chunks: List[Dict]):
        """
        Add document chunks with embeddings to PostgreSQL.
        """
        # Extract text for embedding generation
        texts = [chunk["content"] for chunk in chunks]

        # Generate embeddings in batch
        embeddings = self.embedding_service.generate_embeddings(texts)

        # Insert into database
        with self.engine.connect() as conn:
            for chunk, embedding in zip(chunks, embeddings):
                conn.execute(
                    text("""
                        INSERT INTO document_chunks 
                        (ticker, document_type, filing_date, chunk_index, 
                         content, embedding, metadata)
                        VALUES 
                        (:ticker, :doc_type, :filing_date, :chunk_index,
                         :content, :embedding, :metadata)
                    """),
                    {
                        "ticker": chunk["ticker"],
                        "doc_type": chunk["document_type"],
                        "filing_date": chunk.get("filing_date"),
                        "chunk_index": chunk["chunk_index"],
                        "content": chunk["content"],
                        "embedding": embedding.tolist(),  # Convert numpy to list
                        "metadata": json.dumps(chunk["metadata"])
                    }
                )
            conn.commit()
Enter fullscreen mode Exit fullscreen mode

Step 5: Vector Similarity Search
Now for the magic — retrieving relevant chunks:

Vector search legop showcasing

Click to see Similarity Search implementation
def similarity_search(
    self, 
    query: str, 
    ticker: str = None,
    k: int = 5
) -> List[Dict]:
    """
    Find top-k most similar document chunks to query.

    Args:
        query: User's question
        ticker: Optional stock filter (e.g., 'NVDA')
        k: Number of results to return

    Returns:
        List of matching chunks with similarity scores
    """
    # Convert query to embedding
    query_embedding = self.embedding_service.generate_embeddings([query])[0]

    # Build SQL query
    sql = """
        SELECT 
            id,
            ticker,
            document_type,
            filing_date,
            content,
            metadata,
            1 - (embedding <=> :query_embedding) as similarity
        FROM document_chunks
        WHERE 1=1
    """

    params = {"query_embedding": query_embedding.tolist(), "k": k}

    # Optional ticker filter
    if ticker:
        sql += " AND ticker = :ticker"
        params["ticker"] = ticker

    sql += """
        ORDER BY embedding <=> :query_embedding
        LIMIT :k
    """

    with self.engine.connect() as conn:
        results = conn.execute(text(sql), params).fetchall()

    return [
        {
            "id": row[0],
            "ticker": row[1],
            "document_type": row[2],
            "filing_date": row[3],
            "content": row[4],
            "metadata": json.loads(row[5]),
            "similarity": float(row[6])
        }
        for row in results
    ]
Enter fullscreen mode Exit fullscreen mode

Output:

Score: 0.847 | China represented 14% of total revenue in Q1
Score: 0.721 | Export controls may impact H200 chip sales
Score: 0.634 | NVIDIA's Data Center revenue was $10.3B in Q2

Performance Metrics: Does It Actually Work?

Metric Before Optimization After Optimization Improvement
Query latency 3.2 s 287 ms 11× faster
Embedding generation 450 ms (per chunk) 89 ms (batch of 32) 5× faster
Index build time 15 min (10K docs) 15 min Same (one-time)
Memory usage 2.1 GB 850 MB 2.5× reduction
Accuracy (Top-5 retrieval) 94% 94% Same

⭐ Star the repo

What's Next?
In Part 3, I'll show you how we make AI predictions explainable using SHAP:

"From Black Box to Glass Box: Building Explainable Risk Scores"

  • Why transparency matters when real money is involved
  • How SHAP breaks down risk into individual factors
  • Making ML predictions auditable with mathematical proofs

GitHub logo YashJoshi2109 / QuantTrade-AI

QuantCopilot is an AI-powered trading and research terminal that blends real-time, TradingView-style charts with an intelligent copilot chat. It ingests market data, news, and filings, then uses machine learning, RAG, and large language models to explain price moves, summarize documents, and highlight risks.

Quants AI Trading

A comprehensive AI-powered trading and research platform with TradingView-style UI, featuring real-time market data, AI copilot, RAG-powered insights, and backtesting capabilities.

Live app: https://quanttrade.us/


Table of Contents


Core Features

  • Symbol Analysis with AI — Real-time charts, indicators, and AI-powered explanations
  • Earnings & Filings Deep Dive — RAG-powered analysis of SEC filings and earnings calls
  • Personal Watchlist — Portfolio-aware AI insights and risk monitoring
  • Strategy Backtesting — Rule-based strategy testing with AI explanations
  • Risk Analysis — Comprehensive risk scoring and factor analysis

Architecture

┌─────────────┐
│   Frontend  │  Next.js with TradingView-style charts
│  (Web App)  │
└──────┬──────┘
       │
┌──────▼──────────────────┐
│   API Gateway/Backend   │  FastAPI REST API
└──────┬──────────────────┘
       │
   ┌───┴──────────────────────────────┐
   │                                   │
┌──▼──────────┐              ┌─────────▼────────┐
│   Data      │              │   RAG/Copilot    │
│  Services   │              │    Service       │
│             │              │                  │
│ - Market    │              │ - Embeddings     │
│   Data

Top comments (0)