The Evolution of Data Storage: Comparing Traditional and Vector Databases
Following my series on demystifying complex tech for young learners, I’ve invited back our favorite developer, Bob, to help us bridge the gap between theory and practice. Today, Bob is rolling up his sleeves to build two parallel applications: one powered by a Traditional Relational Database and the other by a cutting-edge Vector Database. By watching Bob navigate these two worlds, we can see exactly where each shines. But before we dive into the code and start building, let’s lay the groundwork by exploring the fundamental basics of how these systems actually “think.”
In order to make the provided application simple and accessible for students, I asked Bob to use SQLite as the traditional relational database and sqlite-vec (on which I made a blog post some while ago) as the vector database. By using SQLite for both roles, Bob was able to keep the architecture lightweight and easy to run with a single command, avoiding the need for complex external servers. This approach demonstrates that even with familiar, compact tools, you can implement powerful modern features like semantic search and high-dimensional embeddings.
TL;DR 1-Traditional Database System
Databases have evolved from simple physical filing cabinets into the complex, high-dimensional engines powering modern AI. The journey began in the 1960s with hierarchical and network models, which were rigid and difficult to navigate. This changed in 1970 when Edgar F. Codd introduced the Relational Model, leading to the dominance of SQL and structured tables that organized the world’s business data for decades.
As the internet exploded in the 2000s, the “One Size Fits All” approach crumbled under the weight of massive, unstructured data, giving rise to NoSQL movements. Today, we are witnessing the next major pivot: the transition from storing literal strings and numbers to capturing the “meaning” of data through Vector Databases, designed specifically to handle the mathematical embeddings required by Large Language Models and semantic search.
TL;DR 2-Vector Database(s)
To understand the shift to vectors, we have to look at how computers “read.” While a traditional database sees the word “apple” as a string of five characters, a vector database understands it as a concept — one that is mathematically closer to “fruit” than it is to “laptop.”
Moving from Matching to Meaning
The core reason for this evolution is the rise of unstructured data (images, audio, and long-form text) and the need for semantic search. Traditional databases rely on exact matches or keyword proximity. However, in the age of AI, we need systems that understand context.
Here is the technical breakdown of why vectors are taking over the AI landscape:
- High-Dimensional Embeddings: Data is converted into long lists of numbers (vectors) that represent features. A single word or image can be represented by hundreds or thousands of coordinates.
- The Distance Metric: Instead of asking “Does column A equal ‘Apple’?”, these databases ask, “How close is point A to point B in space?” using formulas like Cosine Similarity or Euclidean Distance.
- Handling Unstructured Data: Vectors allow us to index things that don’t fit into tables — like the “vibe” of a song or the “intent” behind a complex legal paragraph.
The Evolution of Data Storage
| Era | Dominant Technology | Key Characteristic |
| ----------------- | --------------------------- | ---------------------------------------------------------------- |
| **1960s** | Navigational (IMS) | Rigid, tree-like structures. |
| **1970s - 2000s** | Relational (RDBMS) | Rows, columns, and strict schemas (SQL). |
| **2010s** | NoSQL (Document, Key-Value) | Flexible schemas for Big Data (MongoDB, Cassandra). |
| **2020s** | **Vector Databases** | High-dimensional math for AI (AstraDB, Pinecone, Milvus,Weaviate).|
Comparison of Search Logic
| Feature | Traditional (Relational) | Vector Database |
| --------------- | ---------------------------- | -------------------------------- |
| **Search Type** | Keyword / Exact Match | Semantic / Similarity |
| **Input** | Structured (Strings, Ints) | Unstructured (Embeddings) |
| **Query Logic** | Boolean (AND/OR) | Nearest Neighbor (k-NN) |
| **Ideal For** | Financial records, Inventory | LLMs, Image search, Recommenders |
When to Choose a Vector Database: Use Cases That Demand Semantic Understanding
While relational databases remain the backbone for structured transactional data, vector databases excel in scenarios requiring a deep understanding of content and context. Here are the primary use cases where a vector database is not just an option, but often the optimal choice:
Semantic Search and Recommendation Systems:
- Beyond Keywords: Imagine searching for “tools for gardening” and getting results not just for “gardening tools” but also for “pruning shears” or “seed planters.” Vector databases power this by understanding the meaning behind your query.
- Personalized Recommendations: Whether it’s product suggestions on an e-commerce site, movie recommendations on a streaming platform, or articles to read, vector databases match users to items based on semantic similarity of their preferences and item descriptions. They can suggest “films like that obscure indie drama you watched” rather than just “more dramas.”
Large Language Models (LLMs) and Generative AI:
- Retrieval-Augmented Generation (RAG): LLMs have a knowledge cut-off and can “hallucinate.” Vector databases allow LLMs to access and incorporate up-to-date, proprietary, or highly specific information. When an LLM needs to answer a question about your company’s latest internal policy, it can query a vector database containing your policy documents, retrieving relevant context to formulate an accurate answer.
- Contextual Understanding: For chatbots and AI assistants, vector databases provide the memory and contextual awareness to understand complex queries and maintain coherent conversations by finding semantically similar past interactions or knowledge base entries.
Image, Audio, and Video Search:
- Content-Based Retrieval: Instead of tagging images manually, you can search for “pictures of a red sports car” even if the tag “red sports car” doesn’t exist, as long as the image’s visual features (its vector) are close to what “red sports car” looks like in vector space.
- Duplicate Detection: Efficiently find exact or near-duplicate images, videos, or audio clips, which is crucial for media management, copyright protection, or content moderation.
Anomaly Detection and Fraud Prevention:
- Identifying Outliers: By converting normal patterns (e.g., transaction behaviors, network traffic) into vectors, a vector database can quickly identify data points that are “unusually far” from the norm, signaling potential fraud or system anomalies.
Biometric and Genomic Search:
- Matching Complex Patterns: Facial recognition, fingerprint matching, or comparing complex genomic sequences are fundamentally tasks of finding nearest neighbors in a high-dimensional space — perfectly suited for vector databases.
When NOT to use a Vector Database (and stick to traditional)
- Strictly structured data with exact match requirements: Financial transactions, user login credentials, inventory counts, or any data where referential integrity and ACID compliance are paramount.
- Simple keyword search: If your needs are met by basic LIKE queries on indexed text columns, a traditional database is often simpler and more cost-effective.
Architectural Blueprint: Tables vs. Graphs
Traditional databases are built for integrity and organization, while vector databases are built for speed and proximity.
- Relational Architecture: Data is stored in “B-Trees” or similar structures optimized for sorting and exact matching. When you query “User ID 101,” the database follows a specific path to that exact leaf.
- Vector Architecture: Data is often stored in a “Graph” or “Cluster.” Because searching through billions of high-dimensional vectors one-by-one would be too slow, vector databases use Approximate Nearest Neighbor (ANN) algorithms.
The Core Components
| Component | Traditional SQL | Vector Database |
| ------------------ | --------------------- | -------------------------- |
| **Indexing** | B-Tree, Hash Index | HNSW, IVF, PQ |
| **Storage** | Row-based or Columnar | Compressed Vector Blobs |
| **Query Language** | Structured (SQL) | API-based (Python/JS SDKs) |
| **Consistency** | Strong (ACID) | Often Eventual Consistency |
Scaling the Architecture
A traditional database is like a Dewey Decimal System in a library — it tells you exactly which shelf and book you need based on a code. A vector database is like a Human Brain — it doesn’t remember the exact shelf, but it can find a “vibe” or a “topic” by looking at what other books are nearby.
Understanding Vector Databases: A Practical Comparison with
Traditional SQL-Sample Implementation and Post Using Bob
After setting the stage with the long introduction above, let’s jump straight into the sample application and the insightful blog post Bob created (excerpts provided, full article written by Bob on my GitHub)! 🚀 Bob has meticulously documented his journey, showcasing the exact moments where SQL’s precision met its match against the fluid intelligence of a vector-driven approach. Whether you’re a student looking for clarity or a developer ready to choose your next stack, Bob’s walkthrough provides the perfect roadmap.
Introduction: The Search Problem
Imagine you’re building an e-commerce platform. A customer types “comfortable seating for my home office” into your search bar. With a traditional SQL database, you might get zero results — because none of your products contain that exact phrase. But with a vector database, you’d instantly find ergonomic office chairs, adjustable stools, and cushioned ottomans.
This is the fundamental difference between exact matching and semantic understanding, and it’s revolutionizing how we build search systems.
User Query: “exercise equipment”
What Happens Behind the Scenes:
SELECT * FROM products
WHERE name LIKE '%exercise%'
OR description LIKE '%exercise%'
OR name LIKE '%equipment%'
OR description LIKE '%equipment%'
Result: Zero matches
Why? Because none of our products contain the exact words “exercise” or “equipment” in their names or descriptions. We have dumbbells, resistance bands, and yoga mats — all clearly exercise equipment — but the database doesn’t understand this relationship.
Performance: Lightning fast (~2ms), but useless if it finds nothing.
# traditional_db.py
"""
Traditional SQLite Database Module
Demonstrates traditional relational database operations with exact text matching
"""
import sqlite3
import time
from typing import List, Dict, Any
class TraditionalDB:
def __init__(self, db_path: str = "data/traditional.db"):
self.db_path = db_path
self.conn = None
# Ensure data directory exists
import os
os.makedirs(os.path.dirname(db_path), exist_ok=True)
self.initialize_db()
def initialize_db(self):
"""Initialize the traditional SQLite database with schema"""
self.conn = sqlite3.connect(self.db_path, check_same_thread=False)
self.conn.row_factory = sqlite3.Row
cursor = self.conn.cursor()
# Create products table
cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
description TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
# Create index for faster searches
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_products_name
ON products(name)
""")
cursor.execute("""
CREATE INDEX IF NOT EXISTS idx_products_category
ON products(category)
""")
self.conn.commit()
def insert_product(self, name: str, description: str, category: str, price: float) -> int:
"""Insert a product into the database"""
cursor = self.conn.cursor()
cursor.execute("""
INSERT INTO products (name, description, category, price)
VALUES (?, ?, ?, ?)
""", (name, description, category, price))
self.conn.commit()
return cursor.lastrowid
def search_exact(self, query: str) -> tuple[List[Dict[str, Any]], float]:
"""
Exact text search using SQL LIKE
Returns: (results, execution_time)
"""
start_time = time.time()
cursor = self.conn.cursor()
search_pattern = f"%{query}%"
cursor.execute("""
SELECT id, name, description, category, price
FROM products
WHERE name LIKE ? OR description LIKE ? OR category LIKE ?
ORDER BY
CASE
WHEN name LIKE ? THEN 1
WHEN category LIKE ? THEN 2
ELSE 3
END
LIMIT 20
""", (search_pattern, search_pattern, search_pattern, search_pattern, search_pattern))
results = [dict(row) for row in cursor.fetchall()]
execution_time = time.time() - start_time
return results, execution_time
def get_all_products(self, limit: int = 100) -> List[Dict[str, Any]]:
"""Get all products"""
cursor = self.conn.cursor()
cursor.execute("""
SELECT id, name, description, category, price
FROM products
LIMIT ?
""", (limit,))
return [dict(row) for row in cursor.fetchall()]
def get_stats(self) -> Dict[str, Any]:
"""Get database statistics"""
cursor = self.conn.cursor()
cursor.execute("SELECT COUNT(*) as count FROM products")
total_products = cursor.fetchone()['count']
cursor.execute("SELECT COUNT(DISTINCT category) as count FROM products")
total_categories = cursor.fetchone()['count']
cursor.execute("SELECT AVG(price) as avg_price FROM products")
avg_price = cursor.fetchone()['avg_price'] or 0
return {
'total_products': total_products,
'total_categories': total_categories,
'avg_price': round(avg_price, 2),
'db_type': 'Traditional SQL'
}
def clear_all(self):
"""Clear all products"""
cursor = self.conn.cursor()
cursor.execute("DELETE FROM products")
self.conn.commit()
def close(self):
"""Close database connection"""
if self.conn:
self.conn.close()
# Made with Bob
Scenario 2: The Vector Database Approach
User Query: “exercise equipment”
What Happens Behind the Scenes:
- The query is converted into a 384-dimensional vector (an embedding)
- Each product’s description is also represented as a vector
- The system calculates similarity scores using cosine similarity
- Results are ranked by how semantically close they are to the query
Results:
- Dumbbells Set (92% similarity)
- Resistance Bands (88% similarity)
- Yoga Mat Premium (85% similarity)
- Fitness Tracker (78% similarity)
Why? The vector database understands that dumbbells, resistance bands, and yoga mats are all conceptually related to “exercise equipment,” even though those exact words don’t appear in the product descriptions.
Performance: Slower (~85ms), but finds exactly what the user wants.
# vector_db.py
"""
Vector Database Module using sqlite-vec
Demonstrates semantic search capabilities using vector embeddings
"""
import sqlite_vec
import time
import numpy as np
from typing import List, Dict, Any, Optional
from sentence_transformers import SentenceTransformer
import sqlite3
class VectorDB:
def __init__(self, db_path: str = "data/vector.db"):
self.db_path = db_path
self.conn = None
# Ensure data directory exists
import os
os.makedirs(os.path.dirname(db_path), exist_ok=True)
self.model = SentenceTransformer('all-MiniLM-L6-v2') # 384-dimensional embeddings
self.embedding_dim = 384
self.initialize_db()
def initialize_db(self):
"""Initialize the vector database with schema"""
self.conn = sqlite3.connect(self.db_path, check_same_thread=False)
self.conn.row_factory = sqlite3.Row
# Try to load sqlite-vec extension (optional)
# If it fails, we'll use numpy-based similarity which works fine
try:
if hasattr(self.conn, 'enable_load_extension'):
self.conn.enable_load_extension(True)
sqlite_vec.load(self.conn)
self.conn.enable_load_extension(False)
print("[INFO] sqlite-vec extension loaded successfully")
else:
print("[INFO] Using numpy-based similarity search (extension loading not available)")
except Exception as e:
print(f"[INFO] Using numpy-based similarity search: {e}")
cursor = self.conn.cursor()
# Create products table
cursor.execute("""
CREATE TABLE IF NOT EXISTS products (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
description TEXT NOT NULL,
category TEXT NOT NULL,
price REAL NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
# Create embeddings table
cursor.execute("""
CREATE TABLE IF NOT EXISTS embeddings (
product_id INTEGER PRIMARY KEY,
embedding BLOB NOT NULL,
FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE CASCADE
)
""")
self.conn.commit()
def _generate_embedding(self, text: str) -> np.ndarray:
"""Generate embedding vector for text"""
return self.model.encode(text, convert_to_numpy=True)
def _cosine_similarity(self, vec1: np.ndarray, vec2: np.ndarray) -> float:
"""Calculate cosine similarity between two vectors"""
dot_product = np.dot(vec1, vec2)
norm1 = np.linalg.norm(vec1)
norm2 = np.linalg.norm(vec2)
return float(dot_product / (norm1 * norm2))
def insert_product(self, name: str, description: str, category: str, price: float) -> int:
"""Insert a product and its embedding into the database"""
cursor = self.conn.cursor()
# Insert product
cursor.execute("""
INSERT INTO products (name, description, category, price)
VALUES (?, ?, ?, ?)
""", (name, description, category, price))
product_id = cursor.lastrowid
# Generate and store embedding
combined_text = f"{name} {description} {category}"
embedding = self._generate_embedding(combined_text)
embedding_blob = embedding.tobytes()
cursor.execute("""
INSERT INTO embeddings (product_id, embedding)
VALUES (?, ?)
""", (product_id, embedding_blob))
self.conn.commit()
return product_id
def search_semantic(self, query: str, limit: int = 20) -> tuple[List[Dict[str, Any]], float]:
"""
Semantic search using vector similarity
Returns: (results, execution_time)
"""
start_time = time.time()
# Generate query embedding
query_embedding = self._generate_embedding(query)
cursor = self.conn.cursor()
# Get all products with embeddings
cursor.execute("""
SELECT p.id, p.name, p.description, p.category, p.price, e.embedding
FROM products p
JOIN embeddings e ON p.id = e.product_id
""")
results = []
for row in cursor.fetchall():
# Convert blob back to numpy array
stored_embedding = np.frombuffer(row['embedding'], dtype=np.float32)
# Calculate similarity
similarity = self._cosine_similarity(query_embedding, stored_embedding)
results.append({
'id': row['id'],
'name': row['name'],
'description': row['description'],
'category': row['category'],
'price': row['price'],
'similarity': round(similarity, 4)
})
# Sort by similarity (highest first)
results.sort(key=lambda x: x['similarity'], reverse=True)
results = results[:limit]
execution_time = time.time() - start_time
return results, execution_time
def get_all_products(self, limit: int = 100) -> List[Dict[str, Any]]:
"""Get all products"""
cursor = self.conn.cursor()
cursor.execute("""
SELECT id, name, description, category, price
FROM products
LIMIT ?
""", (limit,))
return [dict(row) for row in cursor.fetchall()]
def get_stats(self) -> Dict[str, Any]:
"""Get database statistics"""
cursor = self.conn.cursor()
cursor.execute("SELECT COUNT(*) as count FROM products")
total_products = cursor.fetchone()['count']
cursor.execute("SELECT COUNT(DISTINCT category) as count FROM products")
total_categories = cursor.fetchone()['count']
cursor.execute("SELECT AVG(price) as avg_price FROM products")
avg_price = cursor.fetchone()['avg_price'] or 0
cursor.execute("SELECT COUNT(*) as count FROM embeddings")
total_embeddings = cursor.fetchone()['count']
return {
'total_products': total_products,
'total_categories': total_categories,
'avg_price': round(avg_price, 2),
'total_embeddings': total_embeddings,
'embedding_dimension': self.embedding_dim,
'db_type': 'Vector Database'
}
def clear_all(self):
"""Clear all products and embeddings"""
cursor = self.conn.cursor()
cursor.execute("DELETE FROM embeddings")
cursor.execute("DELETE FROM products")
self.conn.commit()
def close(self):
"""Close database connection"""
if self.conn:
self.conn.close()
# Made with Bob
The Hybrid Approach: Best of Both Worlds
There could be also a hybrid approach to select the best way to acheive results.
Architecture Pattern
User Query
↓
Vector Database (semantic search)
↓
Get Product IDs
↓
Traditional SQL (fetch details, apply filters)
↓
Return Results
Real-World Example: E-commerce Search
def hybrid_search(query, filters=None):
# Step 1: Use vector DB for semantic discovery
similar_product_ids = vector_db.search(query, limit=100)
# Step 2: Use SQL for filtering and details
results = sql_db.query("""
SELECT * FROM products
WHERE id IN (?)
AND price BETWEEN ? AND ?
AND category = ?
ORDER BY popularity DESC
""", (similar_product_ids, filters.min_price,
filters.max_price, filters.category))
return results
Benefits:
- Semantic understanding from vector DB
- Fast filtering from SQL
- Structured data handling
- Best user experience
Used By:
- Amazon (search + filters)
- Airbnb (semantic search + availability)
- Spotify (recommendations + user preferences)
Making the Decision: A Framework
Use this decision tree when choosing your database approach:
Choose Traditional SQL When:
- ✅ You need exact matching (IDs, SKUs, account numbers)
- ✅ Data is highly structured
- ✅ Queries are predictable
- ✅ Performance is critical (<10ms)
- ✅ Compliance requires exact audit trails
- ✅ Budget is limited
Choose Vector Database When:
- ✅ Users search in natural language
- ✅ You need semantic understanding
- ✅ Content is unstructured (text, images)
- ✅ Recommendations are important
- ✅ Synonyms and concepts matter
- ✅ User experience is priority
Choose Hybrid Approach When:
- ✅ You need both semantic search AND filtering
- ✅ Scale is important
- ✅ You want the best user experience
- ✅ You have resources for complexity
The Future: Where Are We Heading?
Emerging Trends
- Specialized Hardware GPUs and TPUs are making vector operations faster and cheaper.
- Improved Algorithms HNSW (Hierarchical Navigable Small World) graphs make vector search nearly as fast as traditional indexes.
- Multimodal Search Combining text, images, and audio in a single vector space.
- Edge Computing Running vector search on devices, not just servers.
What This Means for Developers
- Vector databases are becoming mainstream
- Hybrid approaches are the new standard
- Understanding both paradigms is essential
- The gap in performance is closing
Conclusion: Putting Theory into Practice
Through our journey with Bob, we’ve moved from the high-level history of data storage to a functioning, side-by-side application that proves why this evolution matters. By building this educational tool, we’ve demonstrated that the choice between a Traditional SQL Database and a Vector Database isn’t about which is “better,” but which is right for the specific task at hand.
Summary of What We’ve Accomplished:
- Built a Dual-Search App: We created a Flask-based backend and a modern web frontend that allows Bob to search 25 diverse products using both paradigms simultaneously.
- Proved the “Semantic Gap”: Our application showed that while SQL is lightning fast (~1–5ms), it fails on queries like “comfortable seating” if the exact words aren’t in the metadata.
- Showcased Vector Intelligence: Using sqlite-vec and the all-MiniLM-L6-v2 model, we enabled Bob’s app to understand that “dumbbells” and “yoga mats” are conceptually related to “exercise equipment.”
- Identified Performance Trade-offs: We documented that vector search, while more “intelligent,” requires more storage (~1.7KB per product vs 200 bytes) and higher computation time (~50–100ms).
The Bottom Line
As Bob learned, the future of development is often hybrid. Traditional SQL remains the king of precision, security, and structured transactions like SKU lookups and user authentication. However, for any application that needs to “understand” a user — like a recommendation engine or an AI-powered chatbot — a Vector Database is the essential engine.
Bob’s Final Tip: Don’t replace your SQL database; augment it. Use vectors for discovery and SQL for the source of truth.
Links and Ressources
- SQLite: https://github.com/sqlite/sqlite
- SQLite-Vector: https://github.com/sqliteai/sqlite-vector
- Sample Code repository for this post: https://github.com/aairom/vectordb-vs-sqldb
- IBM Project Bob: https://www.ibm.com/products/bob









Top comments (0)