Kirill Tolmachev

Posted on Feb 14

Database Indexes Explained Like You're 5

#postgres #database #sql #beginners

Imagine you walk into a library with ten thousand books, and you need to find one specific title. You have two options: start at the first shelf and check every single spine until you find it, or walk over to the catalog system, look up the title, and get the exact shelf and position in seconds.

A database index works exactly like that catalog. It's a separate data structure that helps your database find rows without scanning the entire table. And once you understand how indexes work, you'll know why some queries take 2 milliseconds and others take 20 seconds — even on the same table.

The SQL examples in this article use PostgreSQL, but the concepts apply equally to MySQL, SQL Server, SQLite, and pretty much any relational database you'll encounter.

The phone book you never asked for

Here's another way to think about it. A phone book is sorted alphabetically by last name. If you need to find "Martinez," you can flip to the M section and get there quickly. That's an index at work — the data is organized in a way that makes lookup fast.

But what if someone asks you to find everyone in the phone book with the phone number 555-0142? Now you're stuck reading every single entry, because the book isn't sorted by phone number. There's no shortcut.

That's what happens inside your database when you run a query against a column that has no index.

What happens without an index: the sequential scan

Let's say you have a users table with a million rows, and you run this:

SELECT * FROM users WHERE email = 'ada@example.com';

Without an index on the email column, PostgreSQL has to perform a sequential scan — it reads every row in the table, checks whether email matches, and moves on. For a million rows, that's a million comparisons. You can see this happening with EXPLAIN:

EXPLAIN SELECT * FROM users WHERE email = 'ada@example.com';

Seq Scan on users  (cost=0.00..20834.00 rows=1 width=72)
  Filter: (email = 'ada@example.com'::text)

That Seq Scan label is telling you the database is walking through the entire table. For small tables this is fine — maybe even faster than using an index. But once your table grows, sequential scans become a real bottleneck.

B-tree indexes: the workhorse

When you create an index in most databases, you're getting a B-tree index by default. B-tree stands for "balanced tree," and you can picture it like a decision tree that narrows down where your data lives.

Think of it this way: imagine a dictionary. You don't start at page one and read every word. You know that "M" words are roughly in the middle, so you open there. Then you narrow it down — "Ma" words, then "Mar," then "Martinez." Each step cuts out a huge chunk of the remaining pages.

A B-tree works similarly. It organizes values in a sorted, tree-like structure where each level of the tree narrows the search space dramatically. Finding a value in a million-row table might only require checking 20 or so nodes in the tree, instead of scanning all million rows.

Creating one is straightforward:

CREATE INDEX idx_users_email ON users (email);

Now run that same query again:

EXPLAIN SELECT * FROM users WHERE email = 'ada@example.com';

Index Scan using idx_users_email on users  (cost=0.42..8.44 rows=1 width=72)
  Index Cond: (email = 'ada@example.com'::text)

The cost dropped from ~20,834 to ~8. Instead of scanning the whole table, PostgreSQL jumped straight to the right spot via the index. That's the kind of difference that turns a 3-second API response into a 10-millisecond one.

Composite indexes: column order matters more than you think

Sometimes your queries filter on more than one column. Say you're looking up orders by customer and status:

SELECT * FROM orders
WHERE customer_id = 42 AND status = 'shipped';

You could create two separate indexes — one on customer_id and one on status — but a composite index covering both columns is usually more efficient:

CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);

Here's the thing that trips people up: the order of columns in a composite index matters a lot. Think of it like a phone book sorted by last name, then first name. You can quickly look up everyone named "Martinez," and within those, you can find "Sofia Martinez" easily. But you can't efficiently search for everyone named "Sofia" regardless of last name — the book isn't organized that way.

The same logic applies to composite indexes. The index above works great for:

WHERE customer_id = 42 (uses the index — it's the leading column)
WHERE customer_id = 42 AND status = 'shipped' (uses both columns)

But it won't help much with:

WHERE status = 'shipped' (the leading column is customer_id, so the index can't narrow things down)

The rule of thumb is: put the column you filter on most often (or the most selective one) first.

When indexes hurt instead of help

Indexes aren't free. Every index you add comes with trade-offs, and understanding them keeps you from making your database slower while trying to make it faster.

Write overhead is the big one. Every time you INSERT, UPDATE, or DELETE a row, the database has to update not just the table but every index on that table too. If you have ten indexes on a table that gets thousands of writes per second, those index updates add up fast. Your reads get quicker, but your writes get slower.

Storage is the other cost. An index is a real data structure sitting on disk, and for large tables, indexes can take up significant space — sometimes as much as the table itself.

So indexes are a trade-off: faster reads in exchange for slower writes and more disk usage. For a table that gets read a thousand times for every write, indexes are almost always worth it. For a high-write logging table that rarely gets queried, you might want to keep indexes minimal.

Covering indexes: skipping the table entirely

Normally, when PostgreSQL uses an index to find matching rows, it still has to go back to the actual table to fetch the full row data. The index told it where the row is, but it still needs to read the row itself. This is sometimes called a "table lookup" or "heap fetch."

A covering index is one that contains all the columns your query needs, so the database never has to visit the table at all. It gets everything it needs directly from the index.

For example, suppose you frequently run this:

SELECT email, created_at FROM users WHERE email = 'ada@example.com';

If you create an index that includes both columns:

CREATE INDEX idx_users_email_created ON users (email, created_at);

PostgreSQL can answer the query entirely from the index — it finds the email in the B-tree and the created_at value is right there alongside it. You'll see this show up as an "Index Only Scan" in EXPLAIN:

Index Only Scan using idx_users_email_created on users  (cost=0.42..4.44 rows=1 width=40)
  Index Cond: (email = 'ada@example.com'::text)

That "Index Only Scan" means zero table lookups, which is about as fast as it gets. This technique is particularly useful for queries that run thousands of times per minute, where shaving off even a small amount of I/O adds up.

Common mistakes (and how to avoid them)

Indexing every column. More indexes doesn't mean faster. Each one costs write performance and storage. Be intentional — index columns that actually appear in your WHERE clauses, JOIN conditions, and ORDER BY clauses.

Forgetting to index columns you filter and join on. This is the flip side. If you have a query that joins orders to customers on customer_id, and there's no index on orders.customer_id, that join is doing a sequential scan every time. Check your slow queries and see what columns they're filtering on.

Wrong column order in composite indexes. As we covered above, an index on (status, customer_id) won't help a query that only filters on customer_id. Think about which queries will actually use the index, and put the most commonly filtered or most selective column first.

Never looking at EXPLAIN. You don't have to guess whether an index is being used — the database will tell you. Get comfortable running EXPLAIN (or EXPLAIN ANALYZE for actual execution times) on your slow queries. It takes ten seconds and saves hours of guessing.

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'shipped';

This gives you the query plan and the actual execution time, so you can see exactly what changed after adding an index.

Rules of thumb

When you're deciding whether to add an index, these guidelines will serve you well in the vast majority of cases:

Always index foreign keys. If a column is used in JOIN conditions, it should probably have an index. Some databases do this automatically; PostgreSQL does not.
Index columns in your WHERE clause — especially on large tables where you're selecting a small percentage of rows.
Think about selectivity. An index on a boolean column with two possible values rarely helps, because the database still has to read half the table. An index on email (where every value is unique) is extremely effective.
Fewer, smarter indexes beat more indexes. One well-designed composite index can serve multiple queries. Three separate single-column indexes often can't.
Monitor before and after. Use EXPLAIN ANALYZE to verify that your new index actually improves the query. Sometimes the optimizer ignores it, and you've added write overhead for nothing.
Don't forget partial indexes. In PostgreSQL, you can index only a subset of rows: CREATE INDEX idx_active_users ON users (email) WHERE active = true; — smaller index, faster lookups, lower overhead.

Where to go from here

If you've been writing queries without thinking about indexes, now's a great time to open up your slowest endpoint, run EXPLAIN ANALYZE on its main query, and see what the database is actually doing. You might find a sequential scan on a table with millions of rows, and a single CREATE INDEX statement could cut your response time by 99%.

Indexes are one of those concepts where a little knowledge goes a long way. You don't need to be a DBA to get this right — you just need to understand the trade-offs and pay attention to your query plans. Start with your slowest queries, add indexes where they make sense, and measure the difference.

Got a query that's been bugging you? Drop it in the comments — let's figure out if an index can fix it.

Top comments (1)

Amar Abaz • Feb 15 • Edited

Nicely explained👌