Piter Adyson

Posted on Feb 9

PostgreSQL indexing explained — 5 index types and when to use each

#database #postgres

Indexes are one of those things that everybody knows they should use, but few people actually understand beyond the basics. You create an index, the query gets faster, done. Except when it doesn't. Or when the wrong index makes things slower. Or when you're running five indexes on a table and none of them are being used.

PostgreSQL ships with five distinct index types, each designed for different access patterns. Picking the right one is the difference between a query that takes 2 milliseconds and one that takes 20 seconds. This article covers all five, when they actually help and when they're a waste of disk space.

How PostgreSQL indexes work under the hood

Before jumping into specific types, it helps to understand what an index actually does. A PostgreSQL index is a separate data structure that maps column values to the physical location of rows on disk. When you run a query with a WHERE clause, the planner checks whether an index exists that can narrow down the search instead of scanning every row.

Without an index, PostgreSQL performs a sequential scan. It reads the entire table, row by row, checking each one against your filter. For a table with 100 rows, that's fine. For a table with 100 million rows, it's a problem.

-- Without an index, this scans the entire table
SELECT * FROM orders WHERE customer_id = 'abc-123';

-- With an index on customer_id, PostgreSQL jumps directly to matching rows
CREATE INDEX idx_orders_customer_id ON orders (customer_id);

Indexes aren't free though. Every index takes disk space and slows down INSERT, UPDATE and DELETE operations because PostgreSQL has to maintain the index alongside the table data. A table with ten indexes means every write operation updates ten additional data structures.

The goal is to have the right indexes for your query patterns and nothing more.

1. B-tree — the default workhorse

B-tree is the default index type in PostgreSQL. If you run CREATE INDEX without specifying a type, you get a B-tree. It handles equality and range queries on sortable data, which covers the vast majority of real-world use cases.

B-tree indexes store data in a balanced tree structure. Each node contains sorted keys and pointers to child nodes, allowing PostgreSQL to find any value in O(log n) time. They support =, <, >, <=, >=, BETWEEN and IS NULL operators efficiently.

-- All of these use B-tree indexes effectively
CREATE INDEX idx_orders_created_at ON orders (created_at);

SELECT * FROM orders WHERE created_at > '2026-01-01';
SELECT * FROM orders WHERE created_at BETWEEN '2026-01-01' AND '2026-02-01';
SELECT * FROM orders WHERE created_at = '2026-02-08';

-- Multi-column B-tree indexes
CREATE INDEX idx_orders_customer_date ON orders (customer_id, created_at);

-- This uses the index (leftmost prefix rule)
SELECT * FROM orders WHERE customer_id = 'abc-123' AND created_at > '2026-01-01';

-- This also uses the index (first column matches)
SELECT * FROM orders WHERE customer_id = 'abc-123';

-- This does NOT use the index efficiently (skips the first column)
SELECT * FROM orders WHERE created_at > '2026-01-01';

The column order in multi-column B-tree indexes matters a lot. PostgreSQL can use the index starting from the leftmost column. If your query only filters on the second column, the index likely won't help.

Scenario	B-tree works?
Exact match (`WHERE status = 'active'`)	Yes
Range queries (`WHERE price > 100`)	Yes
Sorting (`ORDER BY created_at DESC`)	Yes
Pattern matching (`WHERE name LIKE 'John%'`)	Yes (prefix only)
Pattern matching (`WHERE name LIKE '%John%'`)	No
Array or JSON containment	No

B-tree is the right choice for primary keys, foreign keys, timestamp columns used in range filters and any column you frequently sort on. If you're unsure which index type to use, B-tree is almost always a safe starting point.

2. Hash — fast equality lookups

Hash indexes build a hash table mapping each value to the row locations that contain it. They only support equality comparisons (=), but they do it with O(1) lookup time instead of O(log n) for B-tree.

Before PostgreSQL 10, hash indexes were not crash-safe because they weren't WAL-logged. That made them basically unusable in production. Since PostgreSQL 10, they're fully crash-safe and a reasonable option for specific workloads.

CREATE INDEX idx_sessions_token ON sessions USING hash (token);

-- This uses the hash index
SELECT * FROM sessions WHERE token = 'a1b2c3d4e5f6';

-- This does NOT use the hash index (not an equality check)
SELECT * FROM sessions WHERE token > 'a1b2c3d4e5f6';

Hash indexes are smaller than B-tree indexes for the same data, which can matter for large tables with high-cardinality columns. If you have a table with 50 million rows and you only ever look up by an exact session token or API key, a hash index uses less memory and disk.

In practice, the difference is often marginal. B-tree handles equality just fine, and it also supports range queries as a bonus. Most PostgreSQL users never create a hash index. But if you're optimizing a high-throughput lookup table where every byte of index size matters, it's worth benchmarking.

When to use hash over B-tree:

Exact match queries only, no range scans
Very high cardinality columns (UUIDs, tokens, hashes)
You want the smallest possible index size
You've benchmarked and confirmed it outperforms B-tree for your workload

3. GIN — for full-text search, arrays and JSONB

GIN stands for Generalized Inverted Index. It's designed for values that contain multiple elements, like arrays, JSONB documents and full-text search vectors. Where a B-tree maps one value to one row, a GIN index maps each element inside a composite value to the rows that contain it.

Think of it like a book index at the back of a textbook. You look up a word and it tells you all the pages where that word appears. GIN does the same thing for array elements, JSON keys and text lexemes.

-- Full-text search
CREATE INDEX idx_articles_search ON articles USING gin (to_tsvector('english', body));

SELECT * FROM articles
WHERE to_tsvector('english', body) @@ to_tsquery('postgresql & indexing');

-- JSONB containment
CREATE INDEX idx_events_data ON events USING gin (metadata);

SELECT * FROM events
WHERE metadata @> '{"source": "api", "version": 2}';

-- Array containment
CREATE INDEX idx_products_tags ON products USING gin (tags);

SELECT * FROM products
WHERE tags @> ARRAY['electronics', 'wireless'];

GIN indexes are slower to build and update than B-tree indexes. Every insert potentially needs to update many entries in the inverted index. For write-heavy tables, this can be a noticeable overhead. PostgreSQL mitigates this with "fastupdate" which batches pending index entries, but it means the index can be slightly behind during heavy writes.

Feature	B-tree	GIN
Equality and range queries	Yes	No
Full-text search (`@@`)	No	Yes
Array containment (`@>`)	No	Yes
JSONB containment (`@>`, `?`, `?&`)	No	Yes
Index build speed	Fast	Slow
Write overhead	Low	Medium to high
Index size	Moderate	Large

GIN is the correct choice whenever you need to search within composite values. If you're running WHERE tags @> ..., WHERE metadata @> ... or WHERE tsvector @@ tsquery, a GIN index is what you want. Just be aware that it comes with higher write costs and larger disk usage compared to B-tree.

4. GiST — for geometric, range and proximity queries

GiST stands for Generalized Search Tree. It's a framework for building custom index types, but in practice it's mostly used for geometric data (points, polygons, circles), range types (date ranges, integer ranges) and full-text search (as an alternative to GIN).

GiST indexes work by recursively partitioning the search space. For geometric data, imagine dividing a map into progressively smaller regions. To find all restaurants within 500 meters, the index eliminates entire regions that are too far away without checking individual rows.

-- PostGIS spatial queries
CREATE INDEX idx_locations_geo ON locations USING gist (coordinates);

SELECT * FROM locations
WHERE ST_DWithin(coordinates, ST_MakePoint(-73.985, 40.748)::geography, 500);

-- Range overlap queries
CREATE INDEX idx_reservations_period ON reservations USING gist (during);

SELECT * FROM reservations
WHERE during && daterange('2026-02-01', '2026-02-15');

-- Nearest-neighbor search
SELECT name, ST_Distance(coordinates, ST_MakePoint(-73.985, 40.748)::geography) AS distance
FROM locations
ORDER BY coordinates <-> ST_MakePoint(-73.985, 40.748)::geography
LIMIT 10;

GiST also supports full-text search, but with different trade-offs compared to GIN. GiST full-text indexes are faster to build and smaller on disk, but slower for queries, especially when a search term appears in many documents. GIN is generally preferred for full-text search unless you're combining it with other GiST-supported operations.

When to use GiST:

PostGIS and geographic data (finding nearby points, intersecting polygons)
Range type operations (overlapping date ranges, integer ranges)
Nearest-neighbor queries (ORDER BY ... <->)
Exclusion constraints (preventing overlapping ranges in a table)

-- Exclusion constraint using GiST
-- Prevents overlapping room reservations
CREATE TABLE room_bookings (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    room_id INTEGER NOT NULL,
    during TSTZRANGE NOT NULL,
    EXCLUDE USING gist (room_id WITH =, during WITH &&)
);

The exclusion constraint example is particularly useful. It guarantees at the database level that no two bookings for the same room can overlap. This is something you can't do with B-tree indexes.

5. BRIN — for large, naturally ordered tables

BRIN stands for Block Range Index. It's the most space-efficient index type PostgreSQL offers, but it only works well under a specific condition: the physical order of rows on disk must correlate with the column values.

Instead of indexing every row, BRIN indexes store summary information (min and max values) for each block range, which is a group of consecutive physical pages. When PostgreSQL scans for rows, it checks the block summaries and skips entire ranges that can't contain matching data.

-- Perfect for a time-series table where rows are inserted in chronological order
CREATE INDEX idx_logs_created_at ON access_logs USING brin (created_at);

-- This can skip huge portions of the table
SELECT * FROM access_logs
WHERE created_at BETWEEN '2026-02-01' AND '2026-02-02';

The size difference is dramatic. A B-tree index on a 100 GB table might be 2 GB. A BRIN index on the same table could be 100 KB. That's not a typo. BRIN indexes are orders of magnitude smaller because they store one summary per block range instead of one entry per row.

But this efficiency has a hard prerequisite. If the data isn't physically ordered on disk by the indexed column, BRIN is useless. If you insert rows with random timestamps, the min/max summaries for each block range will span the entire value space, and PostgreSQL won't be able to skip anything.

Good candidates for BRIN:

Append-only tables with timestamp columns (logs, events, audit trails)
Tables where rows are inserted in natural order of some column
Very large tables (millions or billions of rows) where B-tree index size is a concern

Bad candidates for BRIN:

Tables with frequent updates that change the indexed column
Tables where rows are inserted in random order
Small tables (B-tree is more efficient for small datasets)

BRIN is a specialized tool. When it fits, it's incredible. When it doesn't, it won't help at all. Check the correlation between physical row order and column values using pg_stats before deciding:

SELECT tablename, attname, correlation
FROM pg_stats
WHERE tablename = 'access_logs' AND attname = 'created_at';

A correlation value close to 1 or -1 means BRIN will work well. Values near 0 mean the data is randomly distributed and BRIN won't help.

Practical indexing tips

Knowing which index types exist is half the story. The other half is using them effectively.

Check if your indexes are actually being used. PostgreSQL tracks index usage statistics. If an index hasn't been scanned in months, it's costing you write performance for no benefit.

SELECT
    indexrelname AS index_name,
    idx_scan AS times_used,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

Use EXPLAIN ANALYZE before and after creating indexes. Don't assume an index will help. Verify it. Sometimes the planner chooses a sequential scan because the table is small enough that the index adds no value.

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 'abc-123';

Consider partial indexes for filtered queries. If you only ever query active orders, index only the active rows:

CREATE INDEX idx_orders_active ON orders (customer_id)
WHERE status = 'active';

This index is smaller and faster than indexing all orders because it only covers rows matching the condition.

Don't forget about covering indexes. If a query only needs columns that are all in the index, PostgreSQL can answer it entirely from the index without touching the table. This is called an index-only scan.

CREATE INDEX idx_orders_covering ON orders (customer_id) INCLUDE (total, created_at);

-- This can be served entirely from the index
SELECT total, created_at FROM orders WHERE customer_id = 'abc-123';

Keeping your data safe while you optimize

Experimenting with indexes is relatively low-risk since you can always drop an index and try again. But schema changes, large data migrations and production experiments can go wrong in ways that are harder to undo.

Having a reliable PostgreSQL backup strategy means you can experiment with confidence. Databasus is an industry standard for PostgreSQL backup tools. It handles automated scheduled backups with compression, encryption and multiple storage destinations, suitable for individual developers and enterprise teams alike.

Choosing the right index for your workload

There's no universal "best" index type. The right choice depends entirely on your data and your queries. B-tree covers most common scenarios. GIN handles composite and full-text data. GiST solves geometric and range problems. Hash optimizes pure equality lookups. BRIN saves massive disk space on naturally ordered data.

Start with EXPLAIN ANALYZE on your slowest queries, identify what kind of operations they perform and match those operations to the appropriate index type. One well-chosen index beats five poorly chosen ones every time.

DEV Community