Athreya aka Maneshwar

Posted on Feb 8

B+-Tree Structure: How Order Is Maintained at Scale

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on FreeDevTools online currently building "one place for all dev tools, cheat codes, and TLDRs" — a free, open-source hub where developers can quickly find and use tools without any hassle of searching all over the internet.

Yesterday, we crossed an important boundary. We stopped talking about interfaces and began talking about shape. Today’s reading cements that shift.

At its core, a B-tree — where B stands for balanced is an answer to a very practical problem:
how do you keep a large collection of records sorted, searchable, and efficient when the data lives on disk rather than in memory?

The answer is not binary trees, and it’s not linked lists.

It’s a family of height-balanced, n-ary trees designed specifically for external storage.

From B-Tree to B+-Tree

In a classic B-tree, both internal nodes and leaf nodes store actual entries.

Search keys and data coexist throughout the tree.

This gives good performance across inserts, deletes, point lookups, and ordered traversal.

SQLite, however, primarily uses a B+-tree variant.

In a B+-tree:

All actual entries live only in leaf nodes
Leaf entries are stored as (key, value) pairs
Internal nodes store only keys and child pointers

Internal nodes exist purely to guide searches.

They are routing tables, not data holders.

As a result, internal nodes can be packed more densely, which reduces tree height and disk I/O during traversal.

Height Balance and Uniform Depth

One defining property of both B-trees and B+-trees is height balance.

Every leaf node sits at the same depth. No matter which key you search for, you traverse the same number of levels from the root to reach a leaf. There are no “deep” or “shallow” paths.

This is what gives B+-trees their predictability.
Search cost grows logarithmically with the number of entries, even as the database scales to millions or billions of rows.

Internal Nodes: Keys as Boundaries, Not Values

Internal nodes in a B+-tree are best thought of as decision points.

They contain:

An ordered list of keys
A corresponding list of child pointers

For an internal node with an upper bound of n + 1 children:

It contains at most n keys
And at most n + 1 child pointers

The interpretation of those keys follows a strict ordering rule.

The leftmost pointer refers to keys less than or equal to the first key.

Each subsequent pointer refers to keys greater than the previous key and less than or equal to the next.

The rightmost pointer covers everything greater than the last key.

These keys are not duplicates of data entries. They are separators that carve the key space into non-overlapping ranges.

Leaf Nodes: Where Data Actually Lives

Leaf nodes are where the real work happens.

Each leaf contains:

Sorted (key, value) pairs
Enough entries to satisfy minimum and maximum occupancy constraints

In many implementations including SQLite’s — leaf nodes are also linked together in key order. This makes range scans efficient, because once you reach the first matching key, you can walk forward leaf by leaf without climbing back up the tree.

This is one of the reasons B+-trees excel at queries like:

SELECT * FROM table WHERE key BETWEEN A AND B;

The tree structure gets you to the first leaf quickly. The leaf chain handles the rest.

Variable Fan-Out and Disk Awareness

A B+-tree is not binary. It is wide.

Internal nodes can have many children, dozens or even hundreds depending on page size and entry size.

The exact limits are implementation-dependent, but there is always an upper bound and a lower bound, usually around half of that upper bound

The root node is special. It is allowed to violate the minimum occupancy rule and may have as few as one child. This flexibility simplifies tree growth and shrinkage.

All of this is designed with disk pages in mind. A node is sized so that it fits neatly into a page, minimizing disk reads and writes.

Search Complexity and Guarantees

Because the tree is height-balanced and wide, search complexity is tightly bounded.

Finding a key requires traversing:

One internal node per level
Until a leaf is reached

The number of levels grows as O(log m), where m is the total number of entries.

In practice, the tree is usually very shallow — often just two or three levels deep.

This is why B+-trees remain dominant in disk-based database systems decades after their invention.

Why This Matters for SQLite

All the pager logic, cache discipline, journaling rules, and tree interface functions we’ve studied so far exist to support this structure.

The pager ensures pages arrive safely in memory
The tree module interprets those pages as nodes
Cursors traverse nodes without worrying about disk I/O
Structural invariants are preserved across crashes and rollbacks

Understanding the B+-tree shape explains why SQLite’s layers are arranged the way they are.

Now that the structure is clear, we’re finally ready to watch it move.

In the next post, we’ll walk through:

Search
Insert
Delete
Node splitting and merging

My experiments and hands-on executions related to SQLite will live here: lovestaco/sqlite

References:

SQLite Database System: Design and Implementation. N.p.: Sibsankar Haldar, (n.d.).

👉 Check out: FreeDevTools

Any feedback or contributors are welcome!

It’s online, open-source, and ready for anyone to use.

⭐ Star it on GitHub: freedevtools

DEV Community