Athreya aka Maneshwar

Posted on Feb 13

Page Structure: From Logical Trees to Raw Bytes

#webdev #programming #database #architecture

Hello, I'm Maneshwar. I'm working on git-lrc: a Git hook for Checking AI generated code.
AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.
git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

You already know that a SQLite database file is divided into fixed-size pages. Every structural concept we’ve discussed B+-trees, internal nodes, leaf nodes, overflow chains ultimately lives inside these pages.

The tree module manages all of them.

Each page is one of the following:

A tree page (internal, leaf, or overflow)
A free page
Or, in special cases, a lock-byte or pointer-map page

The free pages themselves are not scattered randomly.

They are organized into a single trunk list, as shown in the following figure:

This trunk list allows SQLite to recycle unused pages efficiently. When nodes split or merge, or when tables are dropped, pages return to this free structure.

Page 1: The Special Case

Every page in the database can serve any role — except Page 1.

Page 1 is special.

It always contains:

The file header (first 100 bytes)
Followed by a B+-tree root node

The file header describes global properties of the database file itself.

The first 100 bytes are reserved strictly for this header. The rest of Page 1 behaves like a normal B+-tree page.

All other pages in the file are entirely consumed by either:

A single tree node
Or overflow content

There is no fragmentation across pages. One page, one structural role.

Tree Page Structure: The Four-Region Layout

Now we zoom into a tree page — internal or leaf.

Each tree page is logically divided into cells.

A cell is the atomic unit of storage inside the tree.

On an internal page, a cell contains:
- A key value
- The child pointer preceding that key
On a leaf page, a cell contains:
- The payload (or part of it)
- No child pointer

But the physical layout of a tree page is more interesting.

Every internal or leaf page is divided into four regions:

Page header
Cell content area
Cell pointer array
Unallocated space

What makes this elegant is how the layout grows.

The cell pointer array grows downward.
The cell content area grows upward.
The free space lives between them.

They grow toward each other like two opposing stacks.

This design allows SQLite to:

Insert and delete cells without rewriting the entire page
Maintain logical order independently from physical placement
Compact space when fragmentation grows

The cell pointer array acts as a miniature directory for the page. It maps logical cell order to their actual storage offsets inside the page.

So even if cell bodies move around during balancing or compaction, logical order remains stable.

Structure of the Page Header

The page header sits at the very beginning of the page (except on Page 1, where it begins at byte 100).

The header stores only management metadata for that page.

Key details:

Multi-byte integers are stored in big-endian format
The first byte (offset 0) contains flags indicating page type

The page-type flags identify whether the page is:

Table B+-tree internal page
Table B+-tree leaf page
Index B-tree internal page
Index B-tree leaf page

For internal pages, there is also a rightmost child pointer stored at offset 8.

That pointer exists because internal nodes in a B+-tree have one more child pointer than separator keys.

All other child pointers are embedded inside cells.

The rest of the header stores:

Free space offsets
Fragmentation counters
Cell count
Start of cell content area

All of this enables the page to behave like a self-contained memory manager.

Why This Layout Is So Powerful

Notice the layering again:

The pager manages pages as opaque byte arrays.
The tree module interprets those byte arrays as structured nodes.
Each node manages its own mini-allocation system internally.

This design achieves:

Efficient in-page insert/delete
Minimal disk rewrites
Clean separation between logical order and physical layout
Crash safety via journaling at page granularity

Every B+-tree operation we discussed earlier — search, insert, delete — ultimately rewrites cells inside this four-region page structure.

The abstraction only works because the layout is disciplined.

👉 Check out: git-lrc
Any feedback or contributors are welcome! It’s online, source-available, and ready for anyone to use.
⭐ Star it on GitHub:

HexmosTech / git-lrc

Check AI generated code with Git Hooks

git-lrc

Check AI-Generated Code With Git Hooks

AI agents write code fast. They also silently remove logic, change behavior, and introduce bugs -- without telling you. You often find out in production.

git-lrc fixes this. It hooks into git commit and reviews every diff before it lands. 60-second setup. Completely free.

See It In Action

See git-lrc catch serious security issues such as leaked credentials, expensive cloud operations, and sensitive material in log statements

git-lrc-intro-60s.mp4

Why

🤖 AI agents silently break things. Code removed. Logic changed. Edge cases gone. You won't notice until production.
🔍 Catch it before it ships. AI-powered inline comments show you exactly what changed and what looks wrong.
🔁 Build a habit, ship better code. Regular review → fewer bugs → more robust code → better results in your team.
🔗 Why git? Git is universal. Every editor, every IDE, every AI toolkit uses it…