DEV Community

kalos
kalos

Posted on

Recreating Order Books from Tick Data: Principles, Workflow & Implementation


Introduction
When building quantitative trading strategies, running historical backtests and constructing market microstructure models, restoring the order book state at any given timestamp is a fundamental requirement. Most public market data feeds only provide periodic order book snapshots and trade records, which are insufficient for reconstructing full historical order depth.

Tick data serves as a reliable solution to this problem. In this article, we will dive into core principles, data processing rules, standard workflows, practical code and performance optimization strategies for order book reconstruction, targeting quantitative developers and strategy researchers.

1. Tick Data Overview & Common Event Types
Tick data represents the finest granularity of market data. Every order placement, order cancellation, price modification and trade execution generates a new tick record.

The core idea of order book reconstruction is straightforward: replay all tick events strictly in chronological order to reproduce every state change of the order book.

Trade prices alone cannot reflect order volume distribution across different price levels. By processing continuous tick data along the timeline, we can fully restore multi-level bid and ask depth. There are four major tick event types in practical usage:

  • Trades: Adjust order volume at matched price levels when orders are filled.
  • New orders: Add new pending entries to bid or ask queues.
  • Order cancellations: Reduce existing volume on corresponding price levels.
  • Price updates: Move existing orders to new price levels and restructure the order book.

2. Data Structure & Core Processing Rules
In engineering implementation, dictionaries and arrays are commonly used to map price levels to order quantities. For standard 5-level order books, we separate bids and asks for management. Bids are sorted from high to low, while asks are sorted from low to high, following standard market display conventions.

Upon receiving each tick record, we first identify its event type and update the corresponding order volume. One rule must be strictly followed: process all ticks in exact timestamp order.
Even millisecond-level sequence errors will lead to inconsistent order book states compared with real market conditions.

For offline historical tick files, loading the entire dataset into memory at once is not recommended. Parsing and updating records one by one keeps logic clear, and simplifies debugging and validation during backtest development.

3. Standard Workflow for Order Book Reconstruction
We use the timestamp 10:15:30 as an example to demonstrate the universal three-step workflow. This approach works for both real-time data streaming and historical data playback.
1.Initialize the order book
Create an empty data structure and set the order volume of all bid and ask price levels to zero.

2.Update order book sequentially
Iterate over tick records in time order and execute operations based on event types. Increase volume for new orders and decrease volume for cancellations. Trades consume pending orders; large trades may even clear multiple consecutive price levels.

3.Capture target snapshot
Keep updating the order book until the incoming tick timestamp exceeds the target time, then halt execution. The in-memory data at this moment is the complete order book for the selected time point.

Tick data formats vary across different vendors. Some deliver direct order depth updates, while others only include trade data. The implementation details differ slightly, but all aim to restore full bid and ask data for backtesting and market research.

4. Practical Code Implementation
The following code uses the WebSocket stream of AllTick API to subscribe to real-time tick data and capture the order book at the specified time. It can be directly integrated into data collection modules and backtest frameworks.

import websocket
import json

# Initialize order book: buy for bid side, sell for ask side
# Key: price, Value: order quantity
order_book = {'buy': {}, 'sell': {}}

def on_message(ws, message):
    # Parse raw tick data
    tick = json.loads(message)
    # Update 5-level bid and ask data
    for i in range(5):
        buy_price = tick['bidPrice'][i]
        buy_qty = tick['bidQty'][i]
        order_book['buy'][buy_price] = buy_qty

        sell_price = tick['askPrice'][i]
        sell_qty = tick['askQty'][i]
        order_book['sell'][sell_price] = sell_qty
    # Output order book and close connection after reaching target time
    if tick['time'] >= '10:15:30':
        print(order_book)
        ws.close()

# Establish persistent WebSocket connection for real-time data subscription
ws = websocket.WebSocketApp("wss://example.alltick.co/realtime",
                            on_message=on_message)
ws.run_forever()
Enter fullscreen mode Exit fullscreen mode

Key Development Notes
Two points require extra attention during coding. First, maintain correct sorting rules for bid and ask prices to guarantee valid order book structure. Second, calculate volume increments and decrements accurately according to event types. Logical mistakes here will cause data distortion and affect subsequent backtesting and model analysis.

5. Optimization for Large-Scale Datasets
When processing massive historical tick files and running large-scale market backtests, memory usage and computation efficiency become critical bottlenecks. Here are three proven optimization solutions:
1.Limit stored price levels
Retain only the top N levels based on research and strategy requirements, instead of storing all price levels. This effectively reduces memory consumption.

2.Adopt incremental updates
Only modify price levels affected by each tick event, rather than rebuilding the entire order book. This reduces redundant computation and improves overall performance.

3.Add timestamp indexes
Build time indexes for historical tick files to quickly locate target time ranges. This eliminates unnecessary data traversal and shortens preprocessing time for backtesting.
In addition, some third-party tick feeds lack order cancellation records. A common workaround is to inherit the previous valid order book state for logical compensation. This method cannot achieve 100% accuracy, but it meets the requirements of quantitative backtesting, market microstructure analysis and strategy review.

6. Conclusion
Reconstructing order books with tick data is essentially replaying market micro-events in chronological order. Compared with candlestick charts and pure trade data, restored order books reveal detailed liquidity distribution and order changes. They are essential data sources for liquidity factor mining, short-term trend modeling and high-frequency strategy development.

For quantitative developers and researchers, mastering tick data parsing and order book reconstruction expands available data dimensions, improves the authenticity of historical backtests and accelerates strategy iteration. Subtle changes in orders and trades often drive short-term market movements. Fully leveraging tick and order book data helps build more refined quantitative models and trading strategies.

Top comments (0)