DEV Community

EmilyL
EmilyL

Posted on

How Do You Handle WebSocket Drops and Missing Ticks in US Stock Data? Here’s My Approach

If you’ve ever run a tick-based backtest and ended up with results that just didn’t add up, the culprit might be hiding in your data pipeline — specifically, in those few seconds when your WebSocket feed went silent without you noticing.

As a financial data analyst working with US equities, I’ve been through this more times than I’d like. Today, I’ll walk through the reconnection and backfill mechanism I built to ensure my tick stream remains complete and correctly ordered, even when the network misbehaves.

The Problem: Real-Time Feeds Aren’t “Set and Forget”

WebSocket connections for live tick data are long-lived, which makes them susceptible to network hiccups, server-side maintenance, and local resource limits. A common pitfall is assuming that a library’s auto-reconnect feature solves everything. It doesn’t — most auto-reconnects only resume from “now,” leaving a gap in your time series.

That gap might contain the very trade that triggered your entry signal or the quote that would have prevented a false breakout. So we need two things: automatic reconnection with full state recovery and automatic backfill of missing data.

Reconnection That Remembers

My reconnection strategy is built around these principles:

  • Heartbeat watchdog: I set an application-level timer. If no message (including ping/pong) is received within a threshold (e.g., 15 seconds), I forcibly close the socket and initiate reconnection. This proactive detection beats waiting for OS-level timeouts.
  • Persist subscription context: I keep the list of subscribed tickers in memory. Upon reconnection, the system resubscribes to every ticker automatically — no missed symbols, no manual re-entry.
  • Backoff and circuit breaker: Continuous rapid reconnects can get your IP blacklisted. I use exponential backoff between attempts and a cap on consecutive failures, after which the system pauses and alerts.
  • Handoff to backfill: Once reconnected, a flag is set along with the timestamp of the last successfully received tick. This tells the backfill module exactly where the hole begins.

Filling the Hole with Historical Data

The backfill process is essentially a time-range query against a historical API:

  1. Watermark tracking: During normal streaming, I continuously update a last_tick_time variable. In production, I also persist it to a fast store like Redis.
  2. Freeze on disconnect: When the feed drops, last_tick_time stops advancing. It now defines the start of the missing period.
  3. Retrieve missing ticks: After reconnection, a GET request is sent to the historical tick endpoint with start_time set to the frozen watermark.
  4. Merge in order: The returned ticks and the live ticks both enter a time-sorted buffer. Downstream consumers always process events in strict chronological order.

In my setup, I rely on data platforms like AllTick that offer aligned real-time and historical schemas. This means the backfilled ticks can be inserted directly into the same processing pipeline without any translation layer.

import websocket
import json
import time
import requests

# Last tick timestamp stored locally
last_tick_time = "2026-06-05T10:15:00Z"

def on_message(ws, message):
    data = json.loads(message)
    global last_tick_time
    last_tick_time = data["timestamp"]
    print(data)

# WebSocket reconnection
def reconnect():
    ws = websocket.WebSocketApp(
        "wss://apis.alltick.co/stock/ws",
        on_message=on_message
    )
    ws.run_forever()

reconnect()

# Backfill missing historical ticks
def fetch_missing_data(start_time):
    url = f"https://apis.alltick.co/stock/api/history?start_time={start_time}"
    resp = requests.get(url)
    ticks = resp.json()
    for tick in ticks:
        print(tick)
    # Update the last tick timestamp
    global last_tick_time
    if ticks:
        last_tick_time = ticks[-1]["timestamp"]

fetch_missing_data(last_tick_time)
Enter fullscreen mode Exit fullscreen mode

Practical Tips for Stability

  • Sorted merge buffer: Use a priority queue keyed by timestamp to merge live and backfilled data. This eliminates any race-condition-induced ordering issues.
  • Fine-grained watermarks: Maintain a separate last_tick_time per symbol. This scopes backfill requests to only the affected tickers, reducing load.
  • Comprehensive logging: Record every disconnection event — when it happened, how long it lasted, and how many ticks were backfilled. This data is gold when auditing strategy performance or evaluating provider reliability.
  • Staggered subscription: After reconnecting, subscribe to symbols in batches with short delays to keep your system’s resource usage steady.

Wrapping Up

Solid data engineering doesn’t just make your pipeline faster — it makes your analysis trustworthy. By combining fast heartbeat detection, stateful reconnection, and precise historical backfill, you can turn an unreliable streaming feed into a dependable foundation for quantitative research. Give it a try in your next project, and you’ll likely spend a lot less time second-guessing your backtest results.

Top comments (0)