Lalit Mishra

Posted on Feb 14

The Client Knows Best: Deep Dive into WebRTC getStats() and Quality Monitoring

#python #webrtc #videocalling #deepdive

Opening Context – Why Server Logs Lie

In the distributed architecture of real-time communications, the server is often the last to know about a degradation in quality. You can scale your Selective Forwarding Units (SFUs) to handle ten thousand concurrent streams, ensure your signaling WebSocket servers have sub-millisecond latency, and provision massive bandwidth headroom, yet your users will still report frozen video and robotic audio.

Traditional backend observability relies on server-side logs: socket connection states, ICE (Interactive Connectivity Establishment) completion events, and SFU ingress/egress bitrates. These metrics are necessary but insufficient. They represent the infrastructure's view of the world, not the user's reality.

A "green" server dashboard frequently coexists with a "red" user experience. The server knows it forwarded a packet. It does not know if that packet arrived at the client, if it arrived in the correct order, or if the client’s decoder had the CPU cycles to render it. Packet loss, jitter buffer underruns, and decode failures are strictly client-side phenomena.

The only authoritative source of truth for media quality is the client itself. The WebRTC RTCPeerConnection.getStats() API provides a direct window into the media engine's internal state. It exposes the raw telemetry required to distinguish between a network failure, a device limitation, and a platform bug. To build a robust WebRTC platform, you must treat the client not just as a consumer of media, but as an active telemetry node in your observability mesh.

The getStats Standard – Understanding the W3C Report Model

The W3C WebRTC Statistics API is a standardized interface that returns a RTCStatsReport. Unlike earlier, browser-specific implementations (such as the legacy getStats in Chrome), the modern standard returns a map-like structure where every entry represents a specific component of the peer connection at a specific point in time.

Invoking await peerConnection.getStats() returns an iterable map of RTCStats dictionaries. Navigating this map requires an understanding of the relationship between the report types. The architecture is a graph, not a tree.

The critical report types for quality monitoring are:

inbound-rtp: Describes media received from a remote peer. This is where you find packet loss, bytes received, and decode metrics for incoming video/audio.
outbound-rtp: Describes media sent by the local client. This tracks bytes sent and encoding CPU usage.
remote-inbound-rtp: This is a mirror. It contains data sent back via RTCP Receiver Reports (RR), telling the sender how the receiver is perceiving the stream (primarily Round Trip Time and fraction lost).
candidate-pair: Describes the active transport path (local IP to remote IP). This provides the most accurate RTT and current throughput capacity.
transport: Aggregated transport metrics, including encryption (DTLS) state.

Each report contains a timestamp (high-resolution DOMHighResTimeStamp), a type, and an id. The id is crucial for linking reports. For example, an inbound-rtp report will have a transportId field pointing to the associated transport report.

Traversing this structure requires iterating over the report values and filtering by type. The sheer volume of data returned—often dozens of reports per second—means that raw ingestion is prohibitively expensive. We must identify and extract only the signals that matter.

Identifying Critical Metrics

Not all stats are created equal. While the spec defines hundreds of metrics, only a specific subset correlates directly with perceived user quality (QoE).

Round Trip Time (RTT)

RTT is the primary indicator of network latency. High RTT degrades interactivity and forces the congestion control algorithm to be more conservative.

Source: candidate-pair.currentRoundTripTime. This is the most immediate measure of the network path.
Secondary Source: remote-inbound-rtp.roundTripTime. This is calculated from RTCP reports and is useful for understanding the "view from the other side."

Jitter

Jitter measures the variance in packet arrival time. The WebRTC jitter buffer must delay playback to smooth this variance. If jitter exceeds the buffer size, packets are discarded, resulting in robotic audio or frozen video.

Source: inbound-rtp.jitter.

Packets Lost

Packet loss is the enemy of smooth media. WebRTC can tolerate minor loss via NACKs (Negative Acknowledgments) and FEC (Forward Error Correction), but sustained loss > 5% usually results in visible artifacts.

Source: inbound-rtp.packetsLost. Note that this is a cumulative counter. You must calculate the delta between samples to determine the current loss rate.

Frames Decoded and Dropped

This is the "smoking gun" for video freezes.

framesDecoded: A cumulative count of frames successfully passed through the decoder.
framesDropped: Frames received but discarded before decoding (usually due to CPU saturation or buffer underruns).

The Freeze Detector:
The most reliable way to detect a video freeze is to monitor the delta of framesDecoded over time.
If bytesReceived is increasing (network is flowing) but framesDecoded delta is zero, the user is looking at a frozen frame. This specific signature distinguishes a network cutoff (where bytes stop) from a decoder failure or keyframe starvation (where bytes flow but video doesn't update).

Client-Side Normalization Architecture

Sending the full getStats() dump to your backend every second is an anti-pattern. It wastes client bandwidth, CPU, and storage. The client must act as an edge processor, normalizing and aggregating data before transmission.

The Strategy:

Poll Interval: Poll getStats() every 1 to 5 seconds.
Delta Computation: Store the previous report. Calculate current_value - previous_value for cumulative counters like bytesReceived, packetsLost, and framesDecoded.
Normalization: Convert raw bytes to bits-per-second (bps). Convert cumulative loss to "loss per interval."
Batching: Accumulate 5-10 samples before sending to the server to reduce HTTP/WebSocket overhead.

Production-Grade JavaScript Implementation

The following class implements a robust telemetry agent. It handles the difference between standard and legacy stats and computes the critical deltas required for freeze detection.

class WebRTCTelemetryAgent {
  constructor(peerConnection, ingestionUrl, pollingIntervalMs = 2000) {
    this.pc = peerConnection;
    this.ingestionUrl = ingestionUrl;
    this.intervalMs = pollingIntervalMs;
    this.previousStats = new Map(); // Store previous reports by ID
    this.timer = null;
    this.sessionId = crypto.randomUUID();
    this.buffer = [];
  }

  start() {
    this.timer = setInterval(() => this.collectStats(), this.intervalMs);
  }

  stop() {
    if (this.timer) clearInterval(this.timer);
    this.flush(); // Send remaining data
  }

  async collectStats() {
    if (this.pc.connectionState === 'closed') {
      this.stop();
      return;
    }

    try {
      const report = await this.pc.getStats();
      const metrics = this.parseReport(report);

      if (metrics) {
        this.buffer.push(metrics);
        // Batch size of 5 samples creates a ~10s send window
        if (this.buffer.length >= 5) {
          this.flush();
        }
      }
    } catch (e) {
      console.error("Telemetry error:", e);
    }
  }

  parseReport(report) {
    let selectedCandidatePair = null;
    let inboundVideo = null;
    let inboundAudio = null;

    // 1. Identify active candidate pair and inbound streams
    for (const stats of report.values()) {
      if (stats.type === 'candidate-pair' && stats.state === 'succeeded') {
        selectedCandidatePair = stats;
      }
      if (stats.type === 'inbound-rtp') {
        if (stats.kind === 'video') inboundVideo = stats;
        if (stats.kind === 'audio') inboundAudio = stats;
      }
    }

    if (!selectedCandidatePair) return null;

    const timestamp = new Date().toISOString();

    // 2. Compute Deltas for Video
    const videoMetrics = this.computeStreamDeltas(inboundVideo, 'video');

    return {
      sessionId: this.sessionId,
      timestamp: timestamp,
      rtt: selectedCandidatePair.currentRoundTripTime * 1000, // ms
      availableOutgoingBitrate: selectedCandidatePair.availableOutgoingBitrate,
      video: videoMetrics,
      // Audio logic omitted for brevity
    };
  }

  computeStreamDeltas(currentStats, kind) {
    if (!currentStats) return null;

    const prev = this.previousStats.get(currentStats.id);
    this.previousStats.set(currentStats.id, currentStats);

    if (!prev) return null; // First sample, cannot compute delta

    const timeDelta = (currentStats.timestamp - prev.timestamp) / 1000; // seconds
    if (timeDelta <= 0) return null;

    // 3. Key Calculations
    const packetsLostDelta = currentStats.packetsLost - prev.packetsLost;
    const packetsReceivedDelta = currentStats.packetsReceived - prev.packetsReceived;
    const totalPackets = packetsLostDelta + packetsReceivedDelta;

    // Packet Loss Fraction (0.0 to 1.0)
    const lossFraction = totalPackets > 0 ? packetsLostDelta / totalPackets : 0;

    // Bitrate Calculation
    const bytesDelta = currentStats.bytesReceived - prev.bytesReceived;
    const bitrate = (bytesDelta * 8) / timeDelta; // bits per second

    // Freeze Detection Logic
    const framesDecodedDelta = currentStats.framesDecoded - prev.framesDecoded;
    const isFrozen = (kind === 'video' && bitrate > 10000 && framesDecodedDelta === 0);

    return {
      bitrate: Math.round(bitrate),
      packetLoss: parseFloat(lossFraction.toFixed(4)),
      jitter: currentStats.jitter * 1000, // ms
      framesDecodedPerSecond: Math.round(framesDecodedDelta / timeDelta),
      isFrozen: isFrozen
    };
  }

  async flush() {
    if (this.buffer.length === 0) return;

    const payload = [...this.buffer];
    this.buffer = [];

    // Use beacon or fetch keepalive to ensure delivery on page unload
    fetch(this.ingestionUrl, {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ metrics: payload }),
      keepalive: true
    }).catch(e => console.warn("Failed to send telemetry", e));
  }
}

This code does the heavy lifting: it ignores static counters and provides the backend with rate-based metrics (framesDecodedPerSecond, bitrate) and a boolean isFrozen flag.

Backend Ingestion and Processing (Python)

The backend must ingest these time-series bursts efficiently. A high-performance Python framework like FastAPI is ideal here. We define a strict schema using Pydantic to validate the incoming telemetry.

The ingestion layer has two responsibilities:

Persist raw data to the time-series database (TSDB).
Evaluate real-time alerts (e.g., if freeze count > X).

from fastapi import FastAPI, BackgroundTasks, HTTPException
from pydantic import BaseModel
from typing import List, Optional
import time

app = FastAPI()

# --- Schema Definition ---

class VideoMetrics(BaseModel):
    bitrate: int
    packetLoss: float
    jitter: float
    framesDecodedPerSecond: int
    isFrozen: bool

class TelemetryPoint(BaseModel):
    sessionId: str
    timestamp: str
    rtt: float
    availableOutgoingBitrate: Optional[float] = None
    video: Optional[VideoMetrics] = None

class TelemetryBatch(BaseModel):
    metrics: List[TelemetryPoint]

# --- Processing Logic ---

# In-memory alert buffer (Replace with Redis in production)
alert_buffer = {} 

def process_metrics_batch(batch: TelemetryBatch):
    """
    1. Write to Time-Series DB (e.g., InfluxDB, TimescaleDB)
    2. Check for freeze patterns
    """
    for point in batch.metrics:
        # Mock DB Insertion
        # db.write_point("telemetry", tags={"session": point.sessionId}, fields=point.dict())

        # Freeze Detection Logic
        if point.video and point.video.isFrozen:
            check_alert_threshold(point.sessionId)

def check_alert_threshold(session_id: str):
    """
    Simple hysteresis: Trigger alert if 3 consecutive freeze events occur.
    """
    current_time = time.time()
    state = alert_buffer.get(session_id, {"count": 0, "last_seen": 0})

    # Reset if alerts are too far apart (e.g., > 10 seconds)
    if current_time - state["last_seen"] > 10:
        state["count"] = 0

    state["count"] += 1
    state["last_seen"] = current_time
    alert_buffer[session_id] = state

    if state["count"] >= 3:
        trigger_pagerduty(session_id)
        state["count"] = 0 # Reset after alert

def trigger_pagerduty(session_id: str):
    print(f"[ALERT] CRITICAL: Sustained video freeze detected for session {session_id}")

# --- API Endpoint ---

@app.post("/v1/telemetry/ingest")
async def ingest_telemetry(batch: TelemetryBatch, background_tasks: BackgroundTasks):
    if not batch.metrics:
        raise HTTPException(status_code=400, detail="Empty batch")

    # Offload processing to background task to keep API response < 20ms
    background_tasks.add_task(process_metrics_batch, batch)

    return {"status": "accepted", "processed": len(batch.metrics)}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

This Python implementation leverages FastAPI's BackgroundTasks to ensure the ingestion endpoint returns immediately. We do not want telemetry upload to block the client's network thread, even if that thread is theoretically non-blocking in JS. The processing logic separates persistence from alerting.

Storage Strategy

Relational databases are poor candidates for high-frequency telemetry. The write amplification of indexing millions of rows per hour will cripple a standard PostgreSQL instance.

Recommended Storage Engines:

TimescaleDB (PostgreSQL Extension): Excellent if you want SQL capability. It partitions tables by time (hypertables) to keep indices small and writes fast.
InfluxDB: Purpose-built for high-write-volume observability.
ClickHouse: If you are operating at massive scale (millions of concurrent sessions), the columnar storage of ClickHouse offers superior compression and query speed for analytics.

Schema Design:
The schema should avoid deep nesting. Flatten the critical metrics for faster aggregation.

Tags (Indexed): session_id, room_id, user_id, platform (e.g., "chrome_110"), region.
Fields (Metrics): rtt, jitter, packet_loss_pct, bitrate_bps, frames_decoded_delta, freeze_flag (boolean/int).

Real-World Debugging Scenario

Consider a typical production incident. A VIP client reports: "The video froze for 20 seconds during the board meeting."

Without client stats, you check the server logs. The SFU logs show the WebSocket remained open (connectionState: connected). The bandwidth estimation logs show a drop, but not a disconnect. You are blind.

With the telemetry system built above, you query the specific session_id in your TSDB.

The Forensic Analysis:

Check Signaling: You confirm connectionState remained "connected." The user did not disconnect.
Check Bitrate: You see bitrate_bps drop from 2Mbps to 50kbps at T+10:00.
Check Packet Loss: Simulatneously, packet_loss_pct spikes to 15%.
Check Freeze Flag: The frames_decoded_delta drops to 0 for exactly 18 seconds, triggering the freeze_flag.
Check RTT: Crucially, you see rtt spike from 50ms to 800ms before the freeze.

Conclusion: The rising RTT and Packet Loss prove this was network congestion on the user's last mile (likely Wi-Fi contention), not a backend failure. The SFU reacted correctly by lowering bitrate (congestion control), but the loss was too high to sustain video. You can now confidently explain to the client that the issue lies in their local network environment, backed by hard data.

Architectural Pattern: The Side-Channel Approach

Do not piggyback telemetry on your primary signaling WebSocket. Signaling is critical infrastructure; if your telemetry payload floods the WebSocket queue, you risk blocking "Offer/Answer" negotiation or "Candidate" exchange, causing the very failures you are trying to measure.

The Side-Channel Pattern:

Signaling Channel (WebSocket): Reserved strictly for SDP, ICE candidates, and roster updates. High priority.
Telemetry Channel (HTTP/POST): A separate path for stats. Use navigator.sendBeacon or a low-priority fetch loop. If telemetry packets fail, it is acceptable; if signaling packets fail, the call drops.
Aggregation Tier: A dedicated microservice (like the Python example above) that validates and queues data.
Storage Tier: The Time-Series Database.
Alert Engine: A worker that polls the TSDB or subscribes to the Aggregation Tier to trigger PagerDuty/Slack alerts based on freeze thresholds.

Conclusion

Observability in WebRTC is not about collecting logs; it is about reconstructing the user's reality. The getStats API is the only mechanism that bridges the gap between network physics and perceived quality. By implementing client-side delta computation, efficient ingestion pipelines, and intelligent freeze detection, engineering teams transform vague user complaints into actionable infrastructure insights. In the world of real-time video, if you aren't measuring the client, you aren't measuring anything.

DEV Community