Lalit Mishra

Posted on Feb 14

Breaking the Limits: Hybrid WebRTC Load Testing with k6 and xk6-browser

#webrtc #python #ai #videocalling

The WebRTC Testing Gap: Why HTTP Tools Fail

In the world of standard REST APIs, load testing is a solved problem. You spin up JMeter, Locust, or standard k6, blast an endpoint with 10,000 requests per second, and measure the Time to First Byte (TTFB). If the server returns a 200 OK within 50ms, you are green.

In WebRTC, a 200 OK from the signaling server is barely the starting line.

WebRTC is not a request-response protocol; it is a complex, stateful negotiation followed by a continuous, high-bandwidth UDP stream. A "successful" test in the world of WebRTC requires validating a formidable chain of events:

Signaling: WebSocket connection, session creation, SDP offer/answer exchange.
ICE Gathering: STUN packet exchange, candidate discovery, and pair selection.
DTLS Handshake: Secure key exchange for the media plane.
SRTP Flow: The actual encryption and transmission of audio/video packets.
Congestion Control: The bandwidth estimation (BWE) loop reacting to packet loss and jitter.

Traditional tools like JMeter can simulate the WebSocket signaling (the "control plane"), but they are completely blind to the "media plane." They cannot execute JavaScript, they cannot decode VP8/H264 frames, and they cannot calculate the jitter buffer state. You might have a signaling server that handles 50k concurrent users perfectly, while your media server (SFU) is dropping 40% of packets because it crashed under the encryption load.

Validating a WebRTC platform requires a browser. However, browsers are expensive. This creates the "Testing Gap": we need the volume of protocol testing combined with the fidelity of browser testing.

The Architecture: Hybrid Load Testing with k6

To bridge this gap, we utilize k6, a modern load testing tool built in Go, and its extension xk6-browser.

Standard k6 is optimized for raw throughput. It can generate thousands of virtual users (VUs) per core, ideal for stressing your signaling layer (WebSocket handling, room creation logic, database lookups).

xk6-browser embeds a headless Chromium instance directly into the k6 VU. This allows us to script full browser interactions—clicking buttons, verifying DOM elements, and crucially, executing the WebRTC JavaScript APIs (getUserMedia, RTCPeerConnection).

The Hybrid Architecture is the gold standard for RTC validation. In a single test execution, we define two distinct scenarios:

The Swarm (Protocol Level): 95% of the load. Thousands of lightweight VUs hitting the signaling APIs, joining rooms via WebSocket, and keeping sessions open to stress memory and connection limits.
The Scouts (Browser Level): 5% of the load. Full headless Chrome instances that actually publish and subscribe to media. These agents verify that media is actually flowing, that ICE checks complete, and that video frames are being decoded.

Scripting the Test: The Implementation

This section details how to construct a hybrid test. We use k6 scenarios to mix the two workloads.

First, you must build a custom k6 binary that includes the browser extension. (We will cover the Docker build process later).

The Hybrid Script (`load-test.js`)

This script demonstrates a realistic flow: authenticating via REST, establishing a signaling path, and then launching a browser to verify media flow.

import http from 'k6/http';
import { check, sleep } from 'k6';
import { browser } from 'k6/experimental/browser';
import ws from 'k6/ws';

// Configuration: Define two distinct scenarios
export const options = {
  scenarios: {
    // Scenario 1: High volume signaling stress (Lightweight)
    signaling_storm: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '30s', target: 50 }, // Ramp up to 50 concurrent signalers
        { duration: '1m', target: 50 },  // Hold
        { duration: '30s', target: 0 },  // Ramp down
      ],
      exec: 'signalingStress', // Function to execute
    },
    // Scenario 2: Real media verification (Heavyweight)
    browser_media_check: {
      executor: 'constant-vus',
      vus: 2, // Only 2 concurrent browsers (CPU intensive!)
      duration: '2m',
      exec: 'browserMediaCheck', // Function to execute
      options: {
        browser: {
          type: 'chromium',
        },
      },
    },
  },
  thresholds: {
    'http_req_duration': ['p(95)<200'], // Auth API must be fast
    'ws_connecting_duration': ['p(95)<500'], // WS Handshake < 500ms
    'browser_connection_time': ['p(90)<2000'], // Media connected < 2s
    'checks': ['rate>0.99'], // 99% success rate
  },
};

const BASE_URL = 'https://staging-api.myrtcplatform.com';
const ROOM_URL = 'https://staging-app.myrtcplatform.com/room';

// --- SCENARIO 1: Protocol Stress (Signaling Only) ---
export function signalingStress() {
  // 1. Authenticate via REST
  const authRes = http.post(`${BASE_URL}/auth/login`, JSON.stringify({
    username: `loaduser_${__VU}`,
    password: 'password123'
  }), { headers: { 'Content-Type': 'application/json' } });

  check(authRes, { 'status is 200': (r) => r.status === 200 });
  const token = authRes.json('token');

  // 2. Connect to WebSocket Signaling
  const params = { headers: { 'Authorization': `Bearer ${token}` } };
  const wsUrl = `wss://signaling.myrtcplatform.com/ws?room=loadtest`;

  const res = ws.connect(wsUrl, params, function (socket) {
    socket.on('open', () => {
      // Simulate "Join Room" protocol message
      socket.send(JSON.stringify({ event: 'join_room', roomId: 'loadtest' }));
    });

    socket.on('message', (data) => {
      const msg = JSON.parse(data);
      // Basic heartbeat logic to keep connection alive
      if (msg.event === 'ping') {
        socket.send(JSON.stringify({ event: 'pong' }));
      }
    });

    // Hold connection open for 30 seconds to simulate a user in the room
    sleep(30);
    socket.close();
  });

  check(res, { 'status is 101': (r) => r && r.status === 101 });
}

// --- SCENARIO 2: Browser Media Check (Full Stack) ---
export async function browserMediaCheck() {
  const page = browser.newPage();

  try {
    // 1. Navigate to the WebRTC Application
    // Note: We use query params to auto-join and skip UI prompts if possible
    await page.goto(`${ROOM_URL}/loadtest?autoJoin=true`);

    // 2. Inject fake media streams (Crucial for headless environments)
    // Most modern WebRTC apps handle permissions, but Chrome needs args (see Docker section)

    // Wait for the "Join" button or the local video preview
    const joinBtn = page.locator('#join-conference-button');
    await joinBtn.waitFor({ state: 'visible' });
    await joinBtn.click();

    // 3. Wait for Connection State "Connected"
    // We inject a script to poll the RTCPeerConnection state
    // This assumes the app exposes the pc object or we find it in the DOM
    await page.waitForFunction(() => {
      // Accessing internal app state - this depends on your app structure
      // Often easiest to attach the PC to the window object in test builds
      const pc = window.myWebRTCApp.peerConnection; 
      return pc && pc.iceConnectionState === 'connected';
    }, { timeout: 15000 });

    // 4. Custom Metric: Measure time to connect
    // We can extract performance marks
    const connectionTime = await page.evaluate(() => {
        return window.performance.measure('connection_start', 'connection_success').duration;
    });

    // Add custom metric tracking here (pseudo-code)
    // connectionTimeTrend.add(connectionTime);

    // 5. Verify Media Flow (Stats API)
    // Check if bytes are actually being sent
    const bytesSent = await page.evaluate(async () => {
      const pc = window.myWebRTCApp.peerConnection;
      const stats = await pc.getStats();
      let sent = 0;
      stats.forEach(report => {
        if (report.type === 'outbound-rtp' && report.kind === 'video') {
          sent = report.bytesSent;
        }
      });
      return sent;
    });

    check(bytesSent, {
      'video bytes sent > 0': (v) => v > 0,
    });

    sleep(10); // Stay in call for 10 seconds

  } finally {
    page.close();
  }
}

"Checking" -> "Connected" -> "Completed". Alongside the states, show icons for "DTLS Handshake", "ICE Candidate Exchange", and "SRTP Key Derivation"."/>

Resource Constraints: The Cost of Chrome

The most common mistake engineers make when transitioning to browser-based testing is underestimating the CPU cost.

A standard k6 VU (protocol-level) consumes negligible memory (< 1MB) and CPU. You can run thousands on a laptop.
A headless Chrome instance, however, is a full operating system process. When encoding 720p video (even simulated), handling SRTP encryption, and processing the rendering pipeline (even if headless), a single Chrome instance can consume 0.5 to 1 full CPU core and 500MB+ of RAM.

If you attempt to spawn 50 browser users on an 8-core machine, the CPU will saturate immediately. The browser threads will starve, video encoding will lag, and your test results will show "network latency" that is actually just "local CPU starvation."

Sizing Rule of Thumb: Allocate 1 vCPU and 1GB RAM per concurrent Browser VU.

Building the Custom k6 Binary

To run the script above, standard k6 won't work. You must compile k6 with the xk6-browser extension.

Dockerfile for Hybrid Runner:

# Build Stage
FROM golang:1.21 as builder

# Install xk6 to build custom binaries
RUN go install go.k6.io/xk6/cmd/xk6@latest

# Build k6 with the browser extension
RUN xk6 build --with github.com/grafana/xk6-browser

# Runtime Stage
FROM debian:bookworm-slim

# Install Chromium dependencies (Critical for headless mode)
RUN apt-get update && apt-get install -y \
    chromium \
    ca-certificates \
    fonts-liberation \
    libasound2 \
    libatk-bridge2.0-0 \
    libnspr4 \
    libnss3 \
    lsb-release \
    xdg-utils \
    wget \
    && rm -rf /var/lib/apt/lists/*

COPY --from=builder /go/k6 /usr/bin/k6

# Environment variables to force Chrome behavior
# "fake-ui-for-media-stream" bypasses the camera permission prompt
# "use-fake-device-for-media-stream" generates a synthetic green video pattern
ENV K6_BROWSER_ARGS="no-sandbox,disable-setuid-sandbox,fake-ui-for-media-stream,use-fake-device-for-media-stream"

WORKDIR /home/k6
ENTRYPOINT ["k6"]

Distributed Architecture: Scaling the Swarm

Because of the high resource cost of browser testing, you cannot run a large-scale test from a single machine. You need a distributed architecture.

We utilize Kubernetes to orchestrate a fleet of k6 runners. This approach allows us to scale the "browser" scenario horizontally across dozens of nodes.

Kubernetes Job for Distributed Load:

apiVersion: batch/v1
kind: Job
metadata:
  name: webrtc-load-test
spec:
  parallelism: 10 # Spin up 10 Pods simultaneously
  template:
    spec:
      containers:
      - name: k6-runner
        image: my-registry/k6-custom:latest
        command: ["k6", "run", "/scripts/load-test.js"]
        # Pass segmentation info so each pod knows its slice of the work
        env:
        - name: K6_CLOUD_AGGREGATION
          value: "true"
        resources:
          requests:
            cpu: "2000m" # Guarantee 2 vCPUs per pod
            memory: "2Gi"
        volumeMounts:
        - name: test-script
          mountPath: /scripts
      restartPolicy: Never
      volumes:
      - name: test-script
        configMap:
          name: k6-scripts

CI/CD Integration: The Gateway to Production

Ad-hoc testing is useful; automated regression testing is vital. By integrating this into GitHub Actions, we ensure that no Pull Request degrades media quality.

GitHub Actions Workflow (.github/workflows/perf-test.yml)

name: WebRTC Performance Validation

on:
  push:
    branches: [ main ]
  schedule:
    - cron: '0 2 * * *' # Nightly full stress test

jobs:
  load-test:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout Code
        uses: actions/checkout@v3

      - name: Build Custom k6 (Cached)
        uses: actions/cache@v3
        with:
          path: ./k6
          key: ${{ runner.os }}-xk6-${{ hashFiles('go.sum') }}
        # (Insert build steps here if cache miss)

      - name: Run Protocol Sanity Check
        run: |
          ./k6 run scripts/protocol-check.js \
            --vus 10 --duration 30s \
            --out json=protocol_results.json

      - name: Run Browser Stress Test (Nightly Only)
        if: github.event_name == 'schedule'
        run: |
          ./k6 run scripts/hybrid-load.js \
            --env K6_BROWSER_HEADLESS=true \
            --tag test_id=${{ github.run_id }} \
            --out experimental-prometheus-rw

      - name: Upload Artifacts
        if: failure()
        uses: actions/upload-artifact@v3
        with:
          name: k6-results
          path: ./*.json

Observability and Failure Analysis

Running the test is half the battle; interpreting the failure is the rest.

Exporting Metrics
The --out experimental-prometheus-rw flag in the CI config above is critical. It pushes real-time metrics to a remote Prometheus instance. This allows you to visualize:

Signaling Latency: Time from WebSocket open to connected state.
ICE Failure Rate: Percentage of connections that stall at checking.
Media Latency: Time from connected to first byte received.

Interpreting Failures

High Signaling Latency, Low Media Loss: Your WebSocket server (Node.js/Go) is bottlenecked on CPU or Event Loop. The media server (SFU) is fine, but users can't get in.
Fast Signaling, High Media Packet Loss: Your SFU is overloaded. It can't encrypt/decrypt packets fast enough. Check the SFU's UDP buffer sizing and kernel network tuning (net.core.rmem_max).
ICE Connection Timeouts: Often a STUN/TURN issue. Your STUN server might be rate-limiting the massive influx of binding requests.

Conclusion: Systems-Level Validation

WebRTC is brittle. A small race condition in your SDP negotiation logic or a memory leak in your SFU's jitter buffer implementation can bring down a platform. Testing with HTTP tools gives you a false sense of security, validating only the "lobby" of your building while the "conference room" is on fire.

By adopting the hybrid approach—using k6 for the massive swarm of signaling traffic and xk6-browser for the high-fidelity media scouts—you gain the ability to validate the entire stack. This is not just testing; it is production insurance.

DEV Community

Breaking the Limits: Hybrid WebRTC Load Testing with k6 and xk6-browser

The WebRTC Testing Gap: Why HTTP Tools Fail

The Architecture: Hybrid Load Testing with k6

Scripting the Test: The Implementation

The Hybrid Script (`load-test.js`)

Resource Constraints: The Cost of Chrome

Building the Custom k6 Binary

Distributed Architecture: Scaling the Swarm

CI/CD Integration: The Gateway to Production

Observability and Failure Analysis

Conclusion: Systems-Level Validation

Top comments (0)

The WebRTC Testing Gap: Why HTTP Tools Fail

The Architecture: Hybrid Load Testing with k6

Scripting the Test: The Implementation

The Hybrid Script (load-test.js)

Resource Constraints: The Cost of Chrome

Building the Custom k6 Binary

Distributed Architecture: Scaling the Swarm

CI/CD Integration: The Gateway to Production

Observability and Failure Analysis

Conclusion: Systems-Level Validation

The Hybrid Script (`load-test.js`)