Ilya Ploskovitov

Posted on Feb 3 • Originally published at chaosqa.com

Playwright & Chaos Engineering: 3 Ways to Break Your UI in 10 Lines of Code 🧨

#testing #automation #playwright #python

"The tests are green, but production is down."

We’ve all been there. Your CI/CD pipeline looks like a Christmas tree (all green), yet 5 minutes after deployment, the support tickets start rolling in. Why? Because we tend to test only the "Happy Path." In the real world, users enter elevators (network loss), backends have database deadlocks (500 errors), and low-end devices struggle with heavy JS (CPU race conditions).

Here are 3 simple ways to inject chaos into your Playwright tests using Python and TypeScript without any external dependencies.

1. The "Kill the Backend" Scenario (500 Error Injection)

What happens if your billing API fails? Does your UI show a "Retry" button, or does it hang forever?

Scenario: Intercept a critical API call and return a 500 Internal Server Error.

Python Code

def test_billing_failure(page):
    # Intercepting the payment endpoint
    page.route("**/api/v1/billing/pay", lambda route: route.fulfill(
        status=500,
        content_type="application/json",
        body='{"error": "Internal Database Error"}'
    ))

    page.goto("/checkout")
    page.get_by_role("button", name="Pay Now").click()

    # Assert that the UI handles the crash gracefully
    expect(page.locator(".error-message")).to_be_visible()

TypeScript Code

test('handle billing failure', async ({ page }) =&gt; {
  await page.route('**/api/v1/billing/pay', route =&gt; route.fulfill({
    status: 500,
    contentType: 'application/json',
    body: JSON.stringify({ error: 'Internal Database Error' }),
  }));

  await page.goto('/checkout');
  await page.getByRole('button', { name: 'Pay Now' }).click();

  await expect(page.locator('.error-message')).toBeVisible();
});

2. The "Elevator Effect" (Sudden Offline Mode)

Users move. Networks drop. If your app is an SPA, losing connection mid-session can lead to corrupted local states.

Scenario: Start a file upload and cut the internet connection.

Python Code

def test_upload_interruption(page, context):
    page.goto("/upload")
    page.get_by_label("File").set_input_files("heavy_video.mp4")

    # Chaos: Go offline instantly
    context.set_offline(True)

    # Expect a "Resume" button or "Connection lost" banner
    expect(page.get_by_role("button", name="Resume")).to_be_visible()

    context.set_offline(False) # Restore network

TypeScript Code

test('recovery on network loss', async ({ page, context }) =&gt; {
  await page.goto('/upload');
  await page.getByLabel('File').setInputFiles('heavy_video.mp4');

  await context.setOffline(true);

  await expect(page.getByRole('button', { name: 'Resume' })).toBeVisible();

  await context.setOffline(false);
});

3. The "Old Phone" Race Condition (CPU Throttling)

Async bugs often hide behind the speed of your developer laptop. By slowing down the CPU, you change the execution order of scripts and catch elusive race conditions.

Python Code

def test_race_condition(page):
    # Slow down CPU by 6x using Chrome DevTools Protocol (CDP)
    client = page.context.new_cdp_session(page)
    client.send("Emulation.setCPUThrottlingRate", {"rate": 6})

    page.goto("/heavy-dashboard")
    page.get_by_role("button", name="Load Stats").click()

    # Assert that the status eventually becomes 'Ready'
    expect(page.locator("#status")).to_contain_text("Ready", timeout=10000)

TypeScript Code

test('catch race conditions', async ({ page }) =&gt; {
  const client = await page.context().newCDPSession(page);
  await client.send('Emulation.setCPUThrottlingRate', { rate: 6 });

  await page.goto('/heavy-dashboard');
  await page.getByRole('button', { name: 'Load Stats' }).click();

  await expect(page.locator('#status')).toContainText('Ready', { timeout: 10000 });
});

💡 Pro Tip: When to Run These?

Don't run chaos tests on every PR. They are inherently more complex and can be "flaky" if your timeouts aren't tuned.

Best Practice: Add them to a nightly or pre-release suite.

Limit: Remember that CDP (CPU Throttling) only works on Chromium-based browsers.

Wrapping Up
Resilience is a feature. If you only test for success, you're only doing half of your job as a QA Engineer. Break your UI before your users do.

I’ve written a more detailed deep-dive on Resilience Strategy & CI/CD integration on my new blog. Check it out at ChaosQA.com.

Top comments (1)

myroslav mokhammad abdeljawwad • Feb 9

This is a great example of testing the system, not just the UI.

The backend-kill and offline scenarios especially resonate — most bugs I’ve seen in production weren’t logic errors, they were “the backend is slow / dead / half-alive and the frontend didn’t expect it.” Intercepting real routes instead of mocking everything feels like the right balance between realism and control.

CPU throttling is an underrated one too. A lot of race conditions simply never show up on a fast dev machine, but the moment you simulate an older device or a busy browser, async assumptions fall apart.

I also appreciate the point about not running chaos tests on every PR. Treating resilience as a pre-release concern rather than a gating unit test matches how these failures actually behave in the wild.

Out of curiosity — have you seen teams push this further by pairing Playwright chaos tests with backend fault injection (e.g., delayed responses, partial writes) so the UI and API are stressed together? Feels like that’s where these tests really start paying dividends.

Good stuff — this is the kind of testing content more teams should be practicing, not just reading about.