Flaky tests are one of the most quietly expensive problems in modern engineering teams.They don't break your product.
They don't cause incidents. But they silently drain developer time, erode trust in CI results, and make every red build feel like a guessing game.
This is the story of how Franklin, a financial platform built for e-commerce businesses, diagnosed and solved exactly that problem.
A Bit of Background
Franklin provides payment cards with high spend limits, expense management tooling, and cashback on card expenses. Because they operate in the financial domain, release confidence isn't optional, it's the baseline.
Their stack: Playwright for E2E tests, GitHub Actions for CI.
For reporting, they relied on Currents. It worked early on. But as their test suite scaled, cracks started to show.
What Was Actually Breaking
When a CI run failed, the team had no structured way to understand why.
The workflow looked something like this:
- Open GitHub
- Navigate to the CI run
- Manually inspect failures
- Try to determine: is this flaky, or is this real?
- Rerun tests when uncertain
- Repeat
"We were using Currents earlier, but as our test suite grew, it felt expensive and still lacked the visibility we needed."
— Johan Frølich, CTO & Co-founder, Franklin
Four specific problems were compounding over time:
1. No historical context : every failure looked like a new failure, even if it had happened 20 times before
2. No flaky test tracking :the team couldn't distinguish unstable tests from genuine regressions
3. No intelligent classification : debugging meant manual inspection, not structured analysis
4. Rising cost : the reporting layer was getting more expensive as usage grew, without proportional value in return
The Structural Fix
Franklin restructured their CI reporting layer by adopting TestDino. The pipeline itself stayed the same : code push → GitHub Actions → Playwright runs, but now had a reporting and analysis step with flaky detection, quarantine, and smart retry built in.
Franklin's updated CI pipeline: Code Push → GitHub Actions → Playwright → Flaky Detection & Retry → Stable CI
What changed in practice:
Before: Start from GitHub. Navigate manually. Guess whether a failure is real.
After: Start from a centralized dashboard. Review failures with full history. Use flaky test detection to decide what needs attention.
"With TestDino, it became much easier to understand failures and quickly tell whether something was flaky or a real issue."
— Johan Frølich, CTO & Co-founder, Franklin
Four Specific Improvements
1. Centralized Dashboard
Instead of jumping between CI artifacts and logs, the team now starts every investigation from one place. Less context switching, faster triage.
2. Flaky Test History and Tracking
Flaky tests became visible and actionable rather than speculative. The team could see which tests had a pattern of instability and prioritize stabilization accordingly.
3. Branch and Environment Mapping
Test behavior is now visible per branch and per environment. This made it significantly easier to understand where instability was originating: a staging environment, a specific PR, a particular configuration.
4. GitHub and Linear Integration
Failures surfaced directly in the tools developers were already using. Less tool-hopping, faster response.
"The interface is intuitive and easy to use, which really helps when investigating failed tests day to day."
— Johan Frølich, CTO & Co-founder, Franklin
The Results
~40% reduction in Playwright CI reporting costs after switching
- Failure investigation : Manual inspection with no context → Centralized dashboard with full history
- Flaky test tracking : Hard to identify → Visible and actionable
- Environment visibility : Difficult to compare → Clear branch/environment mapping
- Debugging effort : Repetitive and slow → Reduced by 10–20%
- Reporting costs : Higher cost tool → 30–40% cost savings
- Developer experience : Fragmented and noisy → Clear and intuitive
"We estimate around 10 to 20% time saved on test and error-related work, and roughly 30 to 40% cost savings compared to our previous reporting setup."
— Johan Frølich, CTO & Co-founder, Franklin
The Underlying Lesson
The core problem Franklin had wasn't a tooling problem ,it was a visibility problem.
When developers can't quickly understand whether a test failure is real or flaky, they do one of two things: they waste time investigating noise, or they start ignoring failures altogether. Neither is sustainable, and both erode the value of your CI pipeline over time.
The fix wasn't rewriting tests. It wasn't changing frameworks. It was getting structured, historical context into the failure investigation workflow ,so that every red build came with enough information to act on.
That's what reduced debugging time and brought costs down. Not the tool itself, but what the tool made visible.
Worth Thinking About for Your Own Setup
A few questions worth asking about your current CI pipeline:
- When a test fails, how long does it take to determine if it's flaky or real?
- Do you have historical data showing which tests are consistently unstable?
- Are failures surfaced where developers already work, or do they require tool-switching to investigate?
- Is your reporting layer cost-justified relative to the visibility it actually provides?
If the answers are uncomfortable, you're probably carrying more CI overhead than you need to.
📖 Full case study with additional detail: Franklin × TestDino


Top comments (0)