Antoine Dubois

Posted on Jun 11

The Modern Test Automation Stack Is Not Just Playwright vs Selenium Anymore

#testing #qa #ai #automation

There was a time when choosing a test automation stack mostly meant choosing between Selenium and whatever newer tool people were excited about that year.

That conversation feels too small now.

Modern test automation is not just about whether a browser can click a button.

It is about whether your team can keep tests alive after the product changes, whether CI failures are trustworthy, whether your tool can handle login, emails, SMS, APIs, test data, roles, sessions, preview environments, mobile layouts, and all the boring things that turn a nice demo into a maintenance job.

That is why I like thinking about test automation in terms of ownership.

Not just:

Can this tool create a test?

But:

Can this team still trust, debug, and maintain this suite six months from now?

I went through the guides on Test Automation Tools and grouped them into a more practical reading path.

Start with the business case

Before comparing tools, it helps to understand what automation is supposed to save.

A lot of teams talk about ROI in vague terms. "We want to automate regression" sounds good, but leadership usually needs a more concrete answer:

How many manual testing hours are being saved?
How many release delays are being avoided?
How many defects are being caught earlier?
How much time is being lost maintaining the automation itself?

A good place to start is the Test Automation ROI Calculator.

The useful thing about ROI thinking is that it forces you to count hidden costs. A free open-source framework is not free if a senior engineer spends a week every month fixing selectors, test data, CI config, reports, and flaky failures.

That connects directly to the Flaky Test Cost Calculator, because flaky tests are one of the easiest automation costs to underestimate.

A flaky test does not just waste the time needed to rerun it. It creates a decision every time CI goes red:

Is this a real bug?
Should we block the release?
Who has enough context to debug it?
Can we ignore it this time?
Should we quarantine it?

Once that happens often enough, people stop trusting the pipeline.

And when people stop trusting the pipeline, automation becomes theater.

Tool selection is really maintenance selection

A lot of tool comparisons focus on features.

That is fine, but the better question is usually maintenance.

The article The Real Cost of Maintaining Locator-Heavy UI Tests gets into one of the biggest long-term problems in UI automation: locators.

Selectors look like a small detail when the suite is new. Then the frontend changes. A button moves. A label changes. A CSS class gets regenerated. A component library update changes the DOM. Suddenly the test suite becomes a second product that also needs constant care.

That is why these comparison pieces are useful:

This is not really about declaring that one approach is always better.

Code-first tools like Playwright, Cypress, and Selenium can be great when the team has the skill and discipline to maintain the stack. But that also means the team owns everything around the framework: fixtures, helpers, selectors, reports, environments, retries, data setup, CI behavior, and debugging workflow.

A managed or low-code platform can make more sense when the goal is broader test ownership, especially if QA, product, or support teams need to inspect and update flows without turning every change into a developer ticket.

No-code and low-code testing are mostly about who owns the tests

No-code testing sometimes gets dismissed too quickly.

The weak version of no-code is record-and-playback that creates brittle tests nobody trusts.

But the useful version is different. It gives teams an editable test model, lowers the barrier for test creation, and reduces the amount of custom framework work needed to cover business flows.

These guides are good for that part of the evaluation:

The practical question is not "Can non-technical people create tests?"

The better question is:

Can the people closest to the regression risk contribute to the automation without making the suite worse?

That distinction matters.

A manual QA person who understands the product deeply might be better positioned to define a critical regression flow than a developer who only sees the implementation. But the tool still needs guardrails. Otherwise, the suite can become a pile of duplicated, fragile, unclear flows.

Good low-code tools should not hide complexity in a way that makes debugging impossible. They should expose enough structure that tests remain understandable, reviewable, and maintainable.

Browser coverage is still a real problem

Browser testing is one of those topics people assume is mostly solved.

It is not.

Chrome on a developer laptop is not the same thing as Safari on macOS, Edge in an enterprise environment, Firefox in CI, or a mobile viewport with different rendering behavior.

For browser coverage, these guides are useful:

The key is to avoid treating browser coverage as a giant checkbox.

You probably do not need every test on every browser. You need a smart browser matrix based on risk:

critical flows across major browsers
layout-sensitive flows across responsive breakpoints
payment, login, and onboarding flows in realistic environments
a smaller smoke suite for fast CI feedback
deeper regression runs where the cost is justified

Testing everything everywhere sounds responsible, but it can become slow, expensive, and noisy.

The goal is confidence, not maximum theoretical coverage.

CI failures need a debugging workflow, not just reruns

CI is where test automation gets real.

A suite that passes locally but fails in CI is not necessarily a bad suite. But if nobody can quickly explain why it failed, it becomes a release problem.

These two guides are especially useful:

A good CI test gate should answer a few questions quickly:

Did the product break?
Did the test break?
Did the environment break?
Is the failure reproducible?
Is this blocking or informational?
Who owns the fix?

Too many teams treat all red builds the same. That is how release gates become noisy and political.

A reliable gate needs tiers. Some tests should block releases. Some should warn. Some should run nightly. Some should be quarantined only temporarily. The release process should reflect risk, not just test count.

The guide Why Test Suites Fail Only in Preview Environments: A Debugging Guide for Modern Web Teams is also worth reading because preview environments create their own strange category of failures.

Preview environments often differ from production in small but important ways:

seeded data
auth configuration
feature flags
CDN behavior
asset caching
domain and cookie rules
deployment timing
third-party integrations

A test failure in preview might be a product bug, but it might also be a deployment or environment issue. You need evidence before you guess.

Flaky UI tests usually come from boring causes

Flakiness has a mythology around it, but the causes are usually boring.

Unstable selectors. Shared test data. Bad waits. Race conditions. Network timing. Environment drift. Overlapping parallel tests. Animations. UI state that was not reset properly.

The guide Flaky UI Tests: Root Causes, Fix Patterns, and Prevention is a good overview.

The important thing is to stop treating flakiness as random.

Most flaky tests are telling you that something is uncontrolled:

the page state
the data state
the browser state
the environment
the timing model
the selector strategy

Once you identify what is uncontrolled, the fix becomes less mysterious.

Hard UI surfaces need to be evaluated before buying a tool

A clean login page is not a good tool evaluation.

Any test automation tool can look good on a simple login form.

The real evaluation should include the annoying parts of your app:

iframes
Shadow DOM
dynamic components
multi-role flows
session isolation
API-driven setup
test data reset
mobile breakpoints
checkout flows
email or SMS verification
third-party widgets

These guides cover those harder surfaces:

The self-healing locators topic is especially interesting.

Self-healing can be useful, but it should not be magic. If a tool changes a locator automatically, the team should be able to understand what changed and why. Otherwise, you may reduce maintenance in one place while creating a trust problem somewhere else.

Automation needs debuggability as much as it needs resilience.

End-to-end testing is bigger than browser automation

Browser automation is only part of end-to-end testing.

A real user journey may include:

sign-up
email verification
SMS OTP
checkout
API side effects
database state
file uploads
downloads
notifications
webhooks

That is why the Best End-to-End Testing Tools guide is useful.

It pushes the conversation past "can this tool click through the UI?" and toward "can this tool validate the workflow the business actually cares about?"

The same applies to broader comparison articles like:

Small QA teams especially need to be careful here.

They usually do not have unlimited time to maintain a custom framework, debug flaky test infrastructure, and build missing integrations around a browser library. The tool choice needs to match team capacity, not just technical preference.

AI testing is becoming part of regression strategy

AI is changing test automation, but not in the simplistic "AI writes all the tests and everyone goes home" way.

The more realistic version is that AI helps with test creation, locator recovery, coverage suggestions, and faster maintenance. But teams still need review, structure, and clear release criteria.

These two articles are good for that topic:

The second one is especially relevant as more products add AI features directly into the UI.

LLM-powered features are awkward to test because the output is not always deterministic. Exact text assertions become brittle. Prompt changes can alter tone, format, ordering, or length without necessarily breaking the user experience.

So the testing strategy has to change.

Instead of testing every generated sentence literally, teams need to define contracts:

required sections
safe rendering
length boundaries
fallback behavior
loading and streaming states
error handling
business-level expectations

AI does not remove the need for testing. It just changes what needs to be tested.

A practical way to choose your stack

After going through all of these guides, I think a useful decision process looks like this:

1. Define the flows that actually matter

Do not start with tools.

Start with the flows that would hurt the business if they broke:

signup
login
billing
checkout
onboarding
account changes
password reset
data import
critical reports
notifications

Then decide what kind of testing each flow needs.

2. Separate browser testing from workflow testing

Some tests only need browser automation.

Others need API setup, email validation, SMS verification, database checks, or cross-user behavior.

Those are different problems. Do not pretend one simple browser script covers all of them.

3. Estimate maintenance honestly

Ask who will update tests after UI changes.

If the answer is "only one engineer who is already busy," that is a risk.

If the answer is "QA can update common flows safely," that changes the tool requirements.

4. Evaluate on ugly cases

Do not buy a tool after a polished demo.

Try it on the messy parts:

flaky pages
dynamic elements
iframes
Shadow DOM
real auth
real test data
preview environments
CI failures
mobile layouts
multi-role workflows

That is where you learn the truth.

5. Measure trust, not just coverage

A test suite with 2,000 tests can still be useless if everyone ignores the failures.

Track things like:

failure rate
false failure rate
rerun frequency
time to debug
time to update after UI changes
number of tests quarantined
release delays caused by automation

Those numbers tell you whether the suite is helping or slowing the team down.

Final thought

The test automation market is noisy because every tool can show a nice demo.

The harder question is what happens after the demo.

Who maintains the tests?

Who debugs failures?

Who owns the data?

Who fixes the selectors?

Who decides whether CI is red because the product broke or because the test suite is having a bad day?

That is where the real cost shows up.

The best test automation stack is not the one that creates the first test fastest. It is the one your team can keep trusting as the product, browser landscape, CI pipeline, and release process keep changing.

Top comments (1)

TopStar AI • Jun 11

This is an excellent perspective on test automation ownership and maintainability. I really appreciate how you highlight that the real cost and risk come not from writing tests, but from maintaining, debugging, and trusting them over time. Your emphasis on separating browser automation from workflow testing, evaluating tools on messy real-world cases, and measuring trust metrics instead of just coverage is especially valuable for teams scaling testing efforts.
I’d love to collaborate and explore ways to enhance test resilience with AI-assisted maintenance, self-healing locators, and CI observability, ensuring teams can trust automation while reducing technical debt. Sharing strategies on flaky test detection, environment drift handling, and multi-role workflow coverage could be very impactful.
Would you be open to discussing collaboration or prototyping approaches for robust, scalable, and trustworthy test automation workflows?