Flakiness isn't from your test framework

David Burns

Engineering leadership, Standards Specification Editor, Engineer, Selenium Committer, Speaker, Author and Blogger.

Published May 8, 2024

This week I saw Filip Hric share a post from Gleb Bahmutov, ex-principal engineer for Cypress, explaining that the way cypress works and not using transport layers like playwright or WebDriver based frameworks makes its tests less flaky.

That’s 100% not the reason your tests are flaky. I’m not going to lie, this shocked me that Gleb would say this as I’ve always thought of him as a good engineer after seeing his work on Cypress. The reason your tests are flaky is more down to how you interpret the UI and how the browser runs the code, or to put in a different way, you’re thinking about your synchronous steps of a test while they’re running in an asynchronous way. This mismatch leads to flakiness.

Now add in different browsers interpreting the code in different ways leads to more flakiness. This is why some frameworks don’t want to or can’t support different browsers.

Single threadedness of JavaScript

Cypress runs in the page that’s being tested. That means Cypress is hemmed in by the same-origin policy. It injects what it needs to the page. The problem is this means CPU has to swap in commands from different tasks. This could lead to less flakiness but it’s because it’s running much slower. Since there is no guarantee of ordering of commands between tests and the front end, the reduced flakiness is pure chance.

Selenium, when Jason Huggins created it, used this technique for automating the browser and Selenium moved away from it when we merged with WebDriver. Hugs have been calling this out for ever. It’s also the reason why you can’t do basic things like trusted events, iframes, or navigating between different origins.

Driving the browser from inside to outside, where outside is your webpage, is always going to give you a more realistic testing experience.

Transport layers

The transport layer for speaking to the browser doesn’t affect the flakiness. Its main benefit is scalability. If you need your tests to run in the same browser as the runner then you struggle to scale. Since Selenium’s main transport system is based off HTTP we know it’s highly scalable. Cypress tests are less scalable because it wanted to do everything on the browser.

but what about CDP?

What about it? It’s a chromium based protocol. Playwright uses it, Puppeteer uses it, and this might shock you, Selenium uses it. This is firstly how Edgedriver and chromedriver speak to the browser. So if you’ve used chromedriver at any point in the last 8+ years you’ve used CDP. Selenium can also speaks directly to the browser using CDP for some commands. We see this with their network intercept API and logging APIs are examples. Now, webdriverio also supports WebDriver and puppeteer through these APIs. So if we follow Gleb’s post, we need to move everything down to playwright/puppeteer part of his diagram.

The one downside to having to rely on CDP is you limited to chromium specific APIs. These are not stable by design which the Chromium team will tell you. It’s the reason why chromedriver/edgedriver needs to be updated with each browser release. Fortunately, Selenium Manager can auto update your drivers for you without you needing to worry. If you’re not using Selenium Manager you will have to update all your dependencies which you would have to do with Playwright or puppeteer. If you’re using the Selenium event driven APIs then you will have to update your selenium dependency.

As mentioned, it does limit us to chromium specific APIs which is why Selenium is working with Google, Apple, Mozilla, and a few other little companies to bring about the new Webdriver-bidi spec. When WebDriver-BiDi is out to all browsers then the requirements for updating with the browser will drop like they did with Firefox and geckodriver.

The puppeteer team is supporting WebDriver-BiDi. A debug protocol is not the best for automation as it relies heavily on browser state. It’s great for debugging but not automation. Since this work is happening in the open, we are happy for the playwright and cypress team to come collaborate with us.

How do we solve flakiness then?

Auto-waiting, minimizing what your tests are doing, and root cause analysis of failures. Webdriver.io, Nightwatch, playwright, puppeteer, and cypress have the auto waiting. All at different levels. Opinions differ on what should be waited for but they are aiming for the same result.

Selenium has had auto waiting in a minimal wait with explicit and implicit waits. People can struggle with them. These waits are opinionated like the above. So… we’re down to which opinions we like. So… if you’re learning a tool for a CV… well it’s just an opinion that’s different. How to do web testing, and understanding it, that is the super power you need.

As an aside, when I was at Mozilla, automattic came to us saying they wanted puppeteer support as they were dropping selenium because of flakiness… and then the flakiness was still happening with puppeteer

Flakiness has been around for a long time. Simon Stewart wrote a good blog post about this problem 15 years ago and how to solve some of them. This is not a new problem and someone telling you their framework will solve it is lying.

Unfortunately, there will still be some flakiness… and it’s not your fault.

Front-end frameworks hate testers

These frameworks make flakiness equal when it comes to testing.

Browsers have had to put a lot of effort to improving the speed of rendering and painting to handle these because of the constant moving in and out of the DOM. These asynchronous changes to the DOM the way your tests run. In selenium you’ll hit a StaleElementReference. These are painful but auto waiting can solve it for you easily. Batteries included systems, like NightwatchJS, webdriverio, playwright, and cypress try make how this works opaque to the end user.

Asynchronicity is hard

As I alluded to above, we don’t think of tests in the same way we think of rendering front-end tests. When we’re wanting to take data off a server and render it, it’s all happening asynchronously due to how JavaScript and fetch APIs work. The move to server-side rendering again is not going to solve this either… it’s just moving the heavy lifting around. Yes with promises we can write code that looks synchronous but that’s just for where that bit of code is. A slow response from a server can impact our tests. Having your tests and front end competing in the same thread won’t make your tests less flaky other than by sheer luck.

Finally… work in the same world as your users

So is there any benefit to running your tests in Electron? Well… how many of your users use your site with Electron? Probably the same amount as would use playwright-Firefox or playwright-WebKit. It’s a number that is very close to 0.

I’ve talked about this many times about having your tests work. It’s important to have a number of tests running in the environments where our users are. I’ve got examples in my talk from last year. Different environments, like mobile versus desktop, can also lead to different reasons for flakiness.

So… when picking a framework, pick one that works well for you. It should be able to do a Login Test easily. If it can’t then it shouldn’t be used.

John Pourdanis

🤷🏼 Professional Nothing Knower | Quality Engineering in Software Industry

In my opinion, the problem starts when you are using the browser (an external dependency that you can’t control) to automate tests to the most higher level of the pyramid. A hybrid approach to use the api to bring your web app to the state you want to do your tiny web test may solve the problem. Also if you have too much logic to ui ( unless you need it ), it’s an architectural problem. Web UI should play the role of the consumer of the API to the most of cases , so API testing should test all of your logic and do minimum web ui to simulate the real user experience and check the integration between ui and api didn’t broke. Finding the right balance to your test suite is the key against flakiness.

Matt Mayhew

Full Stack Tester and experienced Software Engineering Leader

This is an awesome article, thank you for spelling this out!

Sebastian Stautz

☯️🕊️☮️FIX THE SYSTEM!🛠️Testing, Ludo, Economy🧭👨👩👧👧🌐🤘🥋🏹🤺 (ENFP) #Contextdriven

Great article! I made this picture to show the difference between how humans and machines perceive GUIs. A GUI is not an interface made at first for machines and therefore machines will always have problems interacting with them.

Jani Mikkonen

Build & QA Engineer

Auto-waiting in Selenium aint that hard to implement.. albeit the "biggest" obstacle I had was with instrumenting the page. At that time at least there was no decent approach but it's still doable. Inject code that monitors all the events async and animation events ain't that big piece of code and with Selenium's EventFiringWebElement I got things to pretty decent stage and used it successfully for auto waiting within robotframeworks SeleniumLibrary. PS. Article was well rounded - this was not an argument :)

Flakiness isn't from your test framework

David Burns

Engineering leadership, Standards Specification Editor, Engineer, Selenium Committer, Speaker, Author and Blogger.

Single threadedness of JavaScript

Transport layers

but what about CDP?

How do we solve flakiness then?

Front-end frameworks hate testers

Asynchronicity is hard

Finally… work in the same world as your users

More articles by this author

Others also viewed

My 10 REST Commandments

Day 1: Everything You Need to Know About this in JavaScript

Creating a Simple Implementation of a Promise with .then Chaining

HackTheBox Flag Command Writeup | HackTheBox Walkthrough

Some JS Features You Should Know - Part 1

Basic explanation about the next.config.js file

Memory leak React

@pro-script/as-is is a game changing library. Use cases 2.

5 Key Features of Playwright in Modern Testing

Understanding Call Stack

Explore content categories

Single threadedness of JavaScript

Transport layers

but what about CDP?

How do we solve flakiness then?

Front-end frameworks hate testers

Asynchronicity is hard

Finally… work in the same world as your users

The Pitfalls of Code Coverage

Mar 19, 2024

Free at last! iOS is free at last! At least in the EU.

Feb 1, 2024

How lack of testing can ruin lives - The technology problems behind the Post Office Scandal

Jan 9, 2024

The Hidden Cost of Ignoring Browser Compatibility: Why Websites are Losing Business

Dec 14, 2023

Accessibility Testing is important and easy to do!

Nov 2, 2023

Keeping Up with the Fast Pace of Browser Releases for Software Testers

Sep 26, 2023

You wouldn’t test a car once it’s fully assembled, why do that to your app?

Aug 16, 2023

NightwatchJS V3

Jul 4, 2023

Selenium Manager - The best tool from Selenium that you can forget about!

Jun 7, 2023

The Login Test

May 8, 2023

Others also viewed

My 10 REST Commandments

Day 1: Everything You Need to Know About this in JavaScript

Creating a Simple Implementation of a Promise with .then Chaining

HackTheBox Flag Command Writeup | HackTheBox Walkthrough

Some JS Features You Should Know - Part 1

Basic explanation about the next.config.js file

Memory leak React

@pro-script/as-is is a game changing library. Use cases 2.

5 Key Features of Playwright in Modern Testing

Understanding Call Stack

Explore content categories