Flaky Tests in E2E Test Automation Clarified — Part A: Flaky Tests are Common but People tend to Neglect them
Before addressing the flaky tests, we need to treat them honestly and seriously first.
In this series:
Part A: Flaky Tests are Common but People tend to Neglect them
Part B: Test Suite Size Matters *
Part C: Causes 1 - 3
Part D: Causes 4 - 7 (upcoming)
Part E: Solutions 1-3 (upcoming)
Part F: Solutions 4 -7* (upcoming)
Part G: FAQ (upcoming)
In 2018, I watched the best presentation on CI/CD, “Continuous Integration at Facebook”. The presenter showed three characteristics of CI at Facebook:
The first item, “High-Signal”, means ideally the automated test failures are mostly genuine, not false positives which are test failures due to non-application reasons. A common cause for false positives is ‘flaky tests’.
In the article, I will just focus on the flaky tests in the most common form of functional testing: end-to-end web test automation.
Please note, Facebook listed “High-Signal” as the №1 issue in its CI/CD process. IMO, this is correct for real CI/CD.
When false positives are common, the team will stop trusting the test results, testing process, and the testing team. Without trust, there is really nothing then. That’s why we often see ‘fake automated testers’ sitting in the corner and the team totally don’t care about their work, as manual testing is the one they trust.Many CI/CD failures are due to people naively thinking automated tests should just pass.
A ‘flaky’ test, according to Google, is an automated test that “exhibits both a passing and a failing result with the same code.” (Google Testing Blog)
I was happy to see this Facebook CI/CD slide because I have been saying the same thing all these years, however, I never heard any agile coach / Scrum Master / Test Automation Architect / DevOps Engineers mentioning this, prior to 2018. I was puzzled, ‘high signal’ (or highly-stable test execution) is so obviously crucial to any CI/CD solution, why few talked about it?
Table of Contents:
· Flaky Tests are Common
∘ 16% of tests are flaky at Google
∘ 72% in a Tricentis Report
· What’s the flakiness rate in your automated test suite?
Flaky Tests are Common
Many software teams have no idea of their flaky test rate because they gave up end-to-end test automation (and turned back to manual testing) before that. Only a small percentage can reach up to a suite of 50 tests, and then find out it was too hard to maintain due to many issues, flaky tests certainly is the main one.
16% of tests are flaky at Google
“Almost 16% of our tests have some level of flakiness associated with them! This is a staggering number; it means that more than 1 in 7 of the tests written by our world-class engineers occasionally fail in a way not caused by changes to the code or tests.” — (Google Testing Blog, 2016)
This is the figure at arguably the top software company in the world.
72% in a Tricentis Report
In the article “As Test Automation Matures, So Do False Positives”, by Wolfgang Platz, “From the cradle to the grave, most test automation efforts are plagued by false positives: tests that fail even though the application is behaving correctly. This is a tremendous pain. Tricentis has found that a staggering 72 percent of test failures are actually false positives.”
What’s the flakiness rate in your automated test suite?
Now you have seen “16% at Google” and “72%” in the Tricentis article, what is yours?
Before you answer, let me share a real story.
At one consulting job (in a large company, with over 500 IT staff), one test lead told me that the core product team achieved daily Continuous Testing, running 980 end-to-end (via UI) tests every night (and passing).
It turned out to be a complete lie. 70+% tests failed every night, but under the guidance of the chief ‘agile coach’, the ‘DevOps’ team instructed the CI server to return GREEN regardless!
Can you guess how the story ends? The test lead who exposed the lie (even though I warned him not to) was forced to leave the company. Here’s our exchange below, after the event.
In the next articles in the series, I will talk about
Test suite size matters (coming)
For many test automation engineers, the challenges of handling flakiness might not be apparent initially. This is often because their test suites are relatively small—perhaps only a dozen tests. However, as the number of test steps increases, the likelihood of flakiness also rises quickly. Achieving a "GREEN" status, where all tests pass, becomes increasingly difficult, with the complexity growing almost exponentially.
The cause of flaky tests (1-3, 4-7) (coming)
mostly not the framework, but rather human issues (skills and mindset)Solutions (coming)
so-called ‘auto-wait’ and cypress-style retry are wrong.
Related reading:






