Yesterday’s massive AWS outage once again highlights why test automation and sound test engineering practices are essential
Reflection on Quality Engineering
The global news headline yesterday was AWS outage.
I noticed yesterday’s outage even before it hit the news. I was writing a new article on Substack when I got an unexpected error. I tried posting it on Medium—same problem. That’s when I realized it had to be a major infrastructure provider issue.
Prolonged outages like this are rarely caused by hardware faults—those are easy to identify and replace. They almost always come from software errors. And there are plenty of examples:
Did CrowdStrike (and Microsoft) Learn the Lesson? I Don’t Think So
Effective E2E Test Automation Could Have Saved the Sonos CEO’s Job
Amazon has done quite well—otherwise, AWS wouldn’t have grown to its current scale. Still, the public (and the countless businesses relying on its services) rightfully expect better.
It’s a good time to reflect on what truly ensures software quality.
Software quality depends on testing, and a solid testing engineering process is only possible through automation.
Automation, in this context, has two key aspects:
Verification – deliberately checking for issues introduced by new changes.
Health Check Monitoring – continuously detecting problems in the background, regularly. For automation in software health checks, see my article, “My Simple Approach to App Health Check with Automation”.
Once the two steps above are implemented and consistently maintained, the next level is to add various edge scenarios to accelerate the disaster recovery process.
I am sure all platform providers have the above scheme, the question is how well is implemented.
Human factors play a crucial role — we often overlook routine processes that prevent rare but serious dangers. Only a few CEOs, like those at JAL, consistently invest in disaster prevention efforts.
“Everyone onboard the Japanese airliner — crew members and passengers — followed the process for a textbook 90-second evacuation.”
“Miracle at Haneda: how cabin crew pulled off great escape from Japan plane fire” — The Guardian
“‘It was a miracle’: How passengers escaped a JAL fireball in Tokyo” — Reuters
If you think about it, shouldn’t every airline conduct regular training to prevent incidents like the one above? However, most airlines wouldn’t go as far as JAL did — such training is costly. That’s why Reuters called it a ‘miracle.’
Software testing is like insurance — when disasters don’t occur, some managers may question its value and ask, ‘What’s the point of doing this?’
To wrap this up, a real test automation engineer is extremely rare, even at tech giants.
“In my experience, great developers do not always make great testers, but great testers (who also have strong design skills) can make great developers. It’s a mindset and a passion. … They are gold”.
- Patrick Copeland, Google Senior Engineering Director, in an interview (2010)“95% of the time, 95% of test engineers will write bad GUI automation just because it’s a very difficult thing to do correctly”.
- this interview from Microsoft Test Guru Alan Page (2015), author of “How we test software at Microsoft”“Testing is harder than developing. If you want to have good testing you need to put your best people in testing.”
- Gerald Weinberg, software legend, in a podcast (2018)
Besides the mindset issue (software teams often favor developers far more than testers), software test automation courses are rarely offered at universities. This specialized knowledge can only be gained through self-learning, professional training, or coaching. Only highly motivated individuals who are willing to invest in their professional growth can truly master it. For those who do, the rewards are substantial. I’m a living example—today, as a solopreneur, I owe much of my accomplishments to test automation.
Related reading:
My new book: “End-to-end Test Automation Anti-Patterns”
Exclusive Benefits of Upgrading to a Paid Subscription to the AgileWay Newsletter






