Laws of Software Development: Murphy’s Law in Software Testing

Anything that can go wrong will go wrong

Aug 11, 2024

∙ Paid

This article is one of the “Laws in Software Development” series:

80/20 Rule
Broken Window Theory
Parkinson’s Law: “work expands so as to fill the time available for its completion”
Sturgeon’s law: “ninety percent of everything is crap”
Murphy’s law: “Anything that can go wrong will go wrong”
The 10,000-Hour Rule: “The key to achieving true expertise is simply a matter of practicing”
Brooks’ Law: “Adding manpower to a late project makes it later.”
Hosftadter’s Law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.”
Conway’s Law: “Any piece of software reflects the organizational structure that produced it.”
…

Table of Contents
· Murphy’s law
∘ 1. IT executives often neglect the importance of quality.
∘ 2. Lacking E2E Test Automation
∘ 3. Lacking a real Continuous Testing process.
∘ 4. Need to understand the limits of Manual Testing
· My own experience with Murphy’s Law

Murphy’s law

Murphy’s law is an adage or epigram that is typically stated as: “Anything that can go wrong will go wrong.” [source: Wikipedia]

Software testers like to quote Murphy’s Law to support their work.

On 2024–07–19, Windows Blue Screen of Death causes a massive outage globally.More detail in this video.

Some people still call July 19th “International Blue Screen Day”.

https://twitter.com/dwisiswant0/status/1814269588605170115

The cause, according to Microsoft, is “due to a recent CrowdStrike Update”.

Some people still call July 19th “International Blue Screen Day”.

The cause, according to Microsoft, is “due to a recent CrowdStrike Update”.

“Crowdstrike is a cybersecurity company valued at $80B, and the market leader for Windows endpoint protection, with around 22% market share. So 1 out of 5 businesses operating Windows machines use them.
What seems to have happened is that Crowdstrike pushed an innocent-enough software update… to all Windows machines, globally, pretty much a the same time. Crowdstrike’s software operates at the kernel level: and this update crashes Windows.
Normally, when buggy code is pushed to production: you’d “just” revert this change, and push the previous version (or code that works correctly,) and when clients get this patch: they are restored. But not in this case: because these machines are non-functional.
The fix — as advised by Crowdstrike — is manual and time consuming, and needs to be repeated for every single Windows machine impacted. The machine needs to be booted to safe mode, a file needs to be deleted, then the machine rebooted.”

- Explained by Gergely Orosz

Anyhow, it should NOT happen for either CrowdStrike or Microsoft. This is a classic example of Murphy’s Law.

This article does not intend to mock Microsoft or CrowdStrike; instead, it explores the lessons we can learn and how to prevent similar incidents in the future.

Ideally, a software update should undergo a round of end-to-end regression testing. CrowdStrike, as a large software company (on software safety) valued at $80 billion, surely understands this. So why did it fail so badly?

Update (2024–07–24): “CrowdStrike pledges better tests after IT outage”.

https://www-bbc-com.cdn.ampproject.org/c/s/www.bbc.com/news/articles/ce58p0048r0o.amp

1. IT executives often neglect the importance of quality.

Quality assurance (i.e., software testing in the IT industry) has always been treated as second-class compared to coding and is often neglected.

A Good Software Testing Culture Can be Easily Broken, Lessons learned from Google’s “$100 billion error” on 2023–02–09.

More examples of this: Software companies rarely

provide test automation training to staff.
(instead, they pay for so-called agile training by fake agile coaches)
seek external professional help, e.g. test automation coach or mentor
(often, most test automation attempts led by senior or principal software engineers are repeatedly failed, and then abandoned)

2. Lacking E2E Test Automation

It is another example of “Unit and Integration Testing is not enough”.

The heading of World Quality Report 2018–2019, https://www.capgemini.com/au-en/service/world-quality-report-2018-19/

There was a bad promotion of fake testers to avoid or undermine real E2E UI testing, such as:

Keep reading with a 7-day free trial

Subscribe to The Agile Way to keep reading this post and get 7 days of free access to the full post archives.