The Agile Way

The Agile Way

Share this post

The Agile Way
The Agile Way
Did CrowdStrike Learn the Lesson? I Don’t Think So

Did CrowdStrike Learn the Lesson? I Don’t Think So

I did not see the real engineering solution in CrowdStrike’s Preliminary Post Incident Review, besides pledges better testing.

Zhimin Zhan's avatar
Zhimin Zhan
Aug 22, 2024
∙ Paid
2

Share this post

The Agile Way
The Agile Way
Did CrowdStrike Learn the Lesson? I Don’t Think So
Share

Five days after causing the horrific global IT outage, CrowdStrike released a detailed review of the incident on 2024–07–24.

“The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing.”

In this “How Do We Prevent This From Happening Again?” section:

  • Improve Rapid Response Content testing by using testing types such as:

  • Local developer testing

  • Content update and rollback testing

  • Stress testing, fuzzing and fault injection

  • Stability testing

  • Content interface testing

Wow, there are a lot of testing terms there, even with a fancy name: “Rapid Response Content testing”. To non-IT people, it seems that this company is getting serious about testing from now on.

I don’t think CrowdStrike fully learned a lesson well from this review document. Let’s revisit the basics.

What happened from the customers’ point of view?

Millions of users found their Windows computers got the ‘blue screen of death” at the same time.

We are referring to tens of millions of computers, not just isolated incidents like a specific user’s computer with tricky software installed. Essentially, it failed End-User Testing and was easily detectable during that phase, not about unit testing, performance testing, integration testing, or stress testing, as repeatedly mentioned in CrowdStrike’s review document.

https://www.reddit.com/r/ProgrammerHumor/comments/1aiq7tw/therealtester/, with 7.6K likes

If we get a bit more technical (most computer users will still understand), this refers to regression testing, which tests a new release shouldn’t break existing features.

If CrowdStrike senior engineers have a solid understanding of CI/CD and End-Test Automation, particularly in the context of regression testing, its review document will highlight those. But it did not.

The Solution, a Proper Engineering Way

Keep reading with a 7-day free trial

Subscribe to The Agile Way to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Zhimin Zhan
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share