Recommend a Great CI/DevOps Presentation: “Continuous Integration at Facebook”
See what a real Continuous Integration is like.
I started practising Continuous Integration in late 2005, using CruiseControl (CC in short), the so-called “Grand-daddy” of CI servers. The most challenging part is running automated UI tests; it took me a couple of years to master it.
I had to hack into CruiseControl code (Java) and implemented some features. if you are interested, read ‘My Continuous Testing Journey’.
Since then, I have attended a number of CI presentations at conferences/meetups. Frankly, none was good (Check out CI/CD Clarified for whys). It did not address the real challenges of CI at all, let alone offer practical solutions.
On 20/06/2015, I found an excellent presentation (YouTube) on How Facebook does CI from F8 (Facebook Developer Conference) by Katie Coons, a software engineer at Product Stability (a good division name) at Facebook. It was quite short, only 15 mins, but with a wealth of wisdom. I have watched it many times, and it is still valid. Still, in my opinion, the best presentation on CI/CD and DevOps. I was very surprised by the low number of views and likes on YouTube.
Katie started the presentation with
“№1 goal of CI at Facebook is developer efficiency”
“We want computers waiting on humans.”
Then the three key goals of CI at Facebook:
High-Signal
“We don’t want our developers to chase down the failures that are not their fault, that’s waste of developers’ time”
Rapid
“The system must provide rapid feedback to developers because our developers wll not wait for it. Moving fast is a really important part of Facebooks’ culture”
Frequent
“The system must provide frequent feedback to developers. Because the sooner we let them know about a problem, the easier and faster it is for them to fix it”
I concur with my experience. Ironically, all other CI presentations missed all of the key goals, maybe with the exception of Rapid, but often vague. I like Katie gave a specific target build time for a CI build: 10 minutes.
“10-Minute Build” is an Extreme Programming Practice.
Here, I share my understanding of these 3 CI goals:
High-Signal
Reduce the false alarms, i.e., flagged a test failure, but it was not. If the team did not trust the results of automated test executions, everything goes out of the window, right? However, we frequently see projects executing automated tests every night with a mere ~40% pass rate in Jenkins. Why bother?
Rapid
The feedback from the CI build must be very quick, as we all know, the classic software engineering rule: earlier a bug was found, cheaper and quicker to fix. However, it is easier said than done. Few companies have a proper parallel testing lab set up to make executing automated functional tests faster.
Once, one manager (of another division) wanted to review their CI implementation (over the phone), she said: “Our DevOps guys spent a lot of effort, the build runs for 10 hours …”. I replied: “It has already failed. Your DevOps guys had no idea how to reduce the build time”. The result was exactly as I predicted, the whole thing was discarded shortly after.Frequent
It shall be easy to trigger a run of CI build. There is no point if the total execution time is 10 minutes, but each build's preparation time took 2 hours.
On developer’s involvement in automated testing and CI:
“It is really important at Facebook I will never, as a developer, write code and toss over the fence and for someone else to write test for it. It is my job as a developer at Facebook to write tests for my code, and my reviewers make sure I do”.
💡Show the above to your programmers who don’t want to write automated tests. The fact: most programmers don’t have the capability to develop automated unit tests, let alone Automated End-to-End UI Testing which is much more challenging.
On WebDriver:
“For all of our end-to-end tests at Facebook we use WebDriver, WebDriver is an open-source JSON wired protocol, I encourage you all check it out if you haven’t already. ”
“One of the great advantages of WebDriver is that it gets applications cross all platforms.”
💡Show the above section to your team members who want to create ‘own test framework’ or use other commercial/proprietary ones. If you are interested in why Selenium WebDriver is the best choice for test automation, please read my other articles:
“Please, not another Web Test Automation Framework, just use Selenium WebDriver”
“Selenium WebDriver is the Easiest-to-Learn Web Test Automation Framework”
“Advice on Self-Learning Test Automation with Selenium WebDriver”
On Continuous Testing Server:
Facebook’s own Sandcastle. Of course, not Jenkins/Bamboo/TeamCity which are traditional CI servers. Have you ever seen Level 2 of AgileWay CT Grading (50+ E2E tests) implemented in Jenkins? I mean, reliable test executions with quick feedback. I haven’t.
“> 1000 test results per second every single day”
On addressing flaky automated tests:
“Failures inevitable at Scale”
“Infrastructure failures get retried”
“You would be surprised how many tests cannot run in parallel with themselves”
Please note test-execution-retries here. From the context, the retries were handled by the Sandcastle CI/CT server, NOT the test scripts like Cypress (bad). If you are interested in this topic, please read my other article: “Why Auto-Retry of Test Execution in a Test Framework is Wrong?”
On testing infrastructure.
“At Facebook, We have some of our top engineers working on development infrastructure”
And these impressive FaceBook Build Farm.
(the above is pretty much what I got at the moment)
Wow!
That’s why I gave Level 6 (the highest, a handful in the world) to Facebook on AgileWay CT Grading.