Opbeat is joining forces with Elastic – Read our blog post

Move fast and break things

It’s one of the principles that has guided Facebook’s development process since its earliest days. In five short words it encapsulates a philosophy of rapid development, constant iteration, and the courage to leave the past behind. Of course, some might wonder why you couldn’t just stop at the “move fast” part. The truth is that breaking things is unavoidable.

Even disregarding features that “work” but need to be broken in order to continue innovating (for example, how the profile has changed, often dramatically, over the years), Facebook is a social product connecting over a billion people across the globe. It simply isn’t possible to simulate the unique strains that this level of activity creates. More importantly, it’s generally impossible to understand how people will use a feature or react to a change until after it has been implemented and pushed out into production (though generally first to a smaller subset of users). A billion people will pretty quickly try every possible way to interact with your code, so features will be used in ways you never expected, and sometimes things will break in ways that you didn’t anticipate. Because you can’t get that level of feedback until things reach production, it means that moving fast is inextricably tied with the process of deployment.

It’s hard to imagine now, but back in early 2004, Facebook (or rather, TheFacebook, as it was called then) was just a small college social network available only to Harvard students. In those earliest days, “deployment” just meant pushing a new version of the dozen or so PHP files that comprised Facebook up to a single Apache server. There wasn’t a clear line between development and deployment, and there was certainly no formal release process. As we launched at more and more colleges, the site quickly expanded to dozens of servers, and then to our first colocation center, where we racked all the servers ourselves in one epic all-night session.

As Facebook grew, the deployment process quickly became more structured, which led us to realize a necessary corrolary of the MFABT philosophy: trusting and empowering engineers. This discovery came when Facebook hired the first members of its Ops team. When the new Ops team arrived, they immediately wanted to change the way Facebook handled deployment. Facebook was less than a year old, and at that time, any engineer could push code live to production at any time (once the changes had been checked by other engineers, of course). The new Ops team wanted to change this process by creating a staging environment that would be a necessary stopping point before any code touched production. Once there, each release would be thoroughly tested by a QA team to make sure that nothing would break when it was pushed live.

Adam D’Angelo, who would later become Facebook’s CTO, led the charge in resisting this change. He argued that not only was this impossible (due to the unique characteristics of Facebook that I mention above), but that it would dramatically slow down Facebook’s speed of development. In the end, Mark agreed with Adam, and the conversation changed to how we could build tools to support this kind of rapid development.

As Facebook continued to scale — and as the number of servers grew into the hundreds — it soon became impractical to push new versions of the code live at a moment’s notice. This necessitated weekly release cycles that happened during low points in the site’s usage. This created an opportunity to bring the idea of staging back, albeit in a modified form. Changes to the code were first pushed internally to the version of Facebook that employees used, effectively making the whole company part of the QA team. Engineers still had broad lattitude to decide when their code was production-ready, but there was a window in which changes could see significant usage internally before they reached the outside world. This also allowed the team to get feedback from a larger group for features that were still in flux.

While deployment has become more advanced and is no longer closely tied to weekly cycles, for the most part this is still the system in place today, and the drive of the Ops team continues to be towards making it faster for engineers to release their code (though with some important safeguards). It’s always easier to see the right answers in hindsight, but Facebook’s speed of technical advancement turned out to be one of its greatest assets in the battle with its early rivals.

The good news is that with all of the cloud platforms out there today, it’s easier to deploy and scale web services than ever before. This ease is leading many startups to forgo a dedicated Ops team for longer, leaving the duties of deployment and server management to developers. Opbeat is a powerful new tool in this burgeoning DevOps movement. It allows developers to move faster and spend less time fixing things when they break. This helps teams that don’t have the resources of Facebook to adopt some of the goals of rapid development and empowering engineers.

About the author
Andrew McCollum is co-founder of Facebook and investor in Opbeat.