How to deploy your web service without crashing the cell network

2 min read

Hello great person,

this weeks paper came to me trough the interwebs. It covers the topic how Facebook does Hundrets of Releases to their services all around the globe. TBO most interesting was for me the section where they tell about what smaller companies could do, but where they would crash the telecommunication infrastructure (by forcing billions of users to reconnect at the same time). Learned some nice new features of the Linux Kernel in the Paper as well. Did you know you can hand over sockets without killing them?


Abstract:

Modern network infrastructure has evolved into a complex organism
to satisfy the performance and availability requirements for the billions of users. Frequent releases such as code upgrades, bug fixes and
security updates have become a norm.Millions of globally distributed
infrastructure components including servers and load-balancers
are restarted frequently from multiple times per-day to per-week.
However, every release brings possibilities of disruptions as it can
result in reduced cluster capacity, disturb intricate interaction of the
components operating at large scales and disrupt the end-users by
terminating their connections. The challenge is further complicated
by the scale and heterogeneity of supported services and protocols.
In this paper, we leverage different components of the end-to-end networking infrastructure to prevent or mask any disruptions
in face of releases. Zero Downtime Release is a collection of mechanisms used at Facebook to shield the end-users from any disruptions,
preserve the cluster capacity and robustness of the infrastructure
when updates are released globally. Our evaluation shows that these
mechanisms prevent any significant cluster capacity degradation
when a considerable number of productions servers and proxies are
restarted and minimizes the disruption for different services (notably
TCP, HTTP and publish/subscribe).

Download Link:

https://research.fb.com/wp-content/uploads/2020/12/Zero-Downtime-Release-Disruption-free-Load-Balancing-of-a-Multi-Billion-User-Website.pdf (yes, they use WordPress for that site)


It would be awesome if you could help growing our little paper community even more by sharing it with your circles (you can also @eu_frey me on Twitter for retweets :D):

simon-frey.com/weeklycspaper

If you have any paper recommendation for me, please do not hesitate to approach me via [email protected] (Please keep the Backend & DevOps topic focus in mind)


With much love,

Simon Frey