How to deploy your web service without crashing the cell network

Hello great person,

this weeks paper came to me trough the interwebs. It covers the topic how Facebook does Hundrets of Releases to their services all around the globe. TBO most interesting was for me the section where they tell about what smaller companies could do, but where they would crash the telecommunication infrastructure (by forcing billions of users to reconnect at the same time). Learned some nice new features of the Linux Kernel in the Paper as well. Did you know you can hand over sockets without killing them?

Software exists to create business value

I am Simon Frey, the author of the Weekly CS Paper Newsletter. And I have great news: You can work with me

As CTO as a Service, I will help you choose the right technology for your company, build up your team and be a deeply technical sparring partner for your product development strategy.

Checkout my website simon-frey.com to learn more or directly contact me via the button below.

Simon Frey Header image
Let’s work together!

Abstract:

Modern network infrastructure has evolved into a complex organism
to satisfy the performance and availability requirements for the billions of users. Frequent releases such as code upgrades, bug fixes and
security updates have become a norm.Millions of globally distributed
infrastructure components including servers and load-balancers
are restarted frequently from multiple times per-day to per-week.
However, every release brings possibilities of disruptions as it can
result in reduced cluster capacity, disturb intricate interaction of the
components operating at large scales and disrupt the end-users by
terminating their connections. The challenge is further complicated
by the scale and heterogeneity of supported services and protocols.
In this paper, we leverage different components of the end-to-end networking infrastructure to prevent or mask any disruptions
in face of releases. Zero Downtime Release is a collection of mechanisms used at Facebook to shield the end-users from any disruptions,
preserve the cluster capacity and robustness of the infrastructure
when updates are released globally. Our evaluation shows that these
mechanisms prevent any significant cluster capacity degradation
when a considerable number of productions servers and proxies are
restarted and minimizes the disruption for different services (notably
TCP, HTTP and publish/subscribe).

Download Link:

https://research.fb.com/wp-content/uploads/2020/12/Zero-Downtime-Release-Disruption-free-Load-Balancing-of-a-Multi-Billion-User-Website.pdf (yes, they use WordPress for that site)

Weekly in-depth computer science knowledge to become a better programmer. For free!
Over 2000 subcribers. One click unsubscribe.