How open source introduces risk to your software stack and enterprise

Sawubona muntu owesabekayo,

being a producer and massive consumer of open source software in my private and professional live I am always wondering how this whole system still works. We use a massive amount of free labor to base our businesses and larger software projects on. Especially in my professional context I consider this a high risk as dependencies are too often introduced in code bases that where never intended for professional use (looking at you npm “is-even” package). The code quality can be low, but that is totally fine for the software authors do to that (as it is their free time they devote here), but users of the software should be aware of the risks involved. This weeks paper proposes a framework to identify open source software that might be at especially high risk. In my day to day work I use a bit of a less scientific approach (e.g. github stars and active development on a project) but it is always nice to learn how others approach this problem.

Software exists to create business value

I am Simon Frey, the author of the Weekly CS Paper Newsletter. And I have great news: You can work with me

As CTO as a Service, I will help you choose the right technology for your company, build up your team and be a deeply technical sparring partner for your product development strategy.

Checkout my website simon-frey.com to learn more or directly contact me via the button below.

Let’s work together!

Abstract:

The widespread adoption of Free/Libre and OpenSource Software (FLOSS) means that the ongoing maintenance of many widely used software components relies on the collaborative effort of volunteers who set their own priorities and choose their own tasks. We argue that this has created a new form of risk that we call ‘underproduction’ which occurs when the supply of software engineering labor becomes out of alignment with the demand of people who rely on the software produced. We present a conceptual framework for identifying relative underproduction in software as well as a statistical method for applying our framework to a comprehensive dataset from the Debian GNU/Linux distribution that includes 21,902 source packages and the full history of 461,656 bugs. We draw on this application to present two experiments: (1) a demonstration of how our technique can be used to identify at-risk software packages in a large FLOSS repository and (2) a validation of these results using an alternate indicator of package risk. Our analysis demonstrates both the utility of our approach and reveals the existence of widespread underproduction in a range of widely-installed software components in Debian

Download Link:

https://arxiv.org/pdf/2103.00352.pdf