Presentation by Tom Mens of joint work with Alexandre Decan (University of Mons) at the SATTOSE 2017 research seminar in Madrid (7 June 2017).
Abstract: We carry out a quantitative empirical comparison of the macro-level evolution of software packaging ecosystems for a multitude of different programming languages. We report on the most important observed differences and commonalities in the evolution of their package dependency networks. We hypothesise that the observed commonalities emerge due to the ecosystem scale and complexity. Inspired by Lehman’s laws of software evolution, we seek evidence for a series of empirically observable “laws of software ecosystem evolution”.
Towards Laws of Software Ecosystem Evolution: An Empirical Comparison of Seven Software Packaging Ecosystems
1. An Empirical Comparison of Seven
Package Dependency Networks
NugetnpmCargo CRAN CPAN Packagist RubyGems
Towards Laws of Software Ecosystem Evolution
2. An Empirical Comparison of Seven
Package Dependency Networks
Towards Laws of Software Ecosystem Evolution
Tom Mens and Alexandre Decan
COMPLEXYS Research Institute
University of Mons, Belgium
3. Software Ecosystems
Large and coherent collections of software
components that are maintained by large and
geographically distributed online communities.
6. Package Dependency Networks
Extracted using open source discovery service
http://libraries.io (CC BY-SA 4.0)
Name Age Language Packages Dependencies
Cargo 2014 Rust 9k 150k
CPAN 1995 Perl 34k 1,078k
CRAN 1997 R 12k 164k
npm 2010 JavaScript 462k 1,369k
NuGet 2010 .NET 84k 1,665k
Packagist 2012 PHP 97k 1,863k
RubyGems 2004 Ruby 132k 1,894k
7. Laws of Software Evolution
Empirically observed by M. Lehman
for large proprietary software systems
Continuing Growth
Continuing Change
Increasing Complexity
[ … ]
Do they also hold for software ecosystems?
Lehman M.M. and Belady L.A., 1985. Software Evolution – Processes of Software Change.
Free download from http://informatique.umons.ac.be/genlog/BeladyLehman1985-ProgramEvolution.pdf
10. Evolution of number of package updates per month
Continuing Change
Fastest growth for npm, NuGet, Packagist
11. Package releases get updated often
Survival probability of a package release
Continuing Change
Probability > 50% for a package release to be updated within 2 months.
For CRAN : within 6 months.
12. Younger packages get updated more often …
Continuing Change
Over 50% of updates are for packages ...
up to 6 months old up to 6 months old
up to 3 months old
Over 2 years oldOver 2 years old
… except for older ecosystems
13. Complexity caused by
– high proportion of dependent packages
Ecosystem Complexity
I had one case where my package heavily depended
on another package and after a while that package
was removed from CRAN and stopped being
maintained. So I had to remove one of the main
features of my package. Now I try to minimize
dependencies on packages that are not maintained
by ‘established’ maintainers or by me.
15. Most of the complexity is hidden …
Ecosystem Complexity
16. Most of the complexity is hidden …
… in the transitive dependencies
Ecosystem Complexity
17. Complexity increases over time
for some ecosystems (npm, nuget, cargo)
Evolution of ratio between
number of transitive and number of direct dependencies
Increasing Complexity
18. Most of the complexity is deeply hidden …
… in the transitive dependencies
Proportion of top-level packages by depth of dependency tree
Over 50% of top-level packages have
deep dependency tree.
Ecosystem Complexity
19. Impact of transitive dependencies
March 2016
Unexpected removal of left-pad
caused > 2% of all packages to break
(> 5,400 packages)
Ecosystem Complexity
This impacted many thousands of projects. [...]
We began observing hundreds of failures per
minute, as dependent projects – and their
dependents, and their dependents... – all failed
when requesting the now-unpublished package.”
20. Impact of transitive dependencies
March 2016
Unexpected removal of left-pad
caused > 2% of all packages to break
(> 5,400 packages)
RubyGems, November 2010
Release 0.5.0 of i18n broke dependent package ActiveRecord,
transitively required by >5% of all packages (930)
Ecosystem Complexity
21. Impact of transitive dependencies
• P-Impact Index = number of packages that are transitively
required by at least P% of all packages.
Evolution of 5-Impact Index
Increasing Complexity
22. Summary
Observed evidence of evolution “laws” of
software (packaging) ecosystems
Increasing growth
Continuing change
Increasing complexity
(How) could we find evidence for other laws?
23. Complex Networks
Emergent properties have been
observed in complex networks
– Small-world phenomenon
– Power-law behaviour
(unequal, skewed, distributions)
– …
Do they also hold for package dependency networks?
24. Low proportion of required packages
Unequally Distributed Connectivity
25. • Low proportion of required packages concentrates high
proportion of reverse deps
– From 6% to 17% of required packages concentrate over 80%
of all reverse dependencies.
• High proportion of package updates is concentrated in a
minority of packages.
Power Law Behaviour
Skewed Distributions
Emergent property of complex networks?
26. Skewed distributions of in- and out-degree
in package dependency graph
• Few packages with many dependents (resp. dependencies)
• Many packages with very few dependencies (resp. dependents)
Power Law Behaviour
Skewed Distributions
27. Summary
Observed evidence of complex network behavior
(power laws)
Unequal distribution of package dependencies
Unequal distribution of package updates
Other emerging properties from complex networks?
28. Open Questions
Many observed similarities across ecosystems …
… but also some differences
To which extent does the ecosystem policy influence
its evolution?
Many tools help in supporting package maintainers
• DependencyCI, Gemnasium, …
• How should they be improved?
– E.g. to deal with transitive deps, co-installability issues, …
29. References
• A Decan, T Mens, P Grosjean. An empirical comparison of
package dependency networks in seven software ecosystems.
SUBMITTED
• E Constantinou, T Mens. Socio-technical evolution of the Ruby
ecosystem in GitHub. SANER 2017
• A Decan, T Mens, M Claes. An empirical comparison of
dependency issues in OSS packaging ecosystems. SANER 2017
• E Constantinou, T Mens. Social and technical evolution of
software ecosystems: A case study of Rails. WEA 2016
• A Decan, T Mens, M Claes. On the topology of package
dependency networks: A comparison of programming
language ecosystems. WEA 2016
Hinweis der Redaktion
Logarithmic y-axis
Number of dependencies considering the latest release of each pacakge only.
Logarithmic y-axis
Non-required packages = straight line
Required packages = dashed line
Proportion of updates in 2016 by package age
Inspired by h-index
Measures the propensity for an ecosystem to change, taking into account the amplitude (number of packages) and the importance (number of package updates).
Pour left-pad, j'ai pris le 1er mars 2016 comme référence.Left-pad avait alors 5407 paquets dépendants, sur 255844 au total (soit 2.11%).Pour i18n, j'ai pris la date de release de la 0.5.0, soit le 28 novembre 2010. A ce moment, il y avait 1435 paquets qui en dépendaient, sur 17869, soit 8.03%. Le paquet activerecord est celui qui a "cassé" suite au changement dans i18n (je n'ai aucune preuve que d'autres paquets ont cassé, ou pas !). ActiveRecord avait alors 930 paquets dépendants, soit 5.2% de l'écosystème.
Pour left-pad, j'ai pris le 1er mars 2016 comme référence.Left-pad avait alors 5407 paquets dépendants, sur 255844 au total (soit 2.11%).Pour i18n, j'ai pris la date de release de la 0.5.0, soit le 28 novembre 2010. A ce moment, il y avait 1435 paquets qui en dépendaient, sur 17869, soit 8.03%. Le paquet activerecord est celui qui a "cassé" suite au changement dans i18n (je n'ai aucune preuve que d'autres paquets ont cassé, ou pas !). ActiveRecord avait alors 930 paquets dépendants, soit 5.2% de l'écosystème.
Emergence: process whereby larger entities, patterns, and regularities arise through interactions among smaller or simpler entities that themselves do not exhibit such properties.
“Network thinking is providing novel ways to think about difficult problems such as how to do efficient search on the Web, […] how to manage large organisations, how to preserve ecosystems, […] and, more generally, what kind of resilience and vulnerabilities are intrinsic to natural, social, and technological networks, and how to exploit and protect such systems.”. Melanie Mitchell: Complexity: A Guided Tour.
The concept of a small world was originally observed in the late 1960’s by the social psychologist Stanley Milgram.
- S. Milgram, “The Small World Problem,” Psychology Today, 2, 1967 pp. 60–67.
J. Travers and S. Milgram, “An Experimental Study of the Small World Problem,” Sociometry, 32(4), 1969 pp. 425–443.
The clustering can me measured by the presence of a high clustering coefficient
Connected packages = packages having at least 1 dependent (incoming dependency) or dependency (outgoing dependency)
Weakly connected component = subgraph in which each vertex is conntected to every other vertex by an undirected edge path.
Connected packages = packages having at least 1 dependent (incoming dependency) or dependency (outgoing dependency)
Weakly connected component = subgraph in which each vertex is connected to every other vertex by an undirected edge path.
Lorenz curve computed on January 2017.
Connected packages = packages having at least 1 dependent (incoming dependency) or dependency (outgoing dependency)
Weakly connected component = subgraph in which each vertex is connected to every other vertex by an undirected edge path.
Lorenz curve computed on January 2017.