Presentation at DrupalCamp 2018 (Ghent) by Tom Mens (University of Mons) about lessons learned and guidelines based on a historical empirical analysis of the npm JavaScript packaging ecosystem, and the impact of technical problems in its package dependency network. This work is part of the SECOHealth and SECO-ASSIST research projects, co-financed by the FNRS-FRS.
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Understanding and Improving Open Source Software Ecosystem Health
1. Tom Mens, University of Mons, Belgium
On the health of the npm packaging ecosystem
2. On the health of the
packaging ecosystem
Guidelines and lessons
learned based on historical
software data analytics
Tom Mens
Software Engineering Lab
tom.mens@umons.ac.be
T Mens
E Constantinou
A Decan
@tom_mens
3. Research Context
• Today over 80% of all software in any technology product
or service is open source software (OSS).
• CHAOSS focuses on creating analytics and metrics to help
define OSS community health.
https://chaoss.community
"The CHAOSS community is developing metrics, methodologies, and software for
expressing open source project health and sustainability. By doing so, CHAOSS
seeks to improve the transparency of open source project health and
sustainability so that relevant stakeholders can make more informed decisions
about open source project engagement."
14. Motivation: Security
vulnerabilities
security exploit in 2017
“attackers entered its system in mid-May through a web-application vulnerability
that had a patch available in March. In other words, the credit-reporting giant had
more than two months to take precautions that would have defended the personal
data of 143 million people from being exposed. It didn’t.”
Wired Magazine, “Equifax Has No Excuse”, September 2017
"Patching the security hole was labor intensive and difficult, in part because it
involved downloading an updated version of Struts and then using it to rebuild all
apps that used older, buggy Struts versions. Some websites may depend on dozens
or even hundreds of such apps, which may be scattered across dozens of servers on
multiple continents. Once rebuilt, the apps must be extensively tested before going
into production to ensure they don’t break key functions on the site.”
Ars Technica, Failure to patch two-month-old bug led to massive Equifax breach, September
2017
15. Understanding
through Big Data Analytics
npm = software package manager for JavaScript since 2010
In 2017:
3.5TB of storage required for hosting 500K packages
2.3 million opened GitHub pull requests for JavaScript repositories
We analysed:
~462 thousand packages
~3 million package releases
~13,6 million (runtime) package dependencies
16. Ecosystems grow rapidly
For npm: Exponential growth of
• #packages
• #package updates
• #dependencies
# new packages per trimester # package updates per trimester
Total # package dependencies
17. Ecosystems grow rapidly
Package updates can be the cause
of many maintainability issues or
even failures in dependent
packages !
# new packages per trimester # package updates per trimester
Total # package dependencies
18. Issues in packages may have
high transitive impact
Average dependency depth
for top-level packages
Proportional
dependency depth for
top-level packages
Many "top-level" packages have a high number
of indirect (transitive) dependencies
19. Issues in packages may have
high transitive impact
March 2016: Unexpected removal of left-pad caused
> 2% of all packages to break (> 5,400 packages)
Number of packages that are transitively required by at least 5% of all packages
20. Lesson learned: Be wary of
transitive dependencies!
• Developers are often unaware of transitive
dependencies
• It just takes one such transitive package to break or
compromise your software!
Monitoring tools may help to detect and address such
dependency issues
21. Security vulnerabilities
• When are vulnerabilities discovered in npm?
• When are vulnerabilities fixed in npm?
• When do dependent packages adopt a fixed release?
SOURCE: A Decan, T Mens, E Constantinou (2018)
IEEE Int'l Conf. Mining Software Repositories
"On the impact of security vulnerabilities in the npm package dependency network"
"37% of websites include a JavaScript library with a known open source vulnerability."
T. Lauinger et al. "Thou Shalt Not Depend on Me: Analysing the Use of Outdated JavaScript
Libraries on the Web", NDSS 2017.
23. When are vulnerabilities discovered in npm ?
>40% of all vulnerabilities are not discovered even 2 years after
their introduction, regardless of their severity.
It takes a long time to discover vulnerabilities
regardless of their severity
25. When are vulnerabilities fixed in npm ?
Most vulnerabilities are fixed quickly, and before
becoming public.
1 out of 5 take more than a year to be fixed
unmaintained packages that should be deprecated
26. When do dependent packages
adopt a fixed release?
1 out of 3 dependents never update their
dependency to a vulnerable package
Improper or too restrictive use of dependency constraints
Dependent package is no longer actively maintained
Maintainers of dependent packages are unaware of the
vulnerability or the fix
Fixed package version is incompatible
27. Technical Lag
(a.k.a. dependency freshness)
Goal
• Study, at an ecosystem level, how outdated npm software
packages are with respect to their upstream dependencies.
• Study to which extent semantic versioning is respected
SOURCE: A Decan, T Mens, E Constantinou (2018)
IEEE Int'l Conf. Software Maintenance and Evolution
"On the evolution of technical lag in the npm package dependency network"
Technical lag is caused by dependency constraints
preventing the use of a more recent package version
28. Technical Lag
Main findings
• 1 out of 4 package dependencies suffers
from technical lag
• 1 out of 4 package releases has a technical lag
of more than 9 months
• Minor and patch updates tend to increase technical lag,
even though they are supposed to be backward
compatible
• Major updates tend to reduce technical lag
29. Technical Lag
Actionable results
• Appropriate use of version constraints could reduce
technical lag in 17% of all releases
• Dependency monitoring tools should inform developers
of technical lag and help to reduce it.
• Package maintainers should help dependent packages
to upgrade to new releases as easily as possible.
• Package maintainers should backport important bug
and security fixes to earlier major releases.
30. Be prudent !
• Only add a dependency if it is really needed
• Avoid too many (transitive) dependencies
• Avoid adding dependencies to problematic packages
• too high technical lag
• security vulnerabilities
• unmaintained or deprecated packages
Guidelines and
lessons learned
31. Be agile !
• Detect and fix vulnerabilities early
• Embrace semantic versioning
• Use (transitive) dependency monitoring tools to review your
dependencies regularly
• Integrate these tools in your Continuous Integration process
Guidelines and
lessons learned
32. Be communicative !
• Inform your dependents about
• incompatible upgrades: by adhering to semantic versioning
• planned updates
• deprecated features
• Help your dependents to upgrade more easily
• Provide (automated) migration guidelines
• Provide alpha/beta releases
• Test your changes on dependents before releasing updates
Guidelines and
lessons learned
33. SoHeal 2019
2nd International ICSE Workshop on Software Health
Montreal, Canada, 28 May 2019
• Position papers: 1 February 2019
• Industry/practitioner talk proposals: 15 February 2019
https://soheal.github.io
@iw_soheal
What?
Software Health encompasses many socio-technical aspects:
success, longevity, growth, resilience, survival, diversity,
sustainability, popularity, inclusiveness ...
Why?
• Raise awareness of software health
• Present tools, methods, practical experiences, ...
• Advance body of knowledge on software health.
Who?
Open Source Community Members, Industry and Academia
Hinweis der Redaktion
Technical Diversity: different platforms, different programming languages, different application domains, different packages with similar functionality
Community Smells: Lone Wolfs, Isolated Teams, Communication Problems
Contributor Abandonment: Rage quitting
“The package leftpad essentially contains a few lines of source code but has thousands of dependent projects, including Node and Babel.
When its developer decided to unpublish all his modules for npm, this had important consequences, “almost breaking the internet “
March 2016
Unexpected removal of left-padcaused > 2% of all packages to break
(> 5,400 packages)
RubyGems, November 2010
Release 0.5.0 of i18n broke dependent package ActiveRecord, transitively required by>5% of all packages (930)
Transitive dependencies are a problem, especially since dependency monitoring tools typically only consider direct dependencies.
Breaking changes = backward incompatible changes that are not announced as such. If semantic versioning is used, breaking changes should only arise in "major" releases.
Volume: need to store, analyse and manipulate huge quantities of data when studying software ecosystems (containing tends of thousands of components and dependencies, a huge number of commits, thousands of contributors, millions of lines of code, …
For packaging ecosystems (numbers reported by A. Decan):
Number of packages, releases (excluding prereleases) and (runtime) dependencies (March 2018):- Cargo : 14,491 / 80,778 / 292,470- NPM : 698,647 / 4,432,172 / 19,838,481- Packagist : 126,363 / 832,899 / 2,273,465- Rubygems : 143,737 / 825,386 / 1,970,396
Other numbers that could be of interest (coming from https://octoverse.github.com/, 2017), e.g.:- Number of opened PR for Javascript: 2.3M- Number of opened PR for Python: 1M- ...Some numbers for single projects:- Rails (rubygems):68,980 commits, 346 releases, 3,570 contributors370 open, 11,064 closed issues720 open, 20,689 closed PR- Django (python):25,703 commits, 186 releases, 1,584 contributors28,665 issues (not on github)162 open, 9,784 closed PR- React (javascript):9,885 commits, 89 releases, 1,178 contributors373 open, 5,553 closed issues90 open, 6,684 closed PR
#packages grows exponentially for npm and packagist
#dependencies and #package updates grows exponentially for npm, linearly for packagist
Survival analtysis. After 24 months, 40% of all vulnerabilities are still not discovered!
+ Most vulnerabilities are quickly fixed after their discovery.
- ~20% of vulnerabilities take more than 1 year to be fixed.
Version constraints could reduce technical lag in 17% of all releases.
E.g., through better of semantic versioning
Package maintainers should help dependent packages to upgrade to new releases as easily as possible.
E.g. through (automated) migration guidelines;or by providing alpha/beta releases