On the diversity of software popularity metrics: An empirical study of npm

•Download as PPTX, PDF•

1 like•309 views

Presentation by Prof. Tom Mens (University of Mons) of an ERA-track paper at SANER 2019, the International Conference on Software Analysis, Evolution and Reengineering (Hangzhou, China, February 2019). Abstract: Software systems often leverage on open source software libraries to reuse functionalities. Such libraries are readily available through software package managers like npm for JavaScript. Due to the huge amount of packages available in such package distributions, developers often decide to rely on or contribute to a software package based on its popularity. Moreover, it is a common practice for researchers to depend on popularity metrics for data sampling and choosing the right candidates for their studies. However, the meaning of popularity is relative and can be defined and measured in a diversity of ways, that might produce different outcomes even when considered for the same studies. In this paper, we show evidence of how different is the meaning of popularity in software engineering research. Moreover, we empirically analyse the relationship between different software popularity measures. As a case study, for a large dataset of 175k npm packages, we computed and extracted 9 different popularity metrics from three open source tracking systems: libraries.io, npmjs.com and GitHub. We found that indeed popularity can be measured with different unrelated metrics, each metric can be defined within a specific context. This indicates a need for a generic framework that would use a portfolio of popularity metrics drawing from different concepts. Acknowledgments: This work was partially supported by the EU Research FP (H2020-MSCA-ITN-2014-642954, Seneca), the Spanish Government (TIN2014-59400-R, SobreVision), the Excellence of Science Project SECO-Assist (O015718F, FWO - Vlaanderen and F.R.S.-FNRS).

Science

On the Diversity of Software Package Popularity
An Empirical Study of npm
Ahmed Zerouali, Tom Mens, G. Robles, J. Gonzalez Barahona
IEEE Int’l Conf. Software Analysis, Evolution and Reengineering
Hangzhou, China - February 24-27, 2019
@tom_mens secoassist.github.io

How is software popularity measured? Practical view

How is software popularity measured? Research view

How are these software popularity measures related?

9 popularity metrics
# dependent external repositories
(libraries.io)
# transitive runtime dependents
(libraries.io)
# direct runtime dependents
(npm and libraries.io)
# downloads (npm)
# npm stars (npm)
# github stars
# github forks
# pull requests
# subscribers

Metrics Emanating From the Same Source
Interpretation of Spearman correlation:
[0.0, 0.2[ very weak
[0.2, 0.4[ weak
[0.4, 0.6[ moderate
[0.6, 0.8[ moderately strong
[0.8, 1.0] strong
# transitive runtime
dependents
# direct runtime
dependents
# dependent external
repositories
r = 0.63
moderately strong
r = 0.53
moderate
# direct runtime
dependents
r = 0.66
moderately strong
Spearman correlation coefficient
No strong correlation between
different libraries.io popularity metrics

# downloads # direct runtime
dependents
# npm stars r = 0.39
weak
r = 0.27
weak
# direct runtime
dependents
r = 0.42
moderate
Metrics Emanating From the Same Source
Spearman correlation coefficient
No strong correlation between
different npm popularity metrics
Interpretation of Spearman correlation:
[0.0, 0.2[ very weak
[0.2, 0.4[ weak
[0.4, 0.6[ moderate
[0.6, 0.8[ moderately strong
[0.8, 1.0] strong

# github stars # forks # subscribers
# pull requests r = 0.64
moderately strong
r = 0.70
moderately strong
r = 0.53
moderate
# subscribers r = 0.55
moderately strong
r = 0.55
moderately strong
# forks r = 0.73
moderately strong
Metrics Emanating From the Same Source
Spearman correlation coefficient
No strong correlation between
different GitHub popularity metrics

Metrics Emanating From the Same Source
Example: # forks versus # github stars
r = 0.73
moderately strong
Based on 175K npm
packages
r = 0.56
moderate
Based on GitHub’s 5000
most starred repositories
The chosen population affects the outcome of the results

Metrics Emanating from Different Sources
# runtime dependent repositories (libraries.io)
# direct runtime dependents (npm and libraries.io)
# npm downloads
# npm stars
# github subscribers
Aggarwal-Popularity = #github forks + #github stars + (#pull requests)²

Metrics Emanating from Different Sources
No strong correlation between
popularity metrics from different sources

How many of the top 1,000
most depended-upon npm
packages are part of the top
1.000 of the other popularity
metrics?
Popularity agreement
Little agreement on top 1000 most popular packages

Conclusion
Popularity metrics measure different things
• No strong correlation between metrics from same source
• No strong correlation between metrics from different sources
• Little agreement on topmost popular packages
 Different metrics may produce different outcomes
Selected population affects the correlation
 Different datasets may produce different outcomes
Research on popularity needs to take into account the diversity and
context-dependence of software popularity metrics.

Limitations and Future Work
• Consider more popularity metrics
• Consider other datasets than npm and GitHub
• Assess reproducibility of empirical research based on
popularity metrics

What's hot

Esem2014 traceabilityYujuan Jiang

9th may net sci presentation (1)Rajath Mahesh

Search Inside PST File Without OutlookEmailForensicsInvestigation

Crime Risk Forecasting and Predictive Analytics - Esri UCAzavea

Twitter Sub-event Detection Project PresentationPallav Shah

Biological modeling of software development dynamicsValentina Paunovic

What's hot (6)

Esem2014 traceability

9th may net sci presentation (1)

Search Inside PST File Without Outlook

Crime Risk Forecasting and Predictive Analytics - Esri UC

Twitter Sub-event Detection Project Presentation

Biological modeling of software development dynamics

Similar to On the diversity of software popularity metrics: An empirical study of npm

GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...Gene Kim

The Latest in DevOps: Elite Performance, Productivity, and Scaling - GoogleMarilyne Huret

End Users’ Perception of Hybrid Mobile Apps in the Google Play StoreIvano Malavolta

Keynote: The Phoenix Project: Lessons Learned - PuppetConf 2014Puppet

Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...Alex Pinto

Node.js Module: I Choose You!Bethany Nicolle Griggs

Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks

Open Source Insight: Balancing Agility and Open Source Security for DevOpsBlack Duck by Synopsys

Appstores imc13Thanasis Petsas

Document Classification with Neo4jKenny Bastani

Software Repositories for Research -- An Environmental ScanMicah Altman

Software as a Well-Formed Research ObjectYasmin AlNoamany, PhD

Demystifying Systems for Interactive and Real-time AnalyticsDataWorks Summit

Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...Black Duck by Synopsys

Software Analytics: Data Analytics for Software EngineeringTao Xie

PFHub: Phase Field Community HubDaniel Wheeler

Project Flogo: Serverless Integration, Powered by Flogo and LambdaLeon Stigter

A network based model for predicting a hashtag break out in twitter Sultan Alzahrani

An Example of Predictive Analytics: Building a Recommendation Engine Using Py...PyData

Meetup SF - AmundsenPhilippe Mizrahi

Similar to On the diversity of software popularity metrics: An empirical study of npm (20)

GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...

The Latest in DevOps: Elite Performance, Productivity, and Scaling - Google

End Users’ Perception of Hybrid Mobile Apps in the Google Play Store

Keynote: The Phoenix Project: Lessons Learned - PuppetConf 2014

Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...

Node.js Module: I Choose You!

Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R

Open Source Insight: Balancing Agility and Open Source Security for DevOps

Appstores imc13

Document Classification with Neo4j

Software Repositories for Research -- An Environmental Scan

Software as a Well-Formed Research Object

Demystifying Systems for Interactive and Real-time Analytics

Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...

Software Analytics: Data Analytics for Software Engineering

PFHub: Phase Field Community Hub

Project Flogo: Serverless Integration, Powered by Flogo and Lambda

A network based model for predicting a hashtag break out in twitter

An Example of Predictive Analytics: Building a Recommendation Engine Using Py...

Meetup SF - Amundsen

Recently uploaded

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

Botany 4th semester series (krishna).pdfSumit Kumar yadav

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P

Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823

CELL -Structural and Functional unit of life.pdfNistarini College, Purulia (W.B) India

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari

GBSN - Microbiology (Unit 2)Areesha Ahmad

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

GBSN - Microbiology (Unit 1)Areesha Ahmad

Orientation, design and principles of polyhousejana861314

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

Recently uploaded (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

Botany 4th semester series (krishna).pdf

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Broad bean, Lima Bean, Jack bean, Ullucus.pptx

VIRUSES structure and classification ppt by Dr.Prince C P

Botany krishna series 2nd semester Only Mcq type questions

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...

CELL -Structural and Functional unit of life.pdf

Recombination DNA Technology (Nucleic Acid Hybridization )

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis

Labelling Requirements and Label Claims for Dietary Supplements and Recommend...

GBSN - Microbiology (Unit 2)

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx

GBSN - Microbiology (Unit 1)

Orientation, design and principles of polyhouse

Presentation Vikram Lander by Vedansh Gupta.pptx

On the diversity of software popularity metrics: An empirical study of npm

1. On the Diversity of Software Package Popularity An Empirical Study of npm Ahmed Zerouali, Tom Mens, G. Robles, J. Gonzalez Barahona IEEE Int’l Conf. Software Analysis, Evolution and Reengineering Hangzhou, China - February 24-27, 2019 @tom_mens secoassist.github.io

2. How is software popularity measured? Practical view

3. How is software popularity measured? Research view

4. How are these software popularity measures related?

5. Dataset 175,774 packages

6. 9 popularity metrics # dependent external repositories (libraries.io) # transitive runtime dependents (libraries.io) # direct runtime dependents (npm and libraries.io) # downloads (npm) # npm stars (npm) # github stars # github forks # pull requests # subscribers

7. Metrics Emanating From the Same Source Interpretation of Spearman correlation: [0.0, 0.2[ very weak [0.2, 0.4[ weak [0.4, 0.6[ moderate [0.6, 0.8[ moderately strong [0.8, 1.0] strong # transitive runtime dependents # direct runtime dependents # dependent external repositories r = 0.63 moderately strong r = 0.53 moderate # direct runtime dependents r = 0.66 moderately strong Spearman correlation coefficient No strong correlation between different libraries.io popularity metrics

8. # downloads # direct runtime dependents # npm stars r = 0.39 weak r = 0.27 weak # direct runtime dependents r = 0.42 moderate Metrics Emanating From the Same Source Spearman correlation coefficient No strong correlation between different npm popularity metrics Interpretation of Spearman correlation: [0.0, 0.2[ very weak [0.2, 0.4[ weak [0.4, 0.6[ moderate [0.6, 0.8[ moderately strong [0.8, 1.0] strong

9. # github stars # forks # subscribers # pull requests r = 0.64 moderately strong r = 0.70 moderately strong r = 0.53 moderate # subscribers r = 0.55 moderately strong r = 0.55 moderately strong # forks r = 0.73 moderately strong Metrics Emanating From the Same Source Spearman correlation coefficient No strong correlation between different GitHub popularity metrics

10. Metrics Emanating From the Same Source Example: # forks versus # github stars r = 0.73 moderately strong Based on 175K npm packages r = 0.56 moderate Based on GitHub’s 5000 most starred repositories The chosen population affects the outcome of the results

11. Metrics Emanating from Different Sources # runtime dependent repositories (libraries.io) # direct runtime dependents (npm and libraries.io) # npm downloads # npm stars # github subscribers Aggarwal-Popularity = #github forks + #github stars + (#pull requests)²

12. Metrics Emanating from Different Sources No strong correlation between popularity metrics from different sources

13. How many of the top 1,000 most depended-upon npm packages are part of the top 1.000 of the other popularity metrics? Popularity agreement Little agreement on top 1000 most popular packages

14. Conclusion Popularity metrics measure different things • No strong correlation between metrics from same source • No strong correlation between metrics from different sources • Little agreement on topmost popular packages  Different metrics may produce different outcomes Selected population affects the correlation  Different datasets may produce different outcomes Research on popularity needs to take into account the diversity and context-dependence of software popularity metrics.

15. Limitations and Future Work • Consider more popularity metrics • Consider other datasets than npm and GitHub • Assess reproducibility of empirical research based on popularity metrics

16. Questions

On the diversity of software popularity metrics: An empirical study of npm

Recommended

Recommended

More Related Content

What's hot

What's hot (6)

Similar to On the diversity of software popularity metrics: An empirical study of npm

Similar to On the diversity of software popularity metrics: An empirical study of npm (20)

More from Tom Mens

More from Tom Mens (20)

Recently uploaded

Recently uploaded (20)

On the diversity of software popularity metrics: An empirical study of npm