Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

On the diversity of software popularity metrics: An empirical study of npm

122 Aufrufe

Veröffentlicht am

Presentation by Prof. Tom Mens (University of Mons) of an ERA-track paper at SANER 2019, the International Conference on Software Analysis, Evolution and Reengineering (Hangzhou, China, February 2019).
Abstract: Software systems often leverage on open source software libraries to reuse functionalities. Such libraries are readily available through software package managers like npm for JavaScript. Due to the huge amount of packages available in such package distributions, developers often decide to rely on or contribute to a software package based on its popularity. Moreover, it is a common practice for researchers to depend on popularity metrics for data sampling and choosing the right candidates for their studies. However, the meaning of popularity is relative and can be defined and measured in a diversity of ways, that might produce different outcomes even when considered for the same studies. In this paper, we show evidence of how different is the meaning of popularity in software engineering research. Moreover, we empirically analyse the relationship between different software popularity measures. As a case study, for a large dataset of 175k npm packages, we computed and extracted 9 different popularity metrics from three open source tracking systems: libraries.io, npmjs.com and GitHub. We found that indeed popularity can be measured with different unrelated metrics, each metric can be defined within a specific context. This indicates a need for a generic framework that would use a portfolio of popularity metrics drawing from different concepts.
Acknowledgments: This work was partially supported by the EU Research FP (H2020-MSCA-ITN-2014-642954, Seneca), the Spanish Government (TIN2014-59400-R, SobreVision), the Excellence of Science Project SECO-Assist (O015718F, FWO - Vlaanderen and F.R.S.-FNRS).

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

On the diversity of software popularity metrics: An empirical study of npm

  1. 1. On the Diversity of Software Package Popularity An Empirical Study of npm Ahmed Zerouali, Tom Mens, G. Robles, J. Gonzalez Barahona IEEE Int’l Conf. Software Analysis, Evolution and Reengineering Hangzhou, China - February 24-27, 2019 @tom_mens secoassist.github.io
  2. 2. How is software popularity measured? Practical view
  3. 3. How is software popularity measured? Research view
  4. 4. How are these software popularity measures related?
  5. 5. Dataset 175,774 packages
  6. 6. 9 popularity metrics # dependent external repositories (libraries.io) # transitive runtime dependents (libraries.io) # direct runtime dependents (npm and libraries.io) # downloads (npm) # npm stars (npm) # github stars # github forks # pull requests # subscribers
  7. 7. Metrics Emanating From the Same Source Interpretation of Spearman correlation: [0.0, 0.2[ very weak [0.2, 0.4[ weak [0.4, 0.6[ moderate [0.6, 0.8[ moderately strong [0.8, 1.0] strong # transitive runtime dependents # direct runtime dependents # dependent external repositories r = 0.63 moderately strong r = 0.53 moderate # direct runtime dependents r = 0.66 moderately strong Spearman correlation coefficient No strong correlation between different libraries.io popularity metrics
  8. 8. # downloads # direct runtime dependents # npm stars r = 0.39 weak r = 0.27 weak # direct runtime dependents r = 0.42 moderate Metrics Emanating From the Same Source Spearman correlation coefficient No strong correlation between different npm popularity metrics Interpretation of Spearman correlation: [0.0, 0.2[ very weak [0.2, 0.4[ weak [0.4, 0.6[ moderate [0.6, 0.8[ moderately strong [0.8, 1.0] strong
  9. 9. # github stars # forks # subscribers # pull requests r = 0.64 moderately strong r = 0.70 moderately strong r = 0.53 moderate # subscribers r = 0.55 moderately strong r = 0.55 moderately strong # forks r = 0.73 moderately strong Metrics Emanating From the Same Source Spearman correlation coefficient No strong correlation between different GitHub popularity metrics
  10. 10. Metrics Emanating From the Same Source Example: # forks versus # github stars r = 0.73 moderately strong Based on 175K npm packages r = 0.56 moderate Based on GitHub’s 5000 most starred repositories The chosen population affects the outcome of the results
  11. 11. Metrics Emanating from Different Sources # runtime dependent repositories (libraries.io) # direct runtime dependents (npm and libraries.io) # npm downloads # npm stars # github subscribers Aggarwal-Popularity = #github forks + #github stars + (#pull requests)²
  12. 12. Metrics Emanating from Different Sources No strong correlation between popularity metrics from different sources
  13. 13. How many of the top 1,000 most depended-upon npm packages are part of the top 1.000 of the other popularity metrics? Popularity agreement Little agreement on top 1000 most popular packages
  14. 14. Conclusion Popularity metrics measure different things • No strong correlation between metrics from same source • No strong correlation between metrics from different sources • Little agreement on topmost popular packages  Different metrics may produce different outcomes Selected population affects the correlation  Different datasets may produce different outcomes Research on popularity needs to take into account the diversity and context-dependence of software popularity metrics.
  15. 15. Limitations and Future Work • Consider more popularity metrics • Consider other datasets than npm and GitHub • Assess reproducibility of empirical research based on popularity metrics
  16. 16. Questions

×