SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Six Easy Pieces (of Quantitatively Analyzing Open Source Software) ‏ Dirk Riehle SAP Research, SAP Labs LLC dirk@riehle.org, www.riehle.org, twitter.com/driehle
Open Source Software ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Talk Overview (Agenda) ‏ The Growth of Open Source Software Data Mining for Fun and Profit Efficiently Estimating Commit Sizes Developer Activity in Open Source Software Projects 1. 2. 3. 5. The Commit Size Distribution of Open Source 4. The Commenting Practice of Open Source 6. Team Size Evolution in Open Source Projects 7. Conclusions 8.
The Growth of Open Source Software Amit Deshpande, Dirk Riehle. “The Total Growth of Open Source.” In  Proceedings of the Fourth Conference on Open Source Systems  (OSS 2008). Springer Verlag, 2008. Page 197-209.   http://www.riehle.org/2008/03/14/the-total-growth-of-open-source/
Source Code Growth in Open Source SLoC = source lines of code
Model of Source Code Growth where, y: Total open source lines of code x: Time from Jan 1995 to Dec 2006 in months 0.964 y = 2E+06*e 0.0464x Lower bound 0.961 y = 784098*e 0.0555x Upper bound R-square value Model Approach
Project Growth in Open Source
Model of Project Growth where, y: Total number of open source projects x: Time from Jan 1995 to Dec 2006 in months 0.956 y = 7.1511e 0.0499x R-square value Model
Where Open Source is Growing ,[object Object],[object Object]
Data Mining for Fun and Profit Oliver Arafat, Amit Deshpande, Philipp Hofmann, Dirk Riehle. http://www.riehle.org/publications/
Motivation and Approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Source, Data Quality ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open Source Analytics Tool Chain
Efficiently Estimating Commit Sizes Philipp Hofmann, Dirk Riehle. “Estimating Commit Sizes Efficiently.” In  Proceedings of the 5th International Conference on Open Source Systems  (OSS 2009). Springer Verlag, 2009. Forthcoming.  http://www.riehle.org/2009/02/11/estimating-commit-sizes-efficiently/
Definition of Commit Size ,[object Object],[object Object],[object Object]
What Diff Does 4,5c4,6 < d < f --- > e > e > e 7a9 > j 9d10 < n a b c e e e g h j m a b c d f g h m n 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: diff a.txt b.txt b.txt a.txt
The Trouble with Diff ,[object Object],[object Object],[object Object],[object Object],[object Object]
Some Diff Section Size Examples ,[object Object],[object Object],2 0 1 1 Event 2 1 1 0 0 Event 1 Number of  Modifications Number of  SLoC changed Number of  SLoC removed Number of  SLoC added (1, 1)‏ 7 0 3 4 Event 4 6 1 2 3 Event 3 5 2 1 2 Event 2 4 3 0 1 Event 1 Number of  Modifications Number of  SLoC changed Number of  SLoC removed Number of  SLoC added (4, 3)‏
Garden Variety of Heuristics 5.44 0 Linear Estimation 7 40.35 -5.95 Ldiff 6 30.87 -3.06 GNU diff –d 5 19.55 -1.96 GNU diff 4 7.68 -0.27 Bounds Mean 3 6.39 -4.41 Upper Bound 2 16.64 3.86 Lower Bound 1 Error  Standard Deviation Error Mean Approach
Visual Comparison of Heuristics
Definition of Commit Size ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Commit Size Distribution of Open Source Oliver Arafat, Dirk Riehle. “The Commit Size Distribution of Open Source Software.”  In  Proceedings of the 42nd Hawaiian International Conference on System Science   (HICSS-42). IEEE Press: 2009. Page 1-8. http://www.riehle.org/2008/09/23/ the-commit-size-distribution-of-open-source-software/
The Overall Commit Size Distribution
The Dominance of Small Commits
The Overall Commit Size Distribution ,[object Object],[object Object],[object Object],[object Object],[object Object]
Developer Activity in Open Source Software Projects Dirk Riehle, Oliver Arafat, Amit Deshpande. “Developer Activity in Open Source Software Projects.” In  preparation. Amit Deshpande, Dirk Riehle. “Continuous Integration in Open Source Software Development.” In  Proceedings of the Fourth Conference on Open Source Systems  (OSS 2008). Springer Verlag, 2008. Page 273-280. http://www.riehle.org/2008/03/08/ continuous-integration-in-open-source-software-development/
Average Commit Size
Average Commit Frequency
Changes in Developer Behavior ,[object Object],[object Object],[object Object],[object Object]
The Commenting Practice of Open Source Oliver Arafat, Dirk Riehle. “The Comment Density of Open Source Software Code.” In  Companion to Proceedings of the 31st International Conference on Software Engineering  (ICSE 2009). IEEE Press, 2009: Forthcoming.  http://www.riehle.org/2009/02/04/ the-comment-density-of-open-source-software-code/
Average Comment Density ,[object Object]
Comment Density by Programming Language 273 7% 10% Perl 6. 534 8% 11% Python 5. 276 9% 16% Javascript 4. 1621 8% 18% C/C++ 3. 559 12% 22% php 2. 1085 11% 26% Java 1. Population Size Stddev [%] Average [%] Language #
Comment Density by Commit Size
Comment Density by Team Size
Comment Density by Project Age
Commenting in Open Source ,[object Object],[object Object],[object Object],[object Object],[object Object]
Team Size Evolution in Open Source Projects Philipp Hofmann, Dirk Riehle. “Team Size Evolution in Open Source Software Projects.” In  preparation.
Teams Size Evolution Figure
Is Open Source Scale-Free? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Thank you! dirk@riehle.org, www.riehle.org, twitter.com/driehle Comments are welcome! ‏

Weitere ähnliche Inhalte

Ähnlich wie Six Easy Pieces of Quantitatively Analyzing Open Source

Open source vs. open data
Open source vs. open dataOpen source vs. open data
Open source vs. open data
data publica
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
Friprogsenteret
 
Road to DevOps ROI
Road to DevOps ROIRoad to DevOps ROI
Road to DevOps ROI
Cloudmunch
 
What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.
anilpmuvvala
 
What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.
anilpmuvvala
 

Ähnlich wie Six Easy Pieces of Quantitatively Analyzing Open Source (20)

Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Open source vs. open data
Open source vs. open dataOpen source vs. open data
Open source vs. open data
 
20080602 Microsoft and Open Source
20080602 Microsoft and Open Source20080602 Microsoft and Open Source
20080602 Microsoft and Open Source
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
 
Open source presentation enterprise ireland 2010
Open source presentation enterprise ireland 2010Open source presentation enterprise ireland 2010
Open source presentation enterprise ireland 2010
 
Asundi
AsundiAsundi
Asundi
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
 
DevOps interview questions and answers
DevOps interview questions and answersDevOps interview questions and answers
DevOps interview questions and answers
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
The OSGeo Foundation: Professionally Leveraging Open Source Geospatial
The OSGeo Foundation: Professionally Leveraging Open Source GeospatialThe OSGeo Foundation: Professionally Leveraging Open Source Geospatial
The OSGeo Foundation: Professionally Leveraging Open Source Geospatial
 
Microsoft ve Açık Kaynak
Microsoft ve Açık KaynakMicrosoft ve Açık Kaynak
Microsoft ve Açık Kaynak
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010
 
Road to DevOps ROI
Road to DevOps ROIRoad to DevOps ROI
Road to DevOps ROI
 
Open Source Software in Libraries
Open Source Software in LibrariesOpen Source Software in Libraries
Open Source Software in Libraries
 
What_is_DevOps.pptx
What_is_DevOps.pptxWhat_is_DevOps.pptx
What_is_DevOps.pptx
 
Open source softwares, 2011
Open source softwares, 2011Open source softwares, 2011
Open source softwares, 2011
 
What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.
 
What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.
 
Succeeding with FOSS!
Succeeding with FOSS!Succeeding with FOSS!
Succeeding with FOSS!
 

Mehr von Dirk Riehle

The Business of Open Source User Foundations
The Business of Open Source User FoundationsThe Business of Open Source User Foundations
The Business of Open Source User Foundations
Dirk Riehle
 
The Business of Open Models
The Business of Open ModelsThe Business of Open Models
The Business of Open Models
Dirk Riehle
 
2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web
Dirk Riehle
 
Learning From Wikipedia
Learning From WikipediaLearning From Wikipedia
Learning From Wikipedia
Dirk Riehle
 

Mehr von Dirk Riehle (12)

Single-Vendor Open Source at the Crossroads
Single-Vendor Open Source at the CrossroadsSingle-Vendor Open Source at the Crossroads
Single-Vendor Open Source at the Crossroads
 
Why open source is good for your economy
Why open source is good for your economyWhy open source is good for your economy
Why open source is good for your economy
 
Startupinformatik
StartupinformatikStartupinformatik
Startupinformatik
 
Tripod
TripodTripod
Tripod
 
The Business of Open Source User Foundations
The Business of Open Source User FoundationsThe Business of Open Source User Foundations
The Business of Open Source User Foundations
 
The Business of Open Models
The Business of Open ModelsThe Business of Open Models
The Business of Open Models
 
2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web
 
Open Source: A New Developer Career
Open Source: A New Developer CareerOpen Source: A New Developer Career
Open Source: A New Developer Career
 
The Comment Density of Open Source Software Code
The Comment Density of Open Source Software CodeThe Comment Density of Open Source Software Code
The Comment Density of Open Source Software Code
 
Micro-Blogging in the Enterprise Focus Groups Evaluation
Micro-Blogging in the Enterprise Focus Groups EvaluationMicro-Blogging in the Enterprise Focus Groups Evaluation
Micro-Blogging in the Enterprise Focus Groups Evaluation
 
Learning From Wikipedia
Learning From WikipediaLearning From Wikipedia
Learning From Wikipedia
 
Open Collaboration
Open CollaborationOpen Collaboration
Open Collaboration
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Six Easy Pieces of Quantitatively Analyzing Open Source

  • 1. Six Easy Pieces (of Quantitatively Analyzing Open Source Software) ‏ Dirk Riehle SAP Research, SAP Labs LLC dirk@riehle.org, www.riehle.org, twitter.com/driehle
  • 2.
  • 3. Talk Overview (Agenda) ‏ The Growth of Open Source Software Data Mining for Fun and Profit Efficiently Estimating Commit Sizes Developer Activity in Open Source Software Projects 1. 2. 3. 5. The Commit Size Distribution of Open Source 4. The Commenting Practice of Open Source 6. Team Size Evolution in Open Source Projects 7. Conclusions 8.
  • 4. The Growth of Open Source Software Amit Deshpande, Dirk Riehle. “The Total Growth of Open Source.” In Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008. Page 197-209. http://www.riehle.org/2008/03/14/the-total-growth-of-open-source/
  • 5. Source Code Growth in Open Source SLoC = source lines of code
  • 6. Model of Source Code Growth where, y: Total open source lines of code x: Time from Jan 1995 to Dec 2006 in months 0.964 y = 2E+06*e 0.0464x Lower bound 0.961 y = 784098*e 0.0555x Upper bound R-square value Model Approach
  • 7. Project Growth in Open Source
  • 8. Model of Project Growth where, y: Total number of open source projects x: Time from Jan 1995 to Dec 2006 in months 0.956 y = 7.1511e 0.0499x R-square value Model
  • 9.
  • 10. Data Mining for Fun and Profit Oliver Arafat, Amit Deshpande, Philipp Hofmann, Dirk Riehle. http://www.riehle.org/publications/
  • 11.
  • 12.
  • 13. Open Source Analytics Tool Chain
  • 14. Efficiently Estimating Commit Sizes Philipp Hofmann, Dirk Riehle. “Estimating Commit Sizes Efficiently.” In Proceedings of the 5th International Conference on Open Source Systems (OSS 2009). Springer Verlag, 2009. Forthcoming. http://www.riehle.org/2009/02/11/estimating-commit-sizes-efficiently/
  • 15.
  • 16. What Diff Does 4,5c4,6 < d < f --- > e > e > e 7a9 > j 9d10 < n a b c e e e g h j m a b c d f g h m n 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: diff a.txt b.txt b.txt a.txt
  • 17.
  • 18.
  • 19. Garden Variety of Heuristics 5.44 0 Linear Estimation 7 40.35 -5.95 Ldiff 6 30.87 -3.06 GNU diff –d 5 19.55 -1.96 GNU diff 4 7.68 -0.27 Bounds Mean 3 6.39 -4.41 Upper Bound 2 16.64 3.86 Lower Bound 1 Error Standard Deviation Error Mean Approach
  • 20. Visual Comparison of Heuristics
  • 21.
  • 22. The Commit Size Distribution of Open Source Oliver Arafat, Dirk Riehle. “The Commit Size Distribution of Open Source Software.” In Proceedings of the 42nd Hawaiian International Conference on System Science (HICSS-42). IEEE Press: 2009. Page 1-8. http://www.riehle.org/2008/09/23/ the-commit-size-distribution-of-open-source-software/
  • 23. The Overall Commit Size Distribution
  • 24. The Dominance of Small Commits
  • 25.
  • 26. Developer Activity in Open Source Software Projects Dirk Riehle, Oliver Arafat, Amit Deshpande. “Developer Activity in Open Source Software Projects.” In preparation. Amit Deshpande, Dirk Riehle. “Continuous Integration in Open Source Software Development.” In  Proceedings of the Fourth Conference on Open Source Systems  (OSS 2008). Springer Verlag, 2008. Page 273-280. http://www.riehle.org/2008/03/08/ continuous-integration-in-open-source-software-development/
  • 29.
  • 30. The Commenting Practice of Open Source Oliver Arafat, Dirk Riehle. “The Comment Density of Open Source Software Code.” In  Companion to Proceedings of the 31st International Conference on Software Engineering  (ICSE 2009). IEEE Press, 2009: Forthcoming. http://www.riehle.org/2009/02/04/ the-comment-density-of-open-source-software-code/
  • 31.
  • 32. Comment Density by Programming Language 273 7% 10% Perl 6. 534 8% 11% Python 5. 276 9% 16% Javascript 4. 1621 8% 18% C/C++ 3. 559 12% 22% php 2. 1085 11% 26% Java 1. Population Size Stddev [%] Average [%] Language #
  • 33. Comment Density by Commit Size
  • 34. Comment Density by Team Size
  • 35. Comment Density by Project Age
  • 36.
  • 37. Team Size Evolution in Open Source Projects Philipp Hofmann, Dirk Riehle. “Team Size Evolution in Open Source Software Projects.” In preparation.
  • 39.
  • 40.
  • 41. Thank you! dirk@riehle.org, www.riehle.org, twitter.com/driehle Comments are welcome! ‏