SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
On the variation and
                   specialisation of workload
                       The Gnome case
                        B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens




mardi 4 décembre 2012
Gnome as an ecosystem

                   • Ecosystem: set of interconnected projects
                   • ~ 1400 projects
                   • ~ 3000 contributors
                   • 15 years of activity

                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How does workload vary
                 across contributors?

                   • Who are they?
                   • What do they do?
                   • How do they do it?
              A partial answer by analysing the git repositories.

                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Who are the contributors?



                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Identity matching
                   • Contributors have an account per project
                        repository…
                   • … and sometimes more than one.
                   • No explicit links between the accounts,
                        need to guess them.
                   • Based on names and e-mails found in the
                        git repositories.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Identity matching (cont.)
                   •    (semi) automatic classification techniques.
                   •    Must take into account variations, abbreviations,
                        permutations, misspelling, nicknames, etc.
                   •    No perfect process: even a manualy post-checked result can
                        contain false positives and false negatives.
                   •    Since Gnome has no strict identification regulation on the
                        whole, some matches are not detectable without an extra
                        context information. Fictitious example:
                        •   Robbie Williams <robbiew@gnome.org>
                        •   Euphegenia Doubtfire <euphegenia@gmail.com>


                               Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
What do the
                        contributors do?


                         Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
13 activity types
                   • Identified by the path, name and extension
                        of the touched files.
                        • Coding : *.c, *.java, etc.
                        • Translation : *.po, etc.
                        • Testing : */test/*, etc.
                        • ...
                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How do the contributors
                contribute?


                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Metrics

                   • APTW(p,c,t) : Number of files touched by
                        the contributor c performing an activity of
                        type t in a project p.
                   • Derived metrics, by aggregation: max, sum,
                        etc.



                               Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Workload
                    600
                    500




                                                                               • 50% contributors
Number of authors

                    400




                                                                                 made < 14 changes.
                    300




                                                                               • 1 contributor made
                    200




                                                                                 185,874 changes.
                    100
                    0




                          0     2        4       6       8       10       12

                                              log(AW)
                    Université de Mons   Rapport de formation doctorale 2011   Mathieu Goeminne
  mardi 4 décembre 2012
The more things you do,
         the more things you can!
             • Correlations
              • Between the number of activity types and
                        the workload.
                   • Between the number of projects and the
                        workload.



                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Favorite activities of contributors
                     having ≥ 14 changes

                   • Most frequent
                        contributors
                        specialise in coding
                        and development
                        documentation.
                   • The other activities
                        are not subject to
                        specialisation.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Favorite activities of contributors
                      having < 14 changes

                   • Most occasional
                        contributors
                        specialise in
                        translation and
                        coding.
                   • The other activities
                        are not subject to
                        specialisation.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How strongly do the
                        contributor’s focus?
                • Basic measure : RATW(c,t)
                  • % of the total workload of c dedicated to t.
                  • Use of Gini as inequality index:
                    • Value in [0, 1[
                      • 0 if the workload is equally distributed.
                      • Close to 1 if the workload is
                          concentrated in few activity types.

                           Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Contributor’s focus (cont.)

         • Occasional contributors typically participate
                in a single activity type.
         • Frequent contributors typically participate
                in few activity types.




                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
To summarise



                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
What did we learn?
                   •    Most contributors are occasional and are involved
                        in only one activity type; few are very active;
                        frequent contributors are involved in few activity
                        types.
                   •    The more things you do, the more things you can.
                   •    Occasional contributors are translators, involved
                        in many projects. Frequent contributors are
                        coders and are involved in few projects.
                   •    And more again in our paper.

                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How did we do it?
                   • Contributor matching: semi-automatic
                        and automatic methods.
                   • Activity identification based on file
                        path/name/extension rules.
                   • Advanced statistical analysis (among
                        others for the partial ordering of activity
                        types).
                   • Specialisation: aggregation with inequality
                        indices.
                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
In the future
                   • Add a temporal aspect: How does the
                        contributors’ behaviour change over time?
                   • Consider subsets of Gnome: subecosystems
                        composed by projects sharing stronger
                        properties than all projects on average:
                        archived, by theme, etc.
                   • Combine both by studying migration trends.
                   •…
                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Thank you

       On the variation and specialisation of workload – A case study of the Gnome ecosystem
       community
       B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens
       Empirical Software Engineering
       Waiting for being accepted

                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012

Weitere ähnliche Inhalte

Ähnlich wie On the variation and specialisation of workload : The gnome case

The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
Chiradeep Vittal
 
An empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesAn empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source Communities
Tom Mens
 
Optimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityOptimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and Scalability
Jinho Choi
 
Экосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовЭкосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектов
Evernote
 
Evolution of the blended learning environment
Evolution of the blended learning environmentEvolution of the blended learning environment
Evolution of the blended learning environment
COHERE2012
 

Ähnlich wie On the variation and specialisation of workload : The gnome case (20)

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit Tour
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
An empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesAn empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source Communities
 
DockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsDockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operations
 
Chapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiChapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagi
 
How to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePointHow to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePoint
 
Optimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityOptimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and Scalability
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)
 
Экосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовЭкосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектов
 
Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface application
 
The Upgrade Toolkit
The Upgrade ToolkitThe Upgrade Toolkit
The Upgrade Toolkit
 
VisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's NewVisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's New
 
TERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developmentsTERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developments
 
How to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody lovesHow to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody loves
 
Editing: It's not as easy as it looks
Editing: It's not as easy as it looksEditing: It's not as easy as it looks
Editing: It's not as easy as it looks
 
The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016
 
Evolution of the blended learning environment
Evolution of the blended learning environmentEvolution of the blended learning environment
Evolution of the blended learning environment
 
A study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentA study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven Development
 
Agile Architecture
Agile ArchitectureAgile Architecture
Agile Architecture
 
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
Domain Driven Design Ruby Ways -  JURNAL 05/10/2017Domain Driven Design Ruby Ways -  JURNAL 05/10/2017
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

On the variation and specialisation of workload : The gnome case

  • 1. On the variation and specialisation of workload The Gnome case B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens mardi 4 décembre 2012
  • 2. Gnome as an ecosystem • Ecosystem: set of interconnected projects • ~ 1400 projects • ~ 3000 contributors • 15 years of activity Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 3. How does workload vary across contributors? • Who are they? • What do they do? • How do they do it? A partial answer by analysing the git repositories. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 4. Who are the contributors? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 5. Identity matching • Contributors have an account per project repository… • … and sometimes more than one. • No explicit links between the accounts, need to guess them. • Based on names and e-mails found in the git repositories. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 6. Identity matching (cont.) • (semi) automatic classification techniques. • Must take into account variations, abbreviations, permutations, misspelling, nicknames, etc. • No perfect process: even a manualy post-checked result can contain false positives and false negatives. • Since Gnome has no strict identification regulation on the whole, some matches are not detectable without an extra context information. Fictitious example: • Robbie Williams <robbiew@gnome.org> • Euphegenia Doubtfire <euphegenia@gmail.com> Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 7. What do the contributors do? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 8. 13 activity types • Identified by the path, name and extension of the touched files. • Coding : *.c, *.java, etc. • Translation : *.po, etc. • Testing : */test/*, etc. • ... Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 9. How do the contributors contribute? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 10. Metrics • APTW(p,c,t) : Number of files touched by the contributor c performing an activity of type t in a project p. • Derived metrics, by aggregation: max, sum, etc. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 11. Workload 600 500 • 50% contributors Number of authors 400 made < 14 changes. 300 • 1 contributor made 200 185,874 changes. 100 0 0 2 4 6 8 10 12 log(AW) Université de Mons Rapport de formation doctorale 2011 Mathieu Goeminne mardi 4 décembre 2012
  • 12. The more things you do, the more things you can! • Correlations • Between the number of activity types and the workload. • Between the number of projects and the workload. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 13. Favorite activities of contributors having ≥ 14 changes • Most frequent contributors specialise in coding and development documentation. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 14. Favorite activities of contributors having < 14 changes • Most occasional contributors specialise in translation and coding. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 15. How strongly do the contributor’s focus? • Basic measure : RATW(c,t) • % of the total workload of c dedicated to t. • Use of Gini as inequality index: • Value in [0, 1[ • 0 if the workload is equally distributed. • Close to 1 if the workload is concentrated in few activity types. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 16. Contributor’s focus (cont.) • Occasional contributors typically participate in a single activity type. • Frequent contributors typically participate in few activity types. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 17. To summarise Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 18. What did we learn? • Most contributors are occasional and are involved in only one activity type; few are very active; frequent contributors are involved in few activity types. • The more things you do, the more things you can. • Occasional contributors are translators, involved in many projects. Frequent contributors are coders and are involved in few projects. • And more again in our paper. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 19. How did we do it? • Contributor matching: semi-automatic and automatic methods. • Activity identification based on file path/name/extension rules. • Advanced statistical analysis (among others for the partial ordering of activity types). • Specialisation: aggregation with inequality indices. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 20. In the future • Add a temporal aspect: How does the contributors’ behaviour change over time? • Consider subsets of Gnome: subecosystems composed by projects sharing stronger properties than all projects on average: archived, by theme, etc. • Combine both by studying migration trends. •… Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 21. Thank you On the variation and specialisation of workload – A case study of the Gnome ecosystem community B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens Empirical Software Engineering Waiting for being accepted Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012