SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
On the variation and
                   specialisation of workload
                       The Gnome case
                        B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens




mardi 4 décembre 2012
Gnome as an ecosystem

                   • Ecosystem: set of interconnected projects
                   • ~ 1400 projects
                   • ~ 3000 contributors
                   • 15 years of activity

                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How does workload vary
                 across contributors?

                   • Who are they?
                   • What do they do?
                   • How do they do it?
              A partial answer by analysing the git repositories.

                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Who are the contributors?



                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Identity matching
                   • Contributors have an account per project
                        repository…
                   • … and sometimes more than one.
                   • No explicit links between the accounts,
                        need to guess them.
                   • Based on names and e-mails found in the
                        git repositories.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Identity matching (cont.)
                   •    (semi) automatic classification techniques.
                   •    Must take into account variations, abbreviations,
                        permutations, misspelling, nicknames, etc.
                   •    No perfect process: even a manualy post-checked result can
                        contain false positives and false negatives.
                   •    Since Gnome has no strict identification regulation on the
                        whole, some matches are not detectable without an extra
                        context information. Fictitious example:
                        •   Robbie Williams <robbiew@gnome.org>
                        •   Euphegenia Doubtfire <euphegenia@gmail.com>


                               Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
What do the
                        contributors do?


                         Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
13 activity types
                   • Identified by the path, name and extension
                        of the touched files.
                        • Coding : *.c, *.java, etc.
                        • Translation : *.po, etc.
                        • Testing : */test/*, etc.
                        • ...
                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How do the contributors
                contribute?


                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Metrics

                   • APTW(p,c,t) : Number of files touched by
                        the contributor c performing an activity of
                        type t in a project p.
                   • Derived metrics, by aggregation: max, sum,
                        etc.



                               Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Workload
                    600
                    500




                                                                               • 50% contributors
Number of authors

                    400




                                                                                 made < 14 changes.
                    300




                                                                               • 1 contributor made
                    200




                                                                                 185,874 changes.
                    100
                    0




                          0     2        4       6       8       10       12

                                              log(AW)
                    Université de Mons   Rapport de formation doctorale 2011   Mathieu Goeminne
  mardi 4 décembre 2012
The more things you do,
         the more things you can!
             • Correlations
              • Between the number of activity types and
                        the workload.
                   • Between the number of projects and the
                        workload.



                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Favorite activities of contributors
                     having ≥ 14 changes

                   • Most frequent
                        contributors
                        specialise in coding
                        and development
                        documentation.
                   • The other activities
                        are not subject to
                        specialisation.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Favorite activities of contributors
                      having < 14 changes

                   • Most occasional
                        contributors
                        specialise in
                        translation and
                        coding.
                   • The other activities
                        are not subject to
                        specialisation.

                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How strongly do the
                        contributor’s focus?
                • Basic measure : RATW(c,t)
                  • % of the total workload of c dedicated to t.
                  • Use of Gini as inequality index:
                    • Value in [0, 1[
                      • 0 if the workload is equally distributed.
                      • Close to 1 if the workload is
                          concentrated in few activity types.

                           Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Contributor’s focus (cont.)

         • Occasional contributors typically participate
                in a single activity type.
         • Frequent contributors typically participate
                in few activity types.




                          Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
To summarise



                        Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
What did we learn?
                   •    Most contributors are occasional and are involved
                        in only one activity type; few are very active;
                        frequent contributors are involved in few activity
                        types.
                   •    The more things you do, the more things you can.
                   •    Occasional contributors are translators, involved
                        in many projects. Frequent contributors are
                        coders and are involved in few projects.
                   •    And more again in our paper.

                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
How did we do it?
                   • Contributor matching: semi-automatic
                        and automatic methods.
                   • Activity identification based on file
                        path/name/extension rules.
                   • Advanced statistical analysis (among
                        others for the partial ordering of activity
                        types).
                   • Specialisation: aggregation with inequality
                        indices.
                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
In the future
                   • Add a temporal aspect: How does the
                        contributors’ behaviour change over time?
                   • Consider subsets of Gnome: subecosystems
                        composed by projects sharing stronger
                        properties than all projects on average:
                        archived, by theme, etc.
                   • Combine both by studying migration trends.
                   •…
                             Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012
Thank you

       On the variation and specialisation of workload – A case study of the Gnome ecosystem
       community
       B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens
       Empirical Software Engineering
       Waiting for being accepted

                              Variation and specialisation of workload   Benevol 2012
mardi 4 décembre 2012

Weitere ähnliche Inhalte

Ähnlich wie On the variation and specialisation of workload : The gnome case

The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
Chiradeep Vittal
 
An empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesAn empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source Communities
Tom Mens
 
Optimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityOptimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and Scalability
Jinho Choi
 
Экосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовЭкосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектов
Evernote
 
Evolution of the blended learning environment
Evolution of the blended learning environmentEvolution of the blended learning environment
Evolution of the blended learning environment
COHERE2012
 

Ähnlich wie On the variation and specialisation of workload : The gnome case (20)

An Analytics Toolkit Tour
An Analytics Toolkit TourAn Analytics Toolkit Tour
An Analytics Toolkit Tour
 
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
The Future of Apache CloudStack (Not So Cloudy) (Collab 2012)
 
An empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source CommunitiesAn empirical study on the Specialisation Effect in Open Source Communities
An empirical study on the Specialisation Effect in Open Source Communities
 
DockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operationsDockerCon US 2016 - Scaling Open Source operations
DockerCon US 2016 - Scaling Open Source operations
 
Chapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagiChapter7 simulation handbook_nohanagi
Chapter7 simulation handbook_nohanagi
 
How to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePointHow to make users fall in love with metadata in SharePoint
How to make users fall in love with metadata in SharePoint
 
Optimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and ScalabilityOptimization of NLP Components for Robustness and Scalability
Optimization of NLP Components for Robustness and Scalability
 
Domain Driven Design (DDD)
Domain Driven Design (DDD)Domain Driven Design (DDD)
Domain Driven Design (DDD)
 
Экосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектовЭкосистема Evernote и каналы продвижения партнерских проектов
Экосистема Evernote и каналы продвижения партнерских проектов
 
Jeeves -natural language interface application
Jeeves -natural language interface applicationJeeves -natural language interface application
Jeeves -natural language interface application
 
The Upgrade Toolkit
The Upgrade ToolkitThe Upgrade Toolkit
The Upgrade Toolkit
 
VisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's NewVisibleThread for Docs 2.13 - What's New
VisibleThread for Docs 2.13 - What's New
 
TERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developmentsTERMINALFOUR t44u 2012 - Support and Extranet developments
TERMINALFOUR t44u 2012 - Support and Extranet developments
 
How to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody lovesHow to build SharePoint applications that everybody loves
How to build SharePoint applications that everybody loves
 
Editing: It's not as easy as it looks
Editing: It's not as easy as it looksEditing: It's not as easy as it looks
Editing: It's not as easy as it looks
 
The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016The Open Source Way - @JBCNConf Closing Keynote 2016
The Open Source Way - @JBCNConf Closing Keynote 2016
 
Evolution of the blended learning environment
Evolution of the blended learning environmentEvolution of the blended learning environment
Evolution of the blended learning environment
 
A study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven DevelopmentA study of the characteristics of Behaviour Driven Development
A study of the characteristics of Behaviour Driven Development
 
Agile Architecture
Agile ArchitectureAgile Architecture
Agile Architecture
 
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
Domain Driven Design Ruby Ways -  JURNAL 05/10/2017Domain Driven Design Ruby Ways -  JURNAL 05/10/2017
Domain Driven Design Ruby Ways - JURNAL 05/10/2017
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

On the variation and specialisation of workload : The gnome case

  • 1. On the variation and specialisation of workload The Gnome case B.Vasilescu, A. Serebrenik, M. Goeminne, T. Mens mardi 4 décembre 2012
  • 2. Gnome as an ecosystem • Ecosystem: set of interconnected projects • ~ 1400 projects • ~ 3000 contributors • 15 years of activity Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 3. How does workload vary across contributors? • Who are they? • What do they do? • How do they do it? A partial answer by analysing the git repositories. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 4. Who are the contributors? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 5. Identity matching • Contributors have an account per project repository… • … and sometimes more than one. • No explicit links between the accounts, need to guess them. • Based on names and e-mails found in the git repositories. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 6. Identity matching (cont.) • (semi) automatic classification techniques. • Must take into account variations, abbreviations, permutations, misspelling, nicknames, etc. • No perfect process: even a manualy post-checked result can contain false positives and false negatives. • Since Gnome has no strict identification regulation on the whole, some matches are not detectable without an extra context information. Fictitious example: • Robbie Williams <robbiew@gnome.org> • Euphegenia Doubtfire <euphegenia@gmail.com> Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 7. What do the contributors do? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 8. 13 activity types • Identified by the path, name and extension of the touched files. • Coding : *.c, *.java, etc. • Translation : *.po, etc. • Testing : */test/*, etc. • ... Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 9. How do the contributors contribute? Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 10. Metrics • APTW(p,c,t) : Number of files touched by the contributor c performing an activity of type t in a project p. • Derived metrics, by aggregation: max, sum, etc. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 11. Workload 600 500 • 50% contributors Number of authors 400 made < 14 changes. 300 • 1 contributor made 200 185,874 changes. 100 0 0 2 4 6 8 10 12 log(AW) Université de Mons Rapport de formation doctorale 2011 Mathieu Goeminne mardi 4 décembre 2012
  • 12. The more things you do, the more things you can! • Correlations • Between the number of activity types and the workload. • Between the number of projects and the workload. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 13. Favorite activities of contributors having ≥ 14 changes • Most frequent contributors specialise in coding and development documentation. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 14. Favorite activities of contributors having < 14 changes • Most occasional contributors specialise in translation and coding. • The other activities are not subject to specialisation. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 15. How strongly do the contributor’s focus? • Basic measure : RATW(c,t) • % of the total workload of c dedicated to t. • Use of Gini as inequality index: • Value in [0, 1[ • 0 if the workload is equally distributed. • Close to 1 if the workload is concentrated in few activity types. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 16. Contributor’s focus (cont.) • Occasional contributors typically participate in a single activity type. • Frequent contributors typically participate in few activity types. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 17. To summarise Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 18. What did we learn? • Most contributors are occasional and are involved in only one activity type; few are very active; frequent contributors are involved in few activity types. • The more things you do, the more things you can. • Occasional contributors are translators, involved in many projects. Frequent contributors are coders and are involved in few projects. • And more again in our paper. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 19. How did we do it? • Contributor matching: semi-automatic and automatic methods. • Activity identification based on file path/name/extension rules. • Advanced statistical analysis (among others for the partial ordering of activity types). • Specialisation: aggregation with inequality indices. Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 20. In the future • Add a temporal aspect: How does the contributors’ behaviour change over time? • Consider subsets of Gnome: subecosystems composed by projects sharing stronger properties than all projects on average: archived, by theme, etc. • Combine both by studying migration trends. •… Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012
  • 21. Thank you On the variation and specialisation of workload – A case study of the Gnome ecosystem community B. Vasilescu, A. Serebrenik, M. Goeminne, T. Mens Empirical Software Engineering Waiting for being accepted Variation and specialisation of workload Benevol 2012 mardi 4 décembre 2012