This document discusses ecological studies of open source software ecosystems. It describes research on the GNOME and R ecosystems. The research aims to determine factors that influence the success of open source projects within an ecosystem by drawing analogies to biological ecology. Studies of the GNOME ecosystem examined contributor migrations between projects over time and project clustering. Studies of the R ecosystem analyzed its package dependency network.
130918 maelick claes - ecological studies of open source software ecosystems
1. Ecological Studies of Open Source Software Ecosystems
Empirical case studies with Gnome and R
Ma¨lick Claes, Tom Mens
e
Software Engineering Lab, Computer Science Department
Faculty of Science, University of Mons
18th September 2013
2. Research context
Ecosystems
Gnome studies
R studies
1 Research context
2 Ecosystems
3 Gnome studies
4 R studies
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
2 / 23
3. Research context
Ecosystems
Gnome studies
R studies
ECOS (bit.ly/ecos-project)
Interdisciplinary project called “Ecological Studies of Open Source
Software Ecosystems” at the University of Mons (Belgium)
Tom Mens - Software Engineering Lab
Philippe Grosjeans - Numerical Ecology of Aquatic Systems Lab
(ECONUM)
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
3 / 23
4. Research context
Ecosystems
Gnome studies
R studies
Long-term goals
How far can we drive the analogy between natural and software
ecosystems?
Determine the main factors that drive the success or failure of OSS
projects within their ecosystem
Investigate new techniques and mechanisms to predict and/or
improve survivability of OSS projects
Inspired by research in biological ecology
Use these insights to help
the developer community to improve upon their practices
companies and users to compare and adopt OSS projects
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
4 / 23
5. Research context
Ecosystems
Gnome studies
R studies
1 Research context
2 Ecosystems
3 Gnome studies
4 R studies
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
5 / 23
6. Research context
Ecosystems
Gnome studies
R studies
Biological Ecosystem
Ecosystem
Example: coral reef
Physical and biological components of an
environment considered in relation to each
other as a unit
High biodiversity:
polyps, sea
anemones, fish,
mollusks, sponges,
algae
combines all living organisms (plants,
animals, micro-organisms) and
physical components (light, water,
soil, rocks, minerals)
Ecology
Scientific study of the interactions that
determine the distribution and abundance
of organisms
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
6 / 23
7. Research context
Ecosystems
Gnome studies
R studies
Software Ecosystem
Business-oriented software ecosystem
“a set of actors functioning as a unit and interacting with a shared market
for software and services, together with the relationships among them.”
(Jansen et al. 2009)
Examples
“App Stores” (Android, iOS)
Ma¨lick Claes (UMONS)
e
Eclipse platform & plugins
Ecological Studies of Open Source Software Ecosystems
2013/09/18
7 / 23
8. Research context
Ecosystems
Gnome studies
R studies
Software Ecosystem
Development-centric view
“a collection of software products that have some given degree of
symbiotic relationships.” (MesserschmiK & Szyperski 2003)
“a collection of software projects that are developed and evolve
together in the same environment.” (Lungu 2008)
Examples
GNOME &
KDE
Linux distributions (
Debian,
Ubuntu)
’s CRAN and others (CPAN, CTAN, . . . )
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
8 / 23
9. Research context
Ecosystems
Gnome studies
R studies
Biological and Software Ecosystems in Summary
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
9 / 23
10. Research context
Ecosystems
Gnome studies
R studies
Biological and Software Ecosystems in Summary
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
9 / 23
11. Research context
Ecosystems
Gnome studies
R studies
1 Research context
2 Ecosystems
3 Gnome studies
4 R studies
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
10 / 23
12. Research context
Ecosystems
Gnome studies
R studies
Reticulate evolution
Darwinian evolution cannot
Scleractinian coral polyps
always explain evolution of some
species: causes are not always
related to natural selection
The evolution tree of life:
acyclic graph
Reticulation: hybrid speciation,
horizontal gene transfer
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
11 / 23
13. Research context
Ecosystems
Gnome studies
R studies
Migrations
Initial motivation
Horizontal gene transfer between projects?
Genotype of a project: contributors
Phenotype of a project: code
Can we make a parallel between code duplication and contributors?
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
12 / 23
14. Research context
Ecosystems
Gnome studies
R studies
Migrations
Initial motivation
Horizontal gene transfer between projects?
Genotype of a project: contributors
Phenotype of a project: code
Can we make a parallel between code duplication and contributors?
Questions
Do joiners come from other GNOME projects or from outside the
ecosystem?
Do leavers tend to stay within other GNOME projects?
Does migration patterns change over time?
Do some projects attract or loose more contributors than others?
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
12 / 23
15. Research context
Ecosystems
Gnome studies
R studies
Migrations
GNOME git repositories
16 years of history (1997 to 2012)
1,418 projects (stored in git repositories)
1,315,997 commits
11,094 identities, 5,923 distinct persons after identity merging
Metrics for 6-month periods
Local joiners
Global joiners
Local leavers
Global leavers
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
13 / 23
16. Research context
Ecosystems
Gnome studies
R studies
Migrations
Local vs. global trend
−20
−10
0
10
Difference between global and local joiners
gimp
evolution
gtk+
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
Time
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
14 / 23
17. Research context
Ecosystems
Gnome studies
R studies
Migrations
Local vs. global trend
−20
−15
−10
−5
0
5
10
Difference between global and local leavers
gimp
evolution
gtk+
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
Time
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
15 / 23
18. Research context
Ecosystems
Gnome studies
R studies
Migrations
Collaboration factor
CF (p) = Collaboration factor for project p = percentage of coders in p
having contributed to other GNOME projects
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
16 / 23
19. Research context
Ecosystems
Gnome studies
R studies
Migrations
Collaboration factor
CF (p) = Collaboration factor for project p = percentage of coders in p
having contributed to other GNOME projects
10
Difference between global and local joiners
0
GIMP = 65.3%
−20
GTK+ = 94.8%
−10
Evolution = 85.1%
gimp
evolution
gtk+
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
Time
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
16 / 23
21. Research context
Ecosystems
Gnome studies
R studies
Project clustering
Hierarchical clustering
Distance between two projects represents the similarity of their community
and the intensity of their members
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
18 / 23
22. Research context
gnome.boxes
Gnome studies
Ecosystems
gupnp.tools
gupnp.av
gupnp.vala
muine
monkey.media
gssdp
gupnp
Project clustering
Hierarchical clustering
postr
contact.lookup.applet
devilspie
librest
accounts.dialog
R studies
gnome.vfs
tasks
libsocialweb
Distance between two projects represents the similarity of their community
sound.juicer
pygobject
pygtk
and the intensity of their members
flow
gnome.admin.tools
system.tools.backends
Python clustering
gazpacho
gnoetry.extra
gnome.chart
pygio
bonobo.python
pybank
gnome.python.extras
pygda
gnome.python.desktop
release.notes
pygoocanvas
gnome.python
mhonarc
gael
libglade
dia.newcanvas
gnorpm
fontilus
nautilus.rpm
pygnome
pyorbit
viewcvs.web
blogs.web
jhmenu
gnome.web.photo
Ma¨lick Claes (UMONS)
e
d.feet
pygtk.web
gir.repository
pygi
gnome.webkit
Ecological Studies of Open Source Software Ecosystems
drwright
gfloppy
2013/09/18
18 / 23
24. Research context
Ecosystems
Gnome studies
R studies
Project clustering
Language clustering
C●
1e+07
● C++
C/C++ Header
●
C#● ●Python
Perl
JS● ●
PHP
●
LOC
Visual
●
Basic
Lisp
●
1e+05
●Java
●
IDL
yacc
●
Ruby
Objective C
●
●
Tcl/Tk Assembly
●
●
●
lex
●ASP.Net
●
Objective C++
●Haskell
100
Ma¨lick Claes (UMONS)
e
1000
Files
Ecological Studies of Open Source Software Ecosystems
10000
2013/09/18
20 / 23
25. Research context
Ecosystems
Gnome studies
R studies
1 Research context
2 Ecosystems
3 Gnome studies
4 R studies
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
21 / 23
26. Research context
Ecosystems
Gnome studies
R studies
R Ecosystem
Open Source statistical analysis environment based on the S language
Highly used by (non computer) scientists
Modules, libraries and software installed using a package system
Comprehensive R Archive Network (CRAN):
∼ 4500 maintained packages
∼10 years of history
Very strict policy: unmaintained or buggy packages are archived
⇒ problems arises: dependency breaks, scientific studies reproducibility
Ma¨lick Claes (UMONS)
e
Ecological Studies of Open Source Software Ecosystems
2013/09/18
22 / 23