Of Bugs and Men

Of Bugs and Men
(and Plugins too)

Michel Wermelinger, Yijun Yu
The Open University, UK

Markus Strohmaier
Technical University Graz, Austria

Plugins

Working Conf. on
Mining Softw. Repositories 2008
Int’l Conf. on Softw. Maintenance 2008

Motivation & Method

What is the validity, generality and usefulness
of design principles?
Study long-term evolution
Study architectural evolution
Study complex systems
Case study: Eclipse
modern CBS with reusable, extensible
components

Eclipse

Static dependency: X depends on Y
Dynamic dependency:
X uses extension points provided by Y
Self-cycles possible
We analysed whole Eclipse SDK (JDT, PDE, etc)

Eclipse releases

Various types of releases
Major (e.g. 3.1) and maintenance releases (e.g. 3.1.1)
Milestones (3.2M1) and Release candidates (3.2RC1)
Maintenance of current major release in parallel with milestones
and release candidates of next one
We analysed
20 major and maintenance releases over 6 years (1.0 to
3.3.1.1)‫‏‬
27 milestone and release candidates over 2 years (3.1 to 3.3)‫‏‬
grouped in 2 sequences: 1.0 – 3.1 and 3.1 – 3.3.1.1

Data processing (concrete)
product process
repository graphviz repository
guess OpenOffice
release ccvisu Bug
1.0
bug.xml reports
graphs histograms 1-100

. plugins status
Architecture Bug Info. .
. depend- priority
Extractors
severity
Extractors .
. encies RSF
RSF .
AWK, XSLT XSLT
CrocoPat
release
graphs Bug
3.3.1.1 plugin.xml graphviz bug.xml reports
manifest.mf + spectrum
visualisations
Trace2PNG 219901-220000

time time

Some Research Questions

Is there continuous growth (Lehman’s 6th
law)?
Is there any pattern (e.g. superlinear growth)?
Does complexity increase (Lehman’s 2nd
law)?
Is there any effort to reduce it?
Does coupling decrease?
Does cohesion increase?

Modules

A simple structural model
Module = directed graph
Elements = internal or external
Arcs = internal or external relations
External elements and arcs show context
For Eclipse SDK module
elements = plugins or external components
arcs = static and/or dynamic dependencies

Module measures

Size = # internal elements
NIP = number of internal plugins
Complexity = # internal arcs
NISD/NIDD = number of internal static/dynamic
dependencies
Cohesion = complexity / size
Coupling = # external arcs
NESD (NEDD is always zero)

Size Evolution (1)

Number of plugins kept, added, deleted w.r.t. previous release
Number kept since initial release → stable architectural core
Segmented growth
Overall 4- to 5-fold growth, but not superlinear
Many changes in 3.0; few deletions overall

Size Evolution (2)

Long equilibrium and short punctuation periods
Equilibrium: changes accommodated within current architecture
Punctuation: changes require architectural revisions
mostly in milestones
some in release candidates
hardly in maintenance

Architectural core

jdt.ui

jdt.launching

jdt.doc.isv jdt.doc.user pde.doc.user platform.doc.isv platform.doc.user help.ui pde.runtime ant.ui search compare pde.core debug.ui jdt.debug

help pde ui ant.core jdt.core debug.core

core.runtime swt core.resources

core with static and dynamic dependencies
self-cycles point to reuse of extension points
layered architecture
core is >40% of release 1.0 and ca. 10% of 3.3.1.1

Complexity Evolution

Charts show NISD (left) and NIDD (right)‫‏‬
Release 3.1 is major restructuring
Static dependencies decreased by 19%
Plugins increased by 57%
More deletions, i.e. effort to reduce complexity

Cohesion evolution (1)

Size (left) and complexity (right) grow in step
Two exceptions
Release 3.0 maintains size
Release 3.1 reduces complexity

Cohesion evolution (2)

Result: cohesion slightly decreases over time
Except for major increase during 3.0.* releases
Independently of static, dynamic, or both‫ ‏‬dependencies
Low cohesion: <3 (incoming or outgoing) dependencies per plugin
explicit effort to keep architecture loosely cohesive?

Coupling Evolution

Charts show NESD
Refactoring in 3.0:
All existing external dependencies removed via new internal
proxies
External component org.apache.xerces was removed
Overall, coupling is small compared to size and
complexity

Acyclic Dependency Principle

Dependency graph should be acyclic [Martin 96 and others]
decreases change propagation
eases release management and work allocation
Measured cycle length over joint dependency graph
Graph shows segmented growth of harmless self-cycles (length 1)‫‏‬
Single cycle with length > 1 was broken apart in release 3.0

Stable Dependency Principle

dependencies should be in direction of stability
[Martin 97]
changes propagate opposite to dependencies
if A depends on B, A can’t be harder to change than B
instability of element = fanout / (fanin + fanout)
irresponsible: fanin = 0, instability = 1, may change
independent: fanout = 0, instability = 0, no reason to
change

SDP Evolution

Charts show number of SDP violations
Absolute (left) and relative (right)‫‏‬
static, dynamic and both dependencies
Numbers kept low, with ratio tending to decrease
1-5% violations for static dependencies, 9-17% for dynamic

Changeability measures

slight adaptation of [van Belle 04]
likelihood of changing an element
# of actual changes / max possible #
impact of an element’s changes
avg # of elements changed with it
acuteness = impact / likelihood
high for interfaces, low for method bodies

Changes and Stability (1)

changes and stability are related
responsible elements: high change impact
independent elements: low change likelihood
stable elements: high change acuteness
van Belle: correlational linkage
implicit, from co-change observation
takes change propagation closure into account
Martin: causal linkage
must be given explicitly
only looks at immediate neighbours


measured fanin/fanout of the 69 plugins in release 2.0
measured impact/likelihood of same plugins over next 45
releases
normalised measures, ordered plugins by fanin and fanout
lower fanin ⇒ less responsible ⇒ lower impact: not quite so
lower fanout ⇒ less dependent ⇒ lower likelihood: somewhat


measured instability when defined (52 plugins in 2.0)‫‏‬
All but one irresponsible and independent plugins remained so over time
higher instability lower acuteness: mixed
some trend but many exceptions
likelihood vs independence is better than impact vs responsibility
static causal linkage can’t predict future correlational linkage
former only accounts for internal drives, latter includes external drives

Conclusions (1)

Successful evolution of Eclipse due to…?
systematic architectural change process
segmented growth of size and complexity
cohesion kept low; cycles removed
SDP violations and coupling reduced
significant stable layered architectural core
Some consistency between causal and
correlational changeability measures

Conclusions (2)
many design principles/guidelines proposed, but…
no empirical evidence of usefulness for maintenance
selected representative case study
large, complex, successful, component-based system
accurate architectural information + enough evolution history
generic and lightweight approach
no reverse engineering, no static code analysis
modules and changeability measures
flexible scripting tool manipulating text files with relational data
potential practical implications of findings
confirmed some laws and principles; observed some patterns
investigated static and historic changeability measures

Bugs and Men

New Ideas and Emerging Results track
of Int’l Conf. on Software Eng. 2009

Motivation

Software engineering is socio-technical activity
Global and open source software development led to
increased interest in and relevance of social aspects
Need for representing socio-technical relations
Bipartite graphs of software artefacts and people
Ad-hoc arc semantics, depending on relation
Ad-hoc flat layout, often hard to read
Relevant relations lost among many nodes and arcs
Sought improvements:
More compact, intuitive, and explicit representation
Distinguish ‘hierarchical’ importance of artefacts, people
and their relations.

General Approach

Obtain a bipartite socio-technical network
Compute socio-technical concept lattice
Apply formal concept analysis (FCA) theory
Use free tool ConExp (Concept Explorer)
Concept: clusters all artefacts associated to same
people
Hierarchy: partial ordering of clusters
Study different and evolving socio-technical
relations
Repeat for various relations and system releases

Case study

Requirements:
Should have non-trivial social and technical
structure
Should not have fluid social structure
Should provide different data sources (not just
code)
Eclipse
Has IBM lead and Bugzilla repository

The socio-technical network (1)

Build PBC network
P nodes: 16,025 people
B nodes: 101,966 Eclipse SDK bug reports
C nodes: 16 Eclipse SDK components
p-b arc: p reported/assigned to/discussed b
b-c arc: b is reported for c
Repeat for various releases and roles

The socio-technical network (2)

Build the PC network
Folding of PBC, i.e. p-c arc with weight b
person p is associated to b reports for
component c
Number of paths from p to c
Build the PC(k) network
Remove all arcs with weight < k
Remove all weight information

Formal Concept Analysis

Given objects O and attributes A and relation O × A
e.g. O = components, A = assignees
Concept c = (o ⊆ O, a ⊆ A)
each object in o has all attributes a
o is the extent and a is the intent of the concept
Hierarchy: (o, a) ≤ (o’, a’) if o ⊆ o’ (or a’ ⊆ a)
From top to bottom: extent decreases, intent increases
Socio-technical concept lattice
Usually, people at level n (bottom=0) associated to n components
‘specialists’ at lower, ‘generalists’ at upper levels
Each node includes all its ancestors’ people and all its
descendants’ components

Release 1.0, assignees, k=10
USA coordinating
2 Canadian teams?
only 4 ‘generalists’
(2 components each)

the French team

only 1 developer associated:
what if they leave project?

the Swiss team
most developers associated:
is this largest or most
complex component?

Release 3.0, assignees, k=100
only 2 ‘generalists’ Common developers:
(3 components each) highly dependent
components?

Used higher k because bug reports accumulate over time
Geographical and workload distribution like release 1.0

Release 3.0, discussants, k=100
Developers discuss more
components than they
are assigned to: due to
dependencies?

Developers don’t discuss all reports they are assigned to

Conclusions

Novel application of Formal Concept Analysis
Clustering and ordering of socio-technical relations
General tool-supported approach
Some advantages over bi-partite graphs
More scalable: not one node per person and artefact
More explicit: related people & artefacts in same node
More intuitive: uniform vertical layout & arc semantics
Helps spot expertise and potential problems
Generalist and specialist people
Artefacts with too many or too few people associated
Undesired or absent communication/coordination

Concluding conclusions

Software engineering is inherently socio-technical
endeavour
Availability of FLOSS projects allows to study
historical heterogeneous data
Used process and artefact data to present different
views on same case study
Evolution of architecture
Hierarchy of maintainers
Impact of dependencies
Opportunities for many studies, mining and
visualisation techniques that can help academics,
developers and managers

Of Bugs and Men

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Of Bugs and Men

Ähnlich wie Of Bugs and Men (20)

Mehr von Michel Wermelinger

Mehr von Michel Wermelinger (6)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Of Bugs and Men