1. Social Interactions around
Cross-System Bug Fixings:
The Case of
FreeBSD and OpenBSD
Gerardo Canfora, Luigi Cerulo,
Marta Cimitile, Massimiliano Di Penta
dipenta@unisannio.it
2. Context
ï§ Source code is often reused across different systems
ï· Unixes (FreeBSD, OpenBSD, Linux)
ï· Office applications (NeoOffice, OpenOffice)
ï· Desktop environment apps (KDE or GNOME apps)
ï§ Maintenance might require to propagate bug fixings
ï· We call this âCross System Bug Fixingâ (CSBF)
ï§ Example:
ï· FreeBSD, 1996/01/19, file ip_icmp.h:
â âAdded definitions for ICMP router discovery. Reviewed by:
wollman
ï· OpenBSD, 1996/08/02, file ip_icmp.h:
â âICMP Router Discovery definitions; from FreeBSDâ
3. What we propose
ï§ A method to track CSBFs
ï§ A study on the social characteristics
and development activity made by
CSBF committers
ï· degree, betweenness, brokerage
ï· commits, lines changed
4. Detecting CSBF - I
ï§ Step 1: mining cross-referencing commits
ï· openbsd, atphy.c,2008/09/25 20:47:16,brad,
Add a driver for the Attansic F1 PHY. From FreeBSD via
kevlo@
ï§ Step 2: mine commits previously performed on files
with same name in the other system
ï· freebsd,atphy.c,2008/05/19 01:12:10,yongari,
Add Attansic/Atheros F1 PHY driver.
ï· openbsd, atphy.c,2008/09/25 20:47:16,brad,
Add a driver for the Attansic F1 PHY. From FreeBSD via
kevlo@
5. Detecting CSBF - II
ï§ Step 3: compute file similarity with clone detection
ï· CCFinder
ï· Threshold: at least 10% of cloned lines
ï§ Step 4: take the previous change with the highest
textual similarity in the commit note
ï· Use of Vector Space models
ï· Cosine similarity; threshold (0.20) to filter out unrelated
commits
Add Attansic/Atheros F1 PHY driver.
= 0.72
Add a driver for the Attansic F1 PHY. From FreeBSD via kevlo@
6. Building Committers' Network
ï§ We extract communication from mailing
lists
ï· Bug fixing mailing lists
ï§ Heuristic similar to the one of Bird et al.
[2006] to map inconsistent namings /
emails
ï· Also, to map committer Ids to mailing list
names/emails
ï§ Nodes of the network labeled as:
ï· Committer / other mailing list contributors
ï· CSBFs committer
7. Empirical Study
ï§ Goal: analyze the phenomenon of CSBFs
ï§ Purpose: understanding its relevance with
respect to the social characteristics of the
involved developers
ï§ Context: CVS repositories and mailing lists
archives of FreeBSD and OpenBSD
ï· Period: 1993-2009 (FreeBSD), 1998-2009
(OpenBSD)
ï· Commits: 119,000 (FreeBSD), 70,000 (OpenBSD)
8. Research Questions
ï§ RQ1: How do the source code committers
and contributors of the two systems
overlap?
ï§ RQ2: How frequent is the phenomenon of
CSBFs?
ï§ RQ3: Who are the contributors involved in
CSBFs?
ï§ RQ4: Are mailing list contributors involved
in CSBFs more active than others?
9. RQ1 â Team overlap
FreeBSD OpenBSD Both
Committers 383 211 26
Mailing list contribs 8035 3843 359
Committers and 213 122 17
mailing list contributors
The two projects have less than 10% of
common contributors â
the development team of Free and
Open BSD is really different
10. RQ2 â Commit filtering
1000 933
900
800
700
600
500 439
400
296
300
200 133 120
100
59
0
FreeBSD OpenBSD
Referring commits Cloned files Linked commits
At the end of the filtering not that many but...
11. RQ2 â Cloned lines in CSBF files
C source files header files
ï§ Percentage smaller for .h files
ï§ Use of preprocessor conditional to make header files system-
dependent
ï· #if defined(__FreeBSD__)
13. RQ3: social characteristics
ï§ Importance in terms of
ï· (in/out) degree: number of (incoming/outcoming)
communication links
ï· Betweenness: number of communications for which the
node is in the short path
ï§ Brokerage metrics: useful to analyze the
communication between two clusters
B is a coordinator
B is a gatekeeper
B is a representative
14. RQ3 â social characteristics
Representative
Gatekeeper
12
Coordinator /10
10
Betweenness / 1000
8
Out-degree
Column 1
6
In-degree Column 2
Column 3
4
Degree
2 0 5 10 15 20 25 30 35 40 45 50
0
Row 1 CSBF
Row 2 Others
Row 3 Row 4
ï§ All differences statistically significant
ï§ High effect size (Cohen d>1)
ï§ Contributors involved in CSBF have a higher importance in
the communication and in the flow of communication
between systems
16. RQ4 â change activity of CSBF
committers and others
LOC added/removed Commits
40000 1500
1000
20000
500
0 0
FreeBSD OpenBSD FreeBSD OpenBSD
CSBF Others CSBF Others
ï§ All differences statistically significant
ï§ High effect size (Cohen dâŒ1)
ï§ Contributors involved in CSBF are more active
than others
17. Conclusions and Work-in-Progress
ï§ We proposed method to mine CSBF
ï§ We reported a study on FreeBSD and OpenBSD where:
ï· Development team is almost disjoint
ï· There is a small, though not negligible portion of CSBF
ï· Committers involved in CSBF have
â Higher social importance
â Higher brokerage level
â Higher activity in source code commits
ï§ Work-in-progress:
ï· Better approaches to identify implicit CSBF, tracking and
linking changes occurring on both systems
ï· More extensive study on less obvious cases