- The document describes a research project that analyzes the co-evolution of code and database schema in a data-intensive software system called OSCAR, an open-source electronic medical record system.
- The researchers studied how the relationship between code and database files evolved over time, how the introduction of persistence technologies like Hibernate and JPA affected the system, and how developer contributions changed.
- Analysis of the OSCAR repository showed changes in the proportions of code file types, increasing numbers of developers over time, and shifts in how developers worked on code and database-related files after Hibernate and JPA were introduced.
Microteaching on terms used in filtration .Pharmaceutical Engineering
Co-Evolving Changes in a Data-Intensive Software System
1. Co-‐Evolving
Changes
in
a
Data-‐Intensive
So3ware
System
Mathieu
Goeminne,
Alexandre
Decan,
Tom
Mens
Service
de
Génie
Logiciel,
Université
de
Mons
hDp://informaIque.umons.ac.be/genlog/projects/disse
EOSESE
2014
European
Open
Symposium
on
Empirical
So6ware
Engineering
-‐
July
2014
2. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Context
• FNRS
Projet
de
Recherche
“Data-‐Intensive
So3ware
System
EvoluIon”
– Interuniversity
collaboraIon
with
Anthony
Cleve
and
Loup
Meurice
(University
of
Namur)
• Overall
goal
– Expand
empirical
MSR
research
to
include
database-‐related
acIviIes
– Analyse
and
support
co-‐evoluIon
between
program
code
and
database
(schema)
in
data-‐intensive
so3ware
systems
• Approach
– Develop
generic
framework
– Implement
dedicated
analysis
and
visualisaIon
tools
– Carry
out
empirical
case
studies
2
3. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Research
QuesIons
• Code-‐centric
focus
– RQ1:
How
does
the
relaIon
between
how
source
code
files
and
database-‐related
files
evolve?
– RQ2:
What’s
the
effect
of
introducing
a
parIcular
persistency
technology
(Hibernate,
JPA)?
• Social
focus
– RQ3:
How
do
developers
divide
their
work
and
how
does
this
evolve
over
Ime?
3
4. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Case
Study
4
characteris/c value
duraIon 3,939
days
(
>
129
months)
dates from
Nov
2002
Ill
Aug
2013
number
of
commits 18,727
number
of
disInct
files 20,718
(of
which
54%
code
files)
number
of
file
touches 93,721
number
of
disInct
developers 100
Official
repository hDps://github.com/scoophealth/oscar.git
A
Java
Electronical
Medical
Record
system
5. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
EvoluIon
of
OSCAR
Code
dimension
• 3-‐Ier
web
applicaIon
wriDen
in
Java
and
JSP
5
0%#
10%#
20%#
30%#
40%#
50%#
60%#
70%#
80%#
90%#
100%#
2003-07#
2004-01#
2004-07#
2005-01#
2005-07#
2006-01#
2006-07#
2007-01#
2007-07#
2008-01#
2008-07#
2009-01#
2009-07#
2010-01#
2010-07#
2011-01#
2011-07#
2012-01#
2012-07#
2013-01#
jsp#
java#
Monthly
aggregated
proporIon
of
JSP
and
Java
files
6. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
EvoluIon
of
OSCAR
Social
Dimension
• Monthly
number
of
disInct
acIve
developers
for
OSCAR
(a3er
idenIty
merging)
6
0"
5"
10"
15"
20"
25"
2003'07"
2004'01"
2004'07"
2005'01"
2005'07"
2006'01"
2006'07"
2007'01"
2007'07"
2008'01"
2008'07"
2009'01"
2009'07"
2010'01"
2010'07"
2011'01"
2011'07"
2012'01"
2012'07"
2013'01"
7. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
EvoluIon
of
OSCAR
Code
dimension
• Number
of
source
code
files
w.r.t.
database-‐related
files
7
0"
1000"
2000"
3000"
4000"
5000"
6000"
2003)07"
2004)01"
2004)07"
2005)01"
2005)07"
2006)01"
2006)07"
2007)01"
2007)07"
2008)01"
2008)07"
2009)01"
2009)07"
2010)01"
2010)07"
2011)01"
2011)07"
2012)01"
2012)07"
2013)01"
pure"
sql"
sql
=
code
files
containing
embedded
SQL
statements
8. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
EvoluIon
of
OSCAR
Social
Dimension
• How
does
the
acIvity
of
developers
evolve
over
Ime?
8
Developer
monthly
aggregated
number
of
file
touches
9. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Introducing
Persistency
Provider
in
OSCAR
• Hibernate
– introduced
in
OSCAR
since
July
2006
– Java
object-‐relaIonal
mapping
(ORM)
library
• XML
files
map
Java
classes
to
database
tables
and
Java
data
types
to
SQL
data
types
• facilitates
data
query
and
retrieval
• generates
SQL
calls
and
relieves
the
developer
from
manual
result
set
handling
and
object
conversion
• Java
Persistency
Architecture
(JPA)
– introduced
in
OSCAR
since
July
2008
– industry
standard
ORM
persistency
API
– Uses
Java
annotaIons
instead
of
XML
files
9
11. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Introducing
Persistency
Provider
Social
dimension
• Who
is
involved
in
introducing
changes
in
database-‐related
code?
11
Developer
Bubble
size
=
log(monthly
aggregated
number
of
touched
files)
12. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
EvoluIon
of
OSCAR
Social
Dimension
• How
do
developers
divide
their
work?
12
Java$(87)$ JSP$(86)$
OSCAR$developers$(100)$
3"
2"
5" 1"
6" 9"
SQL$(89)$
HIB$
31"
JPA$
16"
15" 12"
Number of OSCAR developers involved in file touches per activity type
13. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
EvoluIon
of
OSCAR
Social
Dimension
• How
do
developers
divide
their
work?
13
Number of developers that introduce database-related code
in some file for the first time
Java$(87)$ JSP$(86)$
OSCAR$developers$(100)$
3"
24"
10" 10"
1" 0"
SQL$(53)$
HIB$
24"
JPA$
9"
8" 11"
14. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Preliminary
Conclusions
RQ1:
Code-‐related
and
database-‐related
files
evolve
together
(no
“phased”
co-‐evoluIon)
!
RQ2:
IntroducIon
of
Hibernate,
then
JPA
which
takes
over
Hibernate,
but
embedded
SQL
sIll
remains
very
important
!
!
RQ3:
No
clear
separaIon
of
acIviIes
between
developers
Majority
of
developers
changes
both
db-‐related
and
db-‐
unrelated
code
No
observed
periods
dedicated
to
a
specific
acIvity
14
15. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Future
Work
• Analyse
file
changes
at
finer
granularity
• Study
of
other
data-‐intensive
so3ware
systems
• Study
the
evoluIon
of
DISS
quality
– Unit
tests
involving
database-‐related
classes
– Revisited
modularity,
coupling,
cohesion
– Database
inconsistencies
• Study
the
evoluIon
of
social
aspects
– Are
there
disInct
sub-‐communiIes?
– How
is
the
effort
distributed
in
each
community?
Are
there
different
teamwork
paDerns
in
these
communiIes?
15
16. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
References
• M.
Goeminne,
A.
Decan,
T.
Mens,
Co-‐evolving
Code-‐
Related
and
Database-‐Related
Changes
in
a
Data-‐Intensive
So6ware
System,
CSMR-‐WCRE
2014
ERA
track
• L.
Meurice,
A.
Cleve,
DAHLIA:
A
Visual
Analyzer
of
Database
Schema
EvoluKon,
CSMR-‐WCRE
2014
Tool
Demo
• A.
Cleve,
T.
Mens,
J.-‐L.
Hainaut,
Data-‐Intensive
System
EvoluIon,
IEEE
Computer
43(8):
110-‐112
(2010)
• A.
Cleve,
M.
Gobert,
L.
Meurice,
J.
Maes,
J.
Weber,
Understanding
database
schema
evoluIon:
A
case
study,
Science
of
Computer
Programming
(2013)
16
17. A
Historical
Dataset
for
the
Gnome
Ecosystem
Mathieu
Goeminne,
Tom
Mens
Service
de
Génie
Logiciel,
Université
de
Mons
hDp://informaIque.umons.ac.be/genlog/projects/disse
EOSESE
2014
European
Open
Symposium
on
Empirical
So6ware
Engineering
-‐
July
2014
18. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Gnome
ecosystem
Historical
Dataset
• Goal
– Study
the
evoluIon
of
social
aspects
of
the
Gnome
ecosystem
(1,418
projects,
11,094
contributors,
1,315,997
commits).
• Methodology
1. Clone
the
source
code
repository
of
each
Gnome
project
2. Store
its
history
in
a
MySQL
database
3. Add
extra
informaIon
to
facilitate
empirical
studies
18
19. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Gnome
ecosystem
About
the
dataset
19
FLOSSMetrics
MySQL
datase:
hDps://bitbucket.org/mgoeminne/sgl-‐flossmetric-‐dbmerge
20. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Gnome
ecosystem
Extra
informaIon
20
• IdenIty
merging
• CSVAnalY2
hack
• Semi-‐automaIc
idenIty
merging
based
on
name
and
e-‐
mail
• 5,923
/
11,094
contributors
a3er
merging
• AcIvity
types
• Tool
for
associaIng
an
acIvity
type
(coding,
translaIon,
documentaIon,
etc.)
to
each
file.
• Regular
expressions
on
file
extension,
file
name
and
path.
23. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
Gnome
ecosystem
References
• M.
Goeminne,
M.
Claes,
and
T.
Mens.
A
historical
dataset
for
the
Gnome
ecosystem,
MSR
2013,
pp.
225–
228
hDps://bitbucket.org/mgoeminne/sgl-‐flossmetric-‐dbmerge
• B.
Vasilescu,
A.
Serebrenik,
M.
Goeminne,
and
T.
Mens.
On
the
variaKon
and
specialisaKon
of
workload
—a
case
study
of
the
Gnome
ecosystem
community,
Empirical
So3ware
Engineering,
pp.
955–1008,
2014
hDp://dx.doi.org/10.1007/s10664-‐013-‐9244-‐1
23
24. European
Open
Symposium
on
Empirical
So3ware
Engineering
—
Lille,
France,
July
2014
References
24
!
!
Evolving Software Systems
Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.)
2014, XXIII, 404 p.
!
Springer, ISBN 978-3-642-45398-4