SlideShare ist ein Scribd-Unternehmen logo
1 von 33
Downloaden Sie, um offline zu lesen
Model-based Mining of 
Source Code Repositories 
Markus Scheidgen, Joachim Fischer 
{scheidge,fischer}@informatik.hu-berlin.de 
1 
Tuesday, 30. September 2014
Agenda 
▶ Mining Software Repositories (MSR) and Software Evolution 
Research (SER) 
▶ srcrepo – a model-based MSR system 
■ components and mining process 
■ distributed storage of very large models with emf-fragments 
■ a meta-model for source code repositories 
■ gathering software metrics with an OCL-like internal Scala DSL 
▶ work in progress - discussion of remaining problems and 
limitations 
2 
Tuesday, 30. September 2014
Relevant Research Fields 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
3 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
srcrepo – Components and Process 
4 
Tuesday, 30. September 2014
srcrepo – Components and Process 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
Tuesday, 30. September 2014
srcrepo – Components and Process 
revision tree 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
srcrepo storage 
(EMF-models via EMF-Fragments, 
e.g. on mongodb) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
Tuesday, 30. September 2014
srcrepo – Components and Process 
revision tree 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
srcrepo storage 
(EMF-models via EMF-Fragments, 
e.g. on mongodb) 
srcrepo runtime 
(headless eclipse RCP) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
revision tree 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
M! 
M!"# 
M# M# 
M#"$ 
store 
timelines of metrics 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
statistics software 
(e.g. R, Matlab) 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
M! 
M!"# 
M# M# 
M#"$ 
store 
timelines of metrics 
export 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
Distributed Model Store with emf-fragments 
6 
normal references 
* 
Tuesday, 30. September 2014
Distributed Model Store with emf-fragments 
6 
normal references 
* 
fragmenting references 
«fragments» 
* 
Tuesday, 30. September 2014
A Meta-Model for Source Code Repositories 
7 
code revision tree 
» 
«fragments» 
Tuesday, 30. September 2014
8 
MoDisco source code revision tree 
«fragments» 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
▶ OCL-like internal Scala DSL analog to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
9 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
ownedPackages 
▶ OCL-like internal Scala Model DSL analog Package 
to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
9 
pure OCL 
context Model: 
self.ownedElements->collect(p|p.ownedElements)->size 
Abstract 
Type 
Declaration 
ownedElements 
ownedElements 
* 
* 
* 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
Abstract 
Type 
Declaration 
ownedPackages 
* 
▶ OCL-like internal Scala DSL ownedElements 
Model analog Package 
to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
9 
pure OCL 
context Model: 
* 
ownedElements 
* 
self.ownedElements->collect(p|p.ownedElements)->size 
OCL-like expression in Scala 
def numberOfFirstPackageLevelTypes(self: Model): Int = 
self.getOwnedElements().collect(p=>p.getOwnedElements()).size() 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
▶ Extending OCL’s collection operations: 
■ convenience operations 
■ closure 
■ aggregation 
■ execution 
trait OclCollection[E] extends java.lang.Iterable[E] 
{ 
def size(): Int 
def first(): E 
def exists(predicate: (E) => Boolean): Boolean 
def forAll(predicate: (E) => Boolean): Boolean 
def select(predicate: (E) => Boolean): OclCollection[E] 
def reject(predicate: (E) => Boolean): OclCollection[E] 
def collect[R](expr: (E) => R): OclCollection[R] 
def selectOfType[T]:OclCollection[T] 
def collectNotNull[R](expr: (E) => R): OclCollection[R] 
def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] 
def closure(expr: (E) => OclCollection[10 
E]): OclCollection[E] 
123456789 
10 
11 
12 
13 
14 
15 
16 
Tuesday, 30. September 2014
trait OclCollection[E] extends java.lang.Iterable[E] 
{ 
def size(): Int 
def first(): E 
def exists(predicate: (E) => Boolean): Boolean 
def forAll(predicate: (E) => Boolean): Boolean 
def select(predicate: (E) => Boolean): OclCollection[E] 
def reject(predicate: (E) => Boolean): OclCollection[E] 
def collect[R](expr: (E) => R): OclCollection[R] 
def selectOfType[T]:OclCollection[T] 
def collectNotNull[R](expr: (E) => R): OclCollection[R] 
def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] 
def closure(expr: (E) => OclCollection[E]): OclCollection[E] 
def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R 
def sum(expr: (E) => Double): Double 
def product(expr: (E) => Double): Double 
def max(expr: (E) => Double): Double 
def min(expr: (E) => Double): Double 
def stats(expr: (E) => Double): Stats 
def run(runnable: (E) => Unit): Unit 
} 
11 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
Tuesday, 30. September 2014
Complex Example: Average Weighted Methods per Class (WMC) 
▶ WMC is the first CK-metric [1]. There different commonly 
used weights; here we use cyclomatic complexity. 
def classes(model:Model):OclCollection[ClassDeclaration] = 
model.getOwnedElements() 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
Tuesday, 30. September 2014
Complex Example: Average Weighted Methods per Class (WMC) 
▶ WMC is the first CK-metric [1]. There different commonly 
used weights; here we use cyclomatic complexity. 
def classes(model:Model):OclCollection[ClassDeclaration] = 
model.getOwnedElements() 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
Model 
Abstract 
Body 
Declaration 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
Package 
ownedPackages 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
Abstract 
Type 
Declaration 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
Class 
Declaration 
1 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
TypeAccess 
ownedElements 
ownedElements 
bodyDeclarations 
superInterfaces 
superClass 
returnType 
type 
usagesInTypeAccess 
* 
* * 
* 
* * 
1 
1 
Tuesday, 30. September 2014
Model 
Package 
ownedElements 
Complex Example: Average Weighted Methods * per Class * 
ownedPackages 
(WMC) 
ownedElements 
* 
▶ WMC is the first CK-metric [1]. There different Abstract 
commonly 
Type 
used weights; here we use cyclomatic complexity. 
Declaration 
bodyDeclarations 
type 
* 
1 
def classes(model:Model):OclCollection[Abstract 
ClassDeclaration] = 
model.getOwnedElements() 
Body 
usagesInTypeAccess 
Declaration 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
superInterfaces 
* * 
TypeAccess 
1 
returnType 
1 
Method 
Declaration 
superClass 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
Class 
Declaration 
Abstract 
Method 
Invocation 
method 
1 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
Tuesday, 30. September 2014
Implementation of the OCL-Collection Operations 
▶ Just in time iterator-based implementation rather than 
straight forward aggregation of result collections. 
13 
:RepositoryModel 
:Rev :Rev :Rev 
:Par... :Par... :Par... :Par... :Par... :Par... 
:Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! 
1 
2 
3 
4 
Tuesday, 30. September 2014
Future Work, Remaining Problems, and Limitations 
14 
Scalability 
■very large compilation units 
■incremental snapshot creation 
■batching OCL execution 
■experiments with large scale 
repository (e.g. git.eclipse.org) 
Heterogeneity 
■MoDisco for different 
programming languages 
■common metrics meta-model 
(e.g. OMG, KDM) 
■VCS abstraction and support for 
different VCS 
Accessibility 
■relating results to 
software repository 
entities 
■persisting and 
exporting results 
Information Depth 
■diff-models from 
comparison of 
compilation 
units 
▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf 
■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger 
■ CUs are smallest common denominator between text-based VCS view and syntax-based 
AST view 
■ smaller units require model-comparison or text-to-AST mappings 
▶ Support for different programming languages: either abstraction, parallel meta-models, or 
mixed approach 
■ MoDisco is extendable, but only Java support exists; other languages need to be 
implemented ➞ parallel meta-models 
■ A reasonable abstraction for multiple (or all) programming language probably does 
not exist. 
■ A shared abstract meta-model that all language meta-models extends could be an 
sensible compromise. 
Tuesday, 30. September 2014
15 
Tuesday, 30. September 2014

Weitere ähnliche Inhalte

Kürzlich hochgeladen

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
masabamasaba
 

Kürzlich hochgeladen (20)

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 

Empfohlen

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Empfohlen (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Model-based Mining of Software Repositories

  • 1. Model-based Mining of Source Code Repositories Markus Scheidgen, Joachim Fischer {scheidge,fischer}@informatik.hu-berlin.de 1 Tuesday, 30. September 2014
  • 2. Agenda ▶ Mining Software Repositories (MSR) and Software Evolution Research (SER) ▶ srcrepo – a model-based MSR system ■ components and mining process ■ distributed storage of very large models with emf-fragments ■ a meta-model for source code repositories ■ gathering software metrics with an OCL-like internal Scala DSL ▶ work in progress - discussion of remaining problems and limitations 2 Tuesday, 30. September 2014
  • 3. Relevant Research Fields 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 3 Tuesday, 30. September 2014
  • 4. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 5. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 6. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 7. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 8. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 9. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 10. srcrepo – Components and Process 4 Tuesday, 30. September 2014
  • 11. srcrepo – Components and Process 4 large scale software repositories (e.g. github, sourceforge) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects Tuesday, 30. September 2014
  • 12. srcrepo – Components and Process revision tree 4 large scale software repositories (e.g. github, sourceforge) srcrepo storage (EMF-models via EMF-Fragments, e.g. on mongodb) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects C! 1 2 3 A" B# A! AST-models of new and changed CUs import Tuesday, 30. September 2014
  • 13. srcrepo – Components and Process revision tree 4 large scale software repositories (e.g. github, sourceforge) srcrepo storage (EMF-models via EMF-Fragments, e.g. on mongodb) srcrepo runtime (headless eclipse RCP) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 revision tree 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis Tuesday, 30. September 2014
  • 14. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis Tuesday, 30. September 2014
  • 15. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis OCL M! M" M# Tuesday, 30. September 2014
  • 16. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 17. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis M! M!"# M# M# M#"$ store timelines of metrics OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 18. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki statistics software (e.g. R, Matlab) C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis M! M!"# M# M# M#"$ store timelines of metrics export OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 19. Distributed Model Store with emf-fragments 6 normal references * Tuesday, 30. September 2014
  • 20. Distributed Model Store with emf-fragments 6 normal references * fragmenting references «fragments» * Tuesday, 30. September 2014
  • 21. A Meta-Model for Source Code Repositories 7 code revision tree » «fragments» Tuesday, 30. September 2014
  • 22. 8 MoDisco source code revision tree «fragments» Tuesday, 30. September 2014
  • 23. A OCL-like internal Scala DSL for Computing Metrics ▶ OCL-like internal Scala DSL analog to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 9 Tuesday, 30. September 2014
  • 24. A OCL-like internal Scala DSL for Computing Metrics ownedPackages ▶ OCL-like internal Scala Model DSL analog Package to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 9 pure OCL context Model: self.ownedElements->collect(p|p.ownedElements)->size Abstract Type Declaration ownedElements ownedElements * * * 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare Tuesday, 30. September 2014
  • 25. A OCL-like internal Scala DSL for Computing Metrics Abstract Type Declaration ownedPackages * ▶ OCL-like internal Scala DSL ownedElements Model analog Package to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 9 pure OCL context Model: * ownedElements * self.ownedElements->collect(p|p.ownedElements)->size OCL-like expression in Scala def numberOfFirstPackageLevelTypes(self: Model): Int = self.getOwnedElements().collect(p=>p.getOwnedElements()).size() 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare Tuesday, 30. September 2014
  • 26. A OCL-like internal Scala DSL for Computing Metrics ▶ Extending OCL’s collection operations: ■ convenience operations ■ closure ■ aggregation ■ execution trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[10 E]): OclCollection[E] 123456789 10 11 12 13 14 15 16 Tuesday, 30. September 2014
  • 27. trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[E]): OclCollection[E] def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R def sum(expr: (E) => Double): Double def product(expr: (E) => Double): Double def max(expr: (E) => Double): Double def min(expr: (E) => Double): Double def stats(expr: (E) => Double): Stats def run(runnable: (E) => Unit): Unit } 11 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Tuesday, 30. September 2014
  • 28. Complex Example: Average Weighted Methods per Class (WMC) ▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity. def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 Tuesday, 30. September 2014
  • 29. Complex Example: Average Weighted Methods per Class (WMC) ▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity. def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) Model Abstract Body Declaration 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() Package ownedPackages .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average Abstract Type Declaration def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 Class Declaration 1 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 TypeAccess ownedElements ownedElements bodyDeclarations superInterfaces superClass returnType type usagesInTypeAccess * * * * * * 1 1 Tuesday, 30. September 2014
  • 30. Model Package ownedElements Complex Example: Average Weighted Methods * per Class * ownedPackages (WMC) ownedElements * ▶ WMC is the first CK-metric [1]. There different Abstract commonly Type used weights; here we use cyclomatic complexity. Declaration bodyDeclarations type * 1 def classes(model:Model):OclCollection[Abstract ClassDeclaration] = model.getOwnedElements() Body usagesInTypeAccess Declaration .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() superInterfaces * * TypeAccess 1 returnType 1 Method Declaration superClass .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 Class Declaration Abstract Method Invocation method 1 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 Tuesday, 30. September 2014
  • 31. Implementation of the OCL-Collection Operations ▶ Just in time iterator-based implementation rather than straight forward aggregation of result collections. 13 :RepositoryModel :Rev :Rev :Rev :Par... :Par... :Par... :Par... :Par... :Par... :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! 1 2 3 4 Tuesday, 30. September 2014
  • 32. Future Work, Remaining Problems, and Limitations 14 Scalability ■very large compilation units ■incremental snapshot creation ■batching OCL execution ■experiments with large scale repository (e.g. git.eclipse.org) Heterogeneity ■MoDisco for different programming languages ■common metrics meta-model (e.g. OMG, KDM) ■VCS abstraction and support for different VCS Accessibility ■relating results to software repository entities ■persisting and exporting results Information Depth ■diff-models from comparison of compilation units ▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf ■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger ■ CUs are smallest common denominator between text-based VCS view and syntax-based AST view ■ smaller units require model-comparison or text-to-AST mappings ▶ Support for different programming languages: either abstraction, parallel meta-models, or mixed approach ■ MoDisco is extendable, but only Java support exists; other languages need to be implemented ➞ parallel meta-models ■ A reasonable abstraction for multiple (or all) programming language probably does not exist. ■ A shared abstract meta-model that all language meta-models extends could be an sensible compromise. Tuesday, 30. September 2014
  • 33. 15 Tuesday, 30. September 2014