Mining Metrics for Understanding Metamodel Characteristics
1. Dipartimento di Ingegneria e Scienze
Università degli Studi dell’Aquila
dell’Informazione e Matematica
Mining Metrics for Understanding
Metamodel Characteristics
Alfonso Pierantonio
Juri Di Rocco
Davide Di Ruscio
Ludovico Iovino
3. Alfonso Pierantonio
3
Introduction
Metamodels are a key concept in Model-Driven Engineering:
they represent the «trait-d'union» among all constituent
components
Metamodels formally define the modeling primitives used in
modeling activities
4. Alfonso Pierantonio
4
Motivation
It is of crucial relevance to investigate common characteristics
of metamodels, in order to have a better understanding on
– how they evolve over time
– what is the impact of metamodel changes throughout the
modeling ecosystem
Metamodel
5. Alfonso Pierantonio
5
Motivation
It is of crucial relevance to investigate common
characteristics of metamodels, in order to have a
better understanding on
– how they evolve over time
– what is the impact of metamodel changes throughout the
modeling ecosystem
Analyse metamodels in order to evalutate their
structural characteristics and the impact they might
have during the whole metamodel life-cycle
especially in case of metamodel evolutions
7. Alfonso Pierantonio
7
Measuring Metamodels
The applied process is able to identify linked
structural characteristics
Understand how they might change depending on the
nature of metamodels
8. Alfonso Pierantonio
8
Metrics calculation
Consists of the application of metrics on a data set of
metamodels
The applied metrics are borrowed from [1] and we
added new ones by leading to a set of 28 metrics
An excerpt of the considered metrics have been
considered in rest of the presentation
[1] W. James, Z. Athansios, M. Nicholas, R. Louis, K. Dimitios, P. Richard, and P. Fiona. What do
metamodels really look like? Frontiers of Computer Science, 2013.
9. Alfonso Pierantonio
9
Metrics calculation
The corpus of the analyzed metamodels has been
obtained by retrieving metamodels from different
repositories, i.e., EMFText Zoo, ATLZoo, Github,
GoogleCode
For a total number of 466 metamodels belonging to
different technical spaces and domains
11. Alfonso Pierantonio
11
Metrics Calculation
The metrics calculation has been implemented by
exploiting a model-driven toolchainThe Metamodel Metrics Calculator
is able to calculate for each
metamodel all the considered
metrics
12. Alfonso Pierantonio
12
Metrics Calculation
The metrics calculation has been implemented by
exploiting a model-driven toolchain
The Metamodel Metrics Calculator
is an ATL transformation whose
target models conform to the
Metrics metamodel
13. Alfonso Pierantonio
13
Metrics Calculation
The metrics calculation has been implemented by
exploiting a model-driven toolchain
Generating CVS files enables the
adoption of statistical tools like
IBM SPSS, Microsoft Excel, and
Libreoffice Calc for subsequent
analysis of the generated data
14. Alfonso Pierantonio
14
Calculation of metrics correlations
Correlation is used to detect cross-links and assess
relationships among observed data
We have considered Pearson’s and Spearman’s
coefficients to measure the correlations among
calculated metamodel metrics
Spearman only for highlighting curvilinear
correlations
Both Pearson’s and Spearman’s correlation indexes
assume values in the range of -1.00 (perfect negative
correlation) and +1.00 (perfect positive correlation)
A correlation with value 0 indicates that between two
variables there is no correlation
15. Alfonso Pierantonio
15
Selection of metrics correlations
We have calculated the Pearson’s and Spearman’s
correlation indexes for all the values of the
considered metrics
For each couple we have selected the coefficient
index (between Pearson and Spearman) having
higher correlation values
16. Alfonso Pierantonio
16
Selection of metrics correlations
The bar chart shows the degree of metric correlation:
the higher the bar, the more the metric is correlated
with other metrics.
17. Alfonso Pierantonio
17
Pearson’s correlation matrix
All the values greater than 0.7 or lesser than -0.73
have been highlighted to select the metrics that are
most related according to the Pearson’s index.
18. Alfonso Pierantonio
18
Pearson’s correlation matrix
All the values greater than 0.7 or lesser than -0.73
have been highlighted to select the metrics that are
most related according to the Pearson’s index.
19. Alfonso Pierantonio
19
Data analysis
The aim is to discuss and interpret the most relevant
correlations between structural characteristics which
have been found in the previous stages
20. Alfonso Pierantonio
20
Data analysis: abstraction
How the number of metaclasses is related to the
adoption of abstraction constructs, ie. abstract
metaclasses and supertypes
21. Alfonso Pierantonio
21
Data analysis: abstraction
How the number of metaclasses is related to the
adoption of abstraction constructs, ie. abstract
metaclasses and supertypes
22. Alfonso Pierantonio
22
Data analysis: abstraction
How the number of metaclasses is related to the
adoption of abstraction constructs, ie. abstract
metaclasses and supertypes
MGHL
Maximum hierarchical depth
23. Alfonso Pierantonio
23
Data analysis: abstraction
How the number of metaclasses is related to the
adoption of abstraction constructs, ie. abstract
metaclasses and supertypes
MHS
Maximum Hierarchy Siblings
24. Alfonso Pierantonio
24
Data analysis: abstraction
How the number of metaclasses is related to the
adoption of abstraction constructs, ie. abstract
metaclasses and supertypes
MTCWS
Number of metaclasses
having at least one super
type
26. Alfonso Pierantonio
26
Data analysis: abstraction
The #metaclasses and the #classes with supertypes
are strongly correlated (with Pearson index 0.99)
When the metamodel grows in size, the number of
inheritance hierarchies also increases, however the
hierarchy depth is somehow independent from the
metamodel size.
Interestingly, metamodel designers prefer to add
siblings in hierarchies instead of adding new
hierarchy levels
27. Alfonso Pierantonio
27
Data analysis: abstraction
This is testified by the graph that shows the values of
the MHS (Max Hierarchy
Sibling) and MGHL (Max
generalization hierarchical
level) metrics
Confirmed by the Pearson
correlation indexes between
MC and MHS (0.70) and that
between MC and MGHL (0.66).
28. Alfonso Pierantonio
28
Data analysis: abstraction
In metamodels with at most 50 metaclasses:
– the number of supertypes in
hierarchy is in between 0 and
20
– the number of siblings in a
hierarchy is in between 0 and
10
– the maximum height of a
hierarchy is in between 0 and
5
29. Alfonso Pierantonio
29
Data analysis: structural features
How structural features are used with hierarchies
We can consider the average number of features
(ASF) and the total number of metaclasses with
supertypes (MCWS) metrics
The Spearman approach permits to identify a greater
correlation index
30. Alfonso Pierantonio
30
Data analysis: structural features
Increasing the #metaclasses with supertypes, the
average # of structural features in a metaclass
decreases
31. Alfonso Pierantonio
31
Data analysis: structural features
By considering metamodels having in between 1 and 50
metaclasses, the average number of features (excluding the
inherited ones) of a metaclass ranges between 1 and 5
32. Alfonso Pierantonio
32
Data analysis: structural features
How the number of featureless metaclasses is
related to hierarchies height
This can indicate how specializations of metaclasses
can introduce or reduce structural features in
metamodels
MCWS and IFLMC are strongly correlated as
supported by the Pearson’s index having value 0.890
34. Alfonso Pierantonio
34
Data analysis: structural features
By increasing the number of metaclasses with super
types, the number of metaclasses without attributes
or references increases too
This means that when hierarchies are introduced,
usually existing features are subject to refactoring
operations mainly to move them to super classes
and to create leaves in the hierarchies inheriting
features from the super types.
35. Alfonso Pierantonio
35
Data analysis: isolated metaclasses
How isolated metaclasses are distributed
In this case we considered another statistical
instrument named Pareto analysis, which is
commonly referred to as the 80-20 principle (20% of
the causes account for 80% of the defects)
About 80% of the analyzed metamodels have a
percentage of isolated metaclasses in the range 0-
19%, by testifying the fact that isolated metaclasses
are not commonly used
37. Alfonso Pierantonio
37
Conclusions
We proposed a number of metrics which can be used
to acquire objective, transparent, and reproducible
measurements of metamodels.
The major goal is to better understand the main
characteristic of metamodels, how they are coupled,
and how they change depending on the metamodel
structure
A correlation analysis has been performed to identify
the most cross-linked metrics, which have, in turn,
been computed over 450 metamodels
38. Alfonso Pierantonio
38
Conclusions
These figures have been discussed in detail
highlighting the followings
- the adoption of inheritance is proportional to the
size of metamodels
- the number of metaclasses with supertypes are
inversely proportional to the average number of
structural features
- the number of metaclasses with supertypes is
proportional to the number of metaclasses without
attributes or references
- isolated metaclasses are not commonly used
apart from testing or educational purposes.