Presentation of paper on "pitfalls in aspect mining" at the Working Conference on Reverse Engineering (WCRE), Antwerp, Belgium, 2008.
The research domain of aspect mining studies the problem of (semi-)automatically identifying potential aspects and crosscutting concerns in a software system, to improve the system’s comprehensibility or enable its migration to an aspect-oriented solution. Unfortunately, most proposed aspect mining techniques have not lived up to their expectations yet. In this paper we provide a list of problems that most aspect mining techniques suffer from and identify some of the root causes underlying these problems. Based upon this analysis, we conclude that many of the problems seem to be caused directly or indirectly by the use of inappropriate techniques, a lack of rigour and semantics on what is being mined for and how, and in how the results of the mining process are presented to the user.
1. Pitfalls in
Aspect Mining
Pr. Kim Mens Dr. Andy Kellens Dr. Jens Krinke
Université catholique de Louvain Vrije Universiteit Brussel King’s College London
B-1348 Louvain-la-Neuve Belgium United Kingdom
Belgium akellens@vub.ac.be krinke@acm.org
kim.mens@uclouvain.be
WCRE 2008, 15th Working Conference on Reverse Engineering
October 15th – 18th, 2008 Antwerp, Belgium
1
2. What’s this paper doing here?
Reverse engineering is about
“recovering information from existing software and systems”
WCRE studies innovative
methods for extracting such information and
ways of using that information for system renovation and program
understanding
Aspect mining tries to
identify potential aspects and crosscutting concerns from existing
software systems
in order to improve the system's comprehensibility or
to enable its migration to an aspect-oriented solution
2
3. Why did we write this paper?
Partly out of frustration
Prior research on aspect mining
Co-authored ~8 papers since 2004,
including some survey papers
Variety of techniques
based on FCA, clustering, clone detection, ...
No satisfactory results : Why ?
3
4. Our goal
Most proposed aspect mining techniques have not lived up to
their expectations yet
Draw list of problems that most aspect mining techniques
suffer from
Identify root causes underlying these problems
Provide suggestions for improvements
Moment of reflection on state of research in aspect mining
no big “surprises”
provide broader basis for discussion
4
5. Aspects in a nutshell
implementing a notify/listener
the
public abstract class Customer {
private CustomerID id;
private Collection listeners;
public Address getAddress() {
OO return this.address; } public abstract class Customer {
the
public void setLastName(String name) { private CustomerID id;
this.lastName = name; }
way
public Address getAddress() {
public void setCustomerID(String id) { return this.address; }
AO
this.id = id; public void setLastName(String name) {
notifyListeners();
} public class PrivateCustomer { this.lastName = name; }
...
... public void setCustomerID(String id) {
way
private String lastName;
public class CorporateCustomer { this.id = id; }
private String firstName;
... ...
...
private String companyName; public void setLastName(String name) {
private CompanyName taxNumber; this.lastName = name; public class PrivateCustomer {
... notifyListeners();
} public class CorporateCustomer { ...
public void setCompanyName(String name) { public void setFirstName(String name) { ... private String lastName;
this.companyName = name; this.firstName = name; private String companyName; private String firstName;
notifyListeners();
} notifyListeners();
} private CompanyName taxNumber; ...
public void setTaxNumber(String nr) { } ... public void setLastName(String name) {
this.taxNumber = nr;
public void setCompanyName(String name) { this.lastName = name; }
notifyListeners();
}
this.companyName = name; } public void setFirstName(String name) {
}
public void setTaxNumber(String nr) { this.firstName = name; }
public class CustomerListener {
this.taxNumber = nr; } }
}
public void notify(Customer modifiedCustomer) {
System.out.println(quot;Customer quot; + modifiedCustomer.getID() + quot; was modifiedquot;);
} public aspect ChangeNotification {
} pointcut stateUpdate(Customer c) :
pointcut
execution(* Customer.set*(..)) &&
this(c);
after(Customer c): stateUpdate(c) {
tangling for (Iterator iterator = c.listeners.iterator(); iterator.hasNext();) {
advice
CustomerListener listener = (CustomerListener) iterator.next();
listener.notify(c);
}
code in one region addresses }
sca ttering
multiple concerns ... some interclass definitions here ...
code addressing one concern
clean separation of concerns
is spread around the system
5
6. Aspect Mining
Note:
If you want to migrate towards aspects,
aspect mining is only the first step.
aspect 3
aspect 1
You still need to “extract” aspect 2
the actual aspects from the
discovered aspect
6
7. Why Aspect Mining?
Legacy systems
large, complex systems
Not always clearly documented
Program understanding
useful to find crosscutting concerns (what?)
useful to find extent of the crosscutting concerns (where?)
First step in migration to aspect-oriented solution
or just to document the croscutting concerns
7
8. How does it work?
(mostly) Variety of techniques from data mining, code
analysis, reverse engineering
specifically redesigned to identify potential aspect
candidates in software source code
by looking for symptoms of crosscutting concerns
(scattering, tangling, code duplication, ...)
Semi-automated: manual intervention required to
set thresholds, fine-tune filters to apply, ...
verify, select and complete reported results
8
9. Problems with aspect mining
Poor precision
(At different levels of granularity)
Poor recall
Subjectivity
Scalability
Empirical validation
Comparability
Composability
9
10. Consequence:
Levels of granularity - difficult to compare
- difficult to combine
- technique may not return what you look for
Make sure that you know what you are mining for
joinpoints = places in the code
that address a particular aspect
aspects = what aspects are
implemented in the source code
crosscutting sorts = all aspects
or concerns of a given kind
Different techniques may work at different levels of granularity
Example of aspects:
Example of joinpoints: Contract enforcement =
- change notification,
- all mutators that notify a listener The sort of all aspects that check a
synchronisation, logging
(“change notification aspects”) common condition for a set of methods.
Example of such an aspect; before
updating a view check whether it is
necessary to update. 10
11. Poor precision and poor recall
Precision = relevant candidates ÷ reported candidates
Poor precision => false positives => more user involvement
Recall = discovered aspects ÷ all aspects
Poor recall => false negatives => incomplete results
Hard to calculate
Recall is inversely correlated with precision
Poor precision or recall occurs at different levels of
granularity
11
12. Results marked with ‘M’ belong to the memory handling concer
References
only the lines marked with ‘C’ are included in the clone
s finding the code belonging to a cer- CCFinder allows clones to start and end with little Theme: t
1. Elisa Baniassad and Siobhan Clarke. regard
Example
ore, in our algorithm to select the clone syntactic units. In contrast, Bauhaus’ ccdiml does notand
An approach for aspect-oriented analysis allo
5), we favor coverage and sacrifice pre- design. In Proc. Int’l Conf. Software Engineer-
this, due to its AST-based clone detection algorithm. DC,
ing (ICSE), pages 158–167, Washington,
). Arguably, other goals require differ- USA, 2004. IEEE Computer Society Press.
he clone classes. For example, in order 2. Elisa Baniassad, Paul C. Clements, Joao
M C if (r != OK)
Araujo, Ana Moreira, Awais Rashid, and Bedir
ities for (automatic) refactoring, preci- MC{
Tekinerdogan. Discovering early aspects. IEEE
MC ERXA_LOG(r, 0, (quot;PLXAmem_malloc failure.quot;));
• 3 issue. detection techniques M C ERXA_LOG(VSXA_MEMORY_ERR, r, and Linda Northrop.
rimaryclone We plan to explore these Software, 23(1):61–70, January-February 2006.
ture. 3. Len Bass, Mark Klein,
MC
• 5 known aspects detectors MC (quot;%s: failed to allocated %d bytes.quot;,
Identifying aspects using architectural reason-
ate to what extent the clone M func_name, toread));
ing. Position paper presented at Early Aspects
vestigate the level of concern coverage M
• 16KLOC C code is the fraction
2004: Aspect-Oriented Requirements Engineer-
M r = VSXA_MEMORY_ERR;
ing and Architecture Design, Workshop of the
sses. Concern coverage M } 3rd Int’l Conf. Aspect-Oriented Software Devel-
e • Aspects manually annotated
code lines that are covered by the first opment (AOSD), 2004.
Figure 3. CCFinder Engelen, and Arie van Deursen, evalua-
4. Magiel Bruntink, Remco
by programmer clone covering memory erro
sses. Using the selection algorithm de- van Tom Tourw´. An
e
handling.
we obtain the results displayed in Fig- tion of clone detection techniques for identify-
• for Bauhaus’ ccdiml and CCFinder,
(b)Precision and recall compared ing crosscutting concerns. In Proc. Int’l Conf.
to manual annotations
Software Maintenance (ICSM), pages 200–209.
Furthermore this IEEE Computerdoes not cover memory e
clone class Society, 2004.
evaluate the precision obtained by the 5. Magiel Bruntink, Arie van Deursen, Remco van
ror handling code exclusively. Tom Figure On the note clone th
Engelen, and In 2(d), use of that
Technique: AS T Token Tourw´.
P D Ge
nique is relatively low. W hile as follows:
classes. Precision is defined this low pre- precision obtained for the firstidentifying cross is roughly 82%
detection for clone class cutting concern
Concern:
on is not a problemthis “ideal” case still
Even for in se, it does imply that code. IEEE Computer Society Trans. Software
Through inspection Engineering, .63we .81
Memory handling of the code
.65 31(10):804–818, 2005. some of th
found that
ect mining techniques tend to return a lot
concernLines(n)
alse positives, which can be detrimental to ->
n(n) = relatively poor precision
, clones do not cover M. Ceccato, error handling code Moonen, bu
Null pointer checking .99 .97 Marin, K. Mens, L. at all,
6. memory M. .80
totalLines(n)
code that checking at the syntacticalTourw´. yetmining tech-
Range is similar P. Tonella, and T. .42 e Applying and
.71 .59
ir scalability and ease-of-use. Especially for level, semantical
combining three different aspect
E xception handling .38 .36 .35
eniquesnthat returnclone classes, concern-
first selected a large number of results, different. niques. Software Quality Journal, 14(3):209–
Tracing .62 .57 .68
lack of precision can be problematic, since 231, September 2006.
mber of concern code lines covered by Table 1. A verage precision of K. Mens, and P. Tonella. A survey of
7. A. Kellens, each technique
ay require an important amount of user in- 6.2. Parameter Checking
clone classes, andthe false positives from
ement to separate likewise totalLines for each of the five concerns
automated code-level aspect mining techniques.12
13. Subjectivity and scalability
Subjectivity in interpretation of results
Filters, threshold values and blacklists configured by users
Ambiguity in interpretation of what is valid aspect candidate
“if it is part of the core functionality, it is not an aspect”
e.g. “Moving Figures” in JHotDraw
Scalability can be problematic due to user involvement
often many results to be validated / refined by user
looking for false positives / completing the aspect seeds
13
14. Evaluate, compare and combine
Empirical validation
no common benchmark
subjectivity in interpretation
results at different levels of detail and granularity
Comparability
how to compare the quality of mining techniques?
Composability
how to combine the results of different mining techniques?
14
15. Causes of the problems
Inappropriate techniques
Too general-purpose
Too strong assumptions
Too optimistic approaches
Scattering versus tangling
Lack of use of semantic information
Imprecise definition of what is an aspect
Inadequate representation of results
15
16. Aspect mining problems and causes
Inadeq.
Inappropriate techniques Imprecise
Cause repres. of
definition
too general too strong too optimistic no attention lack of use of
results
purpose assumptions approaches to tangling sem. info
Problem What can we learn from this table?
- - - - - -
poor precision
- - - -
poor recall
- - -
subjectivity
Poor precision negatively affects scalability: more user involv.
(-) (-) (-) (-) - (-)
scalability
- -
emp. valid.
- -
comparability Most causes Only this one
These three cause most problems
negatively affect seems specific
- -
either precision, recall
composability to aspects
or both
16
17. How to improve? (1)
Provide more rigourous definition of aspect
Dedicated mining techniques may be more successful than
general-purpose ‘one size fits all’ aspect mining techniques
Rely on semantics rather than on code structure
need for stable semantic foundation
Desired quality depends on purpose of mining
what is it that you want to do with the mined information?
initial understanding vs. migration towards aspects
17
18. How to improve? (2)
Leave room for variability
Look for counter-evidence
Look for symptoms of tangling
Choose adequate and uniform way of presenting the results
enough detail but not too much
Combine results of different techniques
Provide common framework to compare and evaluate mining
techniques
18
19. Conclusion
Most encountered pitfalls not specific to “aspect mining”
relevant to any discovery / reverse engineering process
especially present in aspect mining due to relative
immaturity of domain
potential for cross-fertilisation?
A word of warning
If you want to use aspect mining, don’t apply tools blindly
If you want to research aspect mining, still many research
opportunities but also a high risk of failure
19