This document proposes a three-step process to mine features from the source code of multiple software product variants. The process first extracts code elements from each variant, then uses formal concept analysis to group elements into common and variable partitions. Finally, it clusters elements into features using latent semantic indexing and formal concept analysis to identify mandatory and optional features based on element similarity. The approach was implemented and evaluated on a collection of ArgoUML products, identifying most features. Future work includes combining textual and semantic similarity and generating feature models from the mined features.
Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid
Feature Mining From a Collection of Software Product Variants
1. Feature Mining From a Collection of Software
Product Variants
Rafat AL-msie’deen1
, Abdelhak D. Seriai1
, Marianne Huchard1
,
Christelle Urtado2
, Sylvain Vauttier2
and Hamzeh Eyal Salman1
1
LIRMM / CNRS & Montpellier 2 University, Montpellier, France
{Al-msiedee, Abdelhak.Seriai, huchard, eyalsalman}@lirmm.fr
2
LGI2P / Ecole des Mines d’Al`es, Nˆımes, France
{Christelle.Urtado, Sylvain.Vauttier}@mines-ales.fr
1 Reverse Engineering Software Product Lines
Similarly to car manufacturers who propose a full range of cars with common
characteristics and numerous variants and options, software development might
entail to segment users’ needs and propose to them a software family to choose
from. Such software family is called a software product line (SPL) [1]. A SPL is
usually characterized by two sets of features: the features that are shared by all
products in the family, called the SPL’s commonalities, and, the features that are
shared by some, but not all, products in the family, called the SPLs variability.
These two sets define the mandatory and optional parts of the SPL. Software
product line engineering (SPLE) focuses on capturing the commonalities and
variabilities between several software products that belong to the same family.
In order to provide a more subtle description of the possible combinations of
optional features (e.g., some optional feature might exclude another and require
a third one), SPLs are usually described with a de-facto standard formalism
called a feature model. A feature model characterizes the whole software family.
It defines all valid feature sets, also called configurations. Each valid configuration
represents a specific product, either it be an existing product or a valid product-
to-be.
Software product variants are seldom developed in a disciplined way from
scratch. Alternatively, ad hoc reuse techniques such as copy-paste-modify are
used on the software’s code until some point where the need to discipline the
development by adopting a SPLE approach raises. Expected benefits are to
improve product maintenance, ease system migration, and the extracted features
may lead to the production of new products. In order to capitalize from the
existing code, reverse engineering is needed but manual analysis of the existing
software product variants to discover their features is time-consuming, error-
prone, and requires substantial efforts. Automating feature mining from source
code would be of great help.
In literature, surprisingly, the reverse engineering of features (or feature
model) from source code is seldom considered [2]. Existing approaches mine
features from a single software product variant, while we think it is necessary to
consider all available variants at a time [3].
2. 2 R. AL-msie’deen et al.
2 A Three Step Process to Mine Features from Code
Feature location in OO source code consists in identifying the object-oriented
building elements (OBEs) that implement a particular feature across software
product variants. The OBE we consider are packages, classes, attributes, meth-
ods and their body. We assume that a feature can be mapped to one and only
one set of OBEs: each feature has a unique implementation for the whole product
family.
In order to mine features from the OO source code of software variants, we
propose a three step process and rely on both Formal Concept Analysis (FCA) [4]
and Latent Semantic Indexing (LSI) [5] techniques. Our approach:
1. extracts OBEs from each software product variant by parsing its code.
2. uses FCA to build a lattice from OBEs and software product variants that
hierarchically groups OBEs from the software product variants into disjoint,
minimal partitions. This classification provides us with two OBE sets: Com-
mon OBEs (that are shared by all variants and can be found on the top
node of the lattice) and variable OBEs (that are shared by several but not
all variants and appear at the bottom of the lattice).
3. clusters OBEs into features. Each OBE set is analyzed using LSI and FCA
techniques to mine the optional and mandatory features based on the lexical
similarity between OBEs.
We have implemented this three step approach and evaluated its produced
results on a collection of ten ArgoUML products. The results showed that most of
the features were identified [6]. In our future work, we plan to combine both tex-
tual and semantic similarity measures to be more precise in determining feature
implementation. We also plan to use the mined common and variable features
to automate the building of the studied software family’s feature model.
References
1. AL-Msie’deen, R., Seriai, A.D., Huchard, M., Urtado, C., Vauttier, S., Salman, H.E.:
An approach to recover feature models from object-oriented source code. In: Actes
de la Journ´ee Lignes de Produits 2012, Lille, France (Novembre 2012) 15–26
2. AL-Msie’deen, R., Seriai, A.D., Huchard, M., Urtado, C., Vauttier, S., Salman,
H.E.: Survey: reverse engineering feature model/features from different artefacts.
http://www.lirmm.fr/Survey (2013) [Online; accessed 24-January-2013].
3. Dit, B., Revelle, M., Gethers, M., Poshyvanyk, D.: Feature location in source code:
a taxonomy and survey. Journal of Software: Evolution and Process (2012) 5395
4. Ganter, B., Wille, R.: Formal Concept Analysis, Mathematical Foundations. Sprin-
ger-Verlag (1999)
5. Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability
links using latent semantic indexing. In: Proceedings of the 25th International Con-
ference on Software Engineering. ICSE ’03, Washington, DC, USA, IEEE Computer
Society (2003) 125–135
6. AL-Msie’deen, R., Seriai, A.D., Huchard, M., Urtado, C., Vauttier, S., Salman, H.E.:
ArgoUML case study. http://www.lirmm.fr/CaseStudy (2013) [Online; accessed 23-
January-2013].