The growing market request for Web Applications is forcing software industries to produce applications under the pressure of a short time-to-market and a strong competition, with the consequence that low quality and poor documented software is often produced. Maintaining, evolving or comprehending these applications are not straightforward tasks, and reverse engineering processes should be defined and validated to support them.
In this paper a reverse engineering approach for reconstructing an object-oriented conceptual model of the application domain of a Web Application is presented. The proposed approach defines a process that reconstructs the model in three steps. In each step, heuristic criteria exploiting source code analysis are used for the identification of objects and their relationships. Tools for implementing this method have been produced, and experiments for validating it have been carried out with the support of case studies. Experimental results showed the feasibility and the effectiveness of the proposed approach.
What's New in Teams Calling, Meetings and Devices March 2024
Identifying Reusable Components in Web Applications
1. Identifying Reusable
Components in Web
Applications
P. Tramontana
U. De Carlini, A.R. Fasolino
Dipartimento di Informatica e Sistemistica
University of Naples Federico II, Italy
G.A. Di Lucca
RCOST – Research Centre on Software Technology
University of Sannio, Benevento, Italy
3. Web Applications (WA):
problems and open issues
• Basic software engineering principles,
such as modularity, encapsulation, or
separation of concerns cannot be realized
with some Web Application
implementation technologies
Maintaining and evolving these systems
are difficult and expensive tasks.
4. Web Applications (WA):
problems and open issues
• HTML language does not encourage the
information content to be separated from its
rendering
• Server and Client scripting languages don’t
encourage separation between Layout,
Content and Business Rules
5. Web Applications (WA):
problems and open issues
• Practice of developing new applications
by cloning existing pieces of code is
frequently adopted
Web applications provide a valuable
source of reusable software, which can be
used with success to support their
maintenance, reengineering and evolution.
6. Clone Definition
Clones: Duplicated or similar portions
of code in software artifacts
What kind of portions of code?
How measure similarity between these
portions of code?
7. What kind of portions of code?
Degree of granularity Type of clone
of a clone
Web Page Client Page
Server Page
Client Page inner components Script Block
Script Function
HTML Form
HTML table
Server Page inner Script Block
components Script Function
8. Comparing source code
•Levenshtein distance:
Minimum number of insertions, deletions, and
replacements of elements necessary to make two
vectors equal
• Euclidean distance
9. Distance between portions of code
Technique Description
STH Strings of HTML tags extracted from client pages are
compared by the Levenshtein distance.
CTH Counts of HTML tags extracted from client pages are
compared by the Euclidean distance.
SOA Strings of ASP built-in objects extracted from server
pages are compared by the Levenshtein distance.
AMA An array of software metrics extracted from ASP
script blocks in server pages is used to compare
server pages by the Euclidean distance.
10. Metrics for AMA
Software Metrics
NTC Total Number of characters per page
NTRC Total Number of lines of code per page
NTBC Total Number of ASP code blocks per page
NTSD Total Number of declarative statements per page
NTSC Total Number of conditional statements per page
NTSI Total Number of cyclic statements per page
NTF Total Number of Functions per page
NTS Total Number of Subroutines per page
NTOP Total Number of built-in ASP objects per page
LDBC Levenshtein distance of ASP code blocks
11. Phases of Clone Analysis Process
1. Separation of control and data
component within Web pages
2. Clone Detection
3. Clone Validation
4. Template Generation
5. Repository population
12. Separation of control and data
component
• Structure and data component are separated
in HTML portions of code
• ASP block code is transformed removing
blank characters and comments from the line,
and substituting each data element (such as
constants, or identifiers) by a dummy one.
• Consecutive blocks of ASP code will be
merged in a single comprehensive block.
13. Clone Detection
•Metrics are calculated for each
component
• Clone analysis techniques are applied
•Clone and near-miss-clone are
automatically detected with the support
of CloneDetector tool
14. Clone Validation
Clones (and near-miss-clones) are assessed by an
expert, considering:
•meaningfulness (does the considered clone
implement a meaningful abstraction in the business
domain, or in the solution domain?)
•completeness (does the clone include all the lines of
code necessary for implementing a valid abstraction?).
Clones may be discarded, merged or accepted as
reusable components.
16. Repository population
Template information are stored in a repository,
together with component information
Template
1
1
CloneCluster DataComponent
0..*
Dummy
1..*
WebPage ControlComponent follows
1
1
Client Server HTMLTag Statement
*
*
PageSubComponent TagAttribute
{incomplete}
Form Table Script Function Subroutine
17. Experiment
An experiment has been carried out, with 4 real web
applications
We used CloneDetector tool to search exact clones
of:
HTML pages (STH technique)
client scripts (STH technique)
functions in client scripts (STH technique)
forms (STH technique)
tables (STH technique)
ASP pages (AMA technique)
ASP script blocks (AMA technique)
functions/subroutines in ASP pages (AMA
technique)
18. Results
WA1 WA2 WA3 WA4
# HTML pages 55 23 23 -
# cloned HTML pages 18 2 8 -
# clusters of cloned HTML pages 3 1 4 -
# scripts in client pages 3 12 10 -
# cloned client scripts 3 4 2 -
# cluste rs of cloned client scripts 1 2 1 -
# functions in client pages 12 10 10 -
# cloned client functions 12 4 2 -
# clusters of cloned client functions 4 2 1 -
# forms in client pages 1 8 10 -
# cloned client forms 0 2 9 -
# clusters of clo ned client forms 0 1 3 -
# tables in client pages 7 4 3 -
# cloned client tables 0 0 0 -
# clusters of cloned client tables 0 0 0 -
# ASP pages 19 73 37 71
# cloned ASP pages 2 15 3 2
# clusters of cloned ASP pages 1 6 1 1
# scripts in ASP pages 75 576 150 5341
# cloned ASP scripts 4 0 36 0
# clusters of cloned ASP scripts 2 0 3 0
# functions/subroutines in ASP pages 5 0 3 165
# cloned ASP functions/subroutines 0 0 0 9
# clusters of cloned ASP 0 0 0 3
functions/subroutines
19. Conclusions
A clone detection method has been presented
An experiment has been carried out
Many cluster of cloned artifact has been detected and
validated
Some reusable component has been abstracted as
templates
Reengineering of static pages in a XML/XSL architecture
may be performed
Reuse of components or pages may be performed
Hinweis der Redaktion
In this talk, I will face the lack of reusability
The classic copy and paste We notice that C&P is often used during WA development, due also to ow know how of developers. So, we can find many software clones. We have two problems to solve: Identification of clones Reuse of clones
What is a clone?
What is a clone?
Fundamental distances are Levenshtein and Euclidean distance. These distances measure the distances between two vector of the same length. L. Distance is the minimum number of … L. Distance is a specialization of edit distance, with the same weight for each operation E. Distance is the geometric distance between two vectors, the square root of the sum of the squares of the differences between corrispondent elements of the two vectors
4 metrics has been considered: STH is used to compare portions of client pages AMA is used to compare server script blocks
The metrics used to calculate Asp Metrics Array are …
Nella prima fase un’analisi statica viene fatta per ricavare i dati necessari al calcolo delle metriche: vettore dei tag HTML, etc. etc. Nella seconda fase si procede al confronto di ogni coppia di pagine, form, block, function, table, in cerca di cloni o near miss clone Nella terza fase la meaningfulness e la completeness dei candidati cloni sono esaminate da un utente Nella quarta fase, viene isolata in un template la parte di controllo costante di un cluster di cloni (o near miss clone). La struttura di tali template è salvata in un database repository
We define clone a set of components with zero distance. We define Near-miss-clone if this distance is lower than a fixed experimental threshold
The detected set of clones need to be submitted to a validation activity aiming at assessing if they can be actually considered reusable components or not. Business domain knowledge and experience in building Web applications will be required for carrying out the validation activity.
Each set of validated clones is called cluster. In figure, on the left side, we see a client page whose structure was cloned. On the right side, we see its template, where each data has been substituted with a dummy element
We can operate a reengineering of cluster of cloned components