2. Outline
2
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture generation
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
3. Introduction
3
Architecture view
Architecture view
Architecture view
Design Implementation
Use doX()
Case doY()
Deployment Process
Software architecture should guide The architecture is documented
software development and maintenance in one ore more architecture views
Architecture patterns are used as
reference models for software solutions
4. Problem
4
1 Development of a traditional web application
Controller
2 Use an architecture pattern as reference model
(Ex: Model-View-Controller (MVC) )
Model View
3
Is this a view a model or a controller?
Is anyone maintaining the
Are we really following the MVC pattern? system documentation?
5. Related work
5
¨ In [Corazza 2010] a probabilistic approach is proposed to
partition software systems into meaningful sub-systems
¤ Analysis of variables, methods and class signatures
¤ This is a general approach and does not include historical data to
train the probabilistic classifier
¨ In [Maqbool 2007] a Bayesian method is described to
recover software systems architecture
¤ Use of a Naïve Bayes classifier, based on global variables
¤ Our approach considers a wider set of variables for the
discovery of software architecture
¨ Software Architecture verification tools
¤ Klocwork Architect (http://www.klocwork.com)
¤ Structure 101 (http://www.headwaysoftware.com)
A. Corazza, S. D. Martino, and G. Scanniello, “A probabilistic based approach towards software system clustering,” CSMR , 2010
O. Maqbool and H. Babri, “Bayesian learning for software architecture recovery,” ICEE, 2007
6. Outline
6
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture views
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
7. Objectives and contribution
7
¨ Objectives
¤ Recover software architecture for Java web systems
n MVC/Clustered based architecture
n Architecture Description Language (ADL)
n Scalable Vectors Graphic (SVG)
¤ Help verify their maintainability intent
¨ Contribution
¤ Probabilistic
approach for the generation of
architecture documentation based on MexADL
10. Outline
10
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture views
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
11. Classification functions
11
MVC-based architecture
1 Analyze training data 2 Generate a classification function
Type Suffix
MVC
ExternalAPI MVC Layer
+ Weka = Layer
External
Type Suffix
API
BayesNet classifier
Simple estimator 87% effectivity,
619 manually classified components TAN search
17 representative projects (Grails, Spring Roo, Play, Struts 2) using the training set as a test option
Clustered-based architecture
1 Rely on clustering algorithms
Java-
Type
ExternalAPI
Suffix
Cluster ID
+ ML
Without training data Expectation-Maximization (EM) algorithm
12. Outline
12
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture views
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
14. Outline
14
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture views
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
15. Implementation
15
¨ Deployed as an open-source Eclipse plugin
¤ Context menu linked to WAR files and Eclipse Projects
¨ Sample application: SpringSource Petclinic
1 2
SVG
2
3
Quality MVC-based
metrics ADL
Clustered-based
ADL Full details in:
http://code.google.com/p/web2mexadl
16. Outline
16
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture views
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
17. Maintainability verification
17
Quality metrics report
After each
compilation Valid interactions report
Full details in:
http://code.google.com/p/mexadl
18. Outline
18
¨ Context and motivation
¨ Objectives and contribution
¨ Software architecture discovery
¤ Classificationfunctions
¤ Architecture views
¤ Implementation
¨ Maintainability verification
¨ Conclusions and future work
19. Conclusions
19
¨ The effectivity of the probabilistic model is
promising, though further validation is required
¨ The generated architecture can help verify the
maintainability intent of software systems
¨ The approach is open to a variety of machine
learning algorithms, thanks to the flexibility of the
Weka and Java-ML projects
¨ Our implementation can be easily integrated with
current development environments
20. Future work
20
¨ To improve the classifier effectiveness, the bayesian
network should be trained with a wider set of web
projects
¨ Support additional languages and platforms
¨ Increased support for systems outside the web
application domain
21. References
21
¨ Research paper
¤ J.
Castrejón, R. Lozano, and G. Vargas-Solar,
“Web2MexADL: Discovery and Maintainability
Verification of Software Systems Architecture,” CSMR
2012 - Tool Demonstration Track
¨ Implementation
¤ http://code.google.com/p/web2mexadl
Hello, myname is Juan Carlos Castrejónandtoday I’m going to talk about Web2MexADL, a tool intended to discover and help maintain the architecture of software systems, in particular web applicationsThiswork is part of a collaborationbetweentheTecnológico de Monterrey in Mexicoand a couple of Informaticlaboratories of Grenoble in France
First, I’m going to describe the general context and the motivation of our toolThen, I’ll explain the particular objectives and the contribution of web2mexadlI’ll also describe the discovery process and the details of our current implementation. Then, I’m going to demonstrate its use in a sample scenario, showing how it can help verify the maintainability of web systemsFinally, I’ll present our conclusions and future work
Let’s begin by talking about a common scenario in software engineering [advance!]We can start by developing particular classes (or components) to implement a required logic in oursystem [advance!]. These components are usually grouped into modules, according to the different functionalities of ourapplication [advance!]. As the system grows larger, we usually develop more than one module, and define interactions between them.[advance!] The representation of the system components, and the relations between them, is documented in one or more architecture views. These views can vary according to the development process that weuse [advance!], or in the particular intent that we try to communicate [advance!]. The architecture views can then guide software development and maintenance.We can rely on one or more architecture patterns to identify the types of components that are part of our system. These patterns convey common structures and interactions that are proved to solve particular requirements.In summary, [advance!] software architecture should guide software development and maintenance. For this, the architecture needs be documented in one ore more architecture views. And for the generarion of these views, we can take architecture patterns as reference models.
However, [advance!] when we try to apply this theory to the development of a real-world web application, we may face several problems.First, [advance!] we need to choose an architecture pattern that can serve as reference model for our application. A common choice for web applications is the Model-View-Controller (MVC) pattern, due to its natural separation of business, presentation and control logic.Assuming we rely on the MVC pattern, we are probably going to face the following problems:- [advance!] During system development, how can we be sure that we are really following the MVC pattern?[advance!] For the development of a particular class, how can we know if it’s a model, a view or a controller? [advance!] And finally, can we rely on up-to-date documentation to make this analysis?
Software architecture discovery is a popular topic both in research and industrial environments. In particular, the use of probabilistic models to analyze the source code of software systems is a wide spread technique among reverse engineering tools. These methods differ on the combination of random variables, algorithms, and on the nature of training data. For example, the approach described in [Corazza 2010] builds the probabilistic model based on variables, methods and class signatures. However, this model is intended for general use and is not trained with historical data of any particular domain.In [Maqbool 2007] a similar approach is proposed, based on the analysis of global variables and on the definition of a Naïve Bayes classifier. In comparisson, our tool relies on a wider set of variables for the discovery process and is open to a variety of probabilistic models.To recover software systems architecture, we can also rely on open-source or commercial tools such as Klocwork Architect and Structure 101. These tools deliver a good starting point for the analysis of software systems. However, the advantage of Web2MexADL is that the resulting documentation is based on architecture descriptions, that can later be used to verify the maintainability of the system under analysis.
Let’s talk about the specific objectives and contribution of our tool
Ourtool has two main objectives.The first one, is to recover software architecture for Java web systems, based either on the MVC pattern or on the identification of clusters. The recovered architecture is represented in two architecture views: an Architecture Description Language (ADL) and a Scalable Vectors Graphic (SVG) file. The former includes the system components, their expected interactions and the information required to verify its maintainability intent. The latter includes the classification of each system component, either in MVC layers or Clusters.The second objective of our tool is to help verify the maintainability of the recovered architecture. For this, we rely on the MexADL verification approach. I’ll explain this approach in the following slides.The main contribution of our tool is a probabilistic approach for the generation of architecture documentation based on MexADL.
[advance!] Our tool is not an isolated effort. Web2MexADL is part of an initiative intended to support software development based on architecturenotations, andincludestheparticipation of universities in Mexicoand in France [advance!] This initiative includes tools for software architecture definition, discovery and verification. All of these tools are open-source and can be easily added to current Java developmentenvironments, bymeans of Eclipseplugins
In particular, our tool relies on MexADL to help verify themaintainability of software systems.MexADLis averification approach, based on the ISO/IEC SQuaRE quality model, that relies on architecture documentation (ADL) containing quality metrics and valid relations between system components. These metrics and relations are then verified using Aspect Oriented Programming.Extension of xADL, from the university of california, irvineMaintainability: degree of effectiveness and efficiency with which a software product can be modifiedThe result of the maintainability verification are two HTML reports, generated after each system compilation. These reports contain the quality metrics and interactions analysis.
Now I’ll explain the details of the probabilistic model used by our tool
Remember that the objective of our tool is to recover software architecture based either on the MVC pattern or in a clustered distribution. [advance!] For the recovery of MVC-based architectures, we rely on a technique known as Supervised learning. This technique requires training data from which a classification function can be inferred. The results of this classification are layers of the MVC pattern.[advance!] To obtain the training data, we analyzed popular Java web development frameworks and representative applications developed with them. In particular, we relied on the Grails, Spring Roo, Play and Struts 2 frameworks, and on 17 sample applications included in their distributions. In total, we analyzed 619 source code artifacts, including java classes, jsp, css, and html files.[advance!] The analysis was conducted using the following 4 variables : ExternalAPI, Type, Suffix and MVC Layer. We classified the 619 source code artifacts by conducting a manual analysis of each artifact and by chosing appropriate values for their 4 associated variables.[advance!] The generation of the classification function was conducted using the Weka project, which is a tool that provides a collection of machine learning algorithms . For the current implementation, we chose the following configuration for the classification function: BayesNet classifier, Simple estimator and TAN search algorithm. [advance!] With this configuration, the following bayesian network is generated. It has a 87% effectivity, using the training set as a test option.However, our tool doesn’t depend on this particular classification function. Using Weka, we could create another probabilistic model based on the same training data. This provides a great flexibility to our tool.[advance!] The recovery of a Clustered-based architecture is simpler in comparisson to the MVC approach. [advance!] [advance!] We use the same 4 variables, but we don’t require a training set. [advance!] We rely on the execution of clustering algorithms through the Java-ML project, which is an open-source tool that includes the implementation of machine learning algorithms. In the current implementation, we rely on the Expectation-Maximization algorithm to identify clusters, but as with the MVC approach, we are not tied to this particular algorithm. We can change to other clustering algorithms using the Java-ML interface.
Now I’ll explain the details of the probabilistic model used by our tool
[advance!][advance!] To recover the architecture of a particular web system, its compiled artifacts are analyzed using the ASM framework, which is a library that allows the analysis of Java bytecode. [advance!] Each artifact is analyzed in order to assign values to the Type, ExternalAPI and Suffix variables. [advance!] Once they are assigned a value, this information is sent either to the Bayesian network or to the EM algorithm, [advance!] in order to classify into a MVC or Clustered-based architecture, respectively. The classification is executed using Weka and Java-ML. [advance!] Our tool uses the classification results to generate two architecture views. [advance!] The first one is a SVG file that depicts the classification of each artifact, using a color notation. [advance!] To generate this file we rely on the Graphviz project, which is an open source graph visualization project. [advance!] The second architecture view is an ADL document that contains the system components (that is MVC layers or clusters) and their proposed interactions. [advance!] This document is generated using templates included in our tool. These templates also include a set of proposed values for the quality metrics that are required by MexADL.
Now it’s time to demonstrate the implementation of our tool by using a sample scenario
Our tool is deployed as an open-source Eclipse plugin, that associates a context-menu to WAR files and Eclipse projects.[advance!] To demonstrate its use, I’m going to rely on the SpringSource Petclinic application, a classic example of Java web developmentIn particular, wearegoing to seehow to generatethe [advance!] SVG and [advance!] [advance!] ADL architectureviews[Tool demonstration, according to the following video steps: - Summary of {Develop web application – Generate WAR file} - Discover MVC architecture - Discover Cluster architecture - Generate MexADL artifacts][advance!] You can findmoreinformation in theprojectwebsite
I’m going to describe how the recovered architecture can help verify the maintainability of web systems.
[advance!] After each system compilation, two HTML reports are generated.[advance!] The first report contains the analysis of the quality metrics associated to the system components.[advance!] The second one, depicts violations to the expected interactions between system components.[Tool demonstration, according to the following video steps: - Configure project - Generate MexADL reports]