SlideShare ist ein Scribd-Unternehmen logo
1 von 118
Downloaden Sie, um offline zu lesen
Dottorato in Scienze Computazionali e Informatiche, XXV Ciclo
Ph.D. Candidate:
Valerio Maggio
Thesis Advisors:
Dr. Sergio Di Martino
Dr. Anna Corazza
June 5th, 2013
UNSUPERVISED
MACHINE LEARNING
FOR SOFTWARE
MAINTENANCE
THESIS
G O A L
UNSUPERVISED
MACHINE LEARNING FOR
SOFTWARE MAINTENANCE
These solutions exploit (unsupervised) machine learning
techniques to mine information from the source code
Define and experimentally evaluate solutions (techniques and
prototype tools) for automatic software analysis to support
software maintenance activities.
There exist different types of
Software Maintenance.
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
Software Maintenance represents the most
expensive, time consuming and challenging
phase of the whole development process.
SOFTWARE MAINTENANCE
“A software system must be continually adapted during its overall life cycle or it
progressively becomes less satisfactory.”
(cit. Lehman’s First Law of Software Evolution)
Software Maintenance represents the most
expensive, time consuming and challenging
phase of the whole development process.
Software Maintenance could account up to the
85-90% of the total software costs.
ISSUES
SOFTWARE
MAINTENANCE
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
ISSUES
SOFTWARE
MAINTENANCE
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
The documentation is usually
scarce or not up to date!
ISSUES
SOFTWARE
MAINTENANCE
The source code (usually) represents the most
reliable source of information about the system
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
The documentation is usually
scarce or not up to date!
ISSUES
SOFTWARE
MAINTENANCE
The source code (usually) represents the most
reliable source of information about the system
“Software Maintenance is about change!”
(cit. S. Jarzabek)
Change Analysis
Program
Comprehension
Reverse Engineering
The documentation is usually
scarce or not up to date!
REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
STATIC
ANALYSIS
DYNAMIC
ANALYSIS
REVERSE
E N G I N E
E R I N G
• Definition of tools and techniques to support maintenance activities
• Goal: Build higher-level software models in an automatic fashion
gathering information from the source code or any other document
• Goal: To aid the comprehension of the system
STATIC
ANALYSIS
DYNAMIC
ANALYSIS
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
MACHINE
L E A R N
I N G
• Provides computational effective solutions to analyze large data sets
• Provides solutions that can be tailored to different tasks/domains
• Requires many efforts in:
• the definition of the relevant information best suited for the specific task/domain
• the application of the learning algorithms to the considered data
UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
UNSUPERVISEDLEARNING
• Supervised Learning:
• Learn from labelled samples
• Unsupervised Learning:
• Learn (directly) from the data
Learn by examples
(+) No cost of labeling samples
(-) Trade-off imposed on the quality of the data
THESIS
C O N T R
I B U T I
O N S
&
THESIS
C O N T R
I B U T I
O N S
Contributions to three relevant and related
open issues in Software Maintenance
&
THESIS
C O N T R
I B U T I
O N S
Contributions to three relevant and related
open issues in Software Maintenance
1. SOFTWARE
RE-MODULARIZATION
3. CLONE
DETECTION
2. SOURCE CODE
NORMALIZATION
&
ISSUE
#1
SOFTWARE
RE-MODULARIZATION
Software Classes
UI Process
Components
UI
Components
Data Access
Components
Data Helpers /
Utilities
Security
Operational Management
Communications
Business
Components
Application Facade
Buisiness
Workflows
Messages
Interfaces
Service Interfaces
Re-modularization provides a way to
support software maintainers by
automatically grouping together
(clustering) “related” software classes
SOFTWARE
RE-MODULARIZATION
PROBL
EM
S T A T E
M E N T
Re-modularization provides a way to
support software maintainers by
automatically grouping together
(clustering) “related” software classes
SOFTWARE
RE-MODULARIZATION
External
Systems
Service Consumers
Services
Service Interfaces
Messages
Interfaces
Cross Cutting
Security
OperationalManagement
Communications
Data
Data Access
Components
Data Helpers /
Utilities
Presentation
UI
Components
UI Process
Components
Business
Application Facade
Buisiness
Workflows
Business
Components
Clusters of Software Classes
PROBL
EM
S T A T E
M E N T
SOURCECODE
LEXICALINFORMATION
SOURCECODE
LEXICALINFORMATION
SOURCECODE
LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code
to produce clusters of classes that are lexically related.
SOURCECODE
LEXICALINFORMATION (IDEA): Exploit the lexical information gathered from the source code
to produce clusters of classes that are lexically related.
State of the Art: Information Retrieval (IR) based approaches.
IRINDEXING
Implicit assumption: The “same” words are used whenever a
particular concept occurs
IRINDEXING
1. Tokenization
2. Normalization
draw, the, are,
null, handl,
box, r,
rectangl, g,
graphic,
box,
display, box,
...
Draws, the, are,
NullHandle,
box, r,
Rectangle, g,
Graphics,
box,
displayBox,
...
Implicit assumption: The “same” words are used whenever a
particular concept occurs
IRINDEXING
SOURCECODEZONES
LEXICALINFORMATION
SOURCECODEZONES
LEXICALINFORMATION
Class Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
Method Names
Method Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
Method Names
Parameter Names
Method Names
Parameter Names
SOURCECODEZONES
LEXICALINFORMATION
Class Names
Attribute Names
Method Names
Parameter Names
Comments
Comments
Method Names
Parameter Names
Comments
SOURCECODEZONES
LEXICALINFORMATION
Source Code
Class Names
Attribute Names
Method Names
Parameter Names
Comments
Comments
Method Names
Parameter Names
Source Code
Comments
ZONEINDEXING
ZONEINDEXING
RQ1: Do terms in different Zones provide different contributions?
1. Tokenization
2. Normalization
ZONEINDEXING
Draws, the, are,
NullHandle,
box, r,
draw, g,
Graphics,
color,
displayBox,
...
draw, the, are,
null, handl,
box, r,
draw, g,
graphic,
color,
display, box,
...
RQ1: Do terms in different Zones provide different contributions?
SOURCECODELEXICON JFreeChart JEdit
JUnit Xerces
SOURCECODELEXICON JFreeChart
• Good lexicon in every Zone!
SOURCECODELEXICON JEdit
• Very good in Comments
• Very poor in Method and Parameter names
SOURCECODELEXICON JUnit
• Good in Class and Attribute names
• No Comments at all
SOURCECODELEXICON Xerces
• Poor in Method, Parameter names and Comments
• Good in Method Code and Variable names
SOURCECODELEXICON JFreeChart JEdit
JUnit Xerces
RQ2: How to automatically weight the different contributions ?
CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
• An approach to automatically weight the contribution of different terms in
Zones
• Maximum-Likelihood Estimation (MLE) Approach
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
CONTRI
BUTION
SOFTWARE RE-
MODULARIZATION
• We propose a source code Zone Indexing
• An approach to automatically weight the contribution of different terms in
Zones
• Maximum-Likelihood Estimation (MLE) Approach
• A variant of the K-medoid clustering algorithm
• Tailored to the software clustering task
Corazza A., Di Martino, S., Maggio, V., Scanniello, G.
Investigating the use of lexical information for software system clustering
(2011) Proceedings of the European Conference on Software Maintenance and Reengineering, CSMR, art. no. 5741257, pp. 35-44.
ISSN: 15345351 ISBN: 978-076954343-7
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
Combining machine learning and information retrieval techniques for software clustering
(2012) Communications in Computer and Information Science, 255 CCIS, pp. 42-60. ISSN: 18650929 ISBN: 978-364228032-0
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
VerticalPlot
FitAlgorithm
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
Extreme
Extreme
StackedVerticalBarRenderer
LineAndShapeRenderer
K-MEDOIDVARIANT
ArrayHolder
Plot
LinearPlotFitAlgorithm
LinearPlotFit
HorizontalNumberAxis
NumberAxisPropertyEditPanel
PlotFitAlgorithm
ChartFrame
GraphicsHolder
PropertyEditPanel
DateAxisPropertyEditPanel
StandardLegend
LegendItem
FigureLegend
VerticalNumberAxis
VerticalXYDataPlot
DataPlot
SampleXYDataset
SampleXYDatasetThread
Non-extreme Distribution
property
VerticalPlot
FitAlgorithm
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
0
0.2
0.4
0.6
0.8
1.0
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
AUTHORITATIVENESS
RE
SULTS
Comparison of Clustering Results with an Authoritative Target Partition
19 target Open Source Java Systems FLAT INDEXING (NO ZONES) = STATE OF THE ART
0
0.2
0.4
0.6
0.8
1.0
Ant Lucene Tomcat Azureus Hibernate ITextdotNet jEdit JFreeChart JFTP JHotDraw JRefactory JUnit LiferayPortal Pmd Synapse TigerEnvelopes Velocity Xalan Xerces
FLAT INDEXING (NO ZONES) ZONE INDEXING ZONE INDEXING + MLE WEIGHTS
ISSUE
#2
SOURCE CODE
NORMALIZATION
SOURCECODE
LEXICALINFORMATION Implicit assumption: The “same” words are used whenever a
particular concept occurs
SOURCECODE
LEXICALINFORMATION Implicit assumption: The “same” words are used whenever a
particular concept occurs
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
• displayBox ==> display | Box
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
display, box
null, handl
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
• displayBox ==> display | Box
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
display, box
null, handl
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
display, box
null, handl
1. Tokenization
2. Normalization
1.5 Identifier
Splitting
• Splitting algorithms based on naming conventions
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• NullHandle ==> Null | Handle
• displayBox ==> display | Box
draw, the, are,
box, r,
rectangl, g,
graphic,
color,
....
STATEOFTHEARTTOOLS
null, handl
display, box
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
IDENTIFIERSSPLITTING
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• drawXORRect ==> drawXOR | Rect
IDENTIFIERSSPLITTING
• camelCase Splitter: r’(?<=!^)([A-Z][a-z]+)’
• drawXORRect ==> drawXOR | Rect
• drawxorrect ==> NO SPLIT
Splitting algorithms based on naming conventions
are not robust enough
IDENTIFIERSSPLITTING
Splitting algorithms based on naming conventions
are not robust enough
ABBREVIATIONSEXPANSION
Splitting algorithms based on naming conventions
are not robust enough
• Heavy use of Abbreviations in the source code
• r as for Rectangle
• rect as for Rectangle
ABBREVIATIONSEXPANSION
Splitting algorithms based on naming conventions
are not robust enough
• Heavy use of Abbreviations in the source code
• r as for Rectangle
• rect as for Rectangle
ABBREVIATIONSEXPANSION
1. Tokenization
CODENORMALIZATION
2. Normalization
1.5 Code
Normalization
• AMAP (Hill and Pollock, 2008)
• SAMURAI (Enslen, et.al , 2011)
• TIDIER (Guerrouj, et.al , 2011)
• GenTest+Normalize (Lawrie and Binkley, 2011)
• LINSEN (Corazza, Di Martino and Maggio, 2012)
draw, the, are,
null, handl,
box,
rectangl,
graphic,
color,
display, box,
rectangl,
draw, xor,
rectangl
1. Tokenization
CODENORMALIZATION
2. Normalization
1.5 Code
Normalization
• AMAP (Hill and Pollock, 2008)
• SAMURAI (Enslen, et.al , 2011)
• TIDIER (Guerrouj, et.al , 2011)
• GenTest+Normalize (Lawrie and Binkley, 2011)
• LINSEN (Corazza, Di Martino and Maggio, 2012)
draw, the, are,
null, handl,
box,
rectangl,
graphic,
color,
display, box,
rectangl,
draw,xor,
rectangl
CONTRI
BUTION
SOURCE CODE
NORMALIZATION
• LINSEN: novel technique for code normalization
• Able to both Split Identifiers and Expand possible occurring
abbreviations
Corazza, A., Di Martino, S., Maggio, V.
LINSEN: An efficient approach to split identifiers and expand abbreviations
(2012) IEEE International Conference on Software Maintenance, ICSM, art. no. 6405277, pp. 233-242. ISBN: 978-146732312-3
CONTRI
BUTION
SOURCE CODE
NORMALIZATION
• LINSEN: novel technique for code normalization
• Able to both Split Identifiers and Expand possible occurring
abbreviations
• Based on an efficient String Matching technique:
Baeza-Yates&Perlberg Algorithm (BYP)
• Exploit different Sources of Information to find the matching
words
Corazza, A., Di Martino, S., Maggio, V.
LINSEN: An efficient approach to split identifiers and expand abbreviations
(2012) IEEE International Conference on Software Maintenance, ICSM, art. no. 6405277, pp. 233-242. ISBN: 978-146732312-3
SKETCHOFTHEALGORITHM
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
5
7
8
2 63
10
SKETCHOFTHEALGORITHM
• ARCS corresponds to matchings between identifier substrings
and dictionary words
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
5
7
8
2 63
10
SKETCHOFTHEALGORITHM
• ARCS corresponds to matchings between identifier substrings
and dictionary words
• Padding Arcs to ensure the Graph always connected
• NODES correspond to characters of the current identifier
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
SKETCHOFTHEALGORITHM
• Every Arc is Labelled with the corresponding dictionary word
• Weights represent the “cost” of each matching
• Cost function [c(“word”)] favors longest words and domain-
related information
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
SKETCHOFTHEALGORITHM
• The final Mapping Solution corresponds to the sequence of labels
in the path with the minimum cost (Djikstra Algorithm)
Model: Weighted Directed Graph
Example: drawXORRect identifier
0
11
1
9
4
“r”,C-MAX
“d”,C-MAX
5
7
8
2
“R”,C-MAX
6
“a”,C-MAX
3
“w”,C-MAX
“X”,C-MAX
“O”,C-MAX
“e”,C-MAX
10
“c”,C-MAX“t”,C-MAX
“raw”,c(“raw”)
“draw”,c(“draw”)
“or”,c(“or”)
“xor”,c(“xor”)
“rectangle”,c(“rectangle”)
“R”,C-MAX
SPLITTING
Accuracy Rates for the
comparison with
DTW (Madani et. al 2010)
0
0.25
0.5
0.75
1
JhotDraw 5.1 Lynx 2.8.5
DTW LINSEN DTW LINSEN
DTW LINSEN
RE
SULTS
# 1
Accuracy Rates for the
comparison with GenTest
(Lawrie and Binkley, 2011)
0
0.25
0.5
0.75
1
which 2.20 a2ps 4.14
GenTest LINSEN
DTW O(n3)
Accuracy Rates for the
comparison with
AMAP (Hill and Pollock, 2008)
0
0.25
0.5
0.75
1
CW DL OO AC PR SL TOTAL (Avg)
AMAP
LINSEN
EXPANSION
RE
SULTS
# 2
CW: Combination Words
DL: Dropped Letters
OO: Others
AC: Acronyms
PR: Prefix
SL: Single Letters
ISSUE
#3
CLONE
DETECTION
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Clones Textual Similarity
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Clones Functional Similarity
PROBL
EM
S T A T E
M E N T
CLONE DETECTION
Clones affect the reliability of the system!
Sneaky Bug!
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Text Based Tools:
Text is compared line by line
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Token Based Tools:
Token sequences are
compared to sequences
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Syntax Based Tools:
Syntax subtrees are compared
to each other
STATEOFTHEARTTOOLS
Duplix
Scorpio
PMD
CCFinder
Dup
CPD
Duplix
Shinobi
Clone Detective
Gemini
iClones
KClone
ConQAT
Deckard
Clone Digger
JCCD
CloneDr
SimScan
CLICS
NiCAD
Simian
Duploc
Dude
SDD
Graph Based Tools:
(sub) graphs are compared to
each other
STATEOFTHEARTTOOLS
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
• Application of Kernel Methods to Source Code Structures
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CONTRI
BUTION
CLONE DETECTION
• Combining different sources of information to improve the
effectiveness of the detection process
• Structural and Lexical Information
• Application of Kernel Methods to Source Code Structures
• Typically applied in other field such as NLP or Bioinformatics
Corazza, A., Di Martino, S., Maggio, V., Scanniello, G.
A Tree Kernel based approach for clone detection
(2010) IEEE International Conference on Software Maintenance, ICSM, art. no. 5609715. ISBN: 978-142448629-8
Corazza, A., Di Martino, S., Maggio, V., Moschitti, A., Passerini, A., Scanniello, G., Silvestri F.
Using Machine Learning and Information Retrieval Techniques to Improve Software Maintainability
(2013) Communications in Computer and Information Science, In Press
CODE
STRUCTURES
KERNELSFORSTRUCTURES
Computation of the dot product between (Graph) Structures
K( ),
CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
CODE
STRUCTURES
KERNELSFORSTRUCTURES
Abstract Syntax Tree (AST)
Tree structure representing the syntactic structure of
the different instructions of a program (function)
Program Dependencies Graph (PDG)
(Directed) Graph structure representing the relationship
among the different statement of a program
Computation of the dot product between (Graph) Structures
K( ),
CODE
KERNELFORCLONES
<
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST
KERNELFORCLONES
<
x y = =
x +
x 1
y -
y 1
while
block
while
block
block
if
>
b a = =
a +
a 1
b -
b 1
>
b 0 =
c 3
CODE AST AST KERNEL
KERNELFORCLONES
<
block
while
= =
block
=
y -
=
x +
+
x 1
-
y 1
<
x y
>
b 0 =
c 3
if
block
>
b a
-
b 1
<
block
while
+
a 1
=
b -
=
a +
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
while
block<
x y
KERNELS
FOR CODE
STRUCTURES:
AST
KERNELFEATURES
IC = Conditional-Expr
I = Less-operator
C = Loop
Ls= [x,y]
IC = Loop
I = while-loop
C = Function-Body
Ls= [x, y]
Instruction Class (IC)
i.e., LOOP, CALL,
CONDITIONAL_STATEMENT
Instruction (I)
i.e., FOR, IF, WHILE, RETURN
Context (C)
i.e., Instruction Class of
the closer statement node
Lexemes (Ls)
Lexical information gathered
(recursively) from leaves
IC = Block
I = while-body
C = Loop
Ls= [ x ]
CLONE DETECTION
• Comparison with another (pure) AST-based clone detector
• Comparison on a system with randomly seeded clones
0
0.25
0.5
0.75
1
Precision Recall F-measure
CloneDigger Tree Kernel Tool
RE
SULTS
Results refer to clones where code
fragments have been modified by adding/
removing or changing code statements
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
CONCLUSIONS
FUTURE RESEARCH DIRECTIONS
• Re-modularization:
• Integrate code normalization algorithm (LINSEN) to lexical analysis
• Code Normalization:
• Investigate the contributions of different sources of information
used for disambiguations
• Clone Detection:
• Combine Static (and Kernel methods) with Dynamic Analysis
THANK YOU
June 5th, 2013 - Ph.D. Defense
Valerio Maggio
Ph.D. Student, University of Naples “Federico II”
valerio.maggio@unina.it

Weitere ähnliche Inhalte

Was ist angesagt?

Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersTao Xie
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecurityTao Xie
 
Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...SC CTSI at USC and CHLA
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringMehdi Mirakhorli
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTao Xie
 
Bug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfBug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfHideaki Hata
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software RepositoriesIsrael Herraiz
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...inside-BigData.com
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models IJECEIAES
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTao Xie
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science CommunicationIsabelle Augenstein
 
Requirementv4
Requirementv4Requirementv4
Requirementv4stat
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...Manoj895639
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewDr. Ananth Krishnamoorthy
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesPvrtechnologies Nellore
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsPeter van Heusden
 

Was ist angesagt? (20)

Software Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that MattersSoftware Analytics: Towards Software Mining that Matters
Software Analytics: Towards Software Mining that Matters
 
Software Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and SecuritySoftware Analytics: Data Analytics for Software Engineering and Security
Software Analytics: Data Analytics for Software Engineering and Security
 
Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...Using electronic laboratory notebooks in the academic life sciences: a group ...
Using electronic laboratory notebooks in the academic life sciences: a group ...
 
Big(ger) Data in Software Engineering
Big(ger) Data in Software EngineeringBig(ger) Data in Software Engineering
Big(ger) Data in Software Engineering
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Towards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that MattersTowards Mining Software Repositories Research that Matters
Towards Mining Software Repositories Research that Matters
 
Bug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram IdfBug or Not? Bug Report Classification using N-Gram Idf
Bug or Not? Bug Report Classification using N-Gram Idf
 
Mining Software Repositories
Mining Software RepositoriesMining Software Repositories
Mining Software Repositories
 
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
Machine & Deep Learning: Practical Deployments and Best Practices for the Nex...
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models Finding Bad Code Smells with Neural Network Models
Finding Bad Code Smells with Neural Network Models
 
Transferring Software Testing Tools to Practice
Transferring Software Testing Tools to PracticeTransferring Software Testing Tools to Practice
Transferring Software Testing Tools to Practice
 
Determining the Credibility of Science Communication
Determining the Credibility of Science CommunicationDetermining the Credibility of Science Communication
Determining the Credibility of Science Communication
 
Requirementv4
Requirementv4Requirementv4
Requirementv4
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
a-novel-web-attack-detection-system-for-internet-of-things-via-ensemble-class...
 
The Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape OverviewThe Python ecosystem for data science - Landscape Overview
The Python ecosystem for data science - Landscape Overview
 
Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017 Proposed Talk Outline for Pycon2017
Proposed Talk Outline for Pycon2017
 
Towards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniquesTowards effective bug triage with software data reduction techniques
Towards effective bug triage with software data reduction techniques
 
Assessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformaticsAssessing Galaxy's ability to express scientific workflows in bioinformatics
Assessing Galaxy's ability to express scientific workflows in bioinformatics
 

Andere mochten auch

Unit testing with Junit
Unit testing with JunitUnit testing with Junit
Unit testing with JunitValerio Maggio
 
Unsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionUnsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionValerio Maggio
 
Machine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple TroubleshootingMachine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple TroubleshootingCJ F.
 
LINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviationsLINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviationsValerio Maggio
 
Unit testing and scaffolding
Unit testing and scaffoldingUnit testing and scaffolding
Unit testing and scaffoldingValerio Maggio
 
Maintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of MachineMaintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of MachineRohit Sorte
 
Basic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance TipsBasic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance TipsWhitney Mike Segura
 
Database Refactoring
Database RefactoringDatabase Refactoring
Database RefactoringAnton Keks
 
Maintenance Management
Maintenance ManagementMaintenance Management
Maintenance ManagementBisina Keshara
 

Andere mochten auch (13)

Unit testing with Junit
Unit testing with JunitUnit testing with Junit
Unit testing with Junit
 
Unsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detectionUnsupervised Machine Learning for clone detection
Unsupervised Machine Learning for clone detection
 
Machine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple TroubleshootingMachine Maintenance and Simple Troubleshooting
Machine Maintenance and Simple Troubleshooting
 
Junit
JunitJunit
Junit
 
LINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviationsLINSEN an efficient approach to split identifiers and expand abbreviations
LINSEN an efficient approach to split identifiers and expand abbreviations
 
JavaScript Refactoring
JavaScript RefactoringJavaScript Refactoring
JavaScript Refactoring
 
Unit testing and scaffolding
Unit testing and scaffoldingUnit testing and scaffolding
Unit testing and scaffolding
 
Junit
JunitJunit
Junit
 
Maintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of MachineMaintenance of Machine & Main Functioning Parts of Machine
Maintenance of Machine & Main Functioning Parts of Machine
 
Basic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance TipsBasic Machine Shop Safety And Maintenance Tips
Basic Machine Shop Safety And Maintenance Tips
 
Database Refactoring
Database RefactoringDatabase Refactoring
Database Refactoring
 
Sap plant maintenance
Sap plant maintenanceSap plant maintenance
Sap plant maintenance
 
Maintenance Management
Maintenance ManagementMaintenance Management
Maintenance Management
 

Ähnlich wie Unsupervised Machine Learning Techniques for Software Maintenance

Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesTao Xie
 
sadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdfsadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdfshoukatali154717
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docpraveena06
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniquesijdmtaiir
 
Hardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareHardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareWinstina Kennedy
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural designHiren Selani
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)Tao Xie
 
Over view of system analysis and design
Over view of system analysis and designOver view of system analysis and design
Over view of system analysis and designSaroj Dhakal
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
CASE tools and their effects on software quality
CASE tools and their effects on software qualityCASE tools and their effects on software quality
CASE tools and their effects on software qualityUtkarsh Agarwal
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
System engineering
System engineeringSystem engineering
System engineeringLisa Elisa
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 

Ähnlich wie Unsupervised Machine Learning Techniques for Software Maintenance (20)

Introduction
IntroductionIntroduction
Introduction
 
Software Analytics - Achievements and Challenges
Software Analytics - Achievements and ChallengesSoftware Analytics - Achievements and Challenges
Software Analytics - Achievements and Challenges
 
Software Development Life Cycle
Software Development Life Cycle Software Development Life Cycle
Software Development Life Cycle
 
sadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdfsadfinal2007-121022230733-phpapp01.pdf
sadfinal2007-121022230733-phpapp01.pdf
 
V1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.docV1_I2_2012_Paper3.doc
V1_I2_2012_Paper3.doc
 
Improvement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining TechniquesImprovement of Software Maintenance and Reliability using Data Mining Techniques
Improvement of Software Maintenance and Reliability using Data Mining Techniques
 
Intro
IntroIntro
Intro
 
Hardware Design Practices For Modern Hardware
Hardware Design Practices For Modern HardwareHardware Design Practices For Modern Hardware
Hardware Design Practices For Modern Hardware
 
Unit 3 3 architectural design
Unit 3 3 architectural designUnit 3 3 architectural design
Unit 3 3 architectural design
 
Software Analytics: Towards Software Mining that Matters (2014)
Software Analytics:Towards Software Mining that Matters (2014)Software Analytics:Towards Software Mining that Matters (2014)
Software Analytics: Towards Software Mining that Matters (2014)
 
Over view of system analysis and design
Over view of system analysis and designOver view of system analysis and design
Over view of system analysis and design
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
CASE tools and their effects on software quality
CASE tools and their effects on software qualityCASE tools and their effects on software quality
CASE tools and their effects on software quality
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
System engineering
System engineeringSystem engineering
System engineering
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
H1803044651
H1803044651H1803044651
H1803044651
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 

Mehr von Valerio Maggio

Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in PythonValerio Maggio
 
Clone detection in Python
Clone detection in PythonClone detection in Python
Clone detection in PythonValerio Maggio
 
A Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detectionA Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detectionValerio Maggio
 
Refactoring: Improve the design of existing code
Refactoring: Improve the design of existing codeRefactoring: Improve the design of existing code
Refactoring: Improve the design of existing codeValerio Maggio
 
Scaffolding with JMock
Scaffolding with JMockScaffolding with JMock
Scaffolding with JMockValerio Maggio
 
Design patterns and Refactoring
Design patterns and RefactoringDesign patterns and Refactoring
Design patterns and RefactoringValerio Maggio
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven DevelopmentValerio Maggio
 

Mehr von Valerio Maggio (9)

Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
Clone detection in Python
Clone detection in PythonClone detection in Python
Clone detection in Python
 
A Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detectionA Tree Kernel based approach for clone detection
A Tree Kernel based approach for clone detection
 
Refactoring: Improve the design of existing code
Refactoring: Improve the design of existing codeRefactoring: Improve the design of existing code
Refactoring: Improve the design of existing code
 
Scaffolding with JMock
Scaffolding with JMockScaffolding with JMock
Scaffolding with JMock
 
Junit in action
Junit in actionJunit in action
Junit in action
 
Design patterns and Refactoring
Design patterns and RefactoringDesign patterns and Refactoring
Design patterns and Refactoring
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
 
Web frameworks
Web frameworksWeb frameworks
Web frameworks
 

Kürzlich hochgeladen

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 

Kürzlich hochgeladen (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 

Unsupervised Machine Learning Techniques for Software Maintenance