SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
A Heuristic-based Approach
   to Identify Concepts
    in Execution Traces
               Fatemeh Asadi*
            Massimiliano Di Penta**
              Giuliano Antoniol*
            Yann-Gaël Guéhéneuc**

     * Ecole Polytechnique de Montréal, Canada
   ** Dept. Of Engineering -– Univ. of Sannio, Italy
                 CSMR 2010 Madrid (Spain)              1
Motivations
‱ Software systems lack adequate documentation
‱ Developers try to understand systems through
   – Static analyses, visualizations built upon static data
   – Dynamic analyses, requiring the execution of the system
‱ (Dynamic) concept identification
   – Identify sets of method calls in execution traces responsible
     for the implementation of domain concepts or user-observable
     features
   – Existing approaches based on static analysis [Anquetil and
     Lethbridge (1998)], dynamic analysis [Wilde and Scully (1995)
     Tonella and Ceccato (2004)], IR techniques [Poshyvanyk et
     al. (2007)], or hybrid ones [Eaddy et al. (2008)]



                           CSMR 2010 - Madrid (Spain)                2
Proposed approach
  A novel approach that analyzes execution traces and
  groups together method calls that:
    (i) sequentially invoked together/in sequence
    (ii) cohesive and decoupled from a conceptual point of view
  Assumptions
    Let us consider a feature is being executed in a scenario
      – e.g., “Open a Web page from a browser”
        or “Save an image in a paint application”
    The set of methods related to the feature is likely to be:
      – (i) conceptually cohesive
      – (ii) decoupled from those of other features
      – (iii) sequentially invoked


                         CSMR 2010 - Madrid (Spain)               3
Proposed approach
  Step I – System instrumentation
  Step II – Execution trace collection
  Step III – Trace pruning and compression
  Step IV – Textual analysis of methods’
  source code
  Step V – Search-based concept
  identification



               CSMR 2010 - Madrid (Spain)    4
Step I and Step II – Getting Traces
   Step I - System instrumentation
     System instrumented using the MoDeC instrumentor
      – MoDeC tool to extract and model sequence diagrams for
        Java systems
     Java bytecode instrumentation tool
      – Inserts appropriate and dedicated method invocations in the
        system to method/constructor entry/exit, points
      – Allows for trace tagging


   Step II - Execution trace collection
     We exercise a system following operation sequences
     taken from user manuals or use case descriptions
                       CSMR 2010 - Madrid (Spain)                     5
Step III –          Trace Pruning and Compression
   Removing methods not very useful for feature identification
      Methods occurring in many scenarios
        – Are often utility methods
        – We use the same idea of tf-idf in Information Retrieval
      Too frequent methods
        – Could be for example related to crosscutting concerns
        – We remove methods having a frequency
          Q3 + 2 × IQR (75% percentile + 2 × the interquartile range)

   Trace compression
      Aim: collapse repetitions in execution traces
      Purpose: reduce the search space for Step V
      Examples:
        – m1(); m1(); m1();                m1();
                                           m1; m2();
        – m1(); m2(); m1(); m2();
      Performed using the Run Length Encoding (RLE)
      Applied for sub-sequences having an arbitrary length
                               CSMR 2010 - Madrid (Spain)               6
Step IV
   Conceptual cohesion and coupling determined according
   to [Marcus et al., 2008] and [Poshyvanyk et al., 2006]
   Index identifiers, comments contained in methods
     Extraction of identifiers and comment words
     Camel-case splitting of composed identifiers
     Stop word removal (English + Java keywords)
     Stemming using the Porter stemmer
     Indexing using tf-idf
     Reduce the term-document space into a (smaller) concept-
     document space using Latent Semantic Indexing (LSI)
       – Helps to cope with synonymy and homonymy
       – Concept space=50


                        CSMR 2010 - Madrid (Spain)              7
Step V
    We use a search-based optimization technique based on Genetic
    Algorithms (GA) to split traces into segments
    Representation: a bit-vector where 1 indicates the end of a segment
                Trace splitting       m1 m2 m1 m3 m4 m1 m4 m6 m1

                Representation        0     1    0   0    1    0       0       0       1


    Mutation: randomly flips a bit (i.e., splits or merge segments)
0   1   0   0    1    0   0       0    1                           0       0
                                                                           1       0       0   1   0   0   0   1


    Crossover: two-points
0   1   0   0    1    0   0       0    1                           0       1       0       0   0   1   0   0   1

0   0   1   0    0    1   0       0    1                           0       0       1       0   1   0   0   0   1


    Selection: Roulette Wheel
                                          CSMR 2010 - Madrid (Spain)                                               8
Step V – Quality of the Solution
   Fitness Function:



   Segment Cohesion is the average (textual) similarity
   between any pair of methods in a segment
   Segment Coupling is the average (textual) similarity
   between a segment and all other segments in the trace
   Other GA parameters
     200 individuals
     2,000 generations for JHotDraw and 3,000 for ArgoUML
     5% mutation probability, 70% crossover probability
     Distributed GA implementation (across 4 servers)

                       CSMR 2010 - Madrid (Spain)           9
Empirical Study
 ‱ Goal: analyze the novel concept location approach based
 ‱ Purpose: of evaluating its capability of identifying
   meaningful concepts
 ‱ Quality focus: accuracy and completeness of the
   identified concepts
 ‱ Context: an implementation of our approach and
   execution traces extracted from two open source
   systems, JHotDraw and ArgoUML




                     CSMR 2010 - Madrid (Spain)          10
Research Questions
  RQ1: How stable is the GA, through
  multiple runs, when identifying concepts
  into execution traces?
  RQ2: To what extent the identified
  concepts match the ones in the oracle?
  RQ3: How accurate is the identification of
  concepts in execution traces?



               CSMR 2010 - Madrid (Spain)      11
RQ1: GA stability
   We compute the overlap between segmentations
   obtained in multiple runs using the Jaccard overlap
   Score

     Two segments overlaps when they contain calls in the same position
     of the trace
     Because a segment of trace T1 overlaps with more segments of T2,
     the highest similarity is chosen




             Run 1 m1 m2 m1 m3 m4 m1 m4 m6 m1

             Run 2 m1 m2 m1 m3 m4 m1 m4 m6 m1

                    2/3           2/4              3/4

                          CSMR 2010 - Madrid (Spain)                      12
RQ1: Results




   Average overlap between 72% and 84%
   Slightly higher convergence for ArgoUML
   Ability of the algorithm to converge, despite the
   relatively large search space



                     CSMR 2010 - Madrid (Spain)        13
RQ2: Matching with the Oracle
  We manually tag start-end of features while
  executing the system
    Using the MoDeC instrumentation tool
    While executing the instrumented system, the user triggers the
    introduction of <Start> and <Stop> tags in the trace

  Matching between identified traces and oracle
  computed as in RQ1

        Run 1    m1 m2 m1 m3 m4 m1 m4 m6 m1

        Oracle   m1 m2 m1 m3 m4 m1 m4 m6 m1

                  2/3           2/4              3/4

                        CSMR 2010 - Madrid (Spain)                   14
RQ2: Results




  High overlap for some features
    e.g., Draw rectangle or Draw circle
  Lower for features obtained adapting other ones
    e.g., Add text obtained adapting Draw rectangle
  In other cases, low overlap is due to large segments
  split into more smaller and cohesive ones
                      CSMR 2010 - Madrid (Spain)         15
RQ3: Accuracy in trace identification
   Computed similarly to RQ2, however we use
   Precision instead of Jaccard overlap Score




         Run 1    m1 m2 m1 m3 m4 m1 m4 m6 m1

         Oracle   m1 m2 m1 m3 m4 m1 m4 m6 m1

                  2/2          2/3                3/4



                     CSMR 2010 - Madrid (Spain)         16
RQ3: Results




  Precision often very high
     In most cases above 85% and often equal to 100%
  Low precision (mean 32%) for Add text
  Relatively low (mean 69%) for Draw rectangle
  These two features are difficult to be distinguished

                        CSMR 2010 - Madrid (Spain)       17
Inspection of the obtained segments
  Add class (ArgoUML)
     The approach split this long feature of 199 methods sequence into 5 segments
     related to sub-features (creation of objects, adding the project class, handling
     namespace, setting object properties, handling persistence of the diagram)
  Create note (ArgoUML)
     Only the first part (50 methods) of the trace composed of 88 calls was identified
     Problems related to multi-threading
     Problems related to collapsing (during compression) loops containing variants
  Cut rectangle (JHotDraw)
     Only the last 39 out of 172 calls were included in the segment
     Methods related to adding to the clipboard and showing the rectangle as “cut”
     First methods related to GUI events and split in many small segments
  Spawn window (JHotDraw)
     72 out of 197 methods included
     The remaining ones were related to setting up menu command properties




                               CSMR 2010 - Madrid (Spain)                                18
Threats to Validity
   Construct validity (relation btw. theory and observation)
      Multi-threading can change the ordering of calls in multiple
      executions of the same scenario
      A better assessment of the actual content of the obtained
      segments is needed

   Internal validity (presence of confounding factors)
      Trace tagging may be imprecise, again due to multi-threading
      Noise due to utility methods
      GA intrinsic randomness

   External validity (generalization of findings)
      We analyzed two different systems, multiple traces
      As usual, further empirical evaluation is needed

                        CSMR 2010 - Madrid (Spain)                   19
Conclusions
  We proposed a search-based approach to automatically locate
  concepts in execution traces
     By splitting traces into conceptually cohesive and decoupled segments
  Empirical study on traces from JHotDraw and ArgoUML shows that
     The approach is stable
     Identified segments highly precise
     Finer-splitting wrt. high-level features
     Limitations due to: multi-threading, GUI events, feature adaptation..


  Work-in-progress:
     Improve performance
     Use enhanced compression techniques
     Automatically label identified concepts
     Perform an extensive empirical validation


                              CSMR 2010 - Madrid (Spain)                     20
Thank You!




             Questions?
              CSMR 2010 - Madrid (Spain)   21

Weitere Àhnliche Inhalte

Was ist angesagt?

Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Usatyuk Vasiliy
 
Pilot induced cyclostationarity based method for dvb system identification
Pilot induced cyclostationarity based method for dvb system identificationPilot induced cyclostationarity based method for dvb system identification
Pilot induced cyclostationarity based method for dvb system identificationiaemedu
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsGraph-TA
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...ActiveEon
 
Seminary of numerical analysis 2010
Seminary of numerical analysis 2010Seminary of numerical analysis 2010
Seminary of numerical analysis 2010Jaroslav Broz
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Visual Odomtery(2)
Visual Odomtery(2)Visual Odomtery(2)
Visual Odomtery(2)Ian Sa
 
Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;
Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;
Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;Vishalkumarec
 
Conference PhD & DLA symposium 2009
Conference PhD & DLA symposium 2009Conference PhD & DLA symposium 2009
Conference PhD & DLA symposium 2009Jaroslav Broz
 

Was ist angesagt? (10)

Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...Cycle’s topological optimizations and the iterative decoding problem on gener...
Cycle’s topological optimizations and the iterative decoding problem on gener...
 
EEDC Programming Models
EEDC Programming ModelsEEDC Programming Models
EEDC Programming Models
 
Pilot induced cyclostationarity based method for dvb system identification
Pilot induced cyclostationarity based method for dvb system identificationPilot induced cyclostationarity based method for dvb system identification
Pilot induced cyclostationarity based method for dvb system identification
 
Langford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphsLangford sequences through a product of labeled digraphs
Langford sequences through a product of labeled digraphs
 
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
PhD Thesis Defense Presentation: Robust Low-rank and Sparse Decomposition for...
 
Seminary of numerical analysis 2010
Seminary of numerical analysis 2010Seminary of numerical analysis 2010
Seminary of numerical analysis 2010
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Visual Odomtery(2)
Visual Odomtery(2)Visual Odomtery(2)
Visual Odomtery(2)
 
Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;
Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;
Thesis : &quot;IBBET : In Band Bandwidth Estimation for LAN&quot;
 
Conference PhD & DLA symposium 2009
Conference PhD & DLA symposium 2009Conference PhD & DLA symposium 2009
Conference PhD & DLA symposium 2009
 

Andere mochten auch

Soundcloud desktop
Soundcloud desktopSoundcloud desktop
Soundcloud desktoprogercana
 
ECOOP01 PhDOOS.ppt
ECOOP01 PhDOOS.pptECOOP01 PhDOOS.ppt
ECOOP01 PhDOOS.pptPtidej Team
 
WCRE11b.ppt
WCRE11b.pptWCRE11b.ppt
WCRE11b.pptPtidej Team
 
Soundcloud discount
Soundcloud discountSoundcloud discount
Soundcloud discountrogercana
 
Moeeeee
MoeeeeeMoeeeee
Moeeeee56372
 
Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...
Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...
Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...SYRTO Project
 
La governance dei dati raggiunge la piena maturitĂ 
La governance dei dati raggiunge la piena maturitĂ La governance dei dati raggiunge la piena maturitĂ 
La governance dei dati raggiunge la piena maturitĂ HP Enterprise Italia
 
LinkedIn Latest Updates 2014
LinkedIn Latest Updates 2014LinkedIn Latest Updates 2014
LinkedIn Latest Updates 2014Smart Insights
 
Good to know - Tutto a portata di app
Good to know - Tutto a portata di appGood to know - Tutto a portata di app
Good to know - Tutto a portata di appHP Enterprise Italia
 
ACCT321 Chapter 08
ACCT321 Chapter 08ACCT321 Chapter 08
ACCT321 Chapter 08iDocs
 
131014 yann-gael gueheneuc - quality, patterns, and multi-language systems
131014   yann-gael gueheneuc - quality, patterns, and multi-language systems131014   yann-gael gueheneuc - quality, patterns, and multi-language systems
131014 yann-gael gueheneuc - quality, patterns, and multi-language systemsPtidej Team
 
Uzdevumi grāmatā Succeed: How We Can Reach Our Goals apskats
Uzdevumi grāmatā Succeed: How We Can Reach Our Goals apskatsUzdevumi grāmatā Succeed: How We Can Reach Our Goals apskats
Uzdevumi grāmatā Succeed: How We Can Reach Our Goals apskatsFranklinCovey Latvia
 

Andere mochten auch (14)

Soundcloud desktop
Soundcloud desktopSoundcloud desktop
Soundcloud desktop
 
CIO Survey 2014
CIO Survey 2014CIO Survey 2014
CIO Survey 2014
 
ECOOP01 PhDOOS.ppt
ECOOP01 PhDOOS.pptECOOP01 PhDOOS.ppt
ECOOP01 PhDOOS.ppt
 
WCRE11b.ppt
WCRE11b.pptWCRE11b.ppt
WCRE11b.ppt
 
Doc newsno17260documentno6561
Doc newsno17260documentno6561Doc newsno17260documentno6561
Doc newsno17260documentno6561
 
Soundcloud discount
Soundcloud discountSoundcloud discount
Soundcloud discount
 
Moeeeee
MoeeeeeMoeeeee
Moeeeee
 
Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...
Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...
Rethinking Macroeconomic Policy II: Getting Granular - Giovanni Dell'Ariccia ...
 
La governance dei dati raggiunge la piena maturitĂ 
La governance dei dati raggiunge la piena maturitĂ La governance dei dati raggiunge la piena maturitĂ 
La governance dei dati raggiunge la piena maturitĂ 
 
LinkedIn Latest Updates 2014
LinkedIn Latest Updates 2014LinkedIn Latest Updates 2014
LinkedIn Latest Updates 2014
 
Good to know - Tutto a portata di app
Good to know - Tutto a portata di appGood to know - Tutto a portata di app
Good to know - Tutto a portata di app
 
ACCT321 Chapter 08
ACCT321 Chapter 08ACCT321 Chapter 08
ACCT321 Chapter 08
 
131014 yann-gael gueheneuc - quality, patterns, and multi-language systems
131014   yann-gael gueheneuc - quality, patterns, and multi-language systems131014   yann-gael gueheneuc - quality, patterns, and multi-language systems
131014 yann-gael gueheneuc - quality, patterns, and multi-language systems
 
Uzdevumi grāmatā Succeed: How We Can Reach Our Goals apskats
Uzdevumi grāmatā Succeed: How We Can Reach Our Goals apskatsUzdevumi grāmatā Succeed: How We Can Reach Our Goals apskats
Uzdevumi grāmatā Succeed: How We Can Reach Our Goals apskats
 

Ähnlich wie CSMR10a.ppt

SSBSE10.ppt
SSBSE10.pptSSBSE10.ppt
SSBSE10.pptPtidej Team
 
Wcre12c.ppt
Wcre12c.pptWcre12c.ppt
Wcre12c.pptPtidej Team
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep LearningJorge Cardoso
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesNikos Katirtzis
 
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN ModelsUsing Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN ModelsWaqas Tariq
 
Bifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseBifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseJorge Ressia
 
On the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling LanguagesOn the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling LanguagesJose E. Rivera
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...RSIS International
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm AnalysisMary Margarat
 
Opensource gis development - part 2
Opensource gis development - part 2Opensource gis development - part 2
Opensource gis development - part 2Andrea Antonello
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScscpconf
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUEScsitconf
 

Ähnlich wie CSMR10a.ppt (20)

Csmr10a.ppt
Csmr10a.pptCsmr10a.ppt
Csmr10a.ppt
 
SSBSE10.ppt
SSBSE10.pptSSBSE10.ppt
SSBSE10.ppt
 
Csmr10c.ppt
Csmr10c.pptCsmr10c.ppt
Csmr10c.ppt
 
Wcre12c.ppt
Wcre12c.pptWcre12c.ppt
Wcre12c.ppt
 
Mastering AIOps with Deep Learning
Mastering AIOps with Deep LearningMastering AIOps with Deep Learning
Mastering AIOps with Deep Learning
 
Summarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering TechniquesSummarizing Software API Usage Examples Using Clustering Techniques
Summarizing Software API Usage Examples Using Clustering Techniques
 
Ssbse10.ppt
Ssbse10.pptSsbse10.ppt
Ssbse10.ppt
 
BIRTE-13-Kawashima
BIRTE-13-KawashimaBIRTE-13-Kawashima
BIRTE-13-Kawashima
 
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN ModelsUsing Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
Using Met-modeling Graph Grammars and R-Maude to Process and Simulate LRN Models
 
Bifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk LooseBifrost: Setting Smalltalk Loose
Bifrost: Setting Smalltalk Loose
 
On the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling LanguagesOn the Semantics of Real-Time Domain Specific Modeling Languages
On the Semantics of Real-Time Domain Specific Modeling Languages
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
Hardware Implementations of RS Decoding Algorithm for Multi-Gb/s Communicatio...
 
Wcre12c.ppt
Wcre12c.pptWcre12c.ppt
Wcre12c.ppt
 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm Analysis
 
Opensource gis development - part 2
Opensource gis development - part 2Opensource gis development - part 2
Opensource gis development - part 2
 
ExplainableAI.pptx
ExplainableAI.pptxExplainableAI.pptx
ExplainableAI.pptx
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 

Mehr von Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software MiniaturisationPtidej Team
 
Presentation
PresentationPresentation
PresentationPtidej Team
 
Presentation
PresentationPresentation
PresentationPtidej Team
 
Presentation
PresentationPresentation
PresentationPtidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel BriandPtidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel AbdellatifPtidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh KermansaraviPtidej Team
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna AbidiPtidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel GrichiPtidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano PolitowskiPtidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisisPtidej Team
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptPtidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptPtidej Team
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.pptPtidej Team
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.pptPtidej Team
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.pptPtidej Team
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.pptPtidej Team
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.pptPtidej Team
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.pptPtidej Team
 

Mehr von Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

KĂŒrzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraĂșjo
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

KĂŒrzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

CSMR10a.ppt

  • 1. A Heuristic-based Approach to Identify Concepts in Execution Traces Fatemeh Asadi* Massimiliano Di Penta** Giuliano Antoniol* Yann-GaĂ«l GuĂ©hĂ©neuc** * Ecole Polytechnique de MontrĂ©al, Canada ** Dept. Of Engineering -– Univ. of Sannio, Italy CSMR 2010 Madrid (Spain) 1
  • 2. Motivations ‱ Software systems lack adequate documentation ‱ Developers try to understand systems through – Static analyses, visualizations built upon static data – Dynamic analyses, requiring the execution of the system ‱ (Dynamic) concept identification – Identify sets of method calls in execution traces responsible for the implementation of domain concepts or user-observable features – Existing approaches based on static analysis [Anquetil and Lethbridge (1998)], dynamic analysis [Wilde and Scully (1995) Tonella and Ceccato (2004)], IR techniques [Poshyvanyk et al. (2007)], or hybrid ones [Eaddy et al. (2008)] CSMR 2010 - Madrid (Spain) 2
  • 3. Proposed approach A novel approach that analyzes execution traces and groups together method calls that: (i) sequentially invoked together/in sequence (ii) cohesive and decoupled from a conceptual point of view Assumptions Let us consider a feature is being executed in a scenario – e.g., “Open a Web page from a browser” or “Save an image in a paint application” The set of methods related to the feature is likely to be: – (i) conceptually cohesive – (ii) decoupled from those of other features – (iii) sequentially invoked CSMR 2010 - Madrid (Spain) 3
  • 4. Proposed approach Step I – System instrumentation Step II – Execution trace collection Step III – Trace pruning and compression Step IV – Textual analysis of methods’ source code Step V – Search-based concept identification CSMR 2010 - Madrid (Spain) 4
  • 5. Step I and Step II – Getting Traces Step I - System instrumentation System instrumented using the MoDeC instrumentor – MoDeC tool to extract and model sequence diagrams for Java systems Java bytecode instrumentation tool – Inserts appropriate and dedicated method invocations in the system to method/constructor entry/exit, points – Allows for trace tagging Step II - Execution trace collection We exercise a system following operation sequences taken from user manuals or use case descriptions CSMR 2010 - Madrid (Spain) 5
  • 6. Step III – Trace Pruning and Compression Removing methods not very useful for feature identification Methods occurring in many scenarios – Are often utility methods – We use the same idea of tf-idf in Information Retrieval Too frequent methods – Could be for example related to crosscutting concerns – We remove methods having a frequency Q3 + 2 × IQR (75% percentile + 2 × the interquartile range) Trace compression Aim: collapse repetitions in execution traces Purpose: reduce the search space for Step V Examples: – m1(); m1(); m1(); m1(); m1; m2(); – m1(); m2(); m1(); m2(); Performed using the Run Length Encoding (RLE) Applied for sub-sequences having an arbitrary length CSMR 2010 - Madrid (Spain) 6
  • 7. Step IV Conceptual cohesion and coupling determined according to [Marcus et al., 2008] and [Poshyvanyk et al., 2006] Index identifiers, comments contained in methods Extraction of identifiers and comment words Camel-case splitting of composed identifiers Stop word removal (English + Java keywords) Stemming using the Porter stemmer Indexing using tf-idf Reduce the term-document space into a (smaller) concept- document space using Latent Semantic Indexing (LSI) – Helps to cope with synonymy and homonymy – Concept space=50 CSMR 2010 - Madrid (Spain) 7
  • 8. Step V We use a search-based optimization technique based on Genetic Algorithms (GA) to split traces into segments Representation: a bit-vector where 1 indicates the end of a segment Trace splitting m1 m2 m1 m3 m4 m1 m4 m6 m1 Representation 0 1 0 0 1 0 0 0 1 Mutation: randomly flips a bit (i.e., splits or merge segments) 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 1 Crossover: two-points 0 1 0 0 1 0 0 0 1 0 1 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 1 Selection: Roulette Wheel CSMR 2010 - Madrid (Spain) 8
  • 9. Step V – Quality of the Solution Fitness Function: Segment Cohesion is the average (textual) similarity between any pair of methods in a segment Segment Coupling is the average (textual) similarity between a segment and all other segments in the trace Other GA parameters 200 individuals 2,000 generations for JHotDraw and 3,000 for ArgoUML 5% mutation probability, 70% crossover probability Distributed GA implementation (across 4 servers) CSMR 2010 - Madrid (Spain) 9
  • 10. Empirical Study ‱ Goal: analyze the novel concept location approach based ‱ Purpose: of evaluating its capability of identifying meaningful concepts ‱ Quality focus: accuracy and completeness of the identified concepts ‱ Context: an implementation of our approach and execution traces extracted from two open source systems, JHotDraw and ArgoUML CSMR 2010 - Madrid (Spain) 10
  • 11. Research Questions RQ1: How stable is the GA, through multiple runs, when identifying concepts into execution traces? RQ2: To what extent the identified concepts match the ones in the oracle? RQ3: How accurate is the identification of concepts in execution traces? CSMR 2010 - Madrid (Spain) 11
  • 12. RQ1: GA stability We compute the overlap between segmentations obtained in multiple runs using the Jaccard overlap Score Two segments overlaps when they contain calls in the same position of the trace Because a segment of trace T1 overlaps with more segments of T2, the highest similarity is chosen Run 1 m1 m2 m1 m3 m4 m1 m4 m6 m1 Run 2 m1 m2 m1 m3 m4 m1 m4 m6 m1 2/3 2/4 3/4 CSMR 2010 - Madrid (Spain) 12
  • 13. RQ1: Results Average overlap between 72% and 84% Slightly higher convergence for ArgoUML Ability of the algorithm to converge, despite the relatively large search space CSMR 2010 - Madrid (Spain) 13
  • 14. RQ2: Matching with the Oracle We manually tag start-end of features while executing the system Using the MoDeC instrumentation tool While executing the instrumented system, the user triggers the introduction of <Start> and <Stop> tags in the trace Matching between identified traces and oracle computed as in RQ1 Run 1 m1 m2 m1 m3 m4 m1 m4 m6 m1 Oracle m1 m2 m1 m3 m4 m1 m4 m6 m1 2/3 2/4 3/4 CSMR 2010 - Madrid (Spain) 14
  • 15. RQ2: Results High overlap for some features e.g., Draw rectangle or Draw circle Lower for features obtained adapting other ones e.g., Add text obtained adapting Draw rectangle In other cases, low overlap is due to large segments split into more smaller and cohesive ones CSMR 2010 - Madrid (Spain) 15
  • 16. RQ3: Accuracy in trace identification Computed similarly to RQ2, however we use Precision instead of Jaccard overlap Score Run 1 m1 m2 m1 m3 m4 m1 m4 m6 m1 Oracle m1 m2 m1 m3 m4 m1 m4 m6 m1 2/2 2/3 3/4 CSMR 2010 - Madrid (Spain) 16
  • 17. RQ3: Results Precision often very high In most cases above 85% and often equal to 100% Low precision (mean 32%) for Add text Relatively low (mean 69%) for Draw rectangle These two features are difficult to be distinguished CSMR 2010 - Madrid (Spain) 17
  • 18. Inspection of the obtained segments Add class (ArgoUML) The approach split this long feature of 199 methods sequence into 5 segments related to sub-features (creation of objects, adding the project class, handling namespace, setting object properties, handling persistence of the diagram) Create note (ArgoUML) Only the first part (50 methods) of the trace composed of 88 calls was identified Problems related to multi-threading Problems related to collapsing (during compression) loops containing variants Cut rectangle (JHotDraw) Only the last 39 out of 172 calls were included in the segment Methods related to adding to the clipboard and showing the rectangle as “cut” First methods related to GUI events and split in many small segments Spawn window (JHotDraw) 72 out of 197 methods included The remaining ones were related to setting up menu command properties CSMR 2010 - Madrid (Spain) 18
  • 19. Threats to Validity Construct validity (relation btw. theory and observation) Multi-threading can change the ordering of calls in multiple executions of the same scenario A better assessment of the actual content of the obtained segments is needed Internal validity (presence of confounding factors) Trace tagging may be imprecise, again due to multi-threading Noise due to utility methods GA intrinsic randomness External validity (generalization of findings) We analyzed two different systems, multiple traces As usual, further empirical evaluation is needed CSMR 2010 - Madrid (Spain) 19
  • 20. Conclusions We proposed a search-based approach to automatically locate concepts in execution traces By splitting traces into conceptually cohesive and decoupled segments Empirical study on traces from JHotDraw and ArgoUML shows that The approach is stable Identified segments highly precise Finer-splitting wrt. high-level features Limitations due to: multi-threading, GUI events, feature adaptation.. Work-in-progress: Improve performance Use enhanced compression techniques Automatically label identified concepts Perform an extensive empirical validation CSMR 2010 - Madrid (Spain) 20
  • 21. Thank You! Questions? CSMR 2010 - Madrid (Spain) 21