SlideShare ist ein Scribd-Unternehmen logo
1 von 56
Downloaden Sie, um offline zu lesen
EXPANDING IDENTIFIERS TO
                NORMALIZING SOURCE
                 CODE VOCABULARY
                            PRESENTED BY DAWN LAWRIE
                           LOYOLA UNIVERSITY MARYLAND


                        IN COLLABORATION WITH DAVE BINKLEY




Friday, October 7, 11
VOCABULARY MISMATCH


                        DIFFERENT VOCABULARY IN SOURCE CODE AND OTHER
                        SOFTWARE ARTIFACTS

                        EXAMPLE

                          REQUIREMENT - “FEATURE LOCATION”

                          SOURCE CODE - “FEATURELOCATION”

                            OR WORSE     “FLOC”




Friday, October 7, 11
PURPOSE OF NORMALIZE



                        COPE WITH VOCABULARY MISMATCH

                         SOURCE CODE

                         OTHER SOFTWARE DOCUMENTS




Friday, October 7, 11
EXAMPLE PROBLEMS



                        CONSIDER IDENTIFIERS

                         FEATURELOCATION

                         FLOC




Friday, October 7, 11
EXAMPLE PROBLEMS



                        CONSIDER IDENTIFIERS

                         FEATURE LOCATION      SPLITTING PROBLEM

                         FLOC




Friday, October 7, 11
EXAMPLE PROBLEMS



                        CONSIDER IDENTIFIERS

                         FEATURE LOCATION      SPLITTING PROBLEM

                         F LOC                 SPLITTING PROBLEM




Friday, October 7, 11
EXAMPLE PROBLEMS



                        CONSIDER IDENTIFIERS

                         FEATURE LOCATION      SPLITTING PROBLEM

                         FEATURE LOCATION      SPLITTING AND
                                               EXPANSION PROBLEM




Friday, October 7, 11
WHY NORMALIZE?



                        MANY SE PROBLEMS CAN BE ADDRESSED USING
                        INFORMATION RETRIEVAL (IR) TECHNIQUES

                        UN-NORMALIZED CODE LEADS TO AN UNDER
                        ESTIMATE OF THE IMPORTANCE OF CRUCIAL WORDS




Friday, October 7, 11
NORMALIZE PROBLEM STATEMENT




                        FIND THE BEST EXPANSION OVERALL POSSIBLE SPLITS




                            FLOC           FEATURE LOCATION


Friday, October 7, 11
NORMALIZE ALGORITHM



                        TERMINOLOGY

                         HARD-WORD - WHITEHOUSE_LAWN

                         SOFT-WORD - WHITE-HOUSE_LAWN




Friday, October 7, 11
NORMALIZE ALGORITHM



                        TERMINOLOGY

                         HARD-WORD - WHITEHOUSE_LAWN    (2)

                         SOFT-WORD - WHITE-HOUSE_LAWN




Friday, October 7, 11
NORMALIZE ALGORITHM



                        TERMINOLOGY

                         HARD-WORD - WHITEHOUSE_LAWN    (2)

                         SOFT-WORD - WHITE-HOUSE_LAWN   (3)




Friday, October 7, 11
NORMALIZE ALGORITHM




Friday, October 7, 11
NORMALIZE ALGORITHM


                        STRLEN    STRING LENGTH




Friday, October 7, 11
MACHINE TRANSLATION
                             APPROACH


                        EL   PAPA   VISITA   LA   IGLESIA




Friday, October 7, 11
MACHINE TRANSLATION
                              APPROACH


                        EL   PAPA  VISITA LA IGLESIA
                            FATHER VISITS
                        THE POTATO VISITOR THE CHURCH
                             POPE    HIT




Friday, October 7, 11
MACHINE TRANSLATION
                              APPROACH


                        EL   PAPA  VISITA LA IGLESIA
                            FATHER VISITS
                        THE POTATO VISITOR THE CHURCH
                             POPE    HIT




Friday, October 7, 11
MACHINE TRANSLATION
                              APPROACH


                        EL   PAPA   VISITA LA IGLESIA
                            FATHER VISITS
                        THE POTATO VISITOR THE CHURCH
                             POPE     HIT COH ESION
                                  STRONG




Friday, October 7, 11
MACHINE TRANSLATION
                              APPROACH


                        EL   PAPA   VISITA LA IGLESIA
                            FATHER VISITS
                        THE POTATO VISITOR THE CHURCH
                             POPE     HIT COH ESION
                                  STRONG




Friday, October 7, 11
NORMALIZE ALGORITHM




Friday, October 7, 11
NORMALIZE ALGORITHM

       STRLEN




Friday, October 7, 11
NORMALIZE ALGORITHM

       STRLEN
       S-TRLEN
        ST-RLEN
       STR-LEN
       STRL_EN
       STRLE_N
       S_T_RLEN
        S-TR-LEN
        S_TRL_EN
         S_TRLE_N
         ST_R_LEN
          ST_RL_EN
          ST_RLE_N
           STR_L_EN
           STR_LE_N
            STRL_E_N
            S_T_R_LEN
            S_T_RL_EN
             S_T_RLE_N
             S_TR_L_EN
              S_TR_LE_N
              S_TRL_E_N
               ST_R_L_EN
               ST_R_LE_N
                ST_RL_E_N
                STR_L_E_N
                S_T_R_L_EN
                 S_T_R_LE_N
                 S_TR_L_E_N
                  ST_R_L_E_N
                  S-T-R-L-E-N
Friday, October 7, 11
NORMALIZE ALGORITHM

       STRLEN
       S-TRLEN
                                E(RLEN) = {RIFLEMEN}
        ST-RLEN
       STR-LEN
       STRL_EN
       STRLE_N
       S_T_RLEN
        S-TR-LEN
        S_TRL_EN
         S_TRLE_N
         ST_R_LEN
          ST_RL_EN
          ST_RLE_N
           STR_L_EN
           STR_LE_N
            STRL_E_N
            S_T_R_LEN
            S_T_RL_EN
             S_T_RLE_N
             S_TR_L_EN
              S_TR_LE_N
              S_TRL_E_N
               ST_R_L_EN
               ST_R_LE_N
                ST_RL_E_N
                STR_L_E_N
                S_T_R_L_EN
                 S_T_R_LE_N
                 S_TR_L_E_N
                  ST_R_L_E_N
                  S-T-R-L-E-N
Friday, October 7, 11
NORMALIZE ALGORITHM

       STRLEN
       S-TRLEN
                                E(RLEN) = {RIFLEMEN}
        ST-RLEN
                                WILDCARD EXPANSION
       STR-LEN
       STRL_EN
       STRLE_N                       R*L*E*N*
       S_T_RLEN
        S-TR-LEN
        S_TRL_EN
         S_TRLE_N
         ST_R_LEN
          ST_RL_EN
          ST_RLE_N
           STR_L_EN
           STR_LE_N
            STRL_E_N
            S_T_R_LEN
            S_T_RL_EN
             S_T_RLE_N
             S_TR_L_EN
              S_TR_LE_N
              S_TRL_E_N
               ST_R_L_EN
               ST_R_LE_N
                ST_RL_E_N
                STR_L_E_N
                S_T_R_L_EN
                 S_T_R_LE_N
                 S_TR_L_E_N
                  ST_R_L_E_N
                  S-T-R-L-E-N
Friday, October 7, 11
NORMALIZE ALGORITHM

       STRLEN
                              E(ST) = {SET, STOP, STRING}
       S-TRLEN
                                 E(RLEN) = {RIFLEMEN}
        ST-RLEN
       STR-LEN             E(STR) = {STEER, STRING}
       STRL_EN            E(LEN) = {LENDER, LENGTH}
       STRLE_N
       S_T_RLEN
        S-TR-LEN
        S_TRL_EN
         S_TRLE_N
         ST_R_LEN
          ST_RL_EN
          ST_RLE_N
           STR_L_EN
           STR_LE_N
            STRL_E_N
            S_T_R_LEN
            S_T_RL_EN
             S_T_RLE_N
             S_TR_L_EN
              S_TR_LE_N
              S_TRL_E_N
               ST_R_L_EN
               ST_R_LE_N
                ST_RL_E_N
                STR_L_E_N
                S_T_R_L_EN
                 S_T_R_LE_N
                 S_TR_L_E_N
                  ST_R_L_E_N
                  S-T-R-L-E-N
Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                VS

                STRING               STEER




Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                  VS
                         LENDER                LENDER
                STRING                 STEER
                         LENGTH                LENGTH




Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                         VS
                                LENDER                 LENDER
                STRING                         STEER
                                LENGTH                 LENGTH




                        1. FIND COHESION BY SUMMING LOG OF
                             PROBABILITIES OF WORD PAIRS



Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                         VS
                         LENDER                        LENDER
                STRING                         STEER
                       + LENGTH                      + LENGTH
                       COHESIONA                     COHESIONB



                        1. FIND COHESION BY SUMMING LOG OF
                             PROBABILITIES OF WORD PAIRS



Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                         VS
                         LENDER                        LENDER
                STRING                         STEER
                       + LENGTH                      + LENGTH
                       COHESIONA                     COHESIONB



                        1. FIND COHESION BY SUMMING LOG OF
                             PROBABILITIES OF WORD PAIRS
                    2. SELECT EXPANSION THAT MAXIMIZES
                                  COHESION
Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                         VS
                         LENDER                        LENDER
                STRING                         STEER
                       + LENGTH                      + LENGTH
                       COHESIONA                     COHESIONB



                        1. FIND COHESION BY SUMMING LOG OF
                             PROBABILITIES OF WORD PAIRS
                    2. SELECT EXPANSION THAT MAXIMIZES
                                  COHESION
Friday, October 7, 11
NORMALIZE ALGORITHM PART I
             STR
                                         VS
                         LENDER                        LENDER
                STRING                         STEER
                       + LENGTH                      + LENGTH
                       COHESIONA                     COHESIONB

                                    STRING
                        1. FIND COHESION BY SUMMING LOG OF
                             PROBABILITIES OF WORD PAIRS
                    2. SELECT EXPANSION THAT MAXIMIZES
                                  COHESION
Friday, October 7, 11
NORMALIZE ALGORITHM PART II

                                  VS

                        STR-LEN        ST-RLEN




Friday, October 7, 11
NORMALIZE ALGORITHM PART II

                                        VS

                          STR-LEN              ST-RLEN
                        STRING LENGTH        STOP RIFLEMEN




Friday, October 7, 11
NORMALIZE ALGORITHM PART II

                                        VS

                          STR-LEN              ST-RLEN
                        STRING LENGTH        STOP RIFLEMEN




                    1. FIND COHESION OVER EXPANSIONS




Friday, October 7, 11
NORMALIZE ALGORITHM PART II

                                         VS

                          STR-LEN                 ST-RLEN
                        STRING LENGTH           STOP RIFLEMEN




                    1. FIND COHESION OVER EXPANSIONS
                        2. SELECT EXPANSION OF THE SPLIT
                            THAT MAXIMIZES COHESION

Friday, October 7, 11
NORMALIZE ALGORITHM PART II

                                         VS

                          STR-LEN                 ST-RLEN
                        STRING LENGTH           STOP RIFLEMEN




                    1. FIND COHESION OVER EXPANSIONS
                        2. SELECT EXPANSION OF THE SPLIT
                            THAT MAXIMIZES COHESION

Friday, October 7, 11
NORMALIZE ALGORITHM PART II

                                         VS

                          STR-LEN                 ST-RLEN
                        STRING LENGTH           STOP RIFLEMEN

                             STRING LENGTH
                    1. FIND COHESION OVER EXPANSIONS
                        2. SELECT EXPANSION OF THE SPLIT
                            THAT MAXIMIZES COHESION

Friday, October 7, 11
ADDING CONTEXT




Friday, October 7, 11
ADDING CONTEXT

             DIR




Friday, October 7, 11
ADDING CONTEXT

             DIR        E(DIR) = {DIRECTION, DIRECTORY}




Friday, October 7, 11
ADDING CONTEXT

             DIR         E(DIR) = {DIRECTION, DIRECTORY}
                        CONTEXT = {FORWARD, BACKWARD}




Friday, October 7, 11
ADDING CONTEXT

             DIR             E(DIR) = {DIRECTION, DIRECTORY}
                            CONTEXT = {FORWARD, BACKWARD}



                        FIND COHESION WITH CONTEXT WORDS IN ADDITION TO
                        EXPANSIONS OF OTHER SOFT WORDS

                        USED IN BOTH PART 1 AND PART 2




Friday, October 7, 11
NORMALIZE IMPLEMENTATION




                        USES GenTest TO SPLIT IDENTIFIERS

                          RETURNS MULTIPLE SPLITS

                        GOOGLE 5-GRAM DATASET




Friday, October 7, 11
EVALUATION

                    Program             Loc        SLoc     Unique Ids

                    which-2.20         3,670       2,293       487

                        a2ps-4.14      62,347     38,436       4,393


                    Program         Selected Ids Hard Words Soft Words

                    which-2.20          487        903         1214

                        a2ps-4.14       211        459         618




Friday, October 7, 11
EVALUATION

                        THREE GROUPS OF IDENTIFIERS

                          STANDARD LIBRARY CALLS

                          NAMES FROM STANDARD HEADER FILES / KEYWORDS

                          DOMAIN NAMES




Friday, October 7, 11
EVALUATION

                        THREE GROUPS OF IDENTIFIERS

                          STANDARD LIBRARY CALLS

                          NAMES FROM STANDARD HEADER FILES / KEYWORDS

                          DOMAIN NAMES




Friday, October 7, 11
EVALUATION

                        THREE GROUPS OF IDENTIFIERS

                          STANDARD LIBRARY CALLS

                          NAMES FROM STANDARD HEADER FILES / KEYWORDS

                          DOMAIN NAMES


                         Program         Filtered Ids   Reported Ids

                         which-2.20          152            335

                         a2ps-4.14            46            166

Friday, October 7, 11
EXAMPLE EXPANSIONS

                          id           Top 10         Top Expansion
                                     Expansion
                        nextchar    next_character     next_character
                        indfound   index_found_need     index_found
                         optarg      option_are_g          optarg
                        itemno       i_them_not           itemno




Friday, October 7, 11
RESEARCH QUESTIONS



                        WHAT IS THE OVERALL ACCURACY OF NORMALIZE?

                        DOES THE VOCABULARY USED HAVE A SIGNIFICANT
                        IMPACT ON THE EXPANSION’S ACCURACY?

                        CAN THE EXPANDER INFORM THE SPLITTER?

                        CAN THE SPLITTER INFORM THE EXPANDER?




Friday, October 7, 11
ACCURACY ON DOMAIN IDS




Friday, October 7, 11
SOURCE OF EXPANSION WORDS



                        SOURCE CODE

                        INTERNAL DOCUMENTATION

                        MANUAL




Friday, October 7, 11
BEST VOCABULARY SOURCE?




Friday, October 7, 11
FUTURE WORK


                        EXPLORING DIFFERENT SOURCES OF CO-OCCURRENCE
                        DATA

                        EXPLORING DIFFERENT WAYS OF CALCULATING
                        PROBABILITIES

                        EXAMINING NORMALIZATION IN CONTEXT OF AN
                        INFORMATION RETRIEVAL TASK




Friday, October 7, 11
SUMMARY



                        IDENTIFIERS ARE WRITTEN DIFFERENTLY THAN OTHER
                        SOFTWARE DOCUMENTS

                          DEGRADES PERFORMANCE OF IR TECHNIQUES

                        NORMALIZE CURRENTLY EXPANDS ABOUT HALF OF
                        SOFT WORDS CORRECTLY




Friday, October 7, 11
QUESTIONS?


                         Need an identifier split?
                        GenTest Splitter available at
                            splitit.cs.loyola.edu



Friday, October 7, 11

Weitere ähnliche Inhalte

Andere mochten auch

Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...ICSM 2011
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...ICSM 2011
 
Postdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindlePostdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindleICSM 2011
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationICSM 2011
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskICSM 2011
 
Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11ICSM 2011
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteICSM 2011
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchICSM 2011
 
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...ICSM 2011
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...ICSM 2011
 
ICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM 2011
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...ICSM 2011
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...ICSM 2011
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...ICSM 2011
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...ICSM 2011
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical DebtICSM 2011
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...ICSM 2011
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...ICSM 2011
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsICSM 2011
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects ICSM 2011
 

Andere mochten auch (20)

Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
Migration and Refactoring - Identifying Overly Strong Conditions in Refactori...
 
Industry - Estimating software maintenance effort from use cases an indu...
Industry - Estimating software maintenance effort from use cases an      indu...Industry - Estimating software maintenance effort from use cases an      indu...
Industry - Estimating software maintenance effort from use cases an indu...
 
Postdoc Symposium - Abram Hindle
Postdoc Symposium - Abram HindlePostdoc Symposium - Abram Hindle
Postdoc Symposium - Abram Hindle
 
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change PropagationImpact analysis - A Seismology-inspired Approach to Study Change Propagation
Impact analysis - A Seismology-inspired Approach to Study Change Propagation
 
ERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to TaskERA - Clustering and Recommending Collections of Code Relevant to Task
ERA - Clustering and Recommending Collections of Code Relevant to Task
 
Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11Richard Kemmerer Keynote icsm11
Richard Kemmerer Keynote icsm11
 
Lionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 KeynoteLionel Briand ICSM 2011 Keynote
Lionel Briand ICSM 2011 Keynote
 
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software SearchERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
ERA - A Comparison of Stemmers on Source Code Identifiers for Software Search
 
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
Dynamic Analysis - SCOTCH: Improving Test-to-Code Traceability using Slicing ...
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
 
ICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer KoschkeICSM'01 Most Influential Paper - Rainer Koschke
ICSM'01 Most Influential Paper - Rainer Koschke
 
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
Impact Analysis - ImpactScale: Quantifying Change Impact to Predict Faults in...
 
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
Components - Crossing the Boundaries while Analyzing Heterogeneous Component-...
 
Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...Industry - The Evolution of Information Systems. A Case Study on Document Man...
Industry - The Evolution of Information Systems. A Case Study on Document Man...
 
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
Faults and Regression testing - Localizing Failure-Inducing Program Edits Bas...
 
ERA - Tracking Technical Debt
ERA - Tracking Technical DebtERA - Tracking Technical Debt
ERA - Tracking Technical Debt
 
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...Industry -  Relating Developers' Concepts and Artefact Vocabulary in a Financ...
Industry - Relating Developers' Concepts and Artefact Vocabulary in a Financ...
 
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...Industry - Evolution and migration - Incremental and Iterative Reengineering ...
Industry - Evolution and migration - Incremental and Iterative Reengineering ...
 
Natural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming ConventionsNatural Language Analysis - Mining Java Class Naming Conventions
Natural Language Analysis - Mining Java Class Naming Conventions
 
Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects Industry - Testing & Quality Assurance in Data Migration Projects
Industry - Testing & Quality Assurance in Data Migration Projects
 

Kürzlich hochgeladen

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 

Kürzlich hochgeladen (20)

Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 

Natural Language Analysis - Expanding Identifiers to Normalize Source Code Vocabulary

  • 1. EXPANDING IDENTIFIERS TO NORMALIZING SOURCE CODE VOCABULARY PRESENTED BY DAWN LAWRIE LOYOLA UNIVERSITY MARYLAND IN COLLABORATION WITH DAVE BINKLEY Friday, October 7, 11
  • 2. VOCABULARY MISMATCH DIFFERENT VOCABULARY IN SOURCE CODE AND OTHER SOFTWARE ARTIFACTS EXAMPLE REQUIREMENT - “FEATURE LOCATION” SOURCE CODE - “FEATURELOCATION” OR WORSE “FLOC” Friday, October 7, 11
  • 3. PURPOSE OF NORMALIZE COPE WITH VOCABULARY MISMATCH SOURCE CODE OTHER SOFTWARE DOCUMENTS Friday, October 7, 11
  • 4. EXAMPLE PROBLEMS CONSIDER IDENTIFIERS FEATURELOCATION FLOC Friday, October 7, 11
  • 5. EXAMPLE PROBLEMS CONSIDER IDENTIFIERS FEATURE LOCATION SPLITTING PROBLEM FLOC Friday, October 7, 11
  • 6. EXAMPLE PROBLEMS CONSIDER IDENTIFIERS FEATURE LOCATION SPLITTING PROBLEM F LOC SPLITTING PROBLEM Friday, October 7, 11
  • 7. EXAMPLE PROBLEMS CONSIDER IDENTIFIERS FEATURE LOCATION SPLITTING PROBLEM FEATURE LOCATION SPLITTING AND EXPANSION PROBLEM Friday, October 7, 11
  • 8. WHY NORMALIZE? MANY SE PROBLEMS CAN BE ADDRESSED USING INFORMATION RETRIEVAL (IR) TECHNIQUES UN-NORMALIZED CODE LEADS TO AN UNDER ESTIMATE OF THE IMPORTANCE OF CRUCIAL WORDS Friday, October 7, 11
  • 9. NORMALIZE PROBLEM STATEMENT FIND THE BEST EXPANSION OVERALL POSSIBLE SPLITS FLOC FEATURE LOCATION Friday, October 7, 11
  • 10. NORMALIZE ALGORITHM TERMINOLOGY HARD-WORD - WHITEHOUSE_LAWN SOFT-WORD - WHITE-HOUSE_LAWN Friday, October 7, 11
  • 11. NORMALIZE ALGORITHM TERMINOLOGY HARD-WORD - WHITEHOUSE_LAWN (2) SOFT-WORD - WHITE-HOUSE_LAWN Friday, October 7, 11
  • 12. NORMALIZE ALGORITHM TERMINOLOGY HARD-WORD - WHITEHOUSE_LAWN (2) SOFT-WORD - WHITE-HOUSE_LAWN (3) Friday, October 7, 11
  • 14. NORMALIZE ALGORITHM STRLEN STRING LENGTH Friday, October 7, 11
  • 15. MACHINE TRANSLATION APPROACH EL PAPA VISITA LA IGLESIA Friday, October 7, 11
  • 16. MACHINE TRANSLATION APPROACH EL PAPA VISITA LA IGLESIA FATHER VISITS THE POTATO VISITOR THE CHURCH POPE HIT Friday, October 7, 11
  • 17. MACHINE TRANSLATION APPROACH EL PAPA VISITA LA IGLESIA FATHER VISITS THE POTATO VISITOR THE CHURCH POPE HIT Friday, October 7, 11
  • 18. MACHINE TRANSLATION APPROACH EL PAPA VISITA LA IGLESIA FATHER VISITS THE POTATO VISITOR THE CHURCH POPE HIT COH ESION STRONG Friday, October 7, 11
  • 19. MACHINE TRANSLATION APPROACH EL PAPA VISITA LA IGLESIA FATHER VISITS THE POTATO VISITOR THE CHURCH POPE HIT COH ESION STRONG Friday, October 7, 11
  • 21. NORMALIZE ALGORITHM STRLEN Friday, October 7, 11
  • 22. NORMALIZE ALGORITHM STRLEN S-TRLEN ST-RLEN STR-LEN STRL_EN STRLE_N S_T_RLEN S-TR-LEN S_TRL_EN S_TRLE_N ST_R_LEN ST_RL_EN ST_RLE_N STR_L_EN STR_LE_N STRL_E_N S_T_R_LEN S_T_RL_EN S_T_RLE_N S_TR_L_EN S_TR_LE_N S_TRL_E_N ST_R_L_EN ST_R_LE_N ST_RL_E_N STR_L_E_N S_T_R_L_EN S_T_R_LE_N S_TR_L_E_N ST_R_L_E_N S-T-R-L-E-N Friday, October 7, 11
  • 23. NORMALIZE ALGORITHM STRLEN S-TRLEN E(RLEN) = {RIFLEMEN} ST-RLEN STR-LEN STRL_EN STRLE_N S_T_RLEN S-TR-LEN S_TRL_EN S_TRLE_N ST_R_LEN ST_RL_EN ST_RLE_N STR_L_EN STR_LE_N STRL_E_N S_T_R_LEN S_T_RL_EN S_T_RLE_N S_TR_L_EN S_TR_LE_N S_TRL_E_N ST_R_L_EN ST_R_LE_N ST_RL_E_N STR_L_E_N S_T_R_L_EN S_T_R_LE_N S_TR_L_E_N ST_R_L_E_N S-T-R-L-E-N Friday, October 7, 11
  • 24. NORMALIZE ALGORITHM STRLEN S-TRLEN E(RLEN) = {RIFLEMEN} ST-RLEN WILDCARD EXPANSION STR-LEN STRL_EN STRLE_N R*L*E*N* S_T_RLEN S-TR-LEN S_TRL_EN S_TRLE_N ST_R_LEN ST_RL_EN ST_RLE_N STR_L_EN STR_LE_N STRL_E_N S_T_R_LEN S_T_RL_EN S_T_RLE_N S_TR_L_EN S_TR_LE_N S_TRL_E_N ST_R_L_EN ST_R_LE_N ST_RL_E_N STR_L_E_N S_T_R_L_EN S_T_R_LE_N S_TR_L_E_N ST_R_L_E_N S-T-R-L-E-N Friday, October 7, 11
  • 25. NORMALIZE ALGORITHM STRLEN E(ST) = {SET, STOP, STRING} S-TRLEN E(RLEN) = {RIFLEMEN} ST-RLEN STR-LEN E(STR) = {STEER, STRING} STRL_EN E(LEN) = {LENDER, LENGTH} STRLE_N S_T_RLEN S-TR-LEN S_TRL_EN S_TRLE_N ST_R_LEN ST_RL_EN ST_RLE_N STR_L_EN STR_LE_N STRL_E_N S_T_R_LEN S_T_RL_EN S_T_RLE_N S_TR_L_EN S_TR_LE_N S_TRL_E_N ST_R_L_EN ST_R_LE_N ST_RL_E_N STR_L_E_N S_T_R_L_EN S_T_R_LE_N S_TR_L_E_N ST_R_L_E_N S-T-R-L-E-N Friday, October 7, 11
  • 26. NORMALIZE ALGORITHM PART I STR VS STRING STEER Friday, October 7, 11
  • 27. NORMALIZE ALGORITHM PART I STR VS LENDER LENDER STRING STEER LENGTH LENGTH Friday, October 7, 11
  • 28. NORMALIZE ALGORITHM PART I STR VS LENDER LENDER STRING STEER LENGTH LENGTH 1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS Friday, October 7, 11
  • 29. NORMALIZE ALGORITHM PART I STR VS LENDER LENDER STRING STEER + LENGTH + LENGTH COHESIONA COHESIONB 1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS Friday, October 7, 11
  • 30. NORMALIZE ALGORITHM PART I STR VS LENDER LENDER STRING STEER + LENGTH + LENGTH COHESIONA COHESIONB 1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS 2. SELECT EXPANSION THAT MAXIMIZES COHESION Friday, October 7, 11
  • 31. NORMALIZE ALGORITHM PART I STR VS LENDER LENDER STRING STEER + LENGTH + LENGTH COHESIONA COHESIONB 1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS 2. SELECT EXPANSION THAT MAXIMIZES COHESION Friday, October 7, 11
  • 32. NORMALIZE ALGORITHM PART I STR VS LENDER LENDER STRING STEER + LENGTH + LENGTH COHESIONA COHESIONB STRING 1. FIND COHESION BY SUMMING LOG OF PROBABILITIES OF WORD PAIRS 2. SELECT EXPANSION THAT MAXIMIZES COHESION Friday, October 7, 11
  • 33. NORMALIZE ALGORITHM PART II VS STR-LEN ST-RLEN Friday, October 7, 11
  • 34. NORMALIZE ALGORITHM PART II VS STR-LEN ST-RLEN STRING LENGTH STOP RIFLEMEN Friday, October 7, 11
  • 35. NORMALIZE ALGORITHM PART II VS STR-LEN ST-RLEN STRING LENGTH STOP RIFLEMEN 1. FIND COHESION OVER EXPANSIONS Friday, October 7, 11
  • 36. NORMALIZE ALGORITHM PART II VS STR-LEN ST-RLEN STRING LENGTH STOP RIFLEMEN 1. FIND COHESION OVER EXPANSIONS 2. SELECT EXPANSION OF THE SPLIT THAT MAXIMIZES COHESION Friday, October 7, 11
  • 37. NORMALIZE ALGORITHM PART II VS STR-LEN ST-RLEN STRING LENGTH STOP RIFLEMEN 1. FIND COHESION OVER EXPANSIONS 2. SELECT EXPANSION OF THE SPLIT THAT MAXIMIZES COHESION Friday, October 7, 11
  • 38. NORMALIZE ALGORITHM PART II VS STR-LEN ST-RLEN STRING LENGTH STOP RIFLEMEN STRING LENGTH 1. FIND COHESION OVER EXPANSIONS 2. SELECT EXPANSION OF THE SPLIT THAT MAXIMIZES COHESION Friday, October 7, 11
  • 40. ADDING CONTEXT DIR Friday, October 7, 11
  • 41. ADDING CONTEXT DIR E(DIR) = {DIRECTION, DIRECTORY} Friday, October 7, 11
  • 42. ADDING CONTEXT DIR E(DIR) = {DIRECTION, DIRECTORY} CONTEXT = {FORWARD, BACKWARD} Friday, October 7, 11
  • 43. ADDING CONTEXT DIR E(DIR) = {DIRECTION, DIRECTORY} CONTEXT = {FORWARD, BACKWARD} FIND COHESION WITH CONTEXT WORDS IN ADDITION TO EXPANSIONS OF OTHER SOFT WORDS USED IN BOTH PART 1 AND PART 2 Friday, October 7, 11
  • 44. NORMALIZE IMPLEMENTATION USES GenTest TO SPLIT IDENTIFIERS RETURNS MULTIPLE SPLITS GOOGLE 5-GRAM DATASET Friday, October 7, 11
  • 45. EVALUATION Program Loc SLoc Unique Ids which-2.20 3,670 2,293 487 a2ps-4.14 62,347 38,436 4,393 Program Selected Ids Hard Words Soft Words which-2.20 487 903 1214 a2ps-4.14 211 459 618 Friday, October 7, 11
  • 46. EVALUATION THREE GROUPS OF IDENTIFIERS STANDARD LIBRARY CALLS NAMES FROM STANDARD HEADER FILES / KEYWORDS DOMAIN NAMES Friday, October 7, 11
  • 47. EVALUATION THREE GROUPS OF IDENTIFIERS STANDARD LIBRARY CALLS NAMES FROM STANDARD HEADER FILES / KEYWORDS DOMAIN NAMES Friday, October 7, 11
  • 48. EVALUATION THREE GROUPS OF IDENTIFIERS STANDARD LIBRARY CALLS NAMES FROM STANDARD HEADER FILES / KEYWORDS DOMAIN NAMES Program Filtered Ids Reported Ids which-2.20 152 335 a2ps-4.14 46 166 Friday, October 7, 11
  • 49. EXAMPLE EXPANSIONS id Top 10 Top Expansion Expansion nextchar next_character next_character indfound index_found_need index_found optarg option_are_g optarg itemno i_them_not itemno Friday, October 7, 11
  • 50. RESEARCH QUESTIONS WHAT IS THE OVERALL ACCURACY OF NORMALIZE? DOES THE VOCABULARY USED HAVE A SIGNIFICANT IMPACT ON THE EXPANSION’S ACCURACY? CAN THE EXPANDER INFORM THE SPLITTER? CAN THE SPLITTER INFORM THE EXPANDER? Friday, October 7, 11
  • 51. ACCURACY ON DOMAIN IDS Friday, October 7, 11
  • 52. SOURCE OF EXPANSION WORDS SOURCE CODE INTERNAL DOCUMENTATION MANUAL Friday, October 7, 11
  • 54. FUTURE WORK EXPLORING DIFFERENT SOURCES OF CO-OCCURRENCE DATA EXPLORING DIFFERENT WAYS OF CALCULATING PROBABILITIES EXAMINING NORMALIZATION IN CONTEXT OF AN INFORMATION RETRIEVAL TASK Friday, October 7, 11
  • 55. SUMMARY IDENTIFIERS ARE WRITTEN DIFFERENTLY THAN OTHER SOFTWARE DOCUMENTS DEGRADES PERFORMANCE OF IR TECHNIQUES NORMALIZE CURRENTLY EXPANDS ABOUT HALF OF SOFT WORDS CORRECTLY Friday, October 7, 11
  • 56. QUESTIONS? Need an identifier split? GenTest Splitter available at splitit.cs.loyola.edu Friday, October 7, 11