Graph-Based Code Completion

Graph-Based Pattern-Oriented, ContextSensitive Source Code Completion
Nguyen, T.T. ; Nguyen,
H.A. ; Tamrawi, A. ;
Nguyen, H.V. ; Al-Kofahi,
J. ; Nguyen, T.N.
Presented By: Mohammad Masudur Rahman

Contents
 Code

Completion
 Thesis Statement
 Motivating Example
 Terminologies
 Methodology
 Empirical Evaluation & Results
 My Observation & Future Thoughts
2

Code Completion








3

Built-in feature of modern all IDEs
Speed up development
Longer Identifier names for program comprehension
Less overhead for developers
Mostly single variable, method supports- API
packages
Template based support – control structure, event
handling and others

Thesis Statement
 Novel

approach with graph-based code
completion
 Graph based feature extracting, searching,
ranking of API usage pattern, matching with
editing context of current code.
 Empirical evaluation shows correctness and
usefulness- 95% precision, 92% recall, 93%
f-score over 24 real world systems
4

Motivating Example (Single-line)

5

Fig 1: Current State of Code Completion (Eclipse 3.6)

Motivating Example (Multi-line)

6

Fig 2: SWT Usage Example

Motivating Example (Query)

Fig 3: SWT Query Example
7

Terminologies
 GRAPACC
 API

Usage Pattern
 Groum Based Model
 Context-sensitive Weight

8

GRAPACC
 Graph-Based
 Pattern-Oriented
 Context-Sensitive
 Code

9

Completion

API Usage Pattern

10

Fig 4: SWT API Usage

Groum Based Model

11

Fig 5: Groum Conversion

Context-Sensitive Weight
1
w f (q) =
(d + 1)
Wf (q)=Context-sensitive weight of feature q
q= feature of Query, Q
d=distance to the closest token in Groum Model

12

Methodology
 Query

Processing and Feature Extraction
 Pattern Managing, Searching and Ranking
 Pattern Oriented Code Completion

13

Query Processing and Feature
Extraction
 Tokenizing
 Partial

Parsing
 Groum Building
 Feature Extracting and Weighting

14

Tokenizing, Partial Parsing
 Lexical

analysis
 Preserves keywords related to control
structure, rest are removed elsewhere but
saved
 Eclipse java parser
 PPA tool returns AST (Abstract Syntax
Tree)
 Unresolved nodes assigned ‘Unknown Type’
15

Groum Building
 Groum

from AST
 Unresolved nodes are
discarded but considered
as tokens
 Query converted to the
following Groum

16

Fig 6: Groum of Query

Feature Extraction & Weighting
 Groum

nodes mapped to tokens in
tokenization step
 Feature extracted from Groum for path, L<=3
 3 factors contribute to feature weight
 Structured based factor (size)
 Structured based factor (centrality)
 User based factor
17

 ws(q)=

size based weight for feature, q of
Query, Q (w(q)=1+size(q); 1<= size(q)<=3)
 wc(q)= Centrality based weight for feature, q
of Query, Q (wc(q)=n / s, n=no of
neighbors, s=size)
 (wf(q)=1/(d+1)), distance between focus
node and the closest token in feature path
Groum Model
18








19

w(q)= total weight for feature, q of Query, Q
ws(q)= size based weight for feature, q of Query, Q
wc(q)= Centrality based weight for feature, q of
Query, Q
wf(q)= used based weight for feature, q of Query, Q

Pattern Managing, Searching and
Ranking
 Pr(P)

is popularity of pattern P = frequency of
Pattern P
 Weight of feature p in Pattern P using inverse
indexing
 Np,P=occurrence

20

of feature p in P, NP=total no of

features in P
 Np=No of patterns containing p, N=total no of pattern
in database

Pattern Managing, Searching and
Ranking








21

For each feature p, L(p), a list of patterns from which
p can be extracted
p for pattern feature, q for query feature
Now sim(p,q)>∂,then p is added to F, set of mapped
features for q
For each pєF, top n ranked patterns from L(p) is
added to C, candidate patterns for relevance
computation
Now for each P in C, compute fit(P,Q)

Feature Similarity

is a name-based similarity between two
features given that feature is a collection of
labels and has the form
Of X.Y.Z where
X=package name
Y=class name
Z=method name
22

Name-based Similarity (nsim)

=





23

wsim(X, X’) is word-based similarity
X, X’ are broken down and two sequence of words
L(x) and L(y)
Similarity computed as Lo/Lm
Lo is length of LCS, Lm is average length of two
sequences

Pattern Matching (Relevance)

24

Pattern Matching

SM(P,Q)=total weight of Matched feature pair
Fit (P, Q)=Relevance degree between P and Q
25

Pr(P)=Popularity of Pattern P

Pattern Oriented Code Completion

 Matched

pattern is selected and
corresponding node in Groum is matched
 The missing nodes are fulfilled with code
26

Empirical Evaluation
 Precision
 Recall
 F-score
 java.io,

java.util :API used as library
 28 real world open-source systems
 4 for training, 24 for testing

27

My Observation
 Planning

to use semantic web technology
 Data and control dependency relationship
can be improved using semantic relationship
like conceptual similarity
 Matching of pattern is complex and errorprone, semantic score can be beneficial

29

Graph-Based Code Completion

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (7)

Ähnlich wie Graph-Based Code Completion

Ähnlich wie Graph-Based Code Completion (20)

Mehr von Masud Rahman

Mehr von Masud Rahman (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Graph-Based Code Completion