3. Code Completion
3
Built-in feature of modern all IDEs
Speed up development
Longer Identifier names for program comprehension
Less overhead for developers
Mostly single variable, method supports- API
packages
Template based support – control structure, event
handling and others
4. Thesis Statement
Novel
approach with graph-based code
completion
Graph based feature extracting, searching,
ranking of API usage pattern, matching with
editing context of current code.
Empirical evaluation shows correctness and
usefulness- 95% precision, 92% recall, 93%
f-score over 24 real world systems
4
12. Context-Sensitive Weight
1
w f (q) =
(d + 1)
Wf (q)=Context-sensitive weight of feature q
q= feature of Query, Q
d=distance to the closest token in Groum Model
12
14. Query Processing and Feature
Extraction
Tokenizing
Partial
Parsing
Groum Building
Feature Extracting and Weighting
14
15. Tokenizing, Partial Parsing
Lexical
analysis
Preserves keywords related to control
structure, rest are removed elsewhere but
saved
Eclipse java parser
PPA tool returns AST (Abstract Syntax
Tree)
Unresolved nodes assigned ‘Unknown Type’
15
16. Groum Building
Groum
from AST
Unresolved nodes are
discarded but considered
as tokens
Query converted to the
following Groum
16
Fig 6: Groum of Query
17. Feature Extraction & Weighting
Groum
nodes mapped to tokens in
tokenization step
Feature extracted from Groum for path, L<=3
3 factors contribute to feature weight
Structured based factor (size)
Structured based factor (centrality)
User based factor
17
18. Feature Extraction & Weighting
ws(q)=
size based weight for feature, q of
Query, Q (w(q)=1+size(q); 1<= size(q)<=3)
wc(q)= Centrality based weight for feature, q
of Query, Q (wc(q)=n / s, n=no of
neighbors, s=size)
(wf(q)=1/(d+1)), distance between focus
node and the closest token in feature path
Groum Model
18
19. Feature Extraction & Weighting
19
w(q)= total weight for feature, q of Query, Q
ws(q)= size based weight for feature, q of Query, Q
wc(q)= Centrality based weight for feature, q of
Query, Q
wf(q)= used based weight for feature, q of Query, Q
20. Pattern Managing, Searching and
Ranking
Pr(P)
is popularity of pattern P = frequency of
Pattern P
Weight of feature p in Pattern P using inverse
indexing
Np,P=occurrence
20
of feature p in P, NP=total no of
features in P
Np=No of patterns containing p, N=total no of pattern
in database
21. Pattern Managing, Searching and
Ranking
21
For each feature p, L(p), a list of patterns from which
p can be extracted
p for pattern feature, q for query feature
Now sim(p,q)>∂,then p is added to F, set of mapped
features for q
For each pєF, top n ranked patterns from L(p) is
added to C, candidate patterns for relevance
computation
Now for each P in C, compute fit(P,Q)
22. Feature Similarity
is a name-based similarity between two
features given that feature is a collection of
labels and has the form
Of X.Y.Z where
X=package name
Y=class name
Z=method name
22
23. Name-based Similarity (nsim)
=
23
wsim(X, X’) is word-based similarity
X, X’ are broken down and two sequence of words
L(x) and L(y)
Similarity computed as Lo/Lm
Lo is length of LCS, Lm is average length of two
sequences
26. Pattern Oriented Code Completion
Matched
pattern is selected and
corresponding node in Groum is matched
The missing nodes are fulfilled with code
26
27. Empirical Evaluation
Precision
Recall
F-score
java.io,
java.util :API used as library
28 real world open-source systems
4 for training, 24 for testing
27
29. My Observation
Planning
to use semantic web technology
Data and control dependency relationship
can be improved using semantic relationship
like conceptual similarity
Matching of pattern is complex and errorprone, semantic score can be beneficial
29