2. Overview
Understanding the terms.
Objectives.
In detail
o Keyword Retrieval
o Variable Retrieval
o API Specification Mining
o Function Retrieval
o Code Generation
Experiment
Conclusion
References
3. API Specification-Based Function Search Engine
Using Natural Language Query
API – Application Programming Interface.
An API is a set of commands, functions, and protocols which programmers can use when
building software for a specific operating system
APIs are usually Implemented as Header Files.
EX:
o Java APIs
o ODBC for Microsoft Windows
4. API Specification-Based Function Search Engine
Using Natural Language Query
Description about the classes and methods inside the API.
Each method(or function) and its uses are briefly described in the API Specifications.
5. API Specification-Based Function Search Engine
Using Natural Language Query
Function search engine is nothing but as the name suggests a search engine for all the
methods in the API.
6. API Specification-Based Function Search Engine
Using Natural Language Query
Natural Language Query is a query that uses a complete sentence or question to
begin a search.
Ex:
o “What is the capital of India?”
o “How to make pizza?”
7. API Specification-Based Function Search Engine
Using Natural Language Query
Means a search engine to search all the functions/methods in an Application
programming interface(API) using simple queries.
Additionally this paper also suggests a means of generating automatic function calls
based on the search.
8. Programmers nearly always use existing functions while developing their applications.
The functions have grown more numerous and more diverse.
The Problem is that ‘what functions they want’ and know ‘how to
call those functions?’.
The Solution:-
o This paper present two novel approaches to address these problems.
o The first is the approach to find right functions based on the API specification.
o The second is approach to automatically generate code for “function call”
9. There are two main objectives in this paper:
o Retrieving functions, and
o Generating code for function calls.
Two different forms of queries corresponding to these objectives.
o The first is “function search query” which requests to look for functions.
o The second is “function call query” which requests to generate code for function
calls.
10. Code
Generation
Variable
Retrieval
Function
Description
API
Document
Fig:Function Search Model
Function
Search Query
Keyword
Retrieval
Mining
Function
Retrieval
Function retrieval is the process of finding suitable
functions by matching “the extracted keywords from
a function search query” to “descriptions of
functions in the API specification”.
Keyword retrieval is the process of extracting
keywords from a function search query
Mining is the process of extracting contents in the
API specification to support function retrieval
Function Call
Query
Function
Call
Variable retrieval is the process of extracting
Variables from a function call query
Code generation is the process of generating code
for a function call based on both the variables
extracted from function call query.
11. There are several methods to identify keywords in a natural
language sequence.
Some methods identify keyword as a simple word, while others identify a keyword
phrase.
In this paper Introducing four technologies of natural language processing to extract
keywords.
-POS tagging, POS filtering, Stemming, Synonym generation.
12. Word/POS
POS
Filter
POS tagging (part-ofspeech tagging) is the
technology to mark up a word in a natural language
sentence (NL Sentence).
Fig Keyword Retrieval Process
NL Sentence POS
Tagging
Stemming
keywordsSynonym
Generation
Main
Word
Original
Word
POS filtering is the technology to remove stopwords
such as prepositions, pronouns, conjunctions, and
interjections.
Stemming is the technology to reduce inflected (or
sometimes derived) words to their root form.
(Ex: ‘return’ is the root form of words “returns,
returning, returned”.
Synonym generation is the technology
to identify synonyms of the retrieved keywords
13. For the natural language query “Gets an element in the collection”. The followings are
results obtained in the above stages.
o POS Tagging: Gets/VB an/DT element/NN in/IN the/DF collection/NN.
o POS Filtering: Gets element collection.
o Stemming: Get element collection.
o Synonym Generation: Get-have/return
element-object/component
collection-list/set.
NOTE:
VB-Verb
DT-Determiner
NN-Noun
IN-Preposition
DF-Adjective
14. Two kinds of objects in a function call query:
-Words and Variables.
Many words related to each variable in the query.
Also each word in the query is only relevant to one(or zero) variable.
words, which are relevant to a variable, is called features of this variable.
15. Every relation between words and variable is represented by a “variable retrieval rule”
derived from a corresponding syntactic rule.
Ex:Some variable retrieval rules
o Root(sf V ) -> V B(wf W)NP(sf V )
o NP(sf fv1; v2g) -> NP(vf v1)PP(vf v2)
o NP(sf V [ fvg) -> NP(sf V )PP(vf v)
o NP(sf V1 [ V2) -> NP(sf V1)PP(sf V2)
o PP(vf v[W1 W2]) -> IN(wf W1)NP(wf v[W2])
o PP(sf V ) -> IN(wf W)NP(sf V )
o NP(wf W1 W2) -> NN(wf W1)NN(wf W2)
o NP(vf v[W1 W2]) -> NN(wf W1)NN(vf v[W2])
o NP(vf v[W1 W2 W3]) ->DT(wf W1)
V BN(wf W2)
NN(vf v[W3])
16. In figure 3, a query in natural
language (“Insert element e in a set
at index k”) is parsed in a tree
structure by using Stanford-Parser
tool.
The last result is:
o e[element];
o a[a set];
o k[at index];
Fig. 3: Parsing tree for
the function call query
17. This subsection focuses on mining the API specification of Java ,called Java API
specification.
In the Java API specification, there are many contents related to function which may be
mined to support the function retrieval process and the code generation process.
They are:-
o function specification
o functionality description
o parameter features
18. Function specification: is a structured data that describes the usage of function.
information, which can be extracted from this content, is:Function name, function scope,
return type, a list of parameters,and so on…
Functionality description: is an unstructured data in the form of natural language that
describes the functionality of the function.
To extract information in this content, the keyword retrieval method (presented in
previous slide) is used.
Parameter features: is an unstructured data in the form of natural language that
describes
features of the parameters in the function specification.
The necessary information in this content are extracted by usingnatural language
processing technologies.
19. Example:
The function add() is described in the Java API specification ArrayList as follows.
Function specification: public void add(int index,Object element).
Functionality description: “Inserts the specified element at the specifiedposition in this
list”.
Parameter features: “index - index at which the specified element is to be inserted” and
“element - element to be inserted”.
20. There are three stages in the process of retrieving function.
Stage 1: extracting the functions related to user’s query based on some constraints.
Stage 2: refining the obtained result in the previous stage by removing some irrelevant
functions.
Stage 3: ranking the collected relevant functions in descending order of appropriate
degree of query.
21. The standard syntax of a function call statement is object.callName(arg1, arg2,…., argk)
To generate code for a function call, we map user’s query to the corresponding function
call based on its function definition.
Two Steps:
i. identifying certain variable vj as the object o , and
ii. mapping the remaining variables to the corresponding arguments arg1, arg2, argk
22. In the first Step , the function retrieval method is used to identify a set of functions
related to user’s query.
However, to use this method, the “function call query” need to be transferred to the
“function search query” by removing all variables in this query.
The variable, whose type contains at least one function related to the new query, is the
desired object o
In the second step all Other variables are set as parameters.
For example, give the query “inserts an element <e:Object> in a collection <a:ArrayList>”,
the variable a with type ArrayList contains the function add related to the new query
“inserts an element in a collection”, so a:add(?) is a suitable function call.
23. A. User Study
In the first user study, ten common search tasks are designed and assigned them to the
participants.
Then, each participant used FSE and some other search engines to complete these tasks.
Three search engines are given to users for study: FSE, Krugle, Koder.
24. In the second user study, the participants suggested over 100 requests that generate
code for function call.
Then, they checked degree of fitness between obtained results and their requests to
calculate accuracy for FSE.
There are four degrees of fitness: Highly Relevant, Somewhat Relevant, Somewhat
Irrelevant, Highly Irrelevant.
Hightly Relevant- The top result in the set of the returned solutions is absolutely fit with
user’s request.
Somewhat Relevant- The desired result in result set was not in the first position.
Somewhat Irrelevant- If it contains the function with correct name but wrong
parameters.
Highly Irrelevant- The lowest level.
26. B. Results
In this figure
92% -correct functions that were
relevant to user’s request.
71% -correct function in the first
position of solution set.
7% -did not find any proper
function.
27. Efficient function search approach by using the API specification is proposed in this
paper
Also presented a novel function call generation method that generates source code to
invoke the functions based on variable features extracted from user’s query.
Finally, we have implemented FSE, a function search engine that helps programmers to
quickly examine different functions that might be appropriate for a problem, obtain
more information about particular functions, and automatically generate code for
function calls to know how to use a function.
28. [1] A. J. Ko, B. A. Myers, and H. H. Aung, “Six learning barriers in enduser programming systems,” in
Proc. of the 2004 IEEE Symposium on
Visual Languages - Human Centric Computing, ser. VLHCC ’04. IEEE Computer Society, 2004, pp. 199–
206.
[2] D. Mandelin, L. Xu, R. Bod´ık, and D. Kimelman, “Jungloid mining: helping to navigate the api jungle,”
in Proc. of the 2005 ACM SIGPLAN conference on Programming language design and
implementation, ser. PLDI ’05. ACM, 2005, pp. 48–61.
[3] J. Stylos and B. A. Myers, “Mica: A web-search tool for finding api components and examples,” in
Proc. of the Visual Languages and Human-Centric Computing, ser. VLHCC ’06. IEEE Computer
Society, 2006, pp. 195–202.
[4] R. Hoffmann, J. Fogarty, and D. S. Weld, “Assieme: finding and leveraging implicit references in a
web search interface for programmers,” in Proc. of the 20th annual ACM symposium on User interface
software and technology, ser. UIST ’07. ACM, 2007, pp. 13–22.
29. [5] S. Thummalapenta and T. Xie, “Parseweb: a programmer assistant for reusing open source code on
the web,” in Proc. of the twentysecond IEEE/ACM international conference on Automated software
engineering, ser. ASE ’07. ACM, 2007, pp. 204–213.
[6] M. Grechanik, C. Fu, Q. Xie, C. McMillan, D. Poshyvanyk, and C. Cumby, “A search engine for finding
highly relevant applications,” in Proc. of the 32nd ACM/IEEE International Conference on Software
Engineering - Volume 1, ser. ICSE ’10. ACM, 2010, pp. 475–484.
[7] S. Chatterjee, S. Juvekar, and K. Sen, “Sniff: A search engine for java using free-form queries,” in
Proc. of the 12th International Conference on Fundamental Approaches to Software Engineering: Held
as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009, ser. FASE
’09. Springer-Verlag, 2009, pp. 385–400.
[8] M. Grechanik, K. M. Conroy, and K. A. Probst, “Finding relevant applications for prototyping,” in
Proc. of the Fourth International Workshop on Mining Software Repositories, ser. MSR ’07. IEEE
Computer Society, 2007, pp. 12–.
30. [9] R. Pandita, X. Xiao, H. Zhong, T. Xie, S. Oney, and A. Paradkar, “Inferring method specifications from
natural language api descriptions,” in Proceedings of the 2012 International Conference on Software
Engineering, ser. ICSE 2012. IEEE Press, 2012, pp. 815–825.
[10] A. Fantechi, S. Gnesi, G. Lami, and A. Maccari, “Application of linguistic techniques for use case
analysis,” in Proc. of the 10th Anniversary IEEE Joint International Conference on Requirements
Engineering, ser. RE ’02. IEEE Computer Society, 2002, pp. 157–164.
[11] D. Klein and C. D. Manning, “Accurate unlexicalized parsing,” in Proc. of the 41st Annual Meeting
on Association for Computational Linguistics - Volume 1, ser. ACL ’03. Association for Computational
Linguistics, 2003, pp. 423–430.
[12] L. Kof, “Scenarios: Identifying missing objects and actions by means of computational linguistics.”
in RE. IEEE, 2007, pp. 121–130.
[13] K. Rothenhausler and H. Schutze, “Part of speech filtered word spaces,” in Proc. of the 2007
Workshop on Contextual Information in Semantic Space Models: Beyond Words and
Documents, 2007, pp. 25–32.
[14] D. Shepherd, Z. P. Fry, E. Hill, L. Pollock, and K. Vijay-Shanker, “Using natural language program
analysis to locate and understand action-oriented concerns,” in Proc. of the 6th international
conference on Aspect-oriented software development, ser. AOSD ’07. ACM, 2007, pp. 212–224.
31. [15] R. Hemayati, W. Meng, and C. Yu, “Semantic-based grouping of search engine results using
wordnet,” in Proc. of the joint 9th Asia- Pacific web and 8th international conference on web-age
information management conference on Advances in data and web management, ser.
APWeb/WAIM’07. Springer-Verlag, 2007, pp. 678–686.
[16] C. Manning and D. Klein. The stanford parser. [Online]. Available:
http://nlp.stanford.edu/software/lex-parser.shtml
[17] Java api. [Online]. Available: docs.oracle.com/javase/1.4.2/docs/api
[18] L. Vaughan, “New measurements for search engine evaluation proposed and tested,” Inf. Process.
Manage., vol. 40, no. 4, pp. 677–691, May 2004.
[19] Krugle inc. [Online]. Available: http://opensearch.krugle.com/
[20] Koder inc. [Online]. Available: http://www.koders.com/
[21] S. E. Sim, M. Umarji, S. Ratanotayanon, and C. V. Lopes, “How well do search engines support code
retrieval on the web?” ACM Trans. Softw. Eng. Methodol., vol. 21, no. 1, pp. 4:1–4:25, Dec. 2011