SlideShare ist ein Scribd-Unternehmen logo
1 von 25
Deep API Learning
Xiaodong GU Sunghun Kim
The Hong Kong University of
Science and Technology
Hongyu Zhang Dongmei Zhang
Microsoft Research
Programming is hard
• Unfamiliar problems
• Unfamiliar APIs [Robillard,2009]
DocumentBuilderFactory.newInstance
↓
DocumentBuilderFactory.newDocumentBuilder
↓
DocumentBuilder.parse
“how to parse XML files?”
Obtaining API usage sequences based on a
query
Obtaining API usage sequences based on a
query
The
Proble
m?
Bag-of-words
Assumption!Lack a deep understanding of the
semantics of the query
Limitations of IR-based Approaches
“how to convert
int to string”
“how to convert
string to int”
“how to convert
string to number”
static public Integer str2Int(String str) {
Integer result = null;
try {
result = Integer.parseInt(str);
} catch (Exception e) {
String negativeMode = "";
if(str.indexOf('-') != -1) negativeMode = "-";
str = str.replaceAll("-", "" );
result = Integer.parseInt(negativeMode + str);
}
return result;
}
Limit #1 Cannot identify semantically related
words
Limit #2 Cannot distinguish word ordering
DeepAPI – Learning The Semantics
“how to parse XML files”
DocumentBuilderFactory:newInstance
DocumentBuilderFactory:newDocumentBuilder
DocumentBuilder:parse
DNN – Embedding Model DNN – Language
Model
1.1
2.3
0.4
⋮
5.0
 Better query understanding (recognize semantically related words
and word ordering)
Background – RNN
• Recurrent Neural Network
Hidden Layer
Output Layer
Input Layer
h1 h2 h3
parse xml file
w1 w2 w3
 Hidden layers are recurrently used for computation
 This creates an internal state of the network to record dynamic temporal
behavior
ℎ 𝑡 = 𝑓 ℎ 𝑡−1, 𝑥𝑡
Background – RNN Encoder-Decoder
• A deep learning model for the sequence-to-sequence
learning
 Encoder: An RNN that encodes a sequence of words (query)
into a vector:
ℎ 𝑡 = 𝑓 ℎ 𝑡−1, 𝑥𝑡 𝑐 = ℎ 𝑇𝑥
 Decoder: An RNN (language model) that sequentially generates
a sequence of words (APIs) based on the (query) vector:
Pr 𝑦 =
𝑡=1
𝑇
𝑝(𝑦𝑡|𝑦1, … 𝑦𝑡−1, 𝑐)
Pr(𝑦𝑡|𝑦1, ⋯ 𝑦𝑡−1) = 𝑔(ℎ 𝑡, 𝑦𝑡−1, 𝑐)
h 𝑡 = 𝑓 ℎ 𝑡−1, 𝑦𝑡−1, 𝑐
 Training – minimize the cost function:
𝐿 𝜃 =
1
𝑁 𝑖=1
𝑁
𝑡=1
𝑇
𝑐𝑜𝑠𝑡𝑖𝑡 𝑐𝑜𝑠𝑡𝑖𝑡 = − log p 𝜃(𝑦𝑖𝑡|𝑥𝑖)
RNN Encoder-Decoder Model for API
Sequence Generation
<START>
h1 h2 h3 h4
y1 y2 y3 y4
y1 y2 y3
h1 h2 h3
x1 x2 x3
Read Text File
c
Encoder RNN Decoder RNN
BuffereReader
.new
FileReader
.new
BuffereReader
.read
BuffereReader
.close
<EOS>
BuffereReader
.new
FileReader
.new
BuffereReader
.read
BuffereReader
.close
h5
y5
y4Input
Output
Hidden
Enhancing RNN Encoder-Decoder Model
with API importance
• Different APIs have different importance for a programming task
File.new FileWriter.new Logger.log FileWriter.write
• Weaken the unimportant APIs
 IDF-based weighting
𝑤𝑖𝑑𝑓 𝑦𝑡 = log
𝑁
𝑛 𝑦𝑡
 Regularized Cost Function
cost 𝑖𝑡 = − log 𝑝 𝜃 𝑦𝑖𝑡 𝑥𝑖 − 𝝀𝒘𝒊𝒅𝒇 𝒚𝒊𝒕
System Overview
Code Corpus
RNN
Encoder-
Decoder
API-related
User Query
Suggested API
sequences
Natural
Language
Annotations Training
Instances
API
sequences
Training
Offline
Training
Step1 – Preparing a Parallel Corpus
InputStream.read OutputStream.write # copy a file from an inputstream to an
outputstream
URL.new URL.openConnection # open a url
File.new File.exists # test file exists
File.renameTo File.delete # rename a file
StringBuffer.new StreanBuffer.reverse # reverse a string
⋮ # ⋮
API Sequences
(Java)
Annotations(English)
<API Sequence, Annotation>
pairs
• Collect 442,928 Java projects from GitHub (2008-2014)
• Parse source files into ASTs using Eclipse JDT
• Extract an API sequence and an annotation for each method body (when Javadoc
comment exists)
Extracting API Usage Sequences
Post-order traverse on each AST
tree:
 Constructor invocation:
new C() => C.new
 Method call:
o.m() => C.m
 Parameters:
o1.m1(o2.m2(),o3.m3())=> C2.m2-C3.m3-
C1.m1
 A sequence of statements:
stmt1;stmt2;,,,stmtt;=>s1-s2-…-st
 Conditional statement:
if(stmt1){stmt2;} else{stmt3;} =>s1-s2-s3
 Loop statements:
while(stmt1){stmt2;}=>s1-s2
1 …
2 BufferedReader reader = new BufferedReader(…);
4 while((line=reader.readLine())!=null)
5 …
6 reader.close;
Body
Statement
While
Statement
Variable
Declaration
Constructor
Invocation
Method
Invocation
Block
Statement
Type Variable
BufferedReader reader
readLineVariable
reader
BufferedReader.new BufferedReader.readLine
BufferedReader.close
…
Extracting Natural Language Annotations
/***
* Copies bytes from a large (over 2GB) InputStream to an OutputStream.
* This method uses the provided buffer, so there is no need to use a
* BufferedInputStream.
* @param input the InputStream to read from
* . . .
* @since 2.2
*/
public static long copyLarge(final InputStream input,
final OutputStream output, final byte[] buffer) throws IOException {
long count = 0;
int n;
while (EOF != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
API sequence: InputStream.read OutputStream.write
Annotation: copies bytes from a large inputstream
to an outputstream.
MethodDefinition
Javadoc
Comment
Body
… …
The first sentence of a documentation comment
Step2 – Training RNN Encoder-Decoder
Model
• Data
 7,519,907 <API Sequence, Annotation> pairs
• Neural Network
 Bi-GRU, 2 hidden layers, 1,000 hidden unites
 Word Embedding: 120
• Training Algorithm
 SGD+Adadelta
 Batch size: 200
• Hardware:
 Nvidia K20 GPU
Evaluation
• RQ1: How accurate is DeepAPI for generating API usage
sequences?
• RQ2: How accurate is DeepAPI under different parameter
settings?
• RQ3: Do the enhanced RNN Encoder-Decoder models improve
the accuracy of DeepAPI?
Automatic Evaluation:
 Data set:
7,519,907 snippets with Javadoc comments
Training set: 7,509,907 pairs Test Set: 10,000 pairs
 Accuracy Measure
BLEU – The hits of n-grams of a candidate sequence to the ground truth
sequence.
𝐵𝐿𝐸𝑈 = 𝐵𝑃 ∙ exp 𝑛=1
𝑁
𝑤 𝑛 𝑙𝑜𝑔𝑝 𝑛
𝑝 𝑛 =
# n−grams appear in the reference+1
# n−grams of candidate+1
𝐵𝑃 =
1 𝑖𝑓 𝑐 > 𝑟
𝑒(1−𝑟/𝑐)
𝑖𝑓 𝑐 ≤ 𝑟
RQ1: How accurate is DeepAPI for generating API
usage sequences?
 Comparison Methods
• Code Search with Pattern Mining
Code Search – Lucene
Summarizing API patterns – UP-Miner [Wang, MSR’13]
• SWIM [Raghothaman, ICSE’16]
Query-to-API Mapping – Statistical Word Alignment
Search API sequence using the bag of APIs – Information retrieval
RQ1: How accurate is DeepAPI for generating API
usage sequences?
Human Evaluation:
 30 API-related natural language queries:
• 17 from Bing search logs
• 13 longer queries and queries with semantic related words
 Accuracy Metrics:
• FRank: the rank of the first relevant result in the result list
• Relevancy Ratio: relevancy ratio =
# relevant results
# all selected results
RQ1: How accurate is DeepAPI for generating API
usage sequences?
RQ1: How accurate is DeepAPI for generating API
usage sequences?
• Examples
DeepAPI
 Distinguishing word ordering
convert int to string => Integer.toString
convert string to int => Integer.parseInt
 Identify Semantically related words
save an image to a file => File.new ImageIO.write
write an image to a file=> File.new ImageIO.write
 Understand longer queries
copy a file and save it to your destination path
play the audio clip at the specified absolute URL
SWIM
 Partially matched sequences
generate md5 hashcode=> Object.hashCode
 Project-specific results
test file exists => File.new, File.exists, File.getName,
File.new, File.delete, FileInputStream.new,…
 Hard to understand longer queries
copy a file and save it to your destination path
RQ2 – Accuracy Under Different Parameter
Settings
BLEU scores under different number of hidden units and word
dimensions
RQ3 – Performance of the Enhanced RNN
Encoder-Decoder Models
• BLEU scores of different Models(%)
• BLEU scores under different λ
Conclusion
Apply RNN Encoder-Decoder for generating API usage sequences
for a given natural language query
 Recognize semantically related words
 Recognize word ordering
Future Work
 Explore the applications of this model to other problems.
 Investigate the synthesis of sample code from the generated API
sequences.
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

E-mail Security in Network Security NS5
E-mail Security in Network Security NS5E-mail Security in Network Security NS5
E-mail Security in Network Security NS5koolkampus
 
Let's make a contract: the art of designing a Java API
Let's make a contract: the art of designing a Java APILet's make a contract: the art of designing a Java API
Let's make a contract: the art of designing a Java APIMario Fusco
 
Python testing using mock and pytest
Python testing using mock and pytestPython testing using mock and pytest
Python testing using mock and pytestSuraj Deshmukh
 
IHE Cross-Enterprise Document Sharing (XDS)
IHE Cross-Enterprise Document Sharing (XDS)IHE Cross-Enterprise Document Sharing (XDS)
IHE Cross-Enterprise Document Sharing (XDS)HL7 New Zealand
 
Information and data security block cipher and the data encryption standard (...
Information and data security block cipher and the data encryption standard (...Information and data security block cipher and the data encryption standard (...
Information and data security block cipher and the data encryption standard (...Mazin Alwaaly
 
IP security Part 1
IP security   Part 1IP security   Part 1
IP security Part 1CAS
 
Chapter 22. Lambda Expressions and LINQ
Chapter 22. Lambda Expressions and LINQChapter 22. Lambda Expressions and LINQ
Chapter 22. Lambda Expressions and LINQIntro C# Book
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialKenny Gryp
 
Lecture 5 ip security
Lecture 5 ip securityLecture 5 ip security
Lecture 5 ip securityrajakhurram
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network SecurityKathirvel Ayyaswamy
 
Cross-domain requests with CORS
Cross-domain requests with CORSCross-domain requests with CORS
Cross-domain requests with CORSVladimir Dzhuvinov
 
IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1Shobhit Sharma
 
Criptografia - Asimetrica - RSA
Criptografia - Asimetrica - RSACriptografia - Asimetrica - RSA
Criptografia - Asimetrica - RSAG Hoyos A
 
DES-lecture (1).ppt
DES-lecture (1).pptDES-lecture (1).ppt
DES-lecture (1).pptMrsPrabhaBV
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking VN
 

Was ist angesagt? (20)

E-mail Security in Network Security NS5
E-mail Security in Network Security NS5E-mail Security in Network Security NS5
E-mail Security in Network Security NS5
 
Asp Architecture
Asp ArchitectureAsp Architecture
Asp Architecture
 
Cryptographic hash function md5
Cryptographic hash function md5Cryptographic hash function md5
Cryptographic hash function md5
 
Let's make a contract: the art of designing a Java API
Let's make a contract: the art of designing a Java APILet's make a contract: the art of designing a Java API
Let's make a contract: the art of designing a Java API
 
Python testing using mock and pytest
Python testing using mock and pytestPython testing using mock and pytest
Python testing using mock and pytest
 
IHE Cross-Enterprise Document Sharing (XDS)
IHE Cross-Enterprise Document Sharing (XDS)IHE Cross-Enterprise Document Sharing (XDS)
IHE Cross-Enterprise Document Sharing (XDS)
 
Program activation records
Program activation recordsProgram activation records
Program activation records
 
Information and data security block cipher and the data encryption standard (...
Information and data security block cipher and the data encryption standard (...Information and data security block cipher and the data encryption standard (...
Information and data security block cipher and the data encryption standard (...
 
IP security Part 1
IP security   Part 1IP security   Part 1
IP security Part 1
 
Chapter 22. Lambda Expressions and LINQ
Chapter 22. Lambda Expressions and LINQChapter 22. Lambda Expressions and LINQ
Chapter 22. Lambda Expressions and LINQ
 
Structure in c sharp
Structure in c sharpStructure in c sharp
Structure in c sharp
 
MySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn TutorialMySQL Group Replication - HandsOn Tutorial
MySQL Group Replication - HandsOn Tutorial
 
Rc4 Research 2013
Rc4 Research 2013Rc4 Research 2013
Rc4 Research 2013
 
Lecture 5 ip security
Lecture 5 ip securityLecture 5 ip security
Lecture 5 ip security
 
18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security18CS2005 Cryptography and Network Security
18CS2005 Cryptography and Network Security
 
Cross-domain requests with CORS
Cross-domain requests with CORSCross-domain requests with CORS
Cross-domain requests with CORS
 
IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1IPSec (Internet Protocol Security) - PART 1
IPSec (Internet Protocol Security) - PART 1
 
Criptografia - Asimetrica - RSA
Criptografia - Asimetrica - RSACriptografia - Asimetrica - RSA
Criptografia - Asimetrica - RSA
 
DES-lecture (1).ppt
DES-lecture (1).pptDES-lecture (1).ppt
DES-lecture (1).ppt
 
Grokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search TreeGrokking TechTalk #27: Optimal Binary Search Tree
Grokking TechTalk #27: Optimal Binary Search Tree
 

Ähnlich wie Deep API Learning (FSE 2016)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningSung Kim
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Itzik Kotler
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Practices and tools for building better APIs
Practices and tools for building better APIsPractices and tools for building better APIs
Practices and tools for building better APIsNLJUG
 
Practices and tools for building better API (JFall 2013)
Practices and tools for building better API (JFall 2013)Practices and tools for building better API (JFall 2013)
Practices and tools for building better API (JFall 2013)Peter Hendriks
 
Practices and Tools for Building Better APIs
Practices and Tools for Building Better APIsPractices and Tools for Building Better APIs
Practices and Tools for Building Better APIsPeter Hendriks
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCTim Burks
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...Maarten Balliauw
 
Juan josefumeroarray14
Juan josefumeroarray14Juan josefumeroarray14
Juan josefumeroarray14Juan Fumero
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationTim Burks
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel ArchitecturesJoel Falcou
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfAnyscale
 
ElixirでFPGAを設計する
ElixirでFPGAを設計するElixirでFPGAを設計する
ElixirでFPGAを設計するHideki Takase
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays
 
Implementing OpenAPI and GraphQL services with gRPC
Implementing OpenAPI and GraphQL services with gRPCImplementing OpenAPI and GraphQL services with gRPC
Implementing OpenAPI and GraphQL services with gRPCTim Burks
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideTim Burks
 
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-SideLF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-SideLF_APIStrat
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error predictionNIKHIL NAWATHE
 
ISI work
ISI workISI work
ISI workdgarijo
 

Ähnlich wie Deep API Learning (FSE 2016) (20)

DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence LearningDeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
DeepAM: Migrate APIs with Multi-modal Sequence to Sequence Learning
 
Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)Hack Like It's 2013 (The Workshop)
Hack Like It's 2013 (The Workshop)
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Practices and tools for building better APIs
Practices and tools for building better APIsPractices and tools for building better APIs
Practices and tools for building better APIs
 
Practices and tools for building better API (JFall 2013)
Practices and tools for building better API (JFall 2013)Practices and tools for building better API (JFall 2013)
Practices and tools for building better API (JFall 2013)
 
Practices and Tools for Building Better APIs
Practices and Tools for Building Better APIsPractices and Tools for Building Better APIs
Practices and Tools for Building Better APIs
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
 
Juan josefumeroarray14
Juan josefumeroarray14Juan josefumeroarray14
Juan josefumeroarray14
 
Enforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code GenerationEnforcing API Design Rules for High Quality Code Generation
Enforcing API Design Rules for High Quality Code Generation
 
Inroduction to r
Inroduction to rInroduction to r
Inroduction to r
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
 
ElixirでFPGAを設計する
ElixirでFPGAを設計するElixirでFPGAを設計する
ElixirでFPGAを設計する
 
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
apidays LIVE Helsinki - Implementing OpenAPI and GraphQL Services with gRPC b...
 
Implementing OpenAPI and GraphQL services with gRPC
Implementing OpenAPI and GraphQL services with gRPCImplementing OpenAPI and GraphQL services with gRPC
Implementing OpenAPI and GraphQL services with gRPC
 
OpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-SideOpenAPI and gRPC Side by-Side
OpenAPI and gRPC Side by-Side
 
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-SideLF_APIStrat17_OpenAPI and gRPC Side-by-Side
LF_APIStrat17_OpenAPI and gRPC Side-by-Side
 
Code Analysis-run time error prediction
Code Analysis-run time error predictionCode Analysis-run time error prediction
Code Analysis-run time error prediction
 
ISI work
ISI workISI work
ISI work
 

Mehr von Sung Kim

Time series classification
Time series classificationTime series classification
Time series classificationSung Kim
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...Sung Kim
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Sung Kim
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesSung Kim
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Sung Kim
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSung Kim
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Sung Kim
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Sung Kim
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...Sung Kim
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)Sung Kim
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving softwareSung Kim
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test GenerationSung Kim
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect PredictionSung Kim
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 openingSung Kim
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect PredictionSung Kim
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSung Kim
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learningSung Kim
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesSung Kim
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksSung Kim
 

Mehr von Sung Kim (20)

Time series classification
Time series classificationTime series classification
Time series classification
 
Tensor board
Tensor boardTensor board
Tensor board
 
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
REMI: Defect Prediction for Efficient API Testing (

ESEC/FSE 2015, Industria...
 
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)Heterogeneous Defect Prediction (

ESEC/FSE 2015)
Heterogeneous Defect Prediction (

ESEC/FSE 2015)
 
A Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution TechniquesA Survey on Automatic Software Evolution Techniques
A Survey on Automatic Software Evolution Techniques
 
Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)Crowd debugging (FSE 2015)
Crowd debugging (FSE 2015)
 
Software Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled DatasetsSoftware Defect Prediction on Unlabeled Datasets
Software Defect Prediction on Unlabeled Datasets
 
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
Partitioning Composite Code Changes to Facilitate Code Review (MSR2015)
 
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
 
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
How We Get There: A Context-Guided Search Strategy in Concolic Testing (FSE 2...
 
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
CrashLocator: Locating Crashing Faults Based on Crash Stacks (ISSTA 2014)
 
Source code comprehension on evolving software
Source code comprehension on evolving softwareSource code comprehension on evolving software
Source code comprehension on evolving software
 
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
A Survey on  Dynamic Symbolic Execution  for Automatic Test GenerationA Survey on  Dynamic Symbolic Execution  for Automatic Test Generation
A Survey on Dynamic Symbolic Execution for Automatic Test Generation
 
Survey on Software Defect Prediction
Survey on Software Defect PredictionSurvey on Software Defect Prediction
Survey on Software Defect Prediction
 
MSR2014 opening
MSR2014 openingMSR2014 opening
MSR2014 opening
 
Personalized Defect Prediction
Personalized Defect PredictionPersonalized Defect Prediction
Personalized Defect Prediction
 
STAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash ReproductionSTAR: Stack Trace based Automatic Crash Reproduction
STAR: Stack Trace based Automatic Crash Reproduction
 
Transfer defect learning
Transfer defect learningTransfer defect learning
Transfer defect learning
 
Automatic patch generation learned from human written patches
Automatic patch generation learned from human written patchesAutomatic patch generation learned from human written patches
Automatic patch generation learned from human written patches
 
The Anatomy of Developer Social Networks
The Anatomy of Developer Social NetworksThe Anatomy of Developer Social Networks
The Anatomy of Developer Social Networks
 

Kürzlich hochgeladen

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 

Kürzlich hochgeladen (20)

Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 

Deep API Learning (FSE 2016)

  • 1. Deep API Learning Xiaodong GU Sunghun Kim The Hong Kong University of Science and Technology Hongyu Zhang Dongmei Zhang Microsoft Research
  • 2. Programming is hard • Unfamiliar problems • Unfamiliar APIs [Robillard,2009] DocumentBuilderFactory.newInstance ↓ DocumentBuilderFactory.newDocumentBuilder ↓ DocumentBuilder.parse “how to parse XML files?”
  • 3. Obtaining API usage sequences based on a query
  • 4. Obtaining API usage sequences based on a query The Proble m? Bag-of-words Assumption!Lack a deep understanding of the semantics of the query
  • 5. Limitations of IR-based Approaches “how to convert int to string” “how to convert string to int” “how to convert string to number” static public Integer str2Int(String str) { Integer result = null; try { result = Integer.parseInt(str); } catch (Exception e) { String negativeMode = ""; if(str.indexOf('-') != -1) negativeMode = "-"; str = str.replaceAll("-", "" ); result = Integer.parseInt(negativeMode + str); } return result; } Limit #1 Cannot identify semantically related words Limit #2 Cannot distinguish word ordering
  • 6. DeepAPI – Learning The Semantics “how to parse XML files” DocumentBuilderFactory:newInstance DocumentBuilderFactory:newDocumentBuilder DocumentBuilder:parse DNN – Embedding Model DNN – Language Model 1.1 2.3 0.4 ⋮ 5.0  Better query understanding (recognize semantically related words and word ordering)
  • 7. Background – RNN • Recurrent Neural Network Hidden Layer Output Layer Input Layer h1 h2 h3 parse xml file w1 w2 w3  Hidden layers are recurrently used for computation  This creates an internal state of the network to record dynamic temporal behavior ℎ 𝑡 = 𝑓 ℎ 𝑡−1, 𝑥𝑡
  • 8. Background – RNN Encoder-Decoder • A deep learning model for the sequence-to-sequence learning  Encoder: An RNN that encodes a sequence of words (query) into a vector: ℎ 𝑡 = 𝑓 ℎ 𝑡−1, 𝑥𝑡 𝑐 = ℎ 𝑇𝑥  Decoder: An RNN (language model) that sequentially generates a sequence of words (APIs) based on the (query) vector: Pr 𝑦 = 𝑡=1 𝑇 𝑝(𝑦𝑡|𝑦1, … 𝑦𝑡−1, 𝑐) Pr(𝑦𝑡|𝑦1, ⋯ 𝑦𝑡−1) = 𝑔(ℎ 𝑡, 𝑦𝑡−1, 𝑐) h 𝑡 = 𝑓 ℎ 𝑡−1, 𝑦𝑡−1, 𝑐  Training – minimize the cost function: 𝐿 𝜃 = 1 𝑁 𝑖=1 𝑁 𝑡=1 𝑇 𝑐𝑜𝑠𝑡𝑖𝑡 𝑐𝑜𝑠𝑡𝑖𝑡 = − log p 𝜃(𝑦𝑖𝑡|𝑥𝑖)
  • 9. RNN Encoder-Decoder Model for API Sequence Generation <START> h1 h2 h3 h4 y1 y2 y3 y4 y1 y2 y3 h1 h2 h3 x1 x2 x3 Read Text File c Encoder RNN Decoder RNN BuffereReader .new FileReader .new BuffereReader .read BuffereReader .close <EOS> BuffereReader .new FileReader .new BuffereReader .read BuffereReader .close h5 y5 y4Input Output Hidden
  • 10. Enhancing RNN Encoder-Decoder Model with API importance • Different APIs have different importance for a programming task File.new FileWriter.new Logger.log FileWriter.write • Weaken the unimportant APIs  IDF-based weighting 𝑤𝑖𝑑𝑓 𝑦𝑡 = log 𝑁 𝑛 𝑦𝑡  Regularized Cost Function cost 𝑖𝑡 = − log 𝑝 𝜃 𝑦𝑖𝑡 𝑥𝑖 − 𝝀𝒘𝒊𝒅𝒇 𝒚𝒊𝒕
  • 11. System Overview Code Corpus RNN Encoder- Decoder API-related User Query Suggested API sequences Natural Language Annotations Training Instances API sequences Training Offline Training
  • 12. Step1 – Preparing a Parallel Corpus InputStream.read OutputStream.write # copy a file from an inputstream to an outputstream URL.new URL.openConnection # open a url File.new File.exists # test file exists File.renameTo File.delete # rename a file StringBuffer.new StreanBuffer.reverse # reverse a string ⋮ # ⋮ API Sequences (Java) Annotations(English) <API Sequence, Annotation> pairs • Collect 442,928 Java projects from GitHub (2008-2014) • Parse source files into ASTs using Eclipse JDT • Extract an API sequence and an annotation for each method body (when Javadoc comment exists)
  • 13. Extracting API Usage Sequences Post-order traverse on each AST tree:  Constructor invocation: new C() => C.new  Method call: o.m() => C.m  Parameters: o1.m1(o2.m2(),o3.m3())=> C2.m2-C3.m3- C1.m1  A sequence of statements: stmt1;stmt2;,,,stmtt;=>s1-s2-…-st  Conditional statement: if(stmt1){stmt2;} else{stmt3;} =>s1-s2-s3  Loop statements: while(stmt1){stmt2;}=>s1-s2 1 … 2 BufferedReader reader = new BufferedReader(…); 4 while((line=reader.readLine())!=null) 5 … 6 reader.close; Body Statement While Statement Variable Declaration Constructor Invocation Method Invocation Block Statement Type Variable BufferedReader reader readLineVariable reader BufferedReader.new BufferedReader.readLine BufferedReader.close …
  • 14. Extracting Natural Language Annotations /*** * Copies bytes from a large (over 2GB) InputStream to an OutputStream. * This method uses the provided buffer, so there is no need to use a * BufferedInputStream. * @param input the InputStream to read from * . . . * @since 2.2 */ public static long copyLarge(final InputStream input, final OutputStream output, final byte[] buffer) throws IOException { long count = 0; int n; while (EOF != (n = input.read(buffer))) { output.write(buffer, 0, n); count += n; } return count; } API sequence: InputStream.read OutputStream.write Annotation: copies bytes from a large inputstream to an outputstream. MethodDefinition Javadoc Comment Body … … The first sentence of a documentation comment
  • 15. Step2 – Training RNN Encoder-Decoder Model • Data  7,519,907 <API Sequence, Annotation> pairs • Neural Network  Bi-GRU, 2 hidden layers, 1,000 hidden unites  Word Embedding: 120 • Training Algorithm  SGD+Adadelta  Batch size: 200 • Hardware:  Nvidia K20 GPU
  • 16. Evaluation • RQ1: How accurate is DeepAPI for generating API usage sequences? • RQ2: How accurate is DeepAPI under different parameter settings? • RQ3: Do the enhanced RNN Encoder-Decoder models improve the accuracy of DeepAPI?
  • 17. Automatic Evaluation:  Data set: 7,519,907 snippets with Javadoc comments Training set: 7,509,907 pairs Test Set: 10,000 pairs  Accuracy Measure BLEU – The hits of n-grams of a candidate sequence to the ground truth sequence. 𝐵𝐿𝐸𝑈 = 𝐵𝑃 ∙ exp 𝑛=1 𝑁 𝑤 𝑛 𝑙𝑜𝑔𝑝 𝑛 𝑝 𝑛 = # n−grams appear in the reference+1 # n−grams of candidate+1 𝐵𝑃 = 1 𝑖𝑓 𝑐 > 𝑟 𝑒(1−𝑟/𝑐) 𝑖𝑓 𝑐 ≤ 𝑟 RQ1: How accurate is DeepAPI for generating API usage sequences?
  • 18.  Comparison Methods • Code Search with Pattern Mining Code Search – Lucene Summarizing API patterns – UP-Miner [Wang, MSR’13] • SWIM [Raghothaman, ICSE’16] Query-to-API Mapping – Statistical Word Alignment Search API sequence using the bag of APIs – Information retrieval RQ1: How accurate is DeepAPI for generating API usage sequences?
  • 19. Human Evaluation:  30 API-related natural language queries: • 17 from Bing search logs • 13 longer queries and queries with semantic related words  Accuracy Metrics: • FRank: the rank of the first relevant result in the result list • Relevancy Ratio: relevancy ratio = # relevant results # all selected results RQ1: How accurate is DeepAPI for generating API usage sequences?
  • 20.
  • 21. RQ1: How accurate is DeepAPI for generating API usage sequences? • Examples DeepAPI  Distinguishing word ordering convert int to string => Integer.toString convert string to int => Integer.parseInt  Identify Semantically related words save an image to a file => File.new ImageIO.write write an image to a file=> File.new ImageIO.write  Understand longer queries copy a file and save it to your destination path play the audio clip at the specified absolute URL SWIM  Partially matched sequences generate md5 hashcode=> Object.hashCode  Project-specific results test file exists => File.new, File.exists, File.getName, File.new, File.delete, FileInputStream.new,…  Hard to understand longer queries copy a file and save it to your destination path
  • 22. RQ2 – Accuracy Under Different Parameter Settings BLEU scores under different number of hidden units and word dimensions
  • 23. RQ3 – Performance of the Enhanced RNN Encoder-Decoder Models • BLEU scores of different Models(%) • BLEU scores under different λ
  • 24. Conclusion Apply RNN Encoder-Decoder for generating API usage sequences for a given natural language query  Recognize semantically related words  Recognize word ordering Future Work  Explore the applications of this model to other problems.  Investigate the synthesis of sample code from the generated API sequences.

Hinweis der Redaktion

  1. Quality of API sequences such as API bugs. -> language model learn the probability from large-scale data, so they are just noises. DeepAPI only produce commonly used API sequences. Second, this is a threat to validity. Our future work will explore better training data The augment of the loss function will affect the original conditional probabilities? => It is just a regularization. The motivation example “convert int to string” is not convincing as google can distinguish? How the model distinguish words with different forms? For example, “write an image”->”writing an image”? An word embedding mechanism to identify similar words.. In real world, developers want API graph, instead of a sequence. =>No need for graph, sequences indicates difference usages, developers can synthesize their own graphical structure of APIs.
  2. General search engines are not designed for source code and could have many unrelated results such as discussions and programming experience. Besides, developers need to brows many web pages to check the results.
  3. Q: Bi-GRU will affect API sequence? Why reverse API sequences? => we just use Bi-GRU for the query. For API sequence, we use traditional GRU.