SlideShare a Scribd company logo
1 of 9
Download to read offline
DEFICIENT DOCUMENTATION DETECTION
A Methodology to Locate Deficient Project
Documentation using Topic Analysis
Joshua Charles Campbell Department of Computing Science
Chenlei Zhang Department of Computing Science
Zhen Xu Department of Electrical and Computer Engineering
Abram Hindle Department of Computing Science
James Miller Department of Electrical and Computer Engineering
The 10th Working Conference on Mining Software Repositories
MOTIVATION
Developers
Official
Crowd-sourced
MSR 2013 2
Project
Documentation
Q&A Website
RESEARCH QUESTION
• Answer the question “Can we identify deficient
areas of project documentation by relating it to
Stack Overflow questions?”
• Provide a method to relate crowd-sourced
questions and project documentation.
3MSR 2013
METHODOLOGY
4
Stack
Overflow
Questions
Data
Extraction
Project
Documentation
MSR 2013
Two-phase Processing
• Data extraction
• Topic analysis
LDA
Ranked Deficient
Topics
Topic
Analysis
Stack Overflow
Question/Topic
Matrix
Project
Documentation/Topic
Matrix
Max
Subtract
DEFICIENT TOPICS FOUND
5MSR 2013
PHP EXAMPLE
6
• Deficient documentation exists
• Stack Overflow question #7321289:
• “How want to apply a vignette effect to an image using PHP with
ImageMagik. I found this function but I’m not sure how to use it.”
• PHP documentation:
• Imagick::vignetteImage
• http://www.php.net/manual/en/imagick.vignetteimage.php
MSR 2013
PYTHON EXAMPLE
7
• Deficient documentation exists
• Stack Overflow question #5893163:
• “What is the meaning of _ after for in this code?”
MSR 2013
OUT-OF-SCOPE DOCUMENTATION
• Questions that related to multiple projects
• For example, questions about:
• Clear indications and links should be included
when a user should reference external project
documentation
8
HTML
MSR 2013
MySQL
CONCLUSION
• Developed a method for locating deficient
documented aspects in project documentation;
• Successfully located deficient project
documentation using Stack Overflow questions.
9MSR 2013

More Related Content

Similar to Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis

From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
Florian Roscheck
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
SUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project PresentationSUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project Presentation
Sung Park
 
Monet banksy may2014
Monet banksy may2014Monet banksy may2014
Monet banksy may2014
GailStrachan
 
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusSocial Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Jakub Ruzicka
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine LearningBuilding Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
David Walker, CSM,CSD,MCP,MCAD,MCSD,MVP
 

Similar to Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis (20)

Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Data_Engineering_Learning_Roadmap.pdf
Data_Engineering_Learning_Roadmap.pdfData_Engineering_Learning_Roadmap.pdf
Data_Engineering_Learning_Roadmap.pdf
 
Multithreading in C# - pitfalls, mistakes and solutions.
Multithreading in C# - pitfalls, mistakes and solutions.Multithreading in C# - pitfalls, mistakes and solutions.
Multithreading in C# - pitfalls, mistakes and solutions.
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 
Neo4j GraphDay Munich - How to make your GraphDB project successful
Neo4j GraphDay Munich - How to make your GraphDB project successfulNeo4j GraphDay Munich - How to make your GraphDB project successful
Neo4j GraphDay Munich - How to make your GraphDB project successful
 
What we learned at pass summit in 2018
What we learned at pass summit in 2018What we learned at pass summit in 2018
What we learned at pass summit in 2018
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Workshop proposal
Workshop proposalWorkshop proposal
Workshop proposal
 
SUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project PresentationSUNG PARK PREDICT 422 Group Project Presentation
SUNG PARK PREDICT 422 Group Project Presentation
 
Benchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academiaBenchmarking search relevance in industry vs academia
Benchmarking search relevance in industry vs academia
 
Untangling fall2017 week1
Untangling fall2017 week1Untangling fall2017 week1
Untangling fall2017 week1
 
Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11Intro to Data Science by DatalentTeam at Data Science Clinic#11
Intro to Data Science by DatalentTeam at Data Science Clinic#11
 
3DIR Presentation at BIM2015 Conference
3DIR Presentation at BIM2015 Conference3DIR Presentation at BIM2015 Conference
3DIR Presentation at BIM2015 Conference
 
Monet banksy may2014
Monet banksy may2014Monet banksy may2014
Monet banksy may2014
 
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusSocial Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
 
NDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data ScienceNDC Oslo : A Practical Introduction to Data Science
NDC Oslo : A Practical Introduction to Data Science
 
Building Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine LearningBuilding Powerful and Intelligent Applications with Azure Machine Learning
Building Powerful and Intelligent Applications with Azure Machine Learning
 

Deficient Documentation Detection: A Methodology to Locate Deficient Project Documentation using Topic Analysis