The document describes an NLP-based tool called NEON that automatically identifies natural language rules for analyzing informal software artifacts. NEON trains on a set of artifacts to infer rules for identifying recurrent patterns. It was evaluated on its ability to classify app reviews using rules identified from a training set. Over a third of NEON's recommended rules were judged useful by human evaluators for classifying reviews into feature requests and problem discoveries. Future work involves using NEON to develop recommender systems for various software engineering tasks.
NEON: NLP-Based Tool for Analyzing Software Artifacts
1. An NLP-based Tool for Software
Artifacts Analysis
Andrea Corrado A. Massimiliano Gerardo Sebastiano
Di Sorbo Visaggio Di Penta Canfora Panichella
2. OUTLINE
Context:
Semi-structured and unstructured software artifacts..
Proposed Solution:
A tool to infer rules aimed at identifying recurrent natural
language patterns in software informal artifacts.
Evaluation:
Assessment of the inferred rules involving human subjects
with NLP expertise.
Conclusions and Future Work
14. NEON’S ARCHITECTURE
Two main phases:
1) Training phase:
• a set of software artifacts of a specific
type (e.g., app reviews or issue reports)
is inspected to identify rules for
capturing recurrent NL patterns.
2) Testing phase:
• the inferred rules are leveraged to
recognize the information of interest in
a different corpus of software artifacts.
15. NEON’S ARCHITECTURE
① The end-user provides a set of software
artifacts as training documents.
② The Parser preprocesses the training
documents and generates the semantic
graph of each sentence present in these
documents.
③ The PathsFinder (i) analyzes the
semantic graphs, (ii) finds all recurrent
paths in these graphs, and (iii) outputs
the rules (in XML format) able to
identify such paths.
17. NEON’S ARCHITECTURE
④ The end-user provides a set of software
artifacts different from the training
documents.
⑤ The Parser performs sentence splitting,
tokenization, and, for each sentence, it
generates the Stanford Dependencies
(SD) representation.
⑥ The Classifier leverages SD
representation and the set of XML rules
to detect the presence of text structures
that match the defined rules.
⑦ All the recognized sentences are
highlighted using different colors for
different categories.
20. EVALUATION
Goal:
• assess the NEON’s capability of identifying rules
useful to automatically classify app reviews along
the feature request and problem discovery classes.
Study Objects:
• 100 app reviews extracted from a labeled dataset
presented in previous work [1]:
• 50 app reviews of the feature request category;
• 50 app reviews of the problem discovery
category.
Study Subjects:
• 3 subjects with NL parsing expertise:
• one professional software engineer (Subject 1);
• one SE master student (Subject 2);
• an author of the paper (Subject 3).
[1] Panichella et al., How can i improve my app? Classifying user
reviews for software maintenance and evolution. ICSME 2015: 281-290