NEON: NLP-Based Tool for Analyzing Software Artifacts

An NLP-based Tool for Software
Artifacts Analysis
Andrea Corrado A. Massimiliano Gerardo Sebastiano
Di Sorbo Visaggio Di Penta Canfora Panichella

OUTLINE
Context:
Semi-structured and unstructured software artifacts..
Proposed Solution:
A tool to infer rules aimed at identifying recurrent natural
language patterns in software informal artifacts.
Evaluation:
Assessment of the inferred rules involving human subjects
with NLP expertise.
Conclusions and Future Work

DISTRIBUTED DEVELOPMENT TEAMS
Development
teams are globally
distributed.

INFORMAL SOFTWARE ARTIFACTS
Information is disseminated
(and scattered) over semi-
structured and unstructured
software artifacts.

RECOMMENDER SYSTEMS
Informal software artifacts
have been leveraged to build
automated approaches aimed
at supporting developers in
several SE tasks.

INFORMATION RETRIEVAL MODELS
Information retrieval
approaches mainly treat
unstructured text as a
bag of words.
for
great
!
LDA
tool
is
summarizing
a
text

INTENTION MINING
To automatically identify
textual patterns in informal
software documents that are
relevant to different
evolution tasks, we
proposed intention mining.
Di Sorbo et. al., Development Emails Content Analyzer:
Intention Mining in Developer Discussions (T). ASE 2015: 12-23

INTENTION MINING
Intention mining has been
applied for classification,
summarization, or quality
assessment purposes.

CHALLENGE
Approaches based on Natural
Language (NL) parsing
techniques require the manual
definition of sets of NL rules;
this task is effort-intensive and
error-prone.

AUTOMATING INTENTION MINING
Recent research attempted to
automate and generalize
intention mining by using deep
learning-based methods.

INTERPRETABILITY?
Deep learning-based methods
make it difficult to understand
the specific linguistic patterns
that have been identified. Such
patterns are crucial to support
several SE tasks.

PROPOSED SOLUTION
NEON automatically identifies
the NL rules necessary to
detect significant natural
language patterns occurring in
software informal documents.

NEON
NEON implements an
approach presented in
previous work, and allows
saving more than 70% of the
time otherwise spent on the
manual identification and
definition of NL rules.

NEON’S ARCHITECTURE
Two main phases:
1) Training phase:
• a set of software artifacts of a specific
type (e.g., app reviews or issue reports)
is inspected to identify rules for
capturing recurrent NL patterns.
2) Testing phase:
• the inferred rules are leveraged to
recognize the information of interest in
a different corpus of software artifacts.

① The end-user provides a set of software
artifacts as training documents.
② The Parser preprocesses the training
documents and generates the semantic
graph of each sentence present in these
documents.
③ The PathsFinder (i) analyzes the
semantic graphs, (ii) finds all recurrent
paths in these graphs, and (iii) outputs
the rules (in XML format) able to
identify such paths.

Given two semantic graphs sharing a
common grammatical structure, to specify
the rule aimed at recognizing such a
common structure, the PathsFinder:
• selects the nodes of the verb and noun
types from both graphs;
• identifies the pairs of similar nodes;
• analyzes the children and the labeled
arcs outgoing from the pairs of similar
nodes.

④ The end-user provides a set of software
artifacts different from the training
documents.
⑤ The Parser performs sentence splitting,
tokenization, and, for each sentence, it
generates the Stanford Dependencies
(SD) representation.
⑥ The Classifier leverages SD
representation and the set of XML rules
to detect the presence of text structures
that match the defined rules.
⑦ All the recognized sentences are
highlighted using different colors for
different categories.

EVALUATION
Goal:
• assess the NEON’s capability of identifying rules
useful to automatically classify app reviews along
the feature request and problem discovery classes.
Study Objects:
• 100 app reviews extracted from a labeled dataset
presented in previous work [1]:
• 50 app reviews of the feature request category;
• 50 app reviews of the problem discovery
category.
Study Subjects:
• 3 subjects with NL parsing expertise:
• one professional software engineer (Subject 1);
• one SE master student (Subject 2);
• an author of the paper (Subject 3).
[1] Panichella et al., How can i improve my app? Classifying user
reviews for software maintenance and evolution. ICSME 2015: 281-290

STUDY PROCEDURE
Subjects 1 and 2:
• independently inspect all the 241 candidate rules
provided by NEON and judge whether each rule
is relevant (or not) for identifying sentences
belonging to one of the feature request or
problem discovery categories.
Subject 3:
• expresses an independent judgment on the
relevance of the rules for which a disagreement
between the two initial raters is observed.
NL rule
NL rule
Subject 3
NL rule
NL rule
NL rule
NL rule
Subject 1 Subject 2

RESULTS
More than 1/3 of the rules
recommended by NEON were judged
useful by at least 2/3 human validators
experienced in NL parsing.
Several patterns are very similar to
those manually identified in previous
work [1].
[1] Panichella et al., How can i improve my app? Classifying user
reviews for software maintenance and evolution. ICSME 2015: 281-290

CONCLUSION
• NEON (i) infers NL rules for identifying
significant NL patterns in software
artifacts, and (ii) leverages such rules for
information classification (or extraction)
purposes.
• NEON is time-saving and useful for
mining rules from a variety of software
artifacts.
• The effectiveness of NEON might degrade
when dealing with sentences containing
mixtures of code elements and natural
language or incomplete sentences (e.g.,
commit messages, chats).

FUTURE WORK
We will leverage NEON to develop (or
improve) recommender systems supporting
developers in a variety of software
engineering tasks:
• requirements elicitation
• issue management
• task prioritization
• etc.

NEON: NLP-Based Tool for Analyzing Software Artifacts

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Ähnlich wie NEON: NLP-Based Tool for Analyzing Software Artifacts

Ähnlich wie NEON: NLP-Based Tool for Analyzing Software Artifacts (20)

Mehr von Sebastiano Panichella

Mehr von Sebastiano Panichella (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

NEON: NLP-Based Tool for Analyzing Software Artifacts