Google Play, Apple App Store and Windows Phone Store are well known distribution platforms where users can download mobile apps, rate them and write review comments about the apps they are using. Previous research studies demonstrated that these reviews contain important information to help developers improve their apps. However, analyzing reviews is challenging due to the large amount of reviews posted every day, the unstructured nature of reviews and its varying quality. In this demo we present
ARdoc, a tool which combines three techniques: (1) Natural Language Parsing,(2) Text Analysis and (3) Sentiment Analysis to automatically classify useful feedback contained in app reviews important for performing software maintenance and evolution tasks. Our quantitative and qualitative analysis (involving mobile professional developers) demonstrates that ARdoc correctly classifies feedback useful for maintenance perspectives in user reviews with high precision (ranging between84% and 89%), recall (ranging between 84% and 89%), and an F-Measure (ranging between 84% and 89%). While evaluating our tool developers of our study confirmed the use-fulness of ARdoc in extracting important maintenance tasks for their mobile applications.
4. Users Submit Many
Reviews Regularly
iOS apps receive on average 23
reviews per day
Facebook for iOS receive more
than 4000 reviews per day
[ Pagano et al. - RE 2013 ]
4
5. Past Work
Chen et al – ICSE 2014
ARMiner: an approach to help
app developers discover the
most informative user
reviews
i. text analysis and machine
learning to filter out non-
informative reviews
ii. topic analysis to recognize
topics treated in the reviews
classified as informative
5
8. Identifying Useful Reviews
i. The awful button in the page doesn’t work
ii. A button in the page should be added
8
BUG DESCRIPTION
9. Available Sources for identifying Useful Reviews
i. The awful button in the page doesn’t work
ii. A button in the page should be added
9
sentiment
lexicon
structure
Natural Language Parsing
Sentiment Analysis
Text Analysis
21. ARdoc Classification Accuracy
24
Minesweeper
PowernAPP
Picturex
https://www.scribd.com/document/323048838/ARdoc-Appendix
3 Apps
2) ARdoc classifies useful feedback with a precision ranging
between 84% and 89%, a recall ranging between 84% and
89%, and an F-Measure ranging between 84% and 89%
2) ARdoc classifies useful feedback with a precision ranging
between 84% and 89%, a recall ranging between 84% and
89%, and an F-Measure ranging between 84% and 89%
2) ARdoc classifies useful feedback with a precision ranging
between 84% and 89%, a recall ranging between 84% and
89%, and an F-Measure ranging between 84% and 89%
22. Conclusion & Future Work
25
1) ARdoc a novel tool able to mine relevant feedback for
real world developers interested in accomplishing
software maintenance and evolution tasks.
2) ARdoc classifies useful feedback with a precision ranging
between 84% and 89%, a recall ranging between 84% and
89%, and an F-Measure ranging between 84% and 89%
23. Conclusion & Future Work
26
1) ARdoc a novel tool able to mine relevant feedback for
real world developers interested in accomplishing
software maintenance and evolution tasks.
2) ARdoc classifies useful feedback with a precision ranging
between 84% and 89%, a recall ranging between 84% and
89%, and an F-Measure ranging between 84% and 89%
&
Di Sorbo et al. “What Would Users Change in My App? Summarizing App
Reviews for Recommending Software Changes” – FSE 16/11//2016 (Session 11)
24. Thanks for the Attention!
27
Stanford CoreNLP
Apache Lucene API
WEKA
Questions?
Hinweis der Redaktion
Hi, I’m Andrea Di Sorbo, I’m a ph.D. student at University of Sannio. In this paper we investigated possible ways for classifying user reviews in according to software maintenance tasks with the purpose of helping developers improving their apps.
Well, the context of our study is App Stores, such as Apple App Store and Google Play, where we know that users can download apps, give ratings and write reviews about the mobile apps they're using.
Indeed previous studies demonstrated that about one third of the information contained in user
reviews is helpful for developers, giving feedback containing requests of implementation of new features, bug descriptions or requests of improvement about existing functionalities.
For example, a study by Pagano (RE2013) showed that mobile apps receive approximately 23
reviews per day and popular apps, as Facebook, receive on average more than 4000 reviews
per day.
To handle this problem Chen at. al proposed AR-Miner, an approach to help app developers discover the most informative user reviews, which uses
i) text analysis and machine learning to filter out non-informative reviews and
ii) topic analysis to recognize topics treated in "informative" reviews.
We argue that text lexicon represents just one of the possible dimensions that can be exploited to detect
informative reviews.
In the first review the user exposes a problem, while in the second one the user suggests the implementation of a new feature
Thus, understanding the intention in user reviews could add precious information
for accomplishing software maintenance and evolution tasks.
We believe that exists three different dimensions that can be explored to determine the intention of a given user review: the sentiment, the structure, and text lexicon.
Thus our final taxonomy was composed by only four categories of sentences: feature request, problem discovery, information seeking and information giving.
Relying on the techniques previously discussed, we can associate a label to each sentence in the review. These are some example of useful feedback by users. These example sentences contain relevant information for improving an app: the first two sentences could suggest developers new functionalities to implement, while the third and fourth sentences indicate bugs that need to be fixed.