1. Title of the Project:-
Detailed Classification of Customer Reviews
G.H Raisoni College of Engineering, Nagpur
Department of Information Technology and
Project Phase –I
With the rapid of Growth of E-Commerce websites,
there is no doubt that people are drawn to online
shopping more nowadays. As people are drawn to it, so
are the sellers. But sellers can differentiate in quality as
well as quantity.
To make it easier for consumers to decide which product
suits for their demands and needs, customer review on
E-Commerce Websites is a proficient way for consumers
to get what they are looking for
Sentiment analysis (or opinion mining) is a natural language
processing (NLP) technique used to determine whether data is
positive, negative, or neutral.
Sentiment analysis is often performed on text data to help
businesses monitor brand and product sentiment in customer
feedback and understand customer needs.
5. • The goal is to automatically recognize and classify opinions expressed in text to determine overall sentiment.
Sentiment analysis is the process of analyzing online writing to determine whether it is positive, negative, or neutral. Simply put,
sentiment analysis helps find the author's attitude towards a topic.
We will be using Concepts likeVader and RoBERTa Model and comparing the results
between these two models.
• Vader Model:
Vader is model that is based on lexicon and rule based matching , sentiment analysis tool
that is specifically aware to sentiments expressed in social platform.
The VADER sentimental analysis uses a dictionary that converts lexical data into sentiment
scores, which measure the intensity of an emotion. By adding the intensity of each word in a
text, one can determine the sentiment score of that text.
• RoBERTa Model:
RoBERTa stands for Robustly Optimized BERT Pre-training Approach.
The goal of this model is to optimize the training of BERT architecture in order to take lesser
time during pre-training.
This model for text sentiment is sensitive to both the polarity (positive/negative)
and intensity (strong) of emotion.TheVADER sentimental analysis uses a
dictionary that converts lexical data into sentiment scores, which measure the
intensity of an emotion. By adding the intensity of each word in a text, one can
determine the sentiment score of that text.
• Part-of-speech (POS) is the practice of
categorizing words in a text (corpus) in
accordance with a certain part of
speech, depending on the word's
definition and context, is known as
tagging in natural language processing
11. RoBERTa Model
RoBERTa Model is a model trained of a large corpus of data
Transformer model that accounts for the words but also the
context related to other words.
teaches the computer to anticipate purposefully hidden text inside
examples of unannotated language
RoBERTa outperforms BERT in terms of the masked language
modelling aim and performs better on subsequent tasks.
Postive reviews and their Polarity Scores
Review – “I am so happy I orderd it”
Here the Comment is I am so happy I orderd it , as we can see the review is about
a satisfied customer who is
happy with this product, hence we can say this is a positive review.
Now if we see the polarity scores that are
Negative (neg) – 0.0
Here there is no negative words or emotion hence the neg score is 0
Positive(pos) – 0.517
The positive score here is good as the words suggest that the review is overall a
Compound Score – 0.646
Review – “This is bad I hated it”
Here the Comment isThis is bad I hated it, as we can see the review is about
unsatisfied customer who is
unhappy with this product, hence we can say this is a negative review.
Now if we see the polarity scores that are
Negative (neg) – 0.658
Here there is a customer who is unhappy with the product so the emotion here is
negative, Hence the negative
value will be higher here.
Positive(pos) – 0.0
Here there is no positive words or emotion hence the neg score is 0.
Compound Score – 0.8271
As we can see in our model the ROBERta has scores that are much
confident than theVADER model as it used deep learning
approach. Due to this approach a difference can be seen between
the two models.
One cannot dismiss the value that sentiment analysis offers to the
industry despite all the obstacles and potential issues that it faces.
Sentiment analysis is destined to become one of the key
determinants of many business decisions in the future because it
bases its findings on elements that are fundamentally
Sentiment analysis’s results are helpful. It cannot be used to
forecast a company’s success or other measures. Sentiment
analysis may occasionally be unnecessary and only serve as a
reporting measure after the harm has already been done.
1] A. Srithirath and P. Seresangtakul, "A hybrid approach to Lao word segmentation using longest syllable level
matching with named entities recognition," 2013 10th International Conference on Electrical
Engineering/Electronics, Computer, Telecommunications and Information Technology, 2013, pp. 1-5, doi:
2] R. Jiamthapthaksin, P. Setthawong and N. Ratanasawetwad, "A system for popular Thai slang extraction
from social media content with n-gram based tokenization," 2016 8th International Conference on Knowledge
and SmartTechnology (KST), 2016, pp. 130-135, doi: 10.1109/KST.2016.7440478.
3] I. Olenych, M. Prytula, O. Sinkevych and O. Khamar, "System of Automatic Determination of Ukrainian
Text Tone," 2021 IEEE 12th International Conference on Electronics and Information Technologies (ELIT),
2021, pp. 80-83, doi: 10.1109/ELIT53502.2021.9501124.
4] S. J. Putra, M. N. Gunawan and A. Suryatno, "Tokenization and N-Gram for Indexing Indonesian
Translation of the Quran," 2018 6th International Conference on Information and Communication Technology
(ICoICT), 2018, pp. 158-161, doi:10.1109/ICoICT.2018.8528762.
5] P. Prakrankamanant and E. Chuangsuwanich, "Tokenization-based data augmentation for text classification," 2022
19th International Joint Conference on Computer Science and Software Engineering (JCSSE), 2022, pp. 1-6, doi:
6] Abdul-Mageed, M., M.T. Diab, and M. Korayem. Subjectivity and sentiment analysis of modern standard Arabic.
In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:shortpapers, 2011.
7] Akkaya, C., J. Wiebe, and R. Mihalcea. Subjectivity word sense disambiguation. In Proceedings of
the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP-2009), 2009.
8] Alm, C.O. Subjective natural language problems: motivations, applications, characterizations, and
implications. In Proceedings of the 49th Annual Meeting of the Association for Computational
Linguistics:shortpapers (ACL-2011), 2011.