SlideShare ist ein Scribd-Unternehmen logo
1 von 48
 It aims to determine the attitude of a
speaker or a writer with respect to some
topic or the overall contextual polarity of a
document.
 The attitude may be his or her judgment or
evaluation, affective state (that is, the
emotional state of the author when writing),
or the intended emotional communication
(that is, the emotional effect the author
wishes to have on the reader).
 Determining document subjectivity:
Often called subjectivity classification, this subtask
determines whether a giving text is objective (expressing a
fact) or subjective (expressing an opinion or emotion).
 Determining document orientation:
Often called sentiment classification or document-level
sentiment classification, this subtask determines the
polarity of a given subjective text. In other words,
determines whether this text expresses a positive or a
negative sentiment on its subject matter.
 Determining the strength of document orientation:
This subtask decides whether the positive sentiment
expressed by a text on its subject matter is weakly positive,
mildly positive or strongly positive.
 Consumer information
› Product reviews
 Marketing
› Consumer attitudes
› Trends
 Politics
› Politicians want to know voters’ views
› Voters want to know policitians’ stances and who else
supports them
 Social
› Find like-minded individuals or communities
 Machine learning
› Naïve Bayes
› Maximum Entropy Classifier
› SVM
 Unsupervised methods
› K-means
› Olsu’s Threshold
› Fuzzy c-means
 Data to annotate given.
 But no training data or additional
resources provided.
 Aim:
 To create a lexical resource in an
automated way without any human
intervention for annotating data.
 Affective lexicon to be used for polarity
classification.
 To obtain training material, use
emoticons as indicators of a mood within
a message.
 Split the tweets into 2 sets:
 positive -  ;) :} :] ...
 negative -  :{ :’( ...
 We get a positive word list and a
negative word list.
 If a word present more frequently in
positive set, then it is positive and vice
versa.
 Aim:
 To analyse the effectiveness of various
popular classifiers and identify the more
suitable classifier for twitter that could
ease the process of classifying
sentiments in tweets.
 Strategy:
To use two or more classifiers chained
one after the other. This resulted in a
high yield, better accuracy of mined
data. I
 First stage: the incoming preprocessed
data is classified into three categories –
polar, neutral and irrelevant.
 Second stage: the data classified under
polar is fed to a second classifier for
further segregation into positive and
negative.
The classification algorithms used in the
research are:
 Naive Bayes
 Random Forest
 Support Vector Machines(SVM)
 SMO
 The research has been performed on
Tunisian user’s statuses on Facebook during
the “Arabic Spring” era.
 The aim is to extract useful information
about user’s sentiments and behaviours
during this sensitive and significant period.
 For this purpose, a method based on
Support Vector Machine(SVM) and Naive
Bayes has been proposed.
 The methodology used is collection of raw
data, followed by lexicon development.
 Three types of lexicons were created ;
lexicon for social acronyms, lexicon for
emoticons and lexicon for interjections.
 Then data preprocessing is done – stop
words removal and stemming, followed by
feature extraction.
 Finally, the machine learning algorithms are
applied.
 The performance of different feature sets
using Naive Bayer (NB) and SVM classifiers
was then compared.
 This paper is concerned with the
problem of mining social emotions from
text. The aim of this research is to
discover the connection between
different social emotions and affective
terms and based on it automatically
predict social emotion of the text.
 An official Chinese news portal has been
used for the dataset collection.
 The proposed solution is to construct a joint
emotion-topic model. Latent Dirichlet
Allocation (LDA) has been used with an
additional layer for emotion modelling.
 A three step process has been used for
generation of affective terms:
 The first step is to generate an emotion from a
document specific emotional distribution.
 The second step is to generate a latent topic
from a Multinomial distribution.
 The final step is to develop an approximate
inference method based on Gibbs sampling.
 As a complete generative model, the
proposed model allows to infer a
number of conditional probabilities for
unseen documents. For example,
probabilities of latent topics given an
emotion and that of terms given a topic.
This method was found to be better than
emotion-term model and multiclass SVM
as the emotion assignments at the term
level could be visualised.
 This paper proposes an aspect-based
sentiment classification approach to
analyze sentiments for tweets.
 In previous studies, the overall sentiment of
a tweet was determined. But this is not
useful for the companies which need to
monitor consumer opinion of their
product/services. For them it would be
more useful to have information as to which
aspects of the product/service the users are
happy or unhappy about.
 The aspect-based sentiment classifier makes use of a POS
tagger, a sentiment lexicon and a few gazetteer lists to produce
results of the form [aspect, sentiment words, polarity]. This
process consists of three main steps:
 1. Aspect-sentiment extraction: Given a tweet, this step
determines a list of possible aspect candidates along with their
associated sentiments and polarity.
 2. Aspect ranking and selection: A tweet can express many
different opinions. Only important aspects should be selected.
For example, when classifying tweets on a telecommunication
company, some of the aspects of interest include customer
service, 3G connectivity, speed, etc. In this step the aspect
candidates are then ranked and the set of most significant
aspects are selected as the expected aspects.
 3. Aspect classification: Using the set of expected aspects and
results from the aspect-sentiment extraction step, we obtain the
final list of aspects along with their polarity for each tweet.
 The experimental results suggested that
a layered classification approach which
uses the aspect-based classifier as the
first layer classification and the tweet-
level classifier as the second layer
classification is more effective than a
classifier trained using target-dependent
features. This approach is able to
consistently improve the performance of
existing sentiment classifiers.
 The aim of this research was to
automatically extract the set of
messages which contain opinions, filter
out non-opinion messages and
determine their sentiment directions, that
is positive or negative.
 Manually labelled data has been used
as training data to build model.
 The initial step is to preprocess the crawled
tweets by removing usernames, hashtags,
retweet tags, non-English words.
 Three resources were constructed for the
further preprocessing which included a stop
word dictionary, an emotion dictionary and
an acronym dictionary.
 After preprocessing all words are
transformed into the form (word, POS tag,
English-word, Stop-word).
 Thereafter, tweets containing opinions are
extracted, filtering out the non-opinion
tweets. Naive Bayes classifier is then used
to classify the tweets based on sentiment.
 Since a word may have different meanings
in different domains, short text classification
is done.
 Two feature selection algorithms have been
used for this purpose – Mutual Information
(MI) and X Feature Selection. The short texts
are classified into different domains, so that
the classifier can automatically classify with
greater performance the tweets as being
either positive or negative.
 The main objective of this research was
to compare state-of-the-art Sentiment
Analysis methods against a novel hybrid
method.
 The Hybrid method adopts a
combination of both the supervised
methods and unsupervised methods.
 It utilizes a Sentiment lexicon to generate
a new set of features to train a linear
SVM classifier.
 In this paper, domain based Twitter Sentiment
Analysis is done. The domain considered is
smartphones.
 The Hybrid Polarity Detection System has three
modules:
 The first module is the Preprocessing Module in which
cleaning of data is done. The preprocessing steps
include removal of usernames, URL tags etc.
 The second module is Sentiment Feature Generator
Module. In this module slangs are replaced with their
proper language equivalents Senti Strength lexicon is
then used to tag the words with their sentiment score.
Fourteen features are extracted from the text.
 The third module is Machine Learning Classifier, in
which a linear SVM takes the input feature set and
classifies the tweets as positive or negative.
 In this paper, the authors have provided a
summary of the differential evolution algorithm
and its improved measures in order to facilitate
researchers studying the topic. Firstly the
differential evolution algorithm basics and its
various operations such as Mutation, Crossover
and Selection have been explained.
Thereafter, the different improvements
directed to increase the optimization
performance are compared. The efficiency of
differential evolution is optimised using
improvements making it a more efficient
application. The improvement measures mainly
include the evolution operation, parameter
settings and other improvements, focussing
mainly on the mutation operation.
 Most traditional clustering algorithms
simply assume that the number of
clusters is given and focus on the quality
of clustering results. This paper presents a
clustering algorithm for clustering and
automatically determining the number
of clusters as well. The proposed
algorithm has two steps. Firstly, a
mechanism, region splitting and merging
(RSM) to split and then merge the similar
groups until a self adaptive threshold is
reached. Secondly, the number of
clusters fine tuned using automatic
clustering differential evolution (ACDE).
 Data Collection: Retrieval of twitter
status updates
 Lexicon development
 Data Pre-Processing
 Feature Extraction, Normalisation and
Reduction
 K-means Clustering
 Differential Evolution
 Casefolding.
 Removal of:
 unnecessary punctuations
 extra blank spaces
 retweet tag
 usertags
 URL’s
 Hashtags
 Removal of stopwords
 Replacement of emoticons
 Positive emoticons – EPOS
 Negative emoticons – ENEG
 Neutral emoticons – ENEUT
 Replacement of sentiment words
 Positive words – POS
 Negative words – NEG
 Replacement of negation and intensity words
 Negation words – NEGATION
 Intensity words -- INTENSITY
 Feature Extraction
 The feature extraction is the process of extracting the main
characteristics of the text. For a machine learning algorithm to
perform well, it is essential to have features that are descriptive
of the text. The total number of occurrences of following features
have been taken into account for each tweet:
 Words
 Exclamation marks (!)
 EPOS keyword
 ENEG keyword
 ENEUT keyword
 POS keyword
 NEG keyword
 NEGATION keyword
 INTENSITY keyword
 Random words (words left, which do not fall into any category)
 The values of all the features are normalised
to the range of 0 to 1. The normalised value
is given by
Normalised(e) = e - Emin
Emax - Emin
where,
e - the original value
Emax - the maximum value of the feature
Emin - the minimum value of the feature
 Feature reduction is done by computing
cross correlation for the features. One
among the features which are closely
related is removed from the table.
ALGORITHM K-MEANS DIFFERENTIAL
EVOLUTION
ACCURACY 51% 59%
Findings
 Through this project I have investigated the utility
of sentiment classification on a collection of
dataset.
 While exploring the topic, I observed that there is
a limited number of algorithms that are useful for
twitter sentiment analysis.
 The twitter statuses have unique characteristics
compared to other corpuses. Since there is a
limitation of 140 words, the usual data mining
techniques used for movie reviews, etc can’t be
used.
 Also, not many research papers were available
for feature reduction.
Conclusion
 This project has been a great learning experience in the field
of information retrieval and data mining. In this project, twitter
dataset was collected for the purpose of Sentiment analysis.
Various data preprocessing techniques were applied on the
dataset. Thereafter, features were extracted from each tweet
and normalised. Feature reduction was then applied to
remove one among the closely related features. The quality
of features/ attributes that are extracted from the training
dataset affects the performance of the technique. K-means
clustering algorithm and Differential Evolution, an optimization
algorithm was then applied to cluster data into two classes,
positive and negative. Finally, the accuracies of these two
algorithms was compared. On the basis of accuracies, it can
be said that Differential Evolution performs better than K-
Means Algorithm for Twitter dataset.
Future Work
 As future work, three more clustering
techniques will be applied as part of
unsupervised learning which include Olsu’s
Threshold, Fuzzy c- means and EM algorithm.
Next step would be to compare it with
supervised learning methods including SVM,
Naive Bayes and LDA. Accuracies of different
algorithms will be calculated and compared.
 Twitter dataset for a particular product will be
collected and Opinion mining will be
applied.

Weitere ähnliche Inhalte

Was ist angesagt?

Camera ready sentiment analysis : quantification of real time brand advocacy ...
Camera ready sentiment analysis : quantification of real time brand advocacy ...Camera ready sentiment analysis : quantification of real time brand advocacy ...
Camera ready sentiment analysis : quantification of real time brand advocacy ...Absolutdata Analytics
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Dr. Amarjeet Singh
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET Journal
 
INFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXTINFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXTijcseit
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSubarno Pal
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)es712
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276IJMER
 
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...AIRCC Publishing Corporation
 
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...ijcsit
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Mido Razaz
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 

Was ist angesagt? (20)

F0363942
F0363942F0363942
F0363942
 
Abstract
AbstractAbstract
Abstract
 
Camera ready sentiment analysis : quantification of real time brand advocacy ...
Camera ready sentiment analysis : quantification of real time brand advocacy ...Camera ready sentiment analysis : quantification of real time brand advocacy ...
Camera ready sentiment analysis : quantification of real time brand advocacy ...
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA TechniqueIRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
IRJET- Interpreting Public Sentiments Variation by using FB-LDA Technique
 
INFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXTINFORMATION RETRIEVAL FROM TEXT
INFORMATION RETRIEVAL FROM TEXT
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)A framework for emotion mining from text in online social networks(final)
A framework for emotion mining from text in online social networks(final)
 
Aman chaudhary
 Aman chaudhary Aman chaudhary
Aman chaudhary
 
Ijmer 46067276
Ijmer 46067276Ijmer 46067276
Ijmer 46067276
 
Project report
Project reportProject report
Project report
 
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
Adaptive Vocabulary Construction for Frustration Intensity Modelling in Custo...
 
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
ADAPTIVE VOCABULARY CONSTRUCTION FOR FRUSTRATION INTENSITY MODELLING IN CUSTO...
 
IJET-V3I1P1
IJET-V3I1P1IJET-V3I1P1
IJET-V3I1P1
 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
2
22
2
 

Andere mochten auch

Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisNicole Novielli
 
Proposal final
Proposal finalProposal final
Proposal finalMido Razaz
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemNicole Novielli
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisYun Hao
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageMido Razaz
 
Ire project presentation
Ire project presentationIre project presentation
Ire project presentationAkshita Jha
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Kavita Ganesan
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in PythonImry Kissos
 

Andere mochten auch (11)

CICLing_2016_paper_52
CICLing_2016_paper_52CICLing_2016_paper_52
CICLing_2016_paper_52
 
Deep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment AnalysisDeep Tweets: from Entity Linking to Sentiment Analysis
Deep Tweets: from Entity Linking to Sentiment Analysis
 
Proposal final
Proposal finalProposal final
Proposal final
 
The Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer EcosystemThe Challenges of Affect Detection in the Social Programmer Ecosystem
The Challenges of Affect Detection in the Social Programmer Ecosystem
 
2 13
2 132 13
2 13
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
 
Ire project presentation
Ire project presentationIre project presentation
Ire project presentation
 
Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)Opinion Mining Tutorial (Sentiment Analysis)
Opinion Mining Tutorial (Sentiment Analysis)
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 

Ähnlich wie Major presentation

A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningIJSRD
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningIJSRD
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptxRishita Gupta
 
IRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment AnalysisIRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment AnalysisIRJET Journal
 
An Approach To Sentiment Analysis
An Approach To Sentiment AnalysisAn Approach To Sentiment Analysis
An Approach To Sentiment AnalysisSarah Morrow
 
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISFEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISmlaij
 
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...mathsjournal
 
A Brief Survey Paper on Sentiment Analysis.pdf
A Brief Survey Paper on Sentiment Analysis.pdfA Brief Survey Paper on Sentiment Analysis.pdf
A Brief Survey Paper on Sentiment Analysis.pdfJill Brown
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...IRJET Journal
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A SurveyLiz Adams
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfOmSatpathy
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 

Ähnlich wie Major presentation (20)

A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
A Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion MiningA Survey on Sentiment Analysis and Opinion Mining
A Survey on Sentiment Analysis and Opinion Mining
 
Twitter sentiment analysis.pptx
Twitter sentiment analysis.pptxTwitter sentiment analysis.pptx
Twitter sentiment analysis.pptx
 
IRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment AnalysisIRJET- A Survey on Graph based Approaches in Sentiment Analysis
IRJET- A Survey on Graph based Approaches in Sentiment Analysis
 
An Approach To Sentiment Analysis
An Approach To Sentiment AnalysisAn Approach To Sentiment Analysis
An Approach To Sentiment Analysis
 
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSISFEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
FEATURE SELECTION AND CLASSIFICATION APPROACH FOR SENTIMENT ANALYSIS
 
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
APPROXIMATE ANALYTICAL SOLUTION OF NON-LINEAR BOUSSINESQ EQUATION FOR THE UNS...
 
A Brief Survey Paper on Sentiment Analysis.pdf
A Brief Survey Paper on Sentiment Analysis.pdfA Brief Survey Paper on Sentiment Analysis.pdf
A Brief Survey Paper on Sentiment Analysis.pdf
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 
unit-5.pdf
unit-5.pdfunit-5.pdf
unit-5.pdf
 
A Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie ReviewsA Survey on Sentiment Categorization of Movie Reviews
A Survey on Sentiment Categorization of Movie Reviews
 
Brm unit iv - cheet sheet
Brm   unit iv - cheet sheetBrm   unit iv - cheet sheet
Brm unit iv - cheet sheet
 
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
Combining Lexicon based and Machine Learning based Methods for Twitter Sentim...
 
Analysis Levels And Techniques A Survey
Analysis Levels And Techniques   A SurveyAnalysis Levels And Techniques   A Survey
Analysis Levels And Techniques A Survey
 
NLP Ecosystem
NLP EcosystemNLP Ecosystem
NLP Ecosystem
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 

Kürzlich hochgeladen

Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 

Kürzlich hochgeladen (20)

Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 

Major presentation

  • 1.
  • 2.  It aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document.  The attitude may be his or her judgment or evaluation, affective state (that is, the emotional state of the author when writing), or the intended emotional communication (that is, the emotional effect the author wishes to have on the reader).
  • 3.  Determining document subjectivity: Often called subjectivity classification, this subtask determines whether a giving text is objective (expressing a fact) or subjective (expressing an opinion or emotion).  Determining document orientation: Often called sentiment classification or document-level sentiment classification, this subtask determines the polarity of a given subjective text. In other words, determines whether this text expresses a positive or a negative sentiment on its subject matter.  Determining the strength of document orientation: This subtask decides whether the positive sentiment expressed by a text on its subject matter is weakly positive, mildly positive or strongly positive.
  • 4.  Consumer information › Product reviews  Marketing › Consumer attitudes › Trends  Politics › Politicians want to know voters’ views › Voters want to know policitians’ stances and who else supports them  Social › Find like-minded individuals or communities
  • 5.  Machine learning › Naïve Bayes › Maximum Entropy Classifier › SVM  Unsupervised methods › K-means › Olsu’s Threshold › Fuzzy c-means
  • 6.
  • 7.
  • 8.  Data to annotate given.  But no training data or additional resources provided.  Aim:  To create a lexical resource in an automated way without any human intervention for annotating data.  Affective lexicon to be used for polarity classification.
  • 9.  To obtain training material, use emoticons as indicators of a mood within a message.  Split the tweets into 2 sets:  positive -  ;) :} :] ...  negative -  :{ :’( ...  We get a positive word list and a negative word list.  If a word present more frequently in positive set, then it is positive and vice versa.
  • 10.
  • 11.  Aim:  To analyse the effectiveness of various popular classifiers and identify the more suitable classifier for twitter that could ease the process of classifying sentiments in tweets.
  • 12.  Strategy: To use two or more classifiers chained one after the other. This resulted in a high yield, better accuracy of mined data. I
  • 13.  First stage: the incoming preprocessed data is classified into three categories – polar, neutral and irrelevant.  Second stage: the data classified under polar is fed to a second classifier for further segregation into positive and negative.
  • 14. The classification algorithms used in the research are:  Naive Bayes  Random Forest  Support Vector Machines(SVM)  SMO
  • 15.
  • 16.  The research has been performed on Tunisian user’s statuses on Facebook during the “Arabic Spring” era.  The aim is to extract useful information about user’s sentiments and behaviours during this sensitive and significant period.  For this purpose, a method based on Support Vector Machine(SVM) and Naive Bayes has been proposed.
  • 17.  The methodology used is collection of raw data, followed by lexicon development.  Three types of lexicons were created ; lexicon for social acronyms, lexicon for emoticons and lexicon for interjections.  Then data preprocessing is done – stop words removal and stemming, followed by feature extraction.  Finally, the machine learning algorithms are applied.  The performance of different feature sets using Naive Bayer (NB) and SVM classifiers was then compared.
  • 18.
  • 19.  This paper is concerned with the problem of mining social emotions from text. The aim of this research is to discover the connection between different social emotions and affective terms and based on it automatically predict social emotion of the text.  An official Chinese news portal has been used for the dataset collection.
  • 20.  The proposed solution is to construct a joint emotion-topic model. Latent Dirichlet Allocation (LDA) has been used with an additional layer for emotion modelling.  A three step process has been used for generation of affective terms:  The first step is to generate an emotion from a document specific emotional distribution.  The second step is to generate a latent topic from a Multinomial distribution.  The final step is to develop an approximate inference method based on Gibbs sampling.
  • 21.  As a complete generative model, the proposed model allows to infer a number of conditional probabilities for unseen documents. For example, probabilities of latent topics given an emotion and that of terms given a topic. This method was found to be better than emotion-term model and multiclass SVM as the emotion assignments at the term level could be visualised.
  • 22.
  • 23.  This paper proposes an aspect-based sentiment classification approach to analyze sentiments for tweets.  In previous studies, the overall sentiment of a tweet was determined. But this is not useful for the companies which need to monitor consumer opinion of their product/services. For them it would be more useful to have information as to which aspects of the product/service the users are happy or unhappy about.
  • 24.  The aspect-based sentiment classifier makes use of a POS tagger, a sentiment lexicon and a few gazetteer lists to produce results of the form [aspect, sentiment words, polarity]. This process consists of three main steps:  1. Aspect-sentiment extraction: Given a tweet, this step determines a list of possible aspect candidates along with their associated sentiments and polarity.  2. Aspect ranking and selection: A tweet can express many different opinions. Only important aspects should be selected. For example, when classifying tweets on a telecommunication company, some of the aspects of interest include customer service, 3G connectivity, speed, etc. In this step the aspect candidates are then ranked and the set of most significant aspects are selected as the expected aspects.  3. Aspect classification: Using the set of expected aspects and results from the aspect-sentiment extraction step, we obtain the final list of aspects along with their polarity for each tweet.
  • 25.  The experimental results suggested that a layered classification approach which uses the aspect-based classifier as the first layer classification and the tweet- level classifier as the second layer classification is more effective than a classifier trained using target-dependent features. This approach is able to consistently improve the performance of existing sentiment classifiers.
  • 26.
  • 27.  The aim of this research was to automatically extract the set of messages which contain opinions, filter out non-opinion messages and determine their sentiment directions, that is positive or negative.  Manually labelled data has been used as training data to build model.
  • 28.  The initial step is to preprocess the crawled tweets by removing usernames, hashtags, retweet tags, non-English words.  Three resources were constructed for the further preprocessing which included a stop word dictionary, an emotion dictionary and an acronym dictionary.  After preprocessing all words are transformed into the form (word, POS tag, English-word, Stop-word).  Thereafter, tweets containing opinions are extracted, filtering out the non-opinion tweets. Naive Bayes classifier is then used to classify the tweets based on sentiment.
  • 29.  Since a word may have different meanings in different domains, short text classification is done.  Two feature selection algorithms have been used for this purpose – Mutual Information (MI) and X Feature Selection. The short texts are classified into different domains, so that the classifier can automatically classify with greater performance the tweets as being either positive or negative.
  • 30.
  • 31.  The main objective of this research was to compare state-of-the-art Sentiment Analysis methods against a novel hybrid method.  The Hybrid method adopts a combination of both the supervised methods and unsupervised methods.  It utilizes a Sentiment lexicon to generate a new set of features to train a linear SVM classifier.
  • 32.  In this paper, domain based Twitter Sentiment Analysis is done. The domain considered is smartphones.  The Hybrid Polarity Detection System has three modules:  The first module is the Preprocessing Module in which cleaning of data is done. The preprocessing steps include removal of usernames, URL tags etc.  The second module is Sentiment Feature Generator Module. In this module slangs are replaced with their proper language equivalents Senti Strength lexicon is then used to tag the words with their sentiment score. Fourteen features are extracted from the text.  The third module is Machine Learning Classifier, in which a linear SVM takes the input feature set and classifies the tweets as positive or negative.
  • 33.
  • 34.  In this paper, the authors have provided a summary of the differential evolution algorithm and its improved measures in order to facilitate researchers studying the topic. Firstly the differential evolution algorithm basics and its various operations such as Mutation, Crossover and Selection have been explained. Thereafter, the different improvements directed to increase the optimization performance are compared. The efficiency of differential evolution is optimised using improvements making it a more efficient application. The improvement measures mainly include the evolution operation, parameter settings and other improvements, focussing mainly on the mutation operation.
  • 35.
  • 36.  Most traditional clustering algorithms simply assume that the number of clusters is given and focus on the quality of clustering results. This paper presents a clustering algorithm for clustering and automatically determining the number of clusters as well. The proposed algorithm has two steps. Firstly, a mechanism, region splitting and merging (RSM) to split and then merge the similar groups until a self adaptive threshold is reached. Secondly, the number of clusters fine tuned using automatic clustering differential evolution (ACDE).
  • 37.  Data Collection: Retrieval of twitter status updates  Lexicon development  Data Pre-Processing  Feature Extraction, Normalisation and Reduction  K-means Clustering  Differential Evolution
  • 38.
  • 39.  Casefolding.  Removal of:  unnecessary punctuations  extra blank spaces  retweet tag  usertags  URL’s  Hashtags  Removal of stopwords  Replacement of emoticons  Positive emoticons – EPOS  Negative emoticons – ENEG  Neutral emoticons – ENEUT  Replacement of sentiment words  Positive words – POS  Negative words – NEG  Replacement of negation and intensity words  Negation words – NEGATION  Intensity words -- INTENSITY
  • 40.  Feature Extraction  The feature extraction is the process of extracting the main characteristics of the text. For a machine learning algorithm to perform well, it is essential to have features that are descriptive of the text. The total number of occurrences of following features have been taken into account for each tweet:  Words  Exclamation marks (!)  EPOS keyword  ENEG keyword  ENEUT keyword  POS keyword  NEG keyword  NEGATION keyword  INTENSITY keyword  Random words (words left, which do not fall into any category)
  • 41.
  • 42.  The values of all the features are normalised to the range of 0 to 1. The normalised value is given by Normalised(e) = e - Emin Emax - Emin where, e - the original value Emax - the maximum value of the feature Emin - the minimum value of the feature
  • 43.
  • 44.  Feature reduction is done by computing cross correlation for the features. One among the features which are closely related is removed from the table.
  • 46. Findings  Through this project I have investigated the utility of sentiment classification on a collection of dataset.  While exploring the topic, I observed that there is a limited number of algorithms that are useful for twitter sentiment analysis.  The twitter statuses have unique characteristics compared to other corpuses. Since there is a limitation of 140 words, the usual data mining techniques used for movie reviews, etc can’t be used.  Also, not many research papers were available for feature reduction.
  • 47. Conclusion  This project has been a great learning experience in the field of information retrieval and data mining. In this project, twitter dataset was collected for the purpose of Sentiment analysis. Various data preprocessing techniques were applied on the dataset. Thereafter, features were extracted from each tweet and normalised. Feature reduction was then applied to remove one among the closely related features. The quality of features/ attributes that are extracted from the training dataset affects the performance of the technique. K-means clustering algorithm and Differential Evolution, an optimization algorithm was then applied to cluster data into two classes, positive and negative. Finally, the accuracies of these two algorithms was compared. On the basis of accuracies, it can be said that Differential Evolution performs better than K- Means Algorithm for Twitter dataset.
  • 48. Future Work  As future work, three more clustering techniques will be applied as part of unsupervised learning which include Olsu’s Threshold, Fuzzy c- means and EM algorithm. Next step would be to compare it with supervised learning methods including SVM, Naive Bayes and LDA. Accuracies of different algorithms will be calculated and compared.  Twitter dataset for a particular product will be collected and Opinion mining will be applied.