SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Sentiment Analysis and
Opinion Mining
Introduction
 Sentiment analysis or opinion mining is the computational study of people's
opinions, appraisals, attitudes, and emotions toward entities, individuals,
issues, events, topics and their attributes.
 The task is technically challenging and practically very useful.
– Businesses want to find public or consumer opinions about their products and services.
– Potential customers want to know the opinions of existing users before they use a
service or purchase a product.
 With user generated content on social media (i.e., reviews, forum discussions,
blogs and social networks) on the Web, individuals and organizations are
increasingly using public opinions for their decision making.
Need for Automated Sentiment Analysis
 Finding and monitoring opinion sites on the Web and distilling the information in
them remains a formidable task because of the proliferation of diverse sites.
 Each site typically contains a huge volume of opinionated text that is not always
easily deciphered in long forum postings and blogs.
 The average human reader will have difficulty identifying relevant sites and
accurately summarizing the information and opinions contained in them.
 Human analysis of text information is subject to considerable biases, e.g., people
often pay greater attention to opinions consistent with their own preferences. People
have difficulty, in producing consistent results when the amount of information to
be processed is large.
 Automated opinion mining and summarization systems are needed, as subjective
biases and mental limitations can be overcome with an objective sentiment analysis
system.
Levels of Analysis
 Sentiment analysis is carried out at three levels:
 Document level: The task at this level is to classify whether a whole
opinion document expresses a positive or negative sentiment
– Given a product review, the system determines whether the review expresses an
overall positive or negative opinion about the product. This task is commonly known
as document-level sentiment classification.
 This level of analysis assumes that each document expresses opinions on a
single entity (e.g., a single product).
 It is not applicable to documents which evaluate or compare multiple
entities.
Levels of Analysis
 Sentence level: The analysis goes to the sentences and determines
whether each sentence expressed a positive, negative, or neutral opinion.
 Neutral usually means no opinion. This level of analysis is closely related
to subjectivity classification, which distinguishes sentences that express
factual information from sentences (called objective sentences) Vs. that
express subjective views and opinions(called subjective sentences).
 Subjectivity is not equivalent to sentiment as many objective sentences can
imply opinions, e.g., “We bought the car last month and the windshield
wiper has fallen off.”
 Analysis is done at clause level but the clause level is still not enough, e.g.,
“Apple is doing very well in this lousy economy.”
Levels of Analysis
 Entity or Aspect Level: Document level and the sentence level analyses do
not discover what exactly people liked and did not like (Also called Feature
Level)
 Instead of looking at language constructs (documents, paragraphs, sentences,
clauses or phrases), aspect level directly looks at the opinion itself. Idea is
that an opinion consists of a sentiment (positive or negative) and a target
 An opinion without its target being identified is of limited use. Realizing the
importance of opinion targets also helps to understand the sentiment analysis
problem better.
– Example: “although the service is not that great, I still love this restaurant”
– Has a positive tone, but cannot say it is entirely positive. In fact, the sentence is positive
about the restaurant (emphasized), but negative about its service (not emphasized).
Sentiment Lexicon
 The most important indicators of sentiments are sentiment words, or
opinion words. These are words that are commonly used to express positive
or negative sentiments.
 For example, good, wonderful, and amazing are positive sentiment words,
and bad, poor, and terrible are negative sentiment words.
 There are also phrases and idioms, e.g., cost someone an arm and a leg.
 Sentiment words and phrases are instrumental to sentiment analysis.
 A list of such words and phrases is called a sentiment lexicon (or opinion
lexicon).
 Over the years, researchers have designed numerous algorithms to compile
such lexicons
Issues
 A positive or negative sentiment word may have opposite
orientations in different application domains.
 For example, “suck” usually indicates negative sentiment, e.g.,
“This battery sucks,” but it can also imply positive sentiment, e.g.,
“This vacuum cleaner really sucks (dirt).”
 Sarcastic sentences with or without sentiment words are hard to
deal with, e.g., “What a great car! It stopped working in two days.”
 Sarcasms are not very common in consumer reviews about
products and services, but are very common in other places, eg.
political discussions,
Issues
 A sentence containing sentiment words may not express any
sentiment.
 This happens in Question (interrogative) sentences and conditional
sentences e.g., “Can you tell me which Sony camera is good?” and “If I can
find a good camera in the shop, I will buy it.”
 These sentences contain the sentiment word “good”, but does not
express a positive or negative opinion on any specific camera.
 Not all conditional sentences or interrogative sentences express no
sentiments, e.g., “Does anyone know how to repair this terrible
printer”
Issues
 Many sentences without sentiment words can also imply opinions.
 Many of these are objective sentences that are used to express
some factual information.
 “This washer uses a lot of water” implies a negative sentiment
about the washer since it uses a lot of resource (water).
 “After sleeping on the mattress for two days, a valley has formed
in the middle” expresses a negative opinion about the mattress.
 These sentences are objective as it states a fact. They have no
sentiment words.
Issues
 “(1) I bought an iPhone a few days ago. (2) It was such a nice phone. (3) The touch
screen was really cool. (4) The voice quality was clear too. (5) However, my mother was
mad with me as I did not tell her before I bought it. (6) She also thought the phone was
too expensive, and wanted me to return it to the shop …”
 Sentences (2), (3) and (4) express some positive opinions
 sentences (5) and (6) express negative opinions or emotions
 Targets
– The target of the opinion in sentence (2) is the iPhone as a whole, and the targets of the opinions in
sentences (3) and (4) are touch screen" and voice quality“
– sentence (6) is the price of the iPhone
– target of the opinion/emotion in sentence (5) is me", not iPhone
 Holder of the opinions in sentences (2), (3) and (4) is the author of the review, but in
sentences (5) and (6) it is “my mother”.
Entity: Definition
 An entity e is a product, service, person, event, organization, or
topic. It is associated with a pair, e : (T;W), where T is a hierarchy
of components (or parts), sub-components, and so on, and W is a
set of attributes of e. Each component or sub-component also has
its own set of attributes.
 Samsung Galaxy is an entity. It has a set of components,
– battery and screen,
– It has a set of attributes, voice quality, size, and weight.
– The battery has its own set of attributes, e.g., battery life, and battery size
Entity and Attributes
 Entity is represented as a tree or hierarchy. The root of the tree is
the name of the entity.
 Each non-root node is a component or sub-component of the
entity.
 Each link is a part-of relation.
 Each node is associated with a set of attributes.
 An opinion can be expressed on any node and any attribute of the
node.
 Both components and attributes are combined and called
“Aspects”
Opinion
 An opinion is a positive or negative sentiment, attitude, emotion or
appraisal about an entity or an aspect of the entity from an opinion holder.
Positive, negative and neutral are called opinion orientations (also called
sentiment orientations, semantic orientations, or polarities).
 An opinion is a quintuple, (ei; aij ; ooijkl; hk; tl), where ei is the entity, aij is an
aspect j of ei, ooijkl is the orientation of the opinion about aspect aij, hk is the
opinion holder, and tl is the time when the opinion is expressed by hk.
– ooijkl can be positive, negative or neutral, with different strength/intensity
levels
– quintuple can be regarded as a schema of a database table for analysis
Opinion mining
 Objective : Given a collection of opinion documents D, discover all
opinion quintuples (ei; aij ; ooijkl; hk; tl) in D.
1. Extract all entity expressions in D, and group synonymous entity expressions
into entity clusters. Each entity expression cluster indicates a unique entity ei.
2. Extract all aspect expressions of the entities, and group aspect expressions into
clusters. Each aspect expression cluster of entity ei indicates a unique aspect aij
3. Extract opinion holder and time information from the text or unstructured data.
4. Determine whether each opinion on an aspect is positive, negative or neutral.
5. Produce all opinion quintuples (ei; aij ; ooijkl; hk; tl) expressed in D based on the
results of the above
Example of Extraction
 bigXyz on Nov-4-2010:(1) I bought a Motorola phone and my girlfriend
bought a Nokia phone yesterday. (2) We called each other when we got
home. (3) The voice of my Moto phone was unclear, but the camera was
good. (4) My girlfriend was quite happy with her phone, and its sound
quality. (5) I want a phone with good voice quality. (6) So I probably will
not keep it.
 QUINTIPLES
 (Motorola, voice quality, negative, bigXyz, Nov-4-2010)
 (Motorola, camera, positive, bigXyz, Nov-4-2010)
 (Nokia, GENERAL, positive, bigXyz's girlfriend, Nov-4-2010)
 (Nokia, voice quality, positive, bigXyz's girlfriend, Nov-4-2010)
Two more Definitions
 An objective sentence (sentence 1&2) presents some factual
information about the world, while a subjective sentence expresses
some personal feelings, views or beliefs.
– Subjective expressions come in many forms, e.g., opinions, allegations,
desires, beliefs, suspicions, and speculations.
– A subjective sentence may not contain an opinion (Sentence 5)
– Not every objective sentence contains no opinion. “the earphone broke in
two days", is an objective sentence but it implies a negative sentiment.
Two more Definitions
 Emotions are our subjective feelings and thoughts
– There are 6 primary emotions, i.e., love, joy, surprise, anger, sadness,
and fear, which can be sub-divided into many secondary and tertiary
emotions. Each emotion can also have different intensities.
– The concepts of emotions and opinions are not equivalent.
– Many opinion sentences express no emotion (e.g., “the voice of this
phone is clear”), which are called rational evaluation sentences
– Many emotion sentences give no opinion, (e.g., “I am so surprised to see
you”)
Document Sentiment Classification
 Given an opinionated document d evaluating an entity e, determine
the opinion orientation oo on e, i.e., determine oo on aspect
GENERAL in the quintuple (e;GENERAL; oo; h; t). e, h, and t are
assumed known or irrelevant.
– Also known as the document-level sentiment classification
– Sentiment classification assumes that the opinion document d (e.g., a product
review) expresses opinions on a single entity e and the opinions are from a
single opinion holder h.
– This assumption holds for customer reviews of products and services
because each such review usually focuses on a single product and is written
by a single reviewer.
Classification based on Supervised Learning
 Three classes, positive, negative and neutral.
 Since each review already has a reviewer-assigned rating (e.g., 1-5
stars), training and testing data are readily available.
– A review with 4 or 5 stars is a positive review, a review with 1 or 2 stars
is a negative review and a review with 3 stars is a neutral review.
– Naïve Bayesian classification, and support vector machines (SVM).
– It was shown that using unigrams (a bag of individual words) as features
in classification performed well with either naive Bayesian or SVM.
Feature set for Classification
 Terms and their frequency: individual words or word n-grams and their
frequency counts.
– word positions may also be important.
– TF-IDF weighting scheme.
 Opinion words and phrases: Used to express positive or negative sentiments.
– beautiful, wonderful, good, and amazing are positive opinion words, and bad, poor,
and terrible are negative
– Many opinion words are adjectives and adverbs. Nouns (rubbish, junk, and crap) and
verbs (hate and like) can also indicate opinions.
– There are also opinion phrases and idioms, cost someone an arm and a leg. Opinion
words and phrases are instrumental to sentiment analysis
Feature set for Classification
 Part of speech: adjectives are important indicators of opinions and treated
as special features.
 Negations: Negation words are important because their appearances often
change the opinion orientation.
– “I don't like this camera” is negative.
– Negation words must be handled with care because not all occurrences of such words
mean negation.
– “not” in “not only but also” does not change the orientation direction
 Syntactic dependency: Word dependency based features generated from
parsing or dependency trees
Feature set for Classification
 Manually labeling training data can be time-consuming and label intensive.
 Opinion words can be utilized in the training process.
 Tan et al. used opinion words to label a portion of informative examples
and then learn a new supervised classifier based on labeled ones.
 Opinion words can be utilized to increase the sentiment classification
accuracy.
 Regression can be used for predicting Rating scores (e.g., 1-5 stars)
– the rating scores are ordinal!
 Domain Specificity: A classifier trained using one domain often performs
poorly when it is applied or tested on another domain.
Classification – Unsupervised Learning
 Three Step Process
 Step 1:
 Phrases containing adjectives or adverbs are extracted as
adjectives and adverbs are good indicators of opinions.
– Context is important. “unpredictable" breaking distance of car vs.
“unpredictable” ending of the mystery movie
 The algorithm extracts two consecutive words, where one
member of the pair is an adjective or adverb, and the other
is a context word
Classification – Unsupervised Learning
 Step 2: Estimate the semantic orientation of the extracted phrases using the point-wise
mutual information (PMI) measure
𝑃𝑀𝐼(𝑡1, 𝑡2 ) = 𝑙𝑜𝑔2
𝑃(𝑡1 ∩ 𝑡2)
𝑃 𝑡1 . 𝑃(𝑡2)
 PMI is a measure of the degree of statistical dependence between t1 and t2 and log of this
ratio is the amount of information that we acquire about the presence of one of the words
when we observe the other
 The semantic/opinion orientation (SO) of a phrase is computed based on its association
with the positive reference word “excellent” and its association with the negative
reference word “poor”
SO(Phrase)=PMI(Phrase, “Excellent”) – PMI(Phrase, “Poor”)
 The probabilities are calculated by issuing queries and collecting the number of hits.
Searching the two terms together and separately, we can estimate the probabilities
Classification – Unsupervised Learning
 Step 3: The algorithm computes the average SO of all phrases in a review,
and classifies the review as recommended if the average SO is positive
– Final classification accuracies on reviews from various domains range from 84% for
automobile reviews to 66% for movie reviews.
 Advantage of document level sentiment classification: it provides a
prevailing opinion on an entity, topic or event.
 Shortcomings:
– It does not give details on what people liked and/or disliked and
– It is not easily applicable to non-reviews, e.g., forum and blog postings, because many
such postings evaluate multiple entities and compare them.
Sentence-level Sentiment Classification.
 Document-level sentiment classification techniques can also be applied to
individual sentences.
 The task of classifying a sentence as subjective or objective is often called
subjectivity classification
 The resulting subjective sentences are also classified as expressing positive or
negative opinions
 1. Subjectivity classification: Determine whether s is a subjective sentence or
an objective sentence
 2. Sentence-level sentiment classification: If s is subjective, determine whether
it expresses a positive, negative or neutral opinion.
Assumption
 The sentence expresses a single opinion from a single opinion
holder.
 This assumption is only appropriate for simple sentences with a
single opinion,
– “The picture quality of this camera is amazing.”
 Compound and complex sentences, a single sentence may express
more than one opinion.
– “The picture quality of this camera is amazing and so is the battery life,
but the view finder is too small for such a great camera"
Opinion Lexicon Expansion
 Opinion words: also known as opinion-bearing words or sentiment words.
 Positive opinion words are used to express some desired states while
negative opinion words are used to express some undesired states.
– beautiful, wonderful, good, and amazing.
– bad, poor, and terrible.
 There are also opinion phrases and idioms: “Cost someone an arm and a leg”.
 Collectively, they are called the opinion lexicon. Used for opinion mining.
 Three Approaches: Manual, Dictionary-based, and Corpus-based.
– The manual approach is time-consuming and not usually used alone, but combined
with automated approaches as the check because automated methods make mistakes.
Dictionary based approach
 Bootstrapping using a small set of seed opinion words and an online
dictionary, e.g., WordNet.
 The strategy is to first collect a small set of opinion words manually with
known orientations, and then to grow this set by searching for their
synonyms and antonyms.
 The newly found words are added to the seed list and the next iteration
starts. The iterative process stops when no more new words are found.
 After the process completes, manual inspection can be carried out to remove
and/or correct errors.
Dictionary based approach
 Shortcoming: The approach is unable to find opinion words
with domain and context specific orientations, which is
quite common.
– For example, for a speaker phone, if it is quiet, it is usually
negative. However, for a car, if it is quiet, it is positive.
 The corpus-based approach can help deal with this problem.
Corpus-based approach
 The methods rely on syntactic or co-occurrence patterns and also a seed list of
opinion words to find other opinion words in a large corpus
 The technique starts with a list of seed opinion adjectives, and uses them and a
set of linguistic constraints or conventions on connectives to identify
additional adjective opinion words and their orientations.
 Conjunction “AND”: conjoined adjectives usually have the same orientation.
– “This car is beautiful and spacious”
– "This car is beautiful and difficult to drive“ (AND Conjunction is not usually used)
 Rules or constraints are also designed for other connectives, OR, BUT,
EITHER-OR, and NEITHER-NOR.
 This idea is called sentiment consistency
Corpus-based approach
 Learning is applied to a large corpus to determine if two conjoined
adjectives are of the same or different orientations.
 Same and different-orientation links between adjectives are formed
 Clustering is performed on these to produce two sets of words: positive and
negative.
 Inter-sentential consistency is the idea to neighboring sentences.
 The same opinion orientation (positive or negative) is usually expressed in a
few consecutive sentences.
 Opinion changes are indicated by adversative expressions such as “but” and
“however”.
Corpus-based approach
 Different orientations in different contexts even in the same domain.
– Digital camera: “The battery life is long (+)” ;
– “The time taken to auto-focus is long" (-).
 Consider both possible opinion words and aspects together, and use
 the pair (aspect, opinion word) as the opinion context, (battery life", long").
 This determines opinion words and their orientations together with the
aspects that they modify.
 Can be used to analyze comparative sentences.
 Many contexts can be more complex, consuming a large amount of
resources.
Aspect-Based Sentiment Analysis
 In a typical opinionated document, the author writes both positive and
negative aspects of the entity, although the general sentiment on the entity
may be positive or negative. Document and sentence sentiment
classification does not provide such information.
 Aspect-based sentiment analysis needs to be used
 At the aspect level, the mining objective is to discover every quintuple (ei;
aij ; ooijkl; hk; tl) in a given document d.
 To achieve the objective, five tasks need to be performed.
Aspect extraction
 Extract aspects that have been evaluated.
– “The picture quality of this camera is amazing,” the aspect is “picture
quality" of the entity represented by “this camera”. The evaluation is not
about the camera as a whole, but about its picture quality.
– The sentence “I love this camera” evaluates the camera as a whole, i.e.,
the GENERAL aspect of the entity represented by “this camera”.
 Whenever we talk about an aspect, we must know which entity it
belongs to.
 It is a Two-step Process
Aspect extraction
 1. Find frequent nouns and noun phrases.
 Nouns and noun phrases (or groups) are identified by a POS tagger; the
frequencies are counted; and only the frequent ones are kept.
 A frequency threshold can be decided experimentally.
 When people comment on different aspects of a product, the vocabulary
that they use usually converges. The nouns that are frequently talked about
are usually genuine and important aspects.
 Irrelevant contents in reviews are often diverse, i.e., they are quite different
in different reviews. These are infrequent nouns
Aspect extraction
 2. Find infrequent aspects by exploiting the relationships between
aspects and opinion words.
– The previous step can miss many genuine aspect expressions which are infrequent.
This step tries to find some of them.
 The same opinion word can be used to describe or modify different
aspects. Opinion words that modify frequent aspects can also
modify infrequent aspects, and thus can be used to extract
infrequent aspects.
– For example, “picture” has been found to be a frequent aspect, and we have the
sentence, “The pictures are absolutely amazing.”
– “software“ can also be extracted as an aspect from the following sentence, “The
software is amazing.”
Aspect extraction
 Point-wise mutual information (PMI) score between the phrase and some
meronymy discriminators associated with the product class can be used.
 The meronymy discriminators for the “scanner” class are, “of scanner”,
“scanner has”, “scanner comes with”, etc., which are used
 To find components or parts of scanners by searching the Web.
 𝑃𝑀𝐼 𝑎, 𝑑 =
ℎ𝑖𝑡𝑠(𝑎∩𝑑)
ℎ𝑖𝑡𝑠 𝑎 .ℎ𝑖𝑡𝑠(𝑑)
 If the PMI value of a candidate aspect is too low, it may not be a
component of the product because a and d do not co-occur frequently.
Aspect sentiment classification
 Determine whether the opinions on different aspects are positive,
negative or neutral. In the first example below, the opinion on the
“picture quality" aspect is positive, and in the second example, the
opinion on the GENERAL aspect is also positive.
 “The picture quality of this camera is amazing," the aspect is “picture
quality" of the entity represented by “this camera". does not indicate the
GENERAL aspect because the evaluation is not about the camera as a
whole, but about its picture quality.
 “I love this camera" evaluates the camera as a whole, i.e., the GENERAL
aspect of the entity represented by “this camera".
Lexicon-based Approach
 Uses an opinion lexicon, - a list of opinion words and phrases, and a
set of rules to determine the orientations of opinions in a sentence
 It also considers opinion shifters and “but-clauses”.
 4 steps
 1. Mark opinion words and phrases: Given a sentence that contains
one or more aspects, this step marks all opinion words and phrases in
the sentence.
– Each positive word is assigned the opinion score of +1, each negative word is
assigned the opinion score of -1
Lexicon-based Approach
 2. Handle opinion shifters: Opinion shifters are words and phrases that can
shift or change opinion orientations.
– Negation words like not, never, none, nobody, nowhere, neither and cannot are the
most common type.
 Sarcasm changes orientation
– “What a great car, it failed to start the first day.”
 Spotting them and handling them correctly in actual sentences by an
automated system is not easy.
 Not every appearance of an opinion shifter changes the opinion orientation
– “not only … but also”
Lexicon-based Approach
 3. Handle but-clauses:
 In English, but means contrary.
 A sentence containing but is handled by applying the following rule:
– The opinion orientation before but and after but are opposite to each other if
the opinion on one side cannot be determined.
 “not only but also” (needs to be handled separately).
 There are contrary words and phrases that do not always indicate an
opinion change
– “Audi is great, but Mercedes is better".
 Such cases need to be identified and dealt with separately.
Lexicon-based Approach
 3. Aggregating opinions: This step applies an opinion aggregation function to
the resulting opinion scores to determine the final orientation of the opinion
on each aspect in the sentence.
 Consider a sentence S, which contains a set of aspects {a1 … am} and a set of
opinion words or phrases {ow1 : : : own} with their opinion scores. The
opinion orientation for each aspect ai in S is
 𝑆𝑐𝑜𝑟𝑒 𝑎𝑖, 𝑆 = 𝑜𝑤 𝑗∈𝑆
𝑜𝑤 𝑗.𝑜𝑜
𝐷𝑖𝑠𝑡(𝑜𝑤 𝑗,𝑎 𝑖)
– where owj is an opinion word/phrase in s, dist (owj ; ai) is the distance between aspect
ai and opinion word owj in S.
– owj.oo is the opinion score of owj. Gives lower weights to opinion words that are far
away from aspect ai.
Simultaneous Opinion Lexicon Expansion
and Aspect Extraction
 Needs an initial set of opinion word seeds as the input (no seed aspects)
 Opinions almost always have targets and there are natural relations
connecting opinion words and targets in a sentence
– Opinion words have relations among themselves and so do targets among themselves.
 The opinion targets are aspects. Opinion words can be recognized by
identified aspects, and aspects can be identified by known opinion words.
– The extracted opinion words and aspects are utilized to identify new opinion words
and new aspects, which are used again to extract more opinion words and aspects.
– Propagation stops when no more opinion words or aspects can be found.
Dependency grammar
 Dependency grammar was adopted to describe the relations. The Algorithm
uses only direct dependencies to model the relations.
– A direct dependency indicates that one word depends on the other word without any
additional words in their dependency path or they both depend on a third word
directly.
 Some constraints are also imposed. Opinion words are considered to be
adjectives and aspects nouns or noun phrases.
– “Canon G3 produces great pictures”, the adjective “great” is parsed as directly
depending on the noun “pictures". “great" is an opinion word and given the rule `a
noun on which an opinion word directly depends is taken as an aspect', we can extract
“pictures” as an aspect. Similarly, “pictures” is an aspect, “great” as an opinion word
using a similar rule.
Mining Comparative Opinions
 A comparative sentence expresses a relation based on similarities or
differences of more than one entity.
– The comparison is usually conveyed using the comparative or superlative form of an
adjective or adverb.
 A comparative sentence typically states that one entity has more or less of a
certain attribute than another entity.
– A superlative sentence states that one entity has the most or least of a certain attribute
among a set of similar entities.
 A comparison can be between two or more entities, groups of entities, and
one entity and the rest of the entities. It can also be between versions.
Types of Comparatives and Superlatives
 Comparatives are usually formed by adding the suffix “-er” and superlatives
are formed by adding the suffix “-est” to their base adjectives and adverbs.
– “longer” in “The battery life of Camera-x is longer than that of Camera-y”, longest“ in
“The battery life of this camera is the longest",
– This type of comparatives and superlatives are called Type 1
 Some adjectives and adverbs form comparatives or superlatives by using
words like more, most, less and least before such words (more beautiful)
– These are type 2. Types 1 and 2 are called regular comparatives and superlatives
 Irregular comparatives and superlatives, i.e., more, less, least, better, best,
– Grouped under Type 1 (based on the behavior)
 Words like “superior”, “preferred” are also grouped under Type 1
Types of comparative relations
 Four types
 1. Non-equal gradable comparisons: Type “greater or less than” that express
an ordering of some entities with regard to some of their shared aspects
– “The Intel chip is faster than that of AMD”. “I prefer Intel to AMD”.
 2. Equative comparisons: Type equal to that state two or more entities are
equal with regard to some of their shared aspects
– “The performance of Samsung is about the same as that of LG.”
 3. Superlative comparisons: type greater or less than all others that rank one
entity over all others,
– “The Intel chip is the fastest”.
Types of comparative relations
 Comparative words used in non-equal gradable comparisons are categorized
into two groups according to whether they express increased or decreased
quantities, which are useful in opinion analysis.
– Increasing comparatives: Such a comparative expresses an increased quantity, e.g.,
more and longer.
– Decreasing comparatives: Such a comparative expresses a decreased quantity, e.g.,
less and fewer.
Types of comparative relations
 4. Non-gradable comparisons: Relations that compare aspects of two or more
entities, but do not grade them.
 There are three main sub-types:
 Entity A is similar to or different from entity B with regard to some of their
shared aspects, “Coke tastes differently from Pepsi.”
 Entity A has aspect a1, and entity B has aspect a2 (They are usually
substitutable), “Desktop PCs use external speakers but laptops use internal
speakers.”
 Entity A has aspect a, but entity B does not have, e.g., “Phone-x has an
earphone, but Phone-y does not have.”
Objective of mining comparative opinions
 Given a collection of opinionated documents D,
– discover in D all comparative opinion sextuples of the form
(E1;E2; A; PE; h; t)
– where E1 and E2 are the entity sets being compared based on their
shared aspects A
– Entities in E1 appear before entities in E2 in the sentence),
– PE(∈ {E1;E2}) is the preferred entity set of the opinion holder h,
– t is the time when the comparative opinion is expressed.
 These sextuples can be mined
Example
 “Ipad's display is better than those of Galaxy and Surface." written by Vish
in Feb 2016.
 The extracted comparative opinion is:
– ({Ipad}, {Galaxy, Surface}, {display}, preferred: {Ipad}, John, Feb 2016)
 The entity set E1 is {Ipad}, the entity set E2 is {Galaxy, Surface},
 Their shared aspect set A being compared is {display},
 The preferred entity set is {Ipad},
 The opinion holder h is John
 The time t when this comparative opinion was written is Feb 2016.
Case: Sentiment Analysis-Hybrid
Approach
 Combined rule-based classification, supervised learning and
machine learning to form a hybrid method.
 Tested on movie reviews, product reviews and MySpace comments.
 Hybrid classification can improve the classification effectiveness in
terms of micro- and macro-averaged F1.
 F1 is a measure that takes both the precision and recall of a
classifier’s effectiveness into account
Evaluation Metrics
 Precision(P) =
tp
tp+f p
; Recall(R) =
tp
tp+fn
;
 Accuracy(A) =
tp+tn
tp+tn+f p+fn
; F1 =
2·P·R
P+R
Machine says yes Machine says no
human says yes tp fn
human says no fp tn
Evaluation Metrics
 1. Micro averaging.
– Given a set of confusion tables, a new two-by-two contingency table is generated.
– Each cell in the new table represents the sum of the number of documents from
within the set of tables.
– Given the new table, the average performance of an automatic classifier, in terms of
its precision and recall, is measured.
 2. Macro averaging.
– Given a set of confusion tables, a set of values are generated.
– Each value represents the precision or recall of an automatic classifier
– Given these values, the average performance of an automatic classifier, in terms of its
precision and recall, is measured
Rule Based Classification
 A rule consists of an antecedent and its associated consequent that have an
‘if-then ’relation: antecedent  consequent
– An antecedent is a condition: one or more tokens concatenated by the ^ operator.
– A token can be a word, ‘?’ representing a proper noun, or ‘#’ representing a target term.
– A target term is a term that represents the context in which a set of documents occurs,
such as the name of a politician, a policy recommendation, a company name, a brand of
a product or a movie title.
 A consequent represents a sentiment that is either positive or negative, and is
the result of meeting the condition defined by the antecedent.
– {token1 ^ token2 ^ . . . ^ tokenn} =) {+|−}
+ is positive sentiment; - is negative sentiment
Comparative Statements
 1. Laptop-A is more expensive than Laptop-B.
 2. Laptop-A is more expensive than Laptop-C.
 Target word of these sentences is Laptop-A. The rule derived is:
– {# ^ more ^ expensive ^ than^?} =) {−}
– The target word, Laptop-A is less favorable than the other two laptops due to its
price. Focus is on the price attribute of the Laptop-A.
 Target words are Laptop-B and Laptop-C. The rule derived is:
– {? ^ more ^ expensive ^ than ^ #} =) {+}
– The two target words, Laptop-B and Laptop-C are more favorable than the Laptop-A
due to its price. Focus is on the price attribute of both the Laptop-B and Laptop-C.
 Target word is crucial factor in determining the sentiment of an antecedent
General Inquirer Based Classifier (GIBC)
 The first, simplest rule set was based on 3672 pre-classified words
found in the General Inquirer Lexicon (Stone et al. 1966),
 1598 of which were pre-classified as positive and 2074 of which
were pre-classified as negative.
 Here, each rule depends solely on one sentiment bearing word
representing an antecedent.
 A General Inquirer Based Classifier (GIBC) was implemented which
applied the rule set to classify document collections.
Calculation of “Closeness”
 1. Select 120 positive words, such as amazing, awesome, beautiful, and 120 negative
words, such as absurd, angry, anguish, from the General Inquirer Lexicon.
 2. Compose 240 search engine queries per antecedent; each query combines an antecedent
and a sentiment bearing word.
 3. Collect the hit counts of all queries by using the Google and Yahoo search engines. Two
search engines were used to determine whether the hit counts were influenced by the
coverage and accuracy level of a single search engine. For each query, the search engines
return the hit count of a number of Web pages that contains both the antecedent and a
sentiment bearing word. The proximity of the antecedent and word is at the page level.
 A better level of precision may be obtained if the proximity checking can be carried out at
the sentence level.
 This would lead to an ethical issue, however, because each page has to be downloaded and
stored locally for further analysis.
Calculation of “Closeness”
 4. Collect the hit counts of each sentiment-bearing word and each antecedent.
 5. Use 4 closeness measures to measure the closeness between each antecedent
and 120 positive words (S+) and between each antecedent and 120 negative
words (S−) based on all the hit counts collected.
 𝑆+
= 𝑖=1
120
𝑐𝑙𝑜𝑠𝑒𝑛𝑒𝑠𝑠 (𝑎𝑛𝑡𝑖𝑐𝑖𝑑𝑒𝑛𝑡, 𝑤𝑜𝑟𝑑𝑖
+
)
 𝑆−
= 𝑖=1
120
𝑐𝑙𝑜𝑠𝑒𝑛𝑒𝑠𝑠 (𝑎𝑛𝑡𝑖𝑐𝑖𝑑𝑒𝑛𝑡, 𝑤𝑜𝑟𝑑𝑖
−
)
 If the antecedent co-occurs more frequently with the 120 positive words (S+ >
S−), then this would mean that the antecedent has a positive consequent and
vice versa.
Measures of Closeness
 Document Frequency (DF). counts the number of Web pages containing a pair
of an antecedent and a sentiment bearing word, i.e., the hit count returned by a
search engine. The larger a DF value, the greater the association strength
between antecedent and word.
 The other measures of closeness are
 Mutual Information (MI) = 𝑙𝑜𝑔2
𝑃 𝑤𝑜𝑟𝑑,𝑎𝑛𝑡𝑒𝑐𝑒𝑛𝑑𝑒𝑛𝑡
𝑃 𝑤𝑜𝑟𝑑 .𝑃(𝐴𝑛𝑡𝑒𝑐𝑒𝑑𝑒𝑛𝑡)
 Chi-Square
 Log Likelihood Ratio
Classifiers Used
 General Inquirer Based Classifier (GIBC)
 Rule-Based Classifier (RBC)
 Statistics Based Classifier (SBC)
 Mutual Information (MI).
 Chi-square (χ2)
 Induction Rule Based Classifier (IRBC)
 Support Vector Machines
Sample Datasets
The evaluation results of the closeness measures – Based on Search Engine
 Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

Basics of Electric circuit theory
Basics of Electric circuit theoryBasics of Electric circuit theory
Basics of Electric circuit theoryMohsin Mulla
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Beat Signer
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applicationsSubrat Swain
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patternsKrish_ver2
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial Salah Amean
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 

Was ist angesagt? (15)

Basics of Electric circuit theory
Basics of Electric circuit theoryBasics of Electric circuit theory
Basics of Electric circuit theory
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
 
dbms notes.ppt
dbms notes.pptdbms notes.ppt
dbms notes.ppt
 
Fundamentals of data mining and its applications
Fundamentals of data mining and its applicationsFundamentals of data mining and its applications
Fundamentals of data mining and its applications
 
Datawarehouse and OLAP
Datawarehouse and OLAPDatawarehouse and OLAP
Datawarehouse and OLAP
 
5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
Graph theory
Graph theoryGraph theory
Graph theory
 
Text mining
Text miningText mining
Text mining
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
5desc
5desc5desc
5desc
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
OLAP technology
OLAP technologyOLAP technology
OLAP technology
 

Ähnlich wie Sentiment analysis and opinion mining

SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
 
Business intelligence analytics using sentiment analysis-a survey
Business intelligence analytics using sentiment analysis-a surveyBusiness intelligence analytics using sentiment analysis-a survey
Business intelligence analytics using sentiment analysis-a surveyIJECEIAES
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Reviewiosrjce
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIERA NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIERIRJET Journal
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...CITE
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptssuser059331
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptssuser059331
 
Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...
Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...
Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...idescitation
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveyIJERA Editor
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...eSAT Publishing House
 
Frame-based Sentiment Analysis with Sentilo
Frame-based Sentiment Analysis with SentiloFrame-based Sentiment Analysis with Sentilo
Frame-based Sentiment Analysis with SentiloValentina Presutti
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Dr. Amarjeet Singh
 
An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.IJSRD
 

Ähnlich wie Sentiment analysis and opinion mining (20)

Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Business intelligence analytics using sentiment analysis-a survey
Business intelligence analytics using sentiment analysis-a surveyBusiness intelligence analytics using sentiment analysis-a survey
Business intelligence analytics using sentiment analysis-a survey
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
Sentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A ReviewSentiment of Sentence in Tweets: A Review
Sentiment of Sentence in Tweets: A Review
 
W01761157162
W01761157162W01761157162
W01761157162
 
H046025258
H046025258H046025258
H046025258
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIERA NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
 
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
Multiple Methods and Techniques in Analyzing Computer-Supported Collaborative...
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
 
Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...
Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...
Design of Automated Sentiment or Opinion Discovery System to Enhance Its Perf...
 
Web Opinion Mining
Web Opinion MiningWeb Opinion Mining
Web Opinion Mining
 
Sentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A SurveySentiment Analysis Using Hybrid Approach: A Survey
Sentiment Analysis Using Hybrid Approach: A Survey
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...
 
Frame-based Sentiment Analysis with Sentilo
Frame-based Sentiment Analysis with SentiloFrame-based Sentiment Analysis with Sentilo
Frame-based Sentiment Analysis with Sentilo
 
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
Fake Product Review Monitoring & Removal and Sentiment Analysis of Genuine Re...
 
An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.An Improved sentiment classification for objective word.
An Improved sentiment classification for objective word.
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 

Mehr von Sumit Sony

Web structure mining
Web structure miningWeb structure mining
Web structure miningSumit Sony
 
Web content mining
Web content miningWeb content mining
Web content miningSumit Sony
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
Basic techniques in nlp
Basic techniques in nlpBasic techniques in nlp
Basic techniques in nlpSumit Sony
 
Web usage mining
Web usage miningWeb usage mining
Web usage miningSumit Sony
 

Mehr von Sumit Sony (7)

Web structure mining
Web structure miningWeb structure mining
Web structure mining
 
Web mining
Web miningWeb mining
Web mining
 
Web content mining
Web content miningWeb content mining
Web content mining
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Basic techniques in nlp
Basic techniques in nlpBasic techniques in nlp
Basic techniques in nlp
 
Web usage mining
Web usage miningWeb usage mining
Web usage mining
 
Deep learning
Deep learningDeep learning
Deep learning
 

Kürzlich hochgeladen

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 

Kürzlich hochgeladen (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

Sentiment analysis and opinion mining

  • 2. Introduction  Sentiment analysis or opinion mining is the computational study of people's opinions, appraisals, attitudes, and emotions toward entities, individuals, issues, events, topics and their attributes.  The task is technically challenging and practically very useful. – Businesses want to find public or consumer opinions about their products and services. – Potential customers want to know the opinions of existing users before they use a service or purchase a product.  With user generated content on social media (i.e., reviews, forum discussions, blogs and social networks) on the Web, individuals and organizations are increasingly using public opinions for their decision making.
  • 3. Need for Automated Sentiment Analysis  Finding and monitoring opinion sites on the Web and distilling the information in them remains a formidable task because of the proliferation of diverse sites.  Each site typically contains a huge volume of opinionated text that is not always easily deciphered in long forum postings and blogs.  The average human reader will have difficulty identifying relevant sites and accurately summarizing the information and opinions contained in them.  Human analysis of text information is subject to considerable biases, e.g., people often pay greater attention to opinions consistent with their own preferences. People have difficulty, in producing consistent results when the amount of information to be processed is large.  Automated opinion mining and summarization systems are needed, as subjective biases and mental limitations can be overcome with an objective sentiment analysis system.
  • 4. Levels of Analysis  Sentiment analysis is carried out at three levels:  Document level: The task at this level is to classify whether a whole opinion document expresses a positive or negative sentiment – Given a product review, the system determines whether the review expresses an overall positive or negative opinion about the product. This task is commonly known as document-level sentiment classification.  This level of analysis assumes that each document expresses opinions on a single entity (e.g., a single product).  It is not applicable to documents which evaluate or compare multiple entities.
  • 5. Levels of Analysis  Sentence level: The analysis goes to the sentences and determines whether each sentence expressed a positive, negative, or neutral opinion.  Neutral usually means no opinion. This level of analysis is closely related to subjectivity classification, which distinguishes sentences that express factual information from sentences (called objective sentences) Vs. that express subjective views and opinions(called subjective sentences).  Subjectivity is not equivalent to sentiment as many objective sentences can imply opinions, e.g., “We bought the car last month and the windshield wiper has fallen off.”  Analysis is done at clause level but the clause level is still not enough, e.g., “Apple is doing very well in this lousy economy.”
  • 6. Levels of Analysis  Entity or Aspect Level: Document level and the sentence level analyses do not discover what exactly people liked and did not like (Also called Feature Level)  Instead of looking at language constructs (documents, paragraphs, sentences, clauses or phrases), aspect level directly looks at the opinion itself. Idea is that an opinion consists of a sentiment (positive or negative) and a target  An opinion without its target being identified is of limited use. Realizing the importance of opinion targets also helps to understand the sentiment analysis problem better. – Example: “although the service is not that great, I still love this restaurant” – Has a positive tone, but cannot say it is entirely positive. In fact, the sentence is positive about the restaurant (emphasized), but negative about its service (not emphasized).
  • 7. Sentiment Lexicon  The most important indicators of sentiments are sentiment words, or opinion words. These are words that are commonly used to express positive or negative sentiments.  For example, good, wonderful, and amazing are positive sentiment words, and bad, poor, and terrible are negative sentiment words.  There are also phrases and idioms, e.g., cost someone an arm and a leg.  Sentiment words and phrases are instrumental to sentiment analysis.  A list of such words and phrases is called a sentiment lexicon (or opinion lexicon).  Over the years, researchers have designed numerous algorithms to compile such lexicons
  • 8. Issues  A positive or negative sentiment word may have opposite orientations in different application domains.  For example, “suck” usually indicates negative sentiment, e.g., “This battery sucks,” but it can also imply positive sentiment, e.g., “This vacuum cleaner really sucks (dirt).”  Sarcastic sentences with or without sentiment words are hard to deal with, e.g., “What a great car! It stopped working in two days.”  Sarcasms are not very common in consumer reviews about products and services, but are very common in other places, eg. political discussions,
  • 9. Issues  A sentence containing sentiment words may not express any sentiment.  This happens in Question (interrogative) sentences and conditional sentences e.g., “Can you tell me which Sony camera is good?” and “If I can find a good camera in the shop, I will buy it.”  These sentences contain the sentiment word “good”, but does not express a positive or negative opinion on any specific camera.  Not all conditional sentences or interrogative sentences express no sentiments, e.g., “Does anyone know how to repair this terrible printer”
  • 10. Issues  Many sentences without sentiment words can also imply opinions.  Many of these are objective sentences that are used to express some factual information.  “This washer uses a lot of water” implies a negative sentiment about the washer since it uses a lot of resource (water).  “After sleeping on the mattress for two days, a valley has formed in the middle” expresses a negative opinion about the mattress.  These sentences are objective as it states a fact. They have no sentiment words.
  • 11. Issues  “(1) I bought an iPhone a few days ago. (2) It was such a nice phone. (3) The touch screen was really cool. (4) The voice quality was clear too. (5) However, my mother was mad with me as I did not tell her before I bought it. (6) She also thought the phone was too expensive, and wanted me to return it to the shop …”  Sentences (2), (3) and (4) express some positive opinions  sentences (5) and (6) express negative opinions or emotions  Targets – The target of the opinion in sentence (2) is the iPhone as a whole, and the targets of the opinions in sentences (3) and (4) are touch screen" and voice quality“ – sentence (6) is the price of the iPhone – target of the opinion/emotion in sentence (5) is me", not iPhone  Holder of the opinions in sentences (2), (3) and (4) is the author of the review, but in sentences (5) and (6) it is “my mother”.
  • 12. Entity: Definition  An entity e is a product, service, person, event, organization, or topic. It is associated with a pair, e : (T;W), where T is a hierarchy of components (or parts), sub-components, and so on, and W is a set of attributes of e. Each component or sub-component also has its own set of attributes.  Samsung Galaxy is an entity. It has a set of components, – battery and screen, – It has a set of attributes, voice quality, size, and weight. – The battery has its own set of attributes, e.g., battery life, and battery size
  • 13. Entity and Attributes  Entity is represented as a tree or hierarchy. The root of the tree is the name of the entity.  Each non-root node is a component or sub-component of the entity.  Each link is a part-of relation.  Each node is associated with a set of attributes.  An opinion can be expressed on any node and any attribute of the node.  Both components and attributes are combined and called “Aspects”
  • 14. Opinion  An opinion is a positive or negative sentiment, attitude, emotion or appraisal about an entity or an aspect of the entity from an opinion holder. Positive, negative and neutral are called opinion orientations (also called sentiment orientations, semantic orientations, or polarities).  An opinion is a quintuple, (ei; aij ; ooijkl; hk; tl), where ei is the entity, aij is an aspect j of ei, ooijkl is the orientation of the opinion about aspect aij, hk is the opinion holder, and tl is the time when the opinion is expressed by hk. – ooijkl can be positive, negative or neutral, with different strength/intensity levels – quintuple can be regarded as a schema of a database table for analysis
  • 15. Opinion mining  Objective : Given a collection of opinion documents D, discover all opinion quintuples (ei; aij ; ooijkl; hk; tl) in D. 1. Extract all entity expressions in D, and group synonymous entity expressions into entity clusters. Each entity expression cluster indicates a unique entity ei. 2. Extract all aspect expressions of the entities, and group aspect expressions into clusters. Each aspect expression cluster of entity ei indicates a unique aspect aij 3. Extract opinion holder and time information from the text or unstructured data. 4. Determine whether each opinion on an aspect is positive, negative or neutral. 5. Produce all opinion quintuples (ei; aij ; ooijkl; hk; tl) expressed in D based on the results of the above
  • 16. Example of Extraction  bigXyz on Nov-4-2010:(1) I bought a Motorola phone and my girlfriend bought a Nokia phone yesterday. (2) We called each other when we got home. (3) The voice of my Moto phone was unclear, but the camera was good. (4) My girlfriend was quite happy with her phone, and its sound quality. (5) I want a phone with good voice quality. (6) So I probably will not keep it.  QUINTIPLES  (Motorola, voice quality, negative, bigXyz, Nov-4-2010)  (Motorola, camera, positive, bigXyz, Nov-4-2010)  (Nokia, GENERAL, positive, bigXyz's girlfriend, Nov-4-2010)  (Nokia, voice quality, positive, bigXyz's girlfriend, Nov-4-2010)
  • 17. Two more Definitions  An objective sentence (sentence 1&2) presents some factual information about the world, while a subjective sentence expresses some personal feelings, views or beliefs. – Subjective expressions come in many forms, e.g., opinions, allegations, desires, beliefs, suspicions, and speculations. – A subjective sentence may not contain an opinion (Sentence 5) – Not every objective sentence contains no opinion. “the earphone broke in two days", is an objective sentence but it implies a negative sentiment.
  • 18. Two more Definitions  Emotions are our subjective feelings and thoughts – There are 6 primary emotions, i.e., love, joy, surprise, anger, sadness, and fear, which can be sub-divided into many secondary and tertiary emotions. Each emotion can also have different intensities. – The concepts of emotions and opinions are not equivalent. – Many opinion sentences express no emotion (e.g., “the voice of this phone is clear”), which are called rational evaluation sentences – Many emotion sentences give no opinion, (e.g., “I am so surprised to see you”)
  • 19. Document Sentiment Classification  Given an opinionated document d evaluating an entity e, determine the opinion orientation oo on e, i.e., determine oo on aspect GENERAL in the quintuple (e;GENERAL; oo; h; t). e, h, and t are assumed known or irrelevant. – Also known as the document-level sentiment classification – Sentiment classification assumes that the opinion document d (e.g., a product review) expresses opinions on a single entity e and the opinions are from a single opinion holder h. – This assumption holds for customer reviews of products and services because each such review usually focuses on a single product and is written by a single reviewer.
  • 20. Classification based on Supervised Learning  Three classes, positive, negative and neutral.  Since each review already has a reviewer-assigned rating (e.g., 1-5 stars), training and testing data are readily available. – A review with 4 or 5 stars is a positive review, a review with 1 or 2 stars is a negative review and a review with 3 stars is a neutral review. – Naïve Bayesian classification, and support vector machines (SVM). – It was shown that using unigrams (a bag of individual words) as features in classification performed well with either naive Bayesian or SVM.
  • 21. Feature set for Classification  Terms and their frequency: individual words or word n-grams and their frequency counts. – word positions may also be important. – TF-IDF weighting scheme.  Opinion words and phrases: Used to express positive or negative sentiments. – beautiful, wonderful, good, and amazing are positive opinion words, and bad, poor, and terrible are negative – Many opinion words are adjectives and adverbs. Nouns (rubbish, junk, and crap) and verbs (hate and like) can also indicate opinions. – There are also opinion phrases and idioms, cost someone an arm and a leg. Opinion words and phrases are instrumental to sentiment analysis
  • 22. Feature set for Classification  Part of speech: adjectives are important indicators of opinions and treated as special features.  Negations: Negation words are important because their appearances often change the opinion orientation. – “I don't like this camera” is negative. – Negation words must be handled with care because not all occurrences of such words mean negation. – “not” in “not only but also” does not change the orientation direction  Syntactic dependency: Word dependency based features generated from parsing or dependency trees
  • 23. Feature set for Classification  Manually labeling training data can be time-consuming and label intensive.  Opinion words can be utilized in the training process.  Tan et al. used opinion words to label a portion of informative examples and then learn a new supervised classifier based on labeled ones.  Opinion words can be utilized to increase the sentiment classification accuracy.  Regression can be used for predicting Rating scores (e.g., 1-5 stars) – the rating scores are ordinal!  Domain Specificity: A classifier trained using one domain often performs poorly when it is applied or tested on another domain.
  • 24. Classification – Unsupervised Learning  Three Step Process  Step 1:  Phrases containing adjectives or adverbs are extracted as adjectives and adverbs are good indicators of opinions. – Context is important. “unpredictable" breaking distance of car vs. “unpredictable” ending of the mystery movie  The algorithm extracts two consecutive words, where one member of the pair is an adjective or adverb, and the other is a context word
  • 25. Classification – Unsupervised Learning  Step 2: Estimate the semantic orientation of the extracted phrases using the point-wise mutual information (PMI) measure 𝑃𝑀𝐼(𝑡1, 𝑡2 ) = 𝑙𝑜𝑔2 𝑃(𝑡1 ∩ 𝑡2) 𝑃 𝑡1 . 𝑃(𝑡2)  PMI is a measure of the degree of statistical dependence between t1 and t2 and log of this ratio is the amount of information that we acquire about the presence of one of the words when we observe the other  The semantic/opinion orientation (SO) of a phrase is computed based on its association with the positive reference word “excellent” and its association with the negative reference word “poor” SO(Phrase)=PMI(Phrase, “Excellent”) – PMI(Phrase, “Poor”)  The probabilities are calculated by issuing queries and collecting the number of hits. Searching the two terms together and separately, we can estimate the probabilities
  • 26. Classification – Unsupervised Learning  Step 3: The algorithm computes the average SO of all phrases in a review, and classifies the review as recommended if the average SO is positive – Final classification accuracies on reviews from various domains range from 84% for automobile reviews to 66% for movie reviews.  Advantage of document level sentiment classification: it provides a prevailing opinion on an entity, topic or event.  Shortcomings: – It does not give details on what people liked and/or disliked and – It is not easily applicable to non-reviews, e.g., forum and blog postings, because many such postings evaluate multiple entities and compare them.
  • 27. Sentence-level Sentiment Classification.  Document-level sentiment classification techniques can also be applied to individual sentences.  The task of classifying a sentence as subjective or objective is often called subjectivity classification  The resulting subjective sentences are also classified as expressing positive or negative opinions  1. Subjectivity classification: Determine whether s is a subjective sentence or an objective sentence  2. Sentence-level sentiment classification: If s is subjective, determine whether it expresses a positive, negative or neutral opinion.
  • 28. Assumption  The sentence expresses a single opinion from a single opinion holder.  This assumption is only appropriate for simple sentences with a single opinion, – “The picture quality of this camera is amazing.”  Compound and complex sentences, a single sentence may express more than one opinion. – “The picture quality of this camera is amazing and so is the battery life, but the view finder is too small for such a great camera"
  • 29. Opinion Lexicon Expansion  Opinion words: also known as opinion-bearing words or sentiment words.  Positive opinion words are used to express some desired states while negative opinion words are used to express some undesired states. – beautiful, wonderful, good, and amazing. – bad, poor, and terrible.  There are also opinion phrases and idioms: “Cost someone an arm and a leg”.  Collectively, they are called the opinion lexicon. Used for opinion mining.  Three Approaches: Manual, Dictionary-based, and Corpus-based. – The manual approach is time-consuming and not usually used alone, but combined with automated approaches as the check because automated methods make mistakes.
  • 30. Dictionary based approach  Bootstrapping using a small set of seed opinion words and an online dictionary, e.g., WordNet.  The strategy is to first collect a small set of opinion words manually with known orientations, and then to grow this set by searching for their synonyms and antonyms.  The newly found words are added to the seed list and the next iteration starts. The iterative process stops when no more new words are found.  After the process completes, manual inspection can be carried out to remove and/or correct errors.
  • 31. Dictionary based approach  Shortcoming: The approach is unable to find opinion words with domain and context specific orientations, which is quite common. – For example, for a speaker phone, if it is quiet, it is usually negative. However, for a car, if it is quiet, it is positive.  The corpus-based approach can help deal with this problem.
  • 32. Corpus-based approach  The methods rely on syntactic or co-occurrence patterns and also a seed list of opinion words to find other opinion words in a large corpus  The technique starts with a list of seed opinion adjectives, and uses them and a set of linguistic constraints or conventions on connectives to identify additional adjective opinion words and their orientations.  Conjunction “AND”: conjoined adjectives usually have the same orientation. – “This car is beautiful and spacious” – "This car is beautiful and difficult to drive“ (AND Conjunction is not usually used)  Rules or constraints are also designed for other connectives, OR, BUT, EITHER-OR, and NEITHER-NOR.  This idea is called sentiment consistency
  • 33. Corpus-based approach  Learning is applied to a large corpus to determine if two conjoined adjectives are of the same or different orientations.  Same and different-orientation links between adjectives are formed  Clustering is performed on these to produce two sets of words: positive and negative.  Inter-sentential consistency is the idea to neighboring sentences.  The same opinion orientation (positive or negative) is usually expressed in a few consecutive sentences.  Opinion changes are indicated by adversative expressions such as “but” and “however”.
  • 34. Corpus-based approach  Different orientations in different contexts even in the same domain. – Digital camera: “The battery life is long (+)” ; – “The time taken to auto-focus is long" (-).  Consider both possible opinion words and aspects together, and use  the pair (aspect, opinion word) as the opinion context, (battery life", long").  This determines opinion words and their orientations together with the aspects that they modify.  Can be used to analyze comparative sentences.  Many contexts can be more complex, consuming a large amount of resources.
  • 35. Aspect-Based Sentiment Analysis  In a typical opinionated document, the author writes both positive and negative aspects of the entity, although the general sentiment on the entity may be positive or negative. Document and sentence sentiment classification does not provide such information.  Aspect-based sentiment analysis needs to be used  At the aspect level, the mining objective is to discover every quintuple (ei; aij ; ooijkl; hk; tl) in a given document d.  To achieve the objective, five tasks need to be performed.
  • 36. Aspect extraction  Extract aspects that have been evaluated. – “The picture quality of this camera is amazing,” the aspect is “picture quality" of the entity represented by “this camera”. The evaluation is not about the camera as a whole, but about its picture quality. – The sentence “I love this camera” evaluates the camera as a whole, i.e., the GENERAL aspect of the entity represented by “this camera”.  Whenever we talk about an aspect, we must know which entity it belongs to.  It is a Two-step Process
  • 37. Aspect extraction  1. Find frequent nouns and noun phrases.  Nouns and noun phrases (or groups) are identified by a POS tagger; the frequencies are counted; and only the frequent ones are kept.  A frequency threshold can be decided experimentally.  When people comment on different aspects of a product, the vocabulary that they use usually converges. The nouns that are frequently talked about are usually genuine and important aspects.  Irrelevant contents in reviews are often diverse, i.e., they are quite different in different reviews. These are infrequent nouns
  • 38. Aspect extraction  2. Find infrequent aspects by exploiting the relationships between aspects and opinion words. – The previous step can miss many genuine aspect expressions which are infrequent. This step tries to find some of them.  The same opinion word can be used to describe or modify different aspects. Opinion words that modify frequent aspects can also modify infrequent aspects, and thus can be used to extract infrequent aspects. – For example, “picture” has been found to be a frequent aspect, and we have the sentence, “The pictures are absolutely amazing.” – “software“ can also be extracted as an aspect from the following sentence, “The software is amazing.”
  • 39. Aspect extraction  Point-wise mutual information (PMI) score between the phrase and some meronymy discriminators associated with the product class can be used.  The meronymy discriminators for the “scanner” class are, “of scanner”, “scanner has”, “scanner comes with”, etc., which are used  To find components or parts of scanners by searching the Web.  𝑃𝑀𝐼 𝑎, 𝑑 = ℎ𝑖𝑡𝑠(𝑎∩𝑑) ℎ𝑖𝑡𝑠 𝑎 .ℎ𝑖𝑡𝑠(𝑑)  If the PMI value of a candidate aspect is too low, it may not be a component of the product because a and d do not co-occur frequently.
  • 40. Aspect sentiment classification  Determine whether the opinions on different aspects are positive, negative or neutral. In the first example below, the opinion on the “picture quality" aspect is positive, and in the second example, the opinion on the GENERAL aspect is also positive.  “The picture quality of this camera is amazing," the aspect is “picture quality" of the entity represented by “this camera". does not indicate the GENERAL aspect because the evaluation is not about the camera as a whole, but about its picture quality.  “I love this camera" evaluates the camera as a whole, i.e., the GENERAL aspect of the entity represented by “this camera".
  • 41. Lexicon-based Approach  Uses an opinion lexicon, - a list of opinion words and phrases, and a set of rules to determine the orientations of opinions in a sentence  It also considers opinion shifters and “but-clauses”.  4 steps  1. Mark opinion words and phrases: Given a sentence that contains one or more aspects, this step marks all opinion words and phrases in the sentence. – Each positive word is assigned the opinion score of +1, each negative word is assigned the opinion score of -1
  • 42. Lexicon-based Approach  2. Handle opinion shifters: Opinion shifters are words and phrases that can shift or change opinion orientations. – Negation words like not, never, none, nobody, nowhere, neither and cannot are the most common type.  Sarcasm changes orientation – “What a great car, it failed to start the first day.”  Spotting them and handling them correctly in actual sentences by an automated system is not easy.  Not every appearance of an opinion shifter changes the opinion orientation – “not only … but also”
  • 43. Lexicon-based Approach  3. Handle but-clauses:  In English, but means contrary.  A sentence containing but is handled by applying the following rule: – The opinion orientation before but and after but are opposite to each other if the opinion on one side cannot be determined.  “not only but also” (needs to be handled separately).  There are contrary words and phrases that do not always indicate an opinion change – “Audi is great, but Mercedes is better".  Such cases need to be identified and dealt with separately.
  • 44. Lexicon-based Approach  3. Aggregating opinions: This step applies an opinion aggregation function to the resulting opinion scores to determine the final orientation of the opinion on each aspect in the sentence.  Consider a sentence S, which contains a set of aspects {a1 … am} and a set of opinion words or phrases {ow1 : : : own} with their opinion scores. The opinion orientation for each aspect ai in S is  𝑆𝑐𝑜𝑟𝑒 𝑎𝑖, 𝑆 = 𝑜𝑤 𝑗∈𝑆 𝑜𝑤 𝑗.𝑜𝑜 𝐷𝑖𝑠𝑡(𝑜𝑤 𝑗,𝑎 𝑖) – where owj is an opinion word/phrase in s, dist (owj ; ai) is the distance between aspect ai and opinion word owj in S. – owj.oo is the opinion score of owj. Gives lower weights to opinion words that are far away from aspect ai.
  • 45. Simultaneous Opinion Lexicon Expansion and Aspect Extraction  Needs an initial set of opinion word seeds as the input (no seed aspects)  Opinions almost always have targets and there are natural relations connecting opinion words and targets in a sentence – Opinion words have relations among themselves and so do targets among themselves.  The opinion targets are aspects. Opinion words can be recognized by identified aspects, and aspects can be identified by known opinion words. – The extracted opinion words and aspects are utilized to identify new opinion words and new aspects, which are used again to extract more opinion words and aspects. – Propagation stops when no more opinion words or aspects can be found.
  • 46. Dependency grammar  Dependency grammar was adopted to describe the relations. The Algorithm uses only direct dependencies to model the relations. – A direct dependency indicates that one word depends on the other word without any additional words in their dependency path or they both depend on a third word directly.  Some constraints are also imposed. Opinion words are considered to be adjectives and aspects nouns or noun phrases. – “Canon G3 produces great pictures”, the adjective “great” is parsed as directly depending on the noun “pictures". “great" is an opinion word and given the rule `a noun on which an opinion word directly depends is taken as an aspect', we can extract “pictures” as an aspect. Similarly, “pictures” is an aspect, “great” as an opinion word using a similar rule.
  • 47. Mining Comparative Opinions  A comparative sentence expresses a relation based on similarities or differences of more than one entity. – The comparison is usually conveyed using the comparative or superlative form of an adjective or adverb.  A comparative sentence typically states that one entity has more or less of a certain attribute than another entity. – A superlative sentence states that one entity has the most or least of a certain attribute among a set of similar entities.  A comparison can be between two or more entities, groups of entities, and one entity and the rest of the entities. It can also be between versions.
  • 48. Types of Comparatives and Superlatives  Comparatives are usually formed by adding the suffix “-er” and superlatives are formed by adding the suffix “-est” to their base adjectives and adverbs. – “longer” in “The battery life of Camera-x is longer than that of Camera-y”, longest“ in “The battery life of this camera is the longest", – This type of comparatives and superlatives are called Type 1  Some adjectives and adverbs form comparatives or superlatives by using words like more, most, less and least before such words (more beautiful) – These are type 2. Types 1 and 2 are called regular comparatives and superlatives  Irregular comparatives and superlatives, i.e., more, less, least, better, best, – Grouped under Type 1 (based on the behavior)  Words like “superior”, “preferred” are also grouped under Type 1
  • 49. Types of comparative relations  Four types  1. Non-equal gradable comparisons: Type “greater or less than” that express an ordering of some entities with regard to some of their shared aspects – “The Intel chip is faster than that of AMD”. “I prefer Intel to AMD”.  2. Equative comparisons: Type equal to that state two or more entities are equal with regard to some of their shared aspects – “The performance of Samsung is about the same as that of LG.”  3. Superlative comparisons: type greater or less than all others that rank one entity over all others, – “The Intel chip is the fastest”.
  • 50. Types of comparative relations  Comparative words used in non-equal gradable comparisons are categorized into two groups according to whether they express increased or decreased quantities, which are useful in opinion analysis. – Increasing comparatives: Such a comparative expresses an increased quantity, e.g., more and longer. – Decreasing comparatives: Such a comparative expresses a decreased quantity, e.g., less and fewer.
  • 51. Types of comparative relations  4. Non-gradable comparisons: Relations that compare aspects of two or more entities, but do not grade them.  There are three main sub-types:  Entity A is similar to or different from entity B with regard to some of their shared aspects, “Coke tastes differently from Pepsi.”  Entity A has aspect a1, and entity B has aspect a2 (They are usually substitutable), “Desktop PCs use external speakers but laptops use internal speakers.”  Entity A has aspect a, but entity B does not have, e.g., “Phone-x has an earphone, but Phone-y does not have.”
  • 52. Objective of mining comparative opinions  Given a collection of opinionated documents D, – discover in D all comparative opinion sextuples of the form (E1;E2; A; PE; h; t) – where E1 and E2 are the entity sets being compared based on their shared aspects A – Entities in E1 appear before entities in E2 in the sentence), – PE(∈ {E1;E2}) is the preferred entity set of the opinion holder h, – t is the time when the comparative opinion is expressed.  These sextuples can be mined
  • 53. Example  “Ipad's display is better than those of Galaxy and Surface." written by Vish in Feb 2016.  The extracted comparative opinion is: – ({Ipad}, {Galaxy, Surface}, {display}, preferred: {Ipad}, John, Feb 2016)  The entity set E1 is {Ipad}, the entity set E2 is {Galaxy, Surface},  Their shared aspect set A being compared is {display},  The preferred entity set is {Ipad},  The opinion holder h is John  The time t when this comparative opinion was written is Feb 2016.
  • 54. Case: Sentiment Analysis-Hybrid Approach  Combined rule-based classification, supervised learning and machine learning to form a hybrid method.  Tested on movie reviews, product reviews and MySpace comments.  Hybrid classification can improve the classification effectiveness in terms of micro- and macro-averaged F1.  F1 is a measure that takes both the precision and recall of a classifier’s effectiveness into account
  • 55. Evaluation Metrics  Precision(P) = tp tp+f p ; Recall(R) = tp tp+fn ;  Accuracy(A) = tp+tn tp+tn+f p+fn ; F1 = 2·P·R P+R Machine says yes Machine says no human says yes tp fn human says no fp tn
  • 56. Evaluation Metrics  1. Micro averaging. – Given a set of confusion tables, a new two-by-two contingency table is generated. – Each cell in the new table represents the sum of the number of documents from within the set of tables. – Given the new table, the average performance of an automatic classifier, in terms of its precision and recall, is measured.  2. Macro averaging. – Given a set of confusion tables, a set of values are generated. – Each value represents the precision or recall of an automatic classifier – Given these values, the average performance of an automatic classifier, in terms of its precision and recall, is measured
  • 57. Rule Based Classification  A rule consists of an antecedent and its associated consequent that have an ‘if-then ’relation: antecedent  consequent – An antecedent is a condition: one or more tokens concatenated by the ^ operator. – A token can be a word, ‘?’ representing a proper noun, or ‘#’ representing a target term. – A target term is a term that represents the context in which a set of documents occurs, such as the name of a politician, a policy recommendation, a company name, a brand of a product or a movie title.  A consequent represents a sentiment that is either positive or negative, and is the result of meeting the condition defined by the antecedent. – {token1 ^ token2 ^ . . . ^ tokenn} =) {+|−} + is positive sentiment; - is negative sentiment
  • 58. Comparative Statements  1. Laptop-A is more expensive than Laptop-B.  2. Laptop-A is more expensive than Laptop-C.  Target word of these sentences is Laptop-A. The rule derived is: – {# ^ more ^ expensive ^ than^?} =) {−} – The target word, Laptop-A is less favorable than the other two laptops due to its price. Focus is on the price attribute of the Laptop-A.  Target words are Laptop-B and Laptop-C. The rule derived is: – {? ^ more ^ expensive ^ than ^ #} =) {+} – The two target words, Laptop-B and Laptop-C are more favorable than the Laptop-A due to its price. Focus is on the price attribute of both the Laptop-B and Laptop-C.  Target word is crucial factor in determining the sentiment of an antecedent
  • 59. General Inquirer Based Classifier (GIBC)  The first, simplest rule set was based on 3672 pre-classified words found in the General Inquirer Lexicon (Stone et al. 1966),  1598 of which were pre-classified as positive and 2074 of which were pre-classified as negative.  Here, each rule depends solely on one sentiment bearing word representing an antecedent.  A General Inquirer Based Classifier (GIBC) was implemented which applied the rule set to classify document collections.
  • 60. Calculation of “Closeness”  1. Select 120 positive words, such as amazing, awesome, beautiful, and 120 negative words, such as absurd, angry, anguish, from the General Inquirer Lexicon.  2. Compose 240 search engine queries per antecedent; each query combines an antecedent and a sentiment bearing word.  3. Collect the hit counts of all queries by using the Google and Yahoo search engines. Two search engines were used to determine whether the hit counts were influenced by the coverage and accuracy level of a single search engine. For each query, the search engines return the hit count of a number of Web pages that contains both the antecedent and a sentiment bearing word. The proximity of the antecedent and word is at the page level.  A better level of precision may be obtained if the proximity checking can be carried out at the sentence level.  This would lead to an ethical issue, however, because each page has to be downloaded and stored locally for further analysis.
  • 61. Calculation of “Closeness”  4. Collect the hit counts of each sentiment-bearing word and each antecedent.  5. Use 4 closeness measures to measure the closeness between each antecedent and 120 positive words (S+) and between each antecedent and 120 negative words (S−) based on all the hit counts collected.  𝑆+ = 𝑖=1 120 𝑐𝑙𝑜𝑠𝑒𝑛𝑒𝑠𝑠 (𝑎𝑛𝑡𝑖𝑐𝑖𝑑𝑒𝑛𝑡, 𝑤𝑜𝑟𝑑𝑖 + )  𝑆− = 𝑖=1 120 𝑐𝑙𝑜𝑠𝑒𝑛𝑒𝑠𝑠 (𝑎𝑛𝑡𝑖𝑐𝑖𝑑𝑒𝑛𝑡, 𝑤𝑜𝑟𝑑𝑖 − )  If the antecedent co-occurs more frequently with the 120 positive words (S+ > S−), then this would mean that the antecedent has a positive consequent and vice versa.
  • 62. Measures of Closeness  Document Frequency (DF). counts the number of Web pages containing a pair of an antecedent and a sentiment bearing word, i.e., the hit count returned by a search engine. The larger a DF value, the greater the association strength between antecedent and word.  The other measures of closeness are  Mutual Information (MI) = 𝑙𝑜𝑔2 𝑃 𝑤𝑜𝑟𝑑,𝑎𝑛𝑡𝑒𝑐𝑒𝑛𝑑𝑒𝑛𝑡 𝑃 𝑤𝑜𝑟𝑑 .𝑃(𝐴𝑛𝑡𝑒𝑐𝑒𝑑𝑒𝑛𝑡)  Chi-Square  Log Likelihood Ratio
  • 63. Classifiers Used  General Inquirer Based Classifier (GIBC)  Rule-Based Classifier (RBC)  Statistics Based Classifier (SBC)  Mutual Information (MI).  Chi-square (χ2)  Induction Rule Based Classifier (IRBC)  Support Vector Machines
  • 65. The evaluation results of the closeness measures – Based on Search Engine
  • 66.
  • 67.

Hinweis der Redaktion

  1. Sentiment Analysis: A Combined Approach, Rudy Prabowo1, Mike Thelwall, Journal of Informetrics · March 2013