1) The document discusses various linguistic phenomena including irony, sarcasm, and thwarting. It presents algorithms for detecting sarcasm and thwarting in text.
2) For sarcasm detection, a semi-supervised algorithm uses pattern-based and punctuation-based features to classify sentences, achieving up to 81% accuracy.
3) Thwarting detection compares sentiment across levels of a domain ontology, using either rule-based or machine learning approaches, with the latter approach achieving up to 81% accuracy.
4. Verbal Irony
“An irony is a figure of speech which implicitly
displays that the utterance situation was surrounded
by an ironic environment.”
There also exists Situational Irony
6. Reasons for Expectation to Fail
Expectation E is caused by an action A
1. E failed because A failed or cannot be
performed because of another action B
2. E failed because A was not performed
Expectation E is not caused by any action
3. E failed by an action B
4. E accidentally failed
Type 1 and 3
have victims
Sarcasm is
irony with
definite victims
and
counterfeited
emotions
7. Properties of an Ironic Environment
An utterance implicitly displays all the three conditions for
ironic environment when it:
1. Alludes to the speaker's expectation E
2. Intentionally violates one of pragmatic principles
3. Implies the speaker's emotional attitude toward the
failure of E
Irony is recognized if any 2 of these 3 are recognized.
Irony conveys the third unidentified property.
8. Allude to Speaker’s Expectation
Deepali baked a pizza to satisfy her hunger. She placed the pizza on the
table and in the meantime Sagar came and gobbled up the whole pizza.
Deepali said to Sagar:
a. I'm not hungry at all
b. Have you seen my pizza on the table?
c. I'll sleep well on a full stomach.
d. I'm really satisfied to eat the pizza.
e. Did you enjoy eating the pizza?
9. Violation of Pragmatic Principles
Sincerity
You make a statement you believe
You ask a question whose answer you don’t know
You offer advice which will benefit the receiver
You thank when you are really grateful
Propositional content
You thank for something that has been done for you
Preparatory condition for Offer
You offer something that you can really give
Maxim of relevance
Politeness principle
Maxim of quantity
10. Emotional Attitude
Tone and expressions
Interjections “Oh! The weather is so nice”
The context implies the emotional attitude of
the speaker
12. Sarcasm
The activity of saying or writing the opposite of
what you mean, or of speaking in a way
intended to make someone feel stupid or show
that you are angry (Macmillan English
Dictionary)
13. Sarcasm manifests in other ways...
● “Love the cover” (book)
● “Be sure to save your purchase receipt”
(Smart Phone)
● “Great idea, now try again with a real
product development team” (e-reader)
● “Where am I?” (GPS device)
14. The Algorithm: Overview
1. Training Set: Sentences manually assigned
scores 1 to 5 where five means clearly
sarcastic and one absence of sarcasm
2. Create feature vectors from the labelled
sentences
3. Use these feature vectors to build a model
and assign scores to unlabelled examples
15. Step 1: Preprocessing of Data
1. Replace each appearance of a
product/company/author by generalized
[product], [company], [author], etc.
2. Remove all HTML tags and special symbols
from review text.
16. Step 2: Creating Feature Vectors
Pattern Based Features:
1. Classify words into High Frequency Words (HFWs) and
Content Words (CWs)
All [product], [company] tags and punctuation marks are
HFWs.
2. A pattern is a sequence of HFWs with slots for CWs.
Example: “Garmin does not care about product quality or
customer support” has patterns “[company] does not CW
about CW CW” or “about CW CW or CW CW”, etc.
17. Pattern Matching
1: Exact Match
: Sparse Match - additional non-matching words can
be inserted between pattern components
: Incomplete Match - only n of N pattern
components appear in sentence, while some
non-matching words can be inserted in
between
18. Punctuation Based Features
●
●
●
●
●
Sentence length in words
Number of “!” characters
Number of “?” characters
Number of quotes
Number of capitalized/all capital words
Features are normalized to be in [0-1] by dividing them by
maximal observed value
19. Step 3: Data Enrichment
● For each sentence in the training set perform
a search engine query containing this
sentence
● Assign similar label to newly extracted
sentence.
20. Step 4: Classification
● Construct feature vectors for each sentence in the
training and test set
● Compute Euclidean Distance to each of matching
vectors in training set
Let ti i=1..k be the k vectors with lowest Euclidean Distance to v.
Then v is classified label l as follows:
Count(l) = Count of vectors in the training set with label l
Label(v) =
21. Star Sentiment Baseline
● From a set of negative reviews (with 1-3
stars) classify those sentences as sarcastic
with strong positive sentiment.
● Positive sentiment words can be eg. “great”,
“best”, “top”, etc.
24. Thwarting?
“The actors were good, the story was great, the
screenplay was a marvel of perfection and the
music was good too, but the movie couldn’t
hold my attention...”
25. Detecting Thwarting: The Big Picture
● Ascertain attributes of entity using ontology
● Find sentiment of each attribute in ontology
and the overall entity
● If there is a contrast, conclude thwarting has
occured
26. Building the Domain Ontology
1. Identify key features of domain from a
corpus
2. Arrange them in a hierarchy
Notes:
● Very human-intensive
● One-time requirement
29. Rule-based Approach
1. Get dependency parse for adjective-noun
dependencies
2. Identify polarities towards all nouns
3. Tag corresponding ontology nodes with
found polarities
4. If a contradiction across levels is found,
conclude that thwarting has taken place
30. Rule-based Approach: Example
Movie
negative
Story Elements
positive
Main Story
positive
Dialogues
positive
Acting
positive
Characters
positive
Music
positive
Songs
positive
Background
Score
negative
32. Learning Weights: Choices
1. Choices for loss function:
a. Linear loss
b. Hinge loss
2. Choices for percolation across ontology
levels:
a. Complete percolation
b. Controlled percolation
33. Classification: Features
● Convert document into a feature vector.
● Examples:
○
○
○
○
○
Document polarity
No of flips of sign
Longest contiguous subsequence of +ve values
Longest contiguous subsequence of -ve values
etc.
35. What’s the catch?
Requires sentiment as input!
Document with
Sentiment Information
Document
Thwarted or Not Thwarted
Current System
Thwarted or Not Thwarted,
Document Sentiment
Ideal System
36. Key Ideas
● Irony indicates presence of an ironic environment,
with 3 properties
● 2 of those 3 are enough to recognize irony
● Sarcasm is irony with victims and counterfeited
emotions
● A semi supervised pattern based algorithm detects
sarcasm well
● Thwarting is the phenomenon of polarity reversal at
a higher level of ontology compared to the polarity
expressed at the lower level
● Rule based and machine learning based
approaches have been attempted for thwarting
37. References
● Akira Utsumi (1996) - A unified theory of irony and its
computational formalization. InCOLING, 962–967.
● Oren Tsur, Dmitry Davidov, Ari Rappoport (2010) ICWSM – A Great Catchy Name: Semi-Supervised
Recognition of Sarcastic Sentences in Online Product
Reviews. In Association for the advancement of Artificial
Intelligence
● Ankit Ramteke, Akshat Malu, Pushpak Bhattacharyya,
J. Saketha Nath (2013) - Detecting Turnarounds in
Sentiment Analysis: Thwarting. In ACL 2013.