Analyzing Arguments during a Debate using Natural Language Processing in Python

Analyzing Arguments during a
Debate using Natural Language
Processing in Python
ABHINAV GUPTA
ADYASHA MAHARANA
PROF PLABAN KUMAR BHOWMICK

about.me
Andrew Ng Machine
Learning Course
• Recommendation systems
• Anomaly Detection
• Digit Recognition
Internship at
Zomato
• Review
Highlights
• What Dishes
to Order
Course on
Language
Processing in
E-learning
Working in
Business
Insights
Team at
American
Express
2013
2014
2015
2015
Machine Learning Journey

How a debate may proceed ?
This new movie ‘Superman
vs. Batman’ is so cool! The
winner has got to be
Superman, with his mighty
Kryptonian abilities and
people’s support. What do
you think?!
I agree! Superman is definitely
more capable than Batman.
What are you saying? Batman is
so much technologically
advanced!
Both, Batman and Superman,
are powerful in their own ways.
It will be a draw.
Ben Affleck is so HOT!

What will we discuss?
 Why Natural Language Toolkit ?
 Basic Natural Language Processing (NLP) techniques with NLTK
 Our approach for analyzing an argument
 Sentence Similarity
 Semantic Similarity
 Word Order Similarity
 Sentiment Analysis
 Backtrack
 User Interface of the framework
 Visualization of an argument
 Calculation of Semantics and Sentiments
 Overall Scoring Strategy
 Challenges and Proposed Solutions

Why Natural Language Toolkit (NLTK)?
 Platform for implementing Natural Language
Processing through Python programs
 Huge database of corpora and lexical resources
with an easy interface
 Built-in libraries of several text processing
algorithms
 Open Source!

Starting with the Basics
“I do not feel very good about Monday mornings.”
Tokenization [‘I’, ‘do’, ‘not’, ‘feel’, ‘very’, ‘good’, ‘about’, ‘Monday’, ‘mornings’]
Parts of Speech Tagging ‘I’ – Personal Pronoun
‘do’ – Verb
‘not’ – Adverb
‘feel’ - Verb,
‘very’ – Adverb
‘good’ – Adjective
‘about’ – Preposition
‘Monday’ – Proper Noun
‘mornings’ – Plural Proper Noun]

Tokens [‘I’, ‘do’, ‘not’, ‘feel’, ‘very’, ‘good’, ‘about’, ‘Monday’, ‘mornings’]
Removal of Stop Words [‘I’, ‘feel’, ‘good’, ‘Monday’, ‘mornings’]
Stemmed Words [‘I’, ‘feel’, ‘good’, ‘Monday’, ‘morn’]

What we look for in an argument?
What is the stance taken by the debater
in this argument?
Has the debater changed stance from the
previous arguments?
Is the argument related to the debate or
irrelevant?
Is the argument good enough?

Analysis of an Argument
• Is the argument
related to the
debate?
SENTENCE
SIMILARITY
• What is the
polarity of the
argument?
SENTIMENT
ANALYSIS • Has the debater
changed stance ?
• Is the argument
good enough?
BACKTRACK
• Final score of the
user
SCORING

SENTENCESIMILARITY
Well, your argument is not at all relevant to this topic.
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score of the
user
SCORING

Sentence Similarity Ss
We will measure it on the basis of following two criteria :
1. Semantic Similarity (Ss):
Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between
them is based on the likeness of their meaning or semantic content. [Source: Wikipedia]
Example :
 l Iike that bachelor.
 I like that unmarried man.
2. Word Order Similarity (Sr):
It is used to measure the similarity between order of words in a sentence.
Example:
 A quick brown dog jumps over the lazy fox.
 A quick brown fox jumps over the lazy dog.
Sentence
Similarity
Semantics Word Order

Focusing on Semantic Similarity (Ss)
 Semantic Distance between words in context is the distance between their underlying senses or lexical
concepts.
 d(festival, celebration) < d(school, circus)
 Semantic Similarity is how close the lexical concepts of two units (word, sentence, paragraph) of language are.
 d(Mangoes and bananas are fruits, Mangoes are sweeter than bananas) < d(Raj has a job at the
hospital, Hospitals have a huge staff of doctors and nurses)
 Lexical databases like WordNet group English words into sets of synonyms expressing a distinct concept and are
used for calculating semantic similarity
Sentence
Similarity

Semantic Similarity : How to calculate it ?
A new NASA
initiative will help
lead the search for
signs of life beyond
our solar system
The Nexus for
Exoplanet System
Science, or NExSS,
will take a
multidisciplinary
approach to
the hunt for alien
life
new
NASA
Initiative
Help
Lead
Search
Signs
Life
Beyond
Solar
System
Nexus
Exoplanet
Alien
Science
NExSS
Take
multidisciplinary
Approach
Hunt
Joint Word Set
Sentence 1
Sentence 2
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1
1
1
1
1
1
1
1
1 2
new NASA
initiative help lead
search signs life
beyond solar
system
Nexus Exoplanet
System Science
NExSS take
multidisciplinary
approach hunt
alien life
Processed Sentence 1

Word Net Based Semantic Similarity
Such a network forms the basis of several distance formulae to calculate semantic similarity

Word Net Based Semantic Similarity
Calculate similarity between :
1. Boy – Girl and Boy – Teacher
2. Boy – Teacher and Boy – Animal
Two important aspects to look for :
1. Path Length: Shorter the path within the same
hierarchy, more similar the words are.
2. Hierarchical Structure: Words at upper layers of
the hierarchy have more general semantics and
less similarity between them, while words at
lower layers have more concrete semantics and
more similarity.
Reference: Li, Yuhua, et al. "Sentence similarity based on semantic nets and corpus statistics." Knowledge and
Data Engineering, IEEE Transactions on 18.8 (2006): 1138-1150.

Word Net based Semantic Similarity
new
NASA
Initiative
Help
Lead
Search
Signs
Life
Beyond
Solar
System
Nexus
Exoplanet
Alien
Science
NExSS
Take
multidisciplinary
Approach
Hunt
Joint Word Set
1
1
1
1
1
1
1
1
1
1
1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
1
?
?
1
1
1
1
1
1
1
1
1
1
1 2
new NASA
initiative help lead
search signs life
beyond solar
system
Nexus Exoplanet
System Science
NExSS take
multidisciplinary
approach hunt
alien life
Case 1: If the word in joint word set appears
in the processed sentence, value is set to 1.
Case 2: If the word in joint word set is not
contained in the processed sentence, a
semantic similarity score is computed
between this word and each word in the
processed sentence.
Example:
 d(Search, Hunt) = 0.8
 d(Solar, Exoplanet) = 0.4

Focusing on Word Order Similarity
Approach:
Construct a word order vector which is the basic
structural information carried by a sentence.
Example :
 T1: A quick brown dog jumps over the lazy fox.
 T2: A quick brown fox jumps over the lazy dog.
Step 1: Create a joint word set
 T : A quick brown dog jumps over the lazy fox.
Step 2: Creating a word order vector (r1) for sentence (T1)
Comparing T with T1 :
1. If the same word (w1) is present in T1, we fill the entry
for this word in r1 with the corresponding index number
from T1. Otherwise, we try to find the most similar word
wj for w1in T1
2. If the similarity between w1 and wj is greater than a
preset threshold, the entry of w1 in r1 is filled with the
index number of wj in T1.
3. If the above two searches fail, the entry of w1 in r1 is 0.
Sentence
Similarity

Focusing on Word Order Similarity
Result :
r1 = { 1 2 3 4 5 6 7 8 9}
r2 = { 1 2 3 9 5 6 7 8 4}
Then we can use this formulate to the word order
similarity of two sentences as:
T1: A quick brown dog jumps over the lazy fox.
Reference: Li, Yuhua, et al. "Sentence similarity based on semantic nets and corpus statistics." Knowledge and
Data Engineering, IEEE Transactions on 18.8 (2006): 1138-1150.
T2: A quick brown fox jumps over the lazy dog.
T: A quick brown dog jumps over the lazy fox.
T: A quick brown dog jumps over the lazy fox.

Back to Sentence Similarity
Both semantic and syntactic information (in terms of word order) play a role in conveying the meaning of
sentences.
Thus, the overall sentence similarity is defined as a combination of semantic similarity (Ss) and word order
similarity (Sr):
where δ decides the relative contributions of semantic and word order information to the overall similarity
computation. Since syntax plays a subordinate role for semantic processing of text, the value should be a value
greater than 0.5.
Sentence
Similarity

Back to Sentence Similarity
Sentence
Similarity
Higher the semantic similarity between two sentences, higher will be the overall score

SENTIMENTANALYSIS
Well, you have changed your stance during the debate. You are done !
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING

Sentiment Analysis
 It is used to determine:
 Subjectivity/Objectivity of a statement :
o Objective judgments concern matters of empirical and mathematical fact
 Example:
 The moon has no atmosphere
 Two and two are four.
o Subjective judgments concern matters of value and preference
 Example:
 Mozart is better than Bach
 Vanilla ice cream with ketchup is disgusting.
 Polarity of a statement :
o The attitude of a speaker or a writer with respect to some topic
o Varies from -1 to +1
 -1 : Negative Sentiment
 0 : Neutral
 +1: Positive Sentiment
 It finds application with Movie Reviews, Blogs, Customer Feedback, Twitter and other micro blogging sources

Sentiment Analysis in Python
 NLP Techniques is used to identify the sentiment content in a text
 Some of the available simple solutions :
o TextBlob
 TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API for diving into common natural language
processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.
 It has a sentiment property associated with each sentence which reflects the subjectivity and polarity of the sentence.
o text-processing.com
 It uses a NLTK 2.0.4 powered text classification process to tell you whether it thinks the text you enter below expresses positive
sentiment, negative sentiment, or if it's neutral.
 It also provides an API interface where you can do a post request for the concerned text and it will give you a JSON response object with
the polarity.
 The most popular approach:
o Use a Naïve Bayes Classifier to train on your problem specific training data, which can be used to predict sentiments. It has
achieved a F-score of 63% while predicting sentiments for tweets.

Sentiment Analysis in Argumentative Framework
text-processing.com

TextBlob

Present Approach
Training Corpus
Congress Presidential
Database
Naïve Bayes
Classifier
Probability
Score (Prob.)
Test Data
Positive
Sentiment of
Sentence
Argument is
Neutral
Negative
sentiment of
sentence
Prob. > 0.6
0.4 < Prob. < 0.6
Prob. < 0.4
Features
A combination of Top 200 Unigrams and Bigram Collocations
References :
1. Corpus :http://www.cs.cornell.edu/home/llee/data/convote.html
2. Features: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/

Present Approach

BACKTRACK
Finally, we are getting there!
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING

Debating User Interface
 Deployed Django Chat Plugin by Sharan (https://github.com/sharan01/django-ajax-chat)
 Modification: A statement number is appended before each argument when it is displayed
Advantages :
 Easy to understand
 All the chat messages can be easily accessed in a json format
 The chat messages then can be scraped and processed easily in Python

Should we ban homework ?
I am in favor of this notion
Homework has little
educational worth…
Homework has a lot of
educational value,…..
…., there is a strong and
positive relationship between
homework and how well
students do at school.
International comparisons of
older students have found no
positive relationship….
I am against this notion
Homework encourages
students to work..
Homework ensures that
students practice what they
are taught at school
I don’t believe that
User 1
User 2
Manual Visualization of Debate

Should we ban homework ?
I am in favor of this notion
Homework has little
educational worth…
Homework has a lot of
educational value,…..
…., there is a strong and
positive relationship between
homework and how well
students do at school.
International comparisons of
older students have found no
positive relationship….
I am against this notion
Homework encourages
students to work..
Homework ensures that
students practice what they
are taught at school
I don’t believe that
User 1
User 2
Visualization of Sentiments in Debate
(+)
(-)
(-)
(-)
(-)
(+)
(+)
(+)(+)
(-)
User Topic
Stance
User
Stance
User Overall
Stance
User 1 - + -
User 2 - - +

Tracking Sentiments
 The sentiment of a fresh argument should match the sentiment of his / her overall stance.
Sentiment of a counter-argument should be opposite to the sentiment of the previous argument
Sentiments of two consecutive arguments can be same
 If they are by the same user
 If the next user has changed his/her stance
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING

 Topic and fresh arguments should be semantically similar to each other.
 An argument and its counter argument should be semantically similar to each other
 Supporting argument should be semantically similar to the main argument.
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
Tracking Semantics

•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score of the
user
SCORING
A QUICK RECAP
•Gives the
similarity score
between two
sentences (ss)
•Gives the polarity of
a sentence (se)
•Gives the two sentences
among which similarity
should be measured
•Gives whether the user is
firm on his/her stance
We have to use all 3
previous steps now to
arrive at a score

SCORING
Almost done!
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING

Scoring Approach
Score of a particular argument with semantic score (ss) and
sentiment score (se):
 Sentence level score:
 If the argument stance matches with the user stance :
 Sentence level score = 0.75 * ss + 0.25 * se
 If the argument stance is opposite to the user stance:
 Sentence level score = ss * (-1)
 Amplification factor
 The depth of an argument can be determined by using number of
level of arguments below it
 Example:
o Sentence 3 depth is 2
o Sentence 4 depth is 1
 Final Score = (Amplification Factor + 1) * Sentence Score
3: Homework has little
educational worth…...
4: Professor Cooper of
Duke University
has……
5: International
comparisons of older
students have found
no positive…..
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING

Present Challenges and Proposed Solutions
 Every argument has also a strength associated with it, which is not captured.
 Possible Solution: Check the strength of the fact and its validity by scraping through web
 The scoring approach is tested manually using a couple of used cases
 Possible Solution: Scrap complete debates from idebate.org and check its validity
The Naïve Bayes Classifier is not designed to capture Neutral Arguments separately.
 Possible Solution: Train a seperate Naïve Bayes Classifier with a tagged corpus of neutral arguments to judge neutrality

Analyzing Arguments during a Debate using Natural Language Processing in Python

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (8)

Ähnlich wie Analyzing Arguments during a Debate using Natural Language Processing in Python

Ähnlich wie Analyzing Arguments during a Debate using Natural Language Processing in Python (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Analyzing Arguments during a Debate using Natural Language Processing in Python