This presentation will guide you through the application of Python NLP Techniques to analyze arguments during a debate and define a strategy to figure out the winner of the debate on the basis of strength and relevance of the arguments.
This is made for PyCon India 2015.
For details : https://in.pycon.org/cfp/pycon-india-2015/proposals/analyzing-arguments-during-a-debate-using-natural-language-processing-in-python/
Contact me : abhinav.gpt3@gmail.com
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
Analyzing Arguments during a Debate using Natural Language Processing in Python
1. Analyzing Arguments during a
Debate using Natural Language
Processing in Python
ABHINAV GUPTA
ADYASHA MAHARANA
PROF PLABAN KUMAR BHOWMICK
2. about.me
Andrew Ng Machine
Learning Course
• Recommendation systems
• Anomaly Detection
• Digit Recognition
Internship at
Zomato
• Review
Highlights
• What Dishes
to Order
Course on
Language
Processing in
E-learning
Working in
Business
Insights
Team at
American
Express
2013
2014
2015
2015
Machine Learning Journey
3. How a debate may proceed ?
This new movie ‘Superman
vs. Batman’ is so cool! The
winner has got to be
Superman, with his mighty
Kryptonian abilities and
people’s support. What do
you think?!
I agree! Superman is definitely
more capable than Batman.
What are you saying? Batman is
so much technologically
advanced!
Both, Batman and Superman,
are powerful in their own ways.
It will be a draw.
Ben Affleck is so HOT!
4. What will we discuss?
Why Natural Language Toolkit ?
Basic Natural Language Processing (NLP) techniques with NLTK
Our approach for analyzing an argument
Sentence Similarity
Semantic Similarity
Word Order Similarity
Sentiment Analysis
Backtrack
User Interface of the framework
Visualization of an argument
Calculation of Semantics and Sentiments
Overall Scoring Strategy
Challenges and Proposed Solutions
5. Why Natural Language Toolkit (NLTK)?
Platform for implementing Natural Language
Processing through Python programs
Huge database of corpora and lexical resources
with an easy interface
Built-in libraries of several text processing
algorithms
Open Source!
6. Starting with the Basics
“I do not feel very good about Monday mornings.”
Tokenization [‘I’, ‘do’, ‘not’, ‘feel’, ‘very’, ‘good’, ‘about’, ‘Monday’, ‘mornings’]
Parts of Speech Tagging ‘I’ – Personal Pronoun
‘do’ – Verb
‘not’ – Adverb
‘feel’ - Verb,
‘very’ – Adverb
‘good’ – Adjective
‘about’ – Preposition
‘Monday’ – Proper Noun
‘mornings’ – Plural Proper Noun]
8. Tokens [‘I’, ‘do’, ‘not’, ‘feel’, ‘very’, ‘good’, ‘about’, ‘Monday’, ‘mornings’]
Removal of Stop Words [‘I’, ‘feel’, ‘good’, ‘Monday’, ‘mornings’]
Stemmed Words [‘I’, ‘feel’, ‘good’, ‘Monday’, ‘morn’]
9. What we look for in an argument?
What is the stance taken by the debater
in this argument?
Has the debater changed stance from the
previous arguments?
Is the argument related to the debate or
irrelevant?
Is the argument good enough?
10. Analysis of an Argument
• Is the argument
related to the
debate?
SENTENCE
SIMILARITY
• What is the
polarity of the
argument?
SENTIMENT
ANALYSIS • Has the debater
changed stance ?
• Is the argument
good enough?
BACKTRACK
• Final score of the
user
SCORING
11. SENTENCESIMILARITY
Well, your argument is not at all relevant to this topic.
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score of the
user
SCORING
12. Sentence Similarity Ss
We will measure it on the basis of following two criteria :
1. Semantic Similarity (Ss):
Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between
them is based on the likeness of their meaning or semantic content. [Source: Wikipedia]
Example :
l Iike that bachelor.
I like that unmarried man.
2. Word Order Similarity (Sr):
It is used to measure the similarity between order of words in a sentence.
Example:
A quick brown dog jumps over the lazy fox.
A quick brown fox jumps over the lazy dog.
Sentence
Similarity
Semantics Word Order
13. Focusing on Semantic Similarity (Ss)
Semantic Distance between words in context is the distance between their underlying senses or lexical
concepts.
d(festival, celebration) < d(school, circus)
Semantic Similarity is how close the lexical concepts of two units (word, sentence, paragraph) of language are.
d(Mangoes and bananas are fruits, Mangoes are sweeter than bananas) < d(Raj has a job at the
hospital, Hospitals have a huge staff of doctors and nurses)
Lexical databases like WordNet group English words into sets of synonyms expressing a distinct concept and are
used for calculating semantic similarity
Sentence
Similarity
Semantics Word Order
14. Semantic Similarity : How to calculate it ?
A new NASA
initiative will help
lead the search for
signs of life beyond
our solar system
The Nexus for
Exoplanet System
Science, or NExSS,
will take a
multidisciplinary
approach to
the hunt for alien
life
new
NASA
Initiative
Help
Lead
Search
Signs
Life
Beyond
Solar
System
Nexus
Exoplanet
Alien
Science
NExSS
Take
multidisciplinary
Approach
Hunt
Joint Word Set
Sentence 1
Sentence 2
1
1
1
1
1
1
1
1
1
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
1
1
1
1
1
1
1
1
1
1
1 2
new NASA
initiative help lead
search signs life
beyond solar
system
Nexus Exoplanet
System Science
NExSS take
multidisciplinary
approach hunt
alien life
Processed Sentence 1
Processed Sentence 2
15. Word Net Based Semantic Similarity
Such a network forms the basis of several distance formulae to calculate semantic similarity
16. Word Net Based Semantic Similarity
Calculate similarity between :
1. Boy – Girl and Boy – Teacher
2. Boy – Teacher and Boy – Animal
Two important aspects to look for :
1. Path Length: Shorter the path within the same
hierarchy, more similar the words are.
2. Hierarchical Structure: Words at upper layers of
the hierarchy have more general semantics and
less similarity between them, while words at
lower layers have more concrete semantics and
more similarity.
Reference: Li, Yuhua, et al. "Sentence similarity based on semantic nets and corpus statistics." Knowledge and
Data Engineering, IEEE Transactions on 18.8 (2006): 1138-1150.
17. Word Net based Semantic Similarity
new
NASA
Initiative
Help
Lead
Search
Signs
Life
Beyond
Solar
System
Nexus
Exoplanet
Alien
Science
NExSS
Take
multidisciplinary
Approach
Hunt
Joint Word Set
1
1
1
1
1
1
1
1
1
1
1
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
?
1
?
?
1
1
1
1
1
1
1
1
1
1
1 2
new NASA
initiative help lead
search signs life
beyond solar
system
Nexus Exoplanet
System Science
NExSS take
multidisciplinary
approach hunt
alien life
Processed Sentence 1
Processed Sentence 2
Case 1: If the word in joint word set appears
in the processed sentence, value is set to 1.
Case 2: If the word in joint word set is not
contained in the processed sentence, a
semantic similarity score is computed
between this word and each word in the
processed sentence.
Example:
d(Search, Hunt) = 0.8
d(Solar, Exoplanet) = 0.4
20. Focusing on Word Order Similarity
Approach:
Construct a word order vector which is the basic
structural information carried by a sentence.
Example :
T1: A quick brown dog jumps over the lazy fox.
T2: A quick brown fox jumps over the lazy dog.
Step 1: Create a joint word set
T : A quick brown dog jumps over the lazy fox.
Step 2: Creating a word order vector (r1) for sentence (T1)
Comparing T with T1 :
1. If the same word (w1) is present in T1, we fill the entry
for this word in r1 with the corresponding index number
from T1. Otherwise, we try to find the most similar word
wj for w1in T1
2. If the similarity between w1 and wj is greater than a
preset threshold, the entry of w1 in r1 is filled with the
index number of wj in T1.
3. If the above two searches fail, the entry of w1 in r1 is 0.
Sentence
Similarity
Semantics Word Order
21. Focusing on Word Order Similarity
Result :
r1 = { 1 2 3 4 5 6 7 8 9}
r2 = { 1 2 3 9 5 6 7 8 4}
Then we can use this formulate to the word order
similarity of two sentences as:
T1: A quick brown dog jumps over the lazy fox.
Reference: Li, Yuhua, et al. "Sentence similarity based on semantic nets and corpus statistics." Knowledge and
Data Engineering, IEEE Transactions on 18.8 (2006): 1138-1150.
T2: A quick brown fox jumps over the lazy dog.
T: A quick brown dog jumps over the lazy fox.
T: A quick brown dog jumps over the lazy fox.
22. Back to Sentence Similarity
Both semantic and syntactic information (in terms of word order) play a role in conveying the meaning of
sentences.
Thus, the overall sentence similarity is defined as a combination of semantic similarity (Ss) and word order
similarity (Sr):
where δ decides the relative contributions of semantic and word order information to the overall similarity
computation. Since syntax plays a subordinate role for semantic processing of text, the value should be a value
greater than 0.5.
Sentence
Similarity
Semantics Word Order
23. Back to Sentence Similarity
Sentence
Similarity
Semantics Word Order
Higher the semantic similarity between two sentences, higher will be the overall score
24. SENTIMENTANALYSIS
Well, you have changed your stance during the debate. You are done !
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
25. Sentiment Analysis
It is used to determine:
Subjectivity/Objectivity of a statement :
o Objective judgments concern matters of empirical and mathematical fact
Example:
The moon has no atmosphere
Two and two are four.
o Subjective judgments concern matters of value and preference
Example:
Mozart is better than Bach
Vanilla ice cream with ketchup is disgusting.
Polarity of a statement :
o The attitude of a speaker or a writer with respect to some topic
o Varies from -1 to +1
-1 : Negative Sentiment
0 : Neutral
+1: Positive Sentiment
It finds application with Movie Reviews, Blogs, Customer Feedback, Twitter and other micro blogging sources
26. Sentiment Analysis in Python
NLP Techniques is used to identify the sentiment content in a text
Some of the available simple solutions :
o TextBlob
TextBlob is a Python (2 and 3) library for processing textual data. It provides a consistent API for diving into common natural language
processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, and more.
It has a sentiment property associated with each sentence which reflects the subjectivity and polarity of the sentence.
o text-processing.com
It uses a NLTK 2.0.4 powered text classification process to tell you whether it thinks the text you enter below expresses positive
sentiment, negative sentiment, or if it's neutral.
It also provides an API interface where you can do a post request for the concerned text and it will give you a JSON response object with
the polarity.
The most popular approach:
o Use a Naïve Bayes Classifier to train on your problem specific training data, which can be used to predict sentiments. It has
achieved a F-score of 63% while predicting sentiments for tweets.
29. Present Approach
Training Corpus
Congress Presidential
Database
Naïve Bayes
Classifier
Probability
Score (Prob.)
Test Data
Positive
Sentiment of
Sentence
Argument is
Neutral
Negative
sentiment of
sentence
Prob. > 0.6
0.4 < Prob. < 0.6
Prob. < 0.4
Features
A combination of Top 200 Unigrams and Bigram Collocations
References :
1. Corpus :http://www.cs.cornell.edu/home/llee/data/convote.html
2. Features: http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/
31. BACKTRACK
Finally, we are getting there!
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
33. Debating User Interface
Deployed Django Chat Plugin by Sharan (https://github.com/sharan01/django-ajax-chat)
Modification: A statement number is appended before each argument when it is displayed
Advantages :
Easy to understand
All the chat messages can be easily accessed in a json format
The chat messages then can be scraped and processed easily in Python
34. Should we ban homework ?
I am in favor of this notion
Homework has little
educational worth…
Homework has a lot of
educational value,…..
…., there is a strong and
positive relationship between
homework and how well
students do at school.
International comparisons of
older students have found no
positive relationship….
I am against this notion
Homework encourages
students to work..
Homework ensures that
students practice what they
are taught at school
I don’t believe that
User 1
User 2
Manual Visualization of Debate
35. Should we ban homework ?
I am in favor of this notion
Homework has little
educational worth…
Homework has a lot of
educational value,…..
…., there is a strong and
positive relationship between
homework and how well
students do at school.
International comparisons of
older students have found no
positive relationship….
I am against this notion
Homework encourages
students to work..
Homework ensures that
students practice what they
are taught at school
I don’t believe that
User 1
User 2
Visualization of Sentiments in Debate
(+)
(-)
(-)
(-)
(-)
(+)
(+)
(+)(+)
(-)
User Topic
Stance
User
Stance
User Overall
Stance
User 1 - + -
User 2 - - +
36. Tracking Sentiments
The sentiment of a fresh argument should match the sentiment of his / her overall stance.
Sentiment of a counter-argument should be opposite to the sentiment of the previous argument
Sentiments of two consecutive arguments can be same
If they are by the same user
If the next user has changed his/her stance
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
37. Topic and fresh arguments should be semantically similar to each other.
An argument and its counter argument should be semantically similar to each other
Supporting argument should be semantically similar to the main argument.
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
Tracking Semantics
38. •Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score of the
user
SCORING
A QUICK RECAP
•Gives the
similarity score
between two
sentences (ss)
•Gives the polarity of
a sentence (se)
•Gives the two sentences
among which similarity
should be measured
•Gives whether the user is
firm on his/her stance
We have to use all 3
previous steps now to
arrive at a score
39. SCORING
Almost done!
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
40. Scoring Approach
Score of a particular argument with semantic score (ss) and
sentiment score (se):
Sentence level score:
If the argument stance matches with the user stance :
Sentence level score = 0.75 * ss + 0.25 * se
If the argument stance is opposite to the user stance:
Sentence level score = ss * (-1)
Amplification factor
The depth of an argument can be determined by using number of
level of arguments below it
Example:
o Sentence 3 depth is 2
o Sentence 4 depth is 1
Final Score = (Amplification Factor + 1) * Sentence Score
3: Homework has little
educational worth…...
4: Professor Cooper of
Duke University
has……
5: International
comparisons of older
students have found
no positive…..
•Is the argument
related to the
debate?
SENTENCE
SIMILARITY
•What is the
polarity of the
argument?
SENTIMENT
ANALYSIS •Has the debater
changed stance ?
•Is the argument
good enough?
BACKTRACK
•Final score for
each user
SCORING
41. Present Challenges and Proposed Solutions
Every argument has also a strength associated with it, which is not captured.
Possible Solution: Check the strength of the fact and its validity by scraping through web
The scoring approach is tested manually using a couple of used cases
Possible Solution: Scrap complete debates from idebate.org and check its validity
The Naïve Bayes Classifier is not designed to capture Neutral Arguments separately.
Possible Solution: Train a seperate Naïve Bayes Classifier with a tagged corpus of neutral arguments to judge neutrality