SlideShare a Scribd company logo
1 of 13
TEXT CLASSIFICATION
using
SUPPORT VECTOR MACHINES in R
by
Sai Srinivas Kotni
[14BM60083]
Under the guidance of
Prof. Susmita Mukhopadhyay
 Automated text classification has been considered as a vital method to manage
and process a vast amount of documents in digital forms that are widespread and
continuously increasing. In general, text classification plays an important role in
information extraction and summarization, text retrieval, and question answering.
Objective:
 Create an efficient Support-vector machines model for text
classification/categorization
 Measure its performance
Problem Statement
Text classification (text categorization): Assign documents to one or more predefined categories
 Applications of Text Classification
 Organize web pages into hierarchies
 Domain-specific information extraction
 Sort email in to different folders
 Find interests of users
Common Methods
 Manual classification
 Automatic document classification
 Supervised learning of document-label assignment function
 Naive Bayes (simple, common method)
 k-Nearest Neighbors (simple, powerful)
 Support-vector machines (new, more powerful) and many more
Introduction
Sport
Science
Theory
Art
Examples
 Labels may be domain-specific binary
e.g., "interesting-to-me" : "not-interesting-to-me”
e.g., “spam” : “not-spam”
e.g., “contains adult language” :“doesn’t”
 LABELS=TOPICS
 “finance” / “sports” / “asia”
 Given:
 A description of an instance, xX, where X is the instance language or instance space.
 E.g: how to represent text documents.
 A fixed set of categories C = {c1, c2,…, cn}
 Determine:
 The category of x: c(x)C, where c(x) is a categorization function whose domain is X and
whose range is C.
 LABELS=OPINION
 “like” / “hate” / “neutral”
 LABELS=AUTHOR
 “Shakespeare” / “Marlowe” / “Ben Jonson”
 Labels may be genres
 e.g., "editorials" "movie-reviews" "news“
Assign labels to each document or web-page
Decision Tree model
 Decision Tree (DT):
 Tree where the root and each internal node is labeled with a question.
 The arcs represent each possible answer to the associated question.
 Each leaf node represents a prediction of a solution to the problem.
 Popular technique for classification; Leaf node indicates class to which the corresponding
tuple belongs.
 A Decision Tree Model is a computational model consisting of three parts:
 Decision Tree
 Algorithm to create the tree
 Algorithm that applies the tree to data
 Creation of the tree is the most difficult part.
 Processing is basically a search similar to that in a binary search tree (although DT may not
be binary).
 The decision tree approach to classification is to divide the search space into rectangular
region. A tuple is classified based on the region into which it falls.
Naive Bayes Algorithm
Formula
Naive Bayes algorithm works on conditional probability i.e.
Where p(Ck|x) – Is the probability whethere the
tweet has positive/negative sentiment
P(Ck) – Probability of Negative/Positive
dataframe
P(x|Ck) – Probability of every word in tweet as
positive or negative
Where K – positive negative
P(xi|Ck) – probability of bag of words – P(x1,x2,x3,x4)
Sentiment with highest probability value will be selected
Logic behind the Model
•Say suppose we’ve trained the model using a excel file containing 10 tweets which consist of
3 positive tweets and 7 negative tweets.
• Probability (Positive tweets) = 0.3 Probability (Negative tweets) =0.7
• Say suppose out tweet is “I had an awesome experience”.
Say suppose the strings in this particular tweet are represented by x1, x2, x3, x4, x5.
Probability (Pos/strings of data(say x1 x2 x3 x4 x5)) =
P(Pos)*P(x1/pos)*P(x2/pos)*P(x3/pos)*P(x4/pos)*P(x5/pos) ----------------------- 1
Probability (Neg/strings of data(say x1 x2 x3 x4 x5)) =
P(Neg)*P(x1/neg)*P(x2/neg)*P(x3/neg)*P(x4/neg)*P(x5/neg) --------------------- 2
Where
If 1 > 2, the text will be classified as a positive one and if otherwise, negative tweet.
Where Nk – No of time x1 repeated in positive dataframe repository
N – Total number of words in positive dataframe repository including redundancy
D – Total distinct words including positive & negative database repository
8
Support Vector Machines
Main idea of SVMs
 Find out the linear separating hyperplane which
maximize the margin, i.e., the optimal separating
hyperplane (OSH)
Supervised learning
Support vector machines are based on the Structural Risk Minimization principle from
computational learning theory. The idea of structural risk minimization is to find a hypothesis h for
which we can guarantee the lowest true error.
Why Should SVMs Work Well for Text Categorization ?
• High dimensional input space
• Document vectors are sparse
• Few irrelevant features
• Most text categorization problems are linearly separable
Methodology
Documents
Preprocessing
Indexing and
Feature
selection
Applying SVM
classification
algorithm
Performance
measure
Transform documents into a suitable representation for
classification task
• Remove HTML or other tags
• Remove stopwords
• Perform word stemming
Indexing by different weighing schemes:
• Boolean weighing
• Word frequency weighing
Feature selection: Remove non-informative terms from
documents
• improve classification effectiveness
• reduce computational complexity
• K-Nearest-Neighbor algorithm (KNN)
• Decision Tree algorithm (DT)
• Naive Bayes algorithm (NB)
• Support Vector Machine (SVM)
Performance of algorithm:
– Training time
– Testing time
– Classification accuracy
 Each document is a vector, one component for each term (= word).
 Normalize to unit length.
 High-dimensional vector space:
 Terms are axes
 10,000+ dimensions, or even 100,000+
 Docs are vectors in this space
 Each training doc a point (vector) labeled by its topic (= class)
 Hypothesis: docs of the same class form a contiguous region of space
 We define surfaces to delineate classes in space
 The set of records available for developing classification methods is divided
into two disjoint subsets- a training set and a test set.
Process
SVM model implementation in R
 Prepare the algorithm to classify the text documents
 Train and Test the model
 Measure the performance of SVM model
Things to perform..
Packages to be used in R
RTextTools
e1071(SVM), rpart
tm, Stringr, Plyr
arules
LITERATURE REVIEW
Title of the literature Author, Journal and
Publication
date
Learnings from
the literature
Link
Text Categorization with
Support Vector Machines:
Learning with Many Relevant
Features
Thorsten Joachims University at
Dortmund
Informatik LS8,
Baroper Str. 301
44221
Dortmund,
Germany
About the the
particular
properties of
learning with text
data and
identifies why
SVMs are
appropriate
PDF
Automatic Text Categorization
and Its Application to Text
Retrieval
Wai Lam, Miguel
Ruiz, and Padmini
Srinivasan
NOVEMBER/D
ECEMBER
1999
The application
of automatic
categorization to
text retrieval
PDF
SVM Tutorial Alexandre
KOWALCZYK
23 November,
2015
How to classify
text in R
PDF
Thank You

More Related Content

What's hot

Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayesDhwaj Raj
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)VenkateshMurugadas
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learningAbu Kaisar
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTSuman Debnath
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERTshaurya uppal
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingMinh Pham
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 

What's hot (20)

Introduction to text classification using naive bayes
Introduction to text classification using naive bayesIntroduction to text classification using naive bayes
Introduction to text classification using naive bayes
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)Introduction to Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
Text summarization using deep learning
Text summarization using deep learningText summarization using deep learning
Text summarization using deep learning
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
An introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERTAn introduction to the Transformers architecture and BERT
An introduction to the Transformers architecture and BERT
 
NLP State of the Art | BERT
NLP State of the Art | BERTNLP State of the Art | BERT
NLP State of the Art | BERT
 
Text summarization
Text summarization Text summarization
Text summarization
 
NLP
NLPNLP
NLP
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
Word2Vec
Word2VecWord2Vec
Word2Vec
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Nlp
NlpNlp
Nlp
 
Tutorial on word2vec
Tutorial on word2vecTutorial on word2vec
Tutorial on word2vec
 
Language models
Language modelsLanguage models
Language models
 
Bert
BertBert
Bert
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
Text clustering
Text clusteringText clustering
Text clustering
 

Similar to Presentation on Text Classification

Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3butest
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsRebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
Text categorization
Text categorizationText categorization
Text categorizationKU Leuven
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Association for Computational Linguistics
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Infrrd
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesBryan Gummibearehausen
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspectiveankurpandeyinfo
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.pptbutest
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Sebastian Ruder
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 

Similar to Presentation on Text Classification (20)

Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Cluster
ClusterCluster
Cluster
 
Text categorization
Text categorizationText categorization
Text categorization
 
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
Chris Dyer - 2017 - Neural MT Workshop Invited Talk: The Neural Noisy Channel...
 
Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...Learning from similarity and information extraction from structured documents...
Learning from similarity and information extraction from structured documents...
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
Clustering
ClusteringClustering
Clustering
 
Clique and sting
Clique and stingClique and sting
Clique and sting
 
Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...Transformation Functions for Text Classification: A case study with StackOver...
Transformation Functions for Text Classification: A case study with StackOver...
 
Words in space
Words in spaceWords in space
Words in space
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
My7class
My7classMy7class
My7class
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
TEXT CLUSTERING.doc
TEXT CLUSTERING.docTEXT CLUSTERING.doc
TEXT CLUSTERING.doc
 

Recently uploaded

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Recently uploaded (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Presentation on Text Classification

  • 1. TEXT CLASSIFICATION using SUPPORT VECTOR MACHINES in R by Sai Srinivas Kotni [14BM60083] Under the guidance of Prof. Susmita Mukhopadhyay
  • 2.  Automated text classification has been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing. In general, text classification plays an important role in information extraction and summarization, text retrieval, and question answering. Objective:  Create an efficient Support-vector machines model for text classification/categorization  Measure its performance Problem Statement
  • 3. Text classification (text categorization): Assign documents to one or more predefined categories  Applications of Text Classification  Organize web pages into hierarchies  Domain-specific information extraction  Sort email in to different folders  Find interests of users Common Methods  Manual classification  Automatic document classification  Supervised learning of document-label assignment function  Naive Bayes (simple, common method)  k-Nearest Neighbors (simple, powerful)  Support-vector machines (new, more powerful) and many more Introduction Sport Science Theory Art
  • 4. Examples  Labels may be domain-specific binary e.g., "interesting-to-me" : "not-interesting-to-me” e.g., “spam” : “not-spam” e.g., “contains adult language” :“doesn’t”  LABELS=TOPICS  “finance” / “sports” / “asia”  Given:  A description of an instance, xX, where X is the instance language or instance space.  E.g: how to represent text documents.  A fixed set of categories C = {c1, c2,…, cn}  Determine:  The category of x: c(x)C, where c(x) is a categorization function whose domain is X and whose range is C.  LABELS=OPINION  “like” / “hate” / “neutral”  LABELS=AUTHOR  “Shakespeare” / “Marlowe” / “Ben Jonson”  Labels may be genres  e.g., "editorials" "movie-reviews" "news“ Assign labels to each document or web-page
  • 5. Decision Tree model  Decision Tree (DT):  Tree where the root and each internal node is labeled with a question.  The arcs represent each possible answer to the associated question.  Each leaf node represents a prediction of a solution to the problem.  Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.  A Decision Tree Model is a computational model consisting of three parts:  Decision Tree  Algorithm to create the tree  Algorithm that applies the tree to data  Creation of the tree is the most difficult part.  Processing is basically a search similar to that in a binary search tree (although DT may not be binary).  The decision tree approach to classification is to divide the search space into rectangular region. A tuple is classified based on the region into which it falls.
  • 6. Naive Bayes Algorithm Formula Naive Bayes algorithm works on conditional probability i.e. Where p(Ck|x) – Is the probability whethere the tweet has positive/negative sentiment P(Ck) – Probability of Negative/Positive dataframe P(x|Ck) – Probability of every word in tweet as positive or negative Where K – positive negative P(xi|Ck) – probability of bag of words – P(x1,x2,x3,x4) Sentiment with highest probability value will be selected
  • 7. Logic behind the Model •Say suppose we’ve trained the model using a excel file containing 10 tweets which consist of 3 positive tweets and 7 negative tweets. • Probability (Positive tweets) = 0.3 Probability (Negative tweets) =0.7 • Say suppose out tweet is “I had an awesome experience”. Say suppose the strings in this particular tweet are represented by x1, x2, x3, x4, x5. Probability (Pos/strings of data(say x1 x2 x3 x4 x5)) = P(Pos)*P(x1/pos)*P(x2/pos)*P(x3/pos)*P(x4/pos)*P(x5/pos) ----------------------- 1 Probability (Neg/strings of data(say x1 x2 x3 x4 x5)) = P(Neg)*P(x1/neg)*P(x2/neg)*P(x3/neg)*P(x4/neg)*P(x5/neg) --------------------- 2 Where If 1 > 2, the text will be classified as a positive one and if otherwise, negative tweet. Where Nk – No of time x1 repeated in positive dataframe repository N – Total number of words in positive dataframe repository including redundancy D – Total distinct words including positive & negative database repository
  • 8. 8 Support Vector Machines Main idea of SVMs  Find out the linear separating hyperplane which maximize the margin, i.e., the optimal separating hyperplane (OSH) Supervised learning Support vector machines are based on the Structural Risk Minimization principle from computational learning theory. The idea of structural risk minimization is to find a hypothesis h for which we can guarantee the lowest true error. Why Should SVMs Work Well for Text Categorization ? • High dimensional input space • Document vectors are sparse • Few irrelevant features • Most text categorization problems are linearly separable
  • 9. Methodology Documents Preprocessing Indexing and Feature selection Applying SVM classification algorithm Performance measure Transform documents into a suitable representation for classification task • Remove HTML or other tags • Remove stopwords • Perform word stemming Indexing by different weighing schemes: • Boolean weighing • Word frequency weighing Feature selection: Remove non-informative terms from documents • improve classification effectiveness • reduce computational complexity • K-Nearest-Neighbor algorithm (KNN) • Decision Tree algorithm (DT) • Naive Bayes algorithm (NB) • Support Vector Machine (SVM) Performance of algorithm: – Training time – Testing time – Classification accuracy
  • 10.  Each document is a vector, one component for each term (= word).  Normalize to unit length.  High-dimensional vector space:  Terms are axes  10,000+ dimensions, or even 100,000+  Docs are vectors in this space  Each training doc a point (vector) labeled by its topic (= class)  Hypothesis: docs of the same class form a contiguous region of space  We define surfaces to delineate classes in space  The set of records available for developing classification methods is divided into two disjoint subsets- a training set and a test set. Process
  • 11. SVM model implementation in R  Prepare the algorithm to classify the text documents  Train and Test the model  Measure the performance of SVM model Things to perform.. Packages to be used in R RTextTools e1071(SVM), rpart tm, Stringr, Plyr arules
  • 12. LITERATURE REVIEW Title of the literature Author, Journal and Publication date Learnings from the literature Link Text Categorization with Support Vector Machines: Learning with Many Relevant Features Thorsten Joachims University at Dortmund Informatik LS8, Baroper Str. 301 44221 Dortmund, Germany About the the particular properties of learning with text data and identifies why SVMs are appropriate PDF Automatic Text Categorization and Its Application to Text Retrieval Wai Lam, Miguel Ruiz, and Padmini Srinivasan NOVEMBER/D ECEMBER 1999 The application of automatic categorization to text retrieval PDF SVM Tutorial Alexandre KOWALCZYK 23 November, 2015 How to classify text in R PDF