SVM is a supervised machine learning algorithm that can be used for classification or regression. It works by finding the optimal hyperplane that separates classes by the largest margin. SVM identifies the hyperplane that results in the largest fractional distance between data points of separate classes. It can perform nonlinear classification using kernel tricks to transform data into higher dimensional space. SVM is effective for high dimensional data, uses a subset of training points, and works well when there is a clear margin of separation between classes, though it does not directly provide probability estimates. It has applications in text categorization, image classification, and other domains.
1. Sentiment Analysis using support
vector machine
Guide : Prof. S.B.Patil
Presented by : Shital M. Andhale
T120398502
Information Technology Dept, VIIT pune
2. Contents
• What is sentiment analysis ?
• Sentiment Analysis in Twitter or any other Social Media.
• Sentiment Analysis Classification
• Sentiment Analysis using machine learning
• Types of Machine Learning
• Support vector Machine Algorithm
• How does it work ?
• Pros and cons of SVM
• Applications
• Conclusion
• references
3. What is Sentiment Analysis ?
• Sentiment Analysis is the process of finding the opinion
of user about some topic or the text in consideration.
• It is also known as opinion mining.
• In other words, it determines whether a piece of writing
is positive, negative or neutral.
4. Sentiment Analysis in Social media or Twitter
• Micro blogging websites are social media site (Twitter, Facebook) to which user
makes short and frequent posts.
• Twitter is one of the famous micro blogging services where user can read and post
messages which are 148 characters in length. Twitter messages are also called as
Tweets.
• we will use these tweets as raw data. We will use a techniques that automatically
extracts tweets into positive, negative or neutral sentiments. By using the sentiment
analysis the customer can know the feedback about the product before making a
purchase. Sentiment analysis is a type of natural language processing for tracking
the mood of the public about a particular product or topic.
5. Classification of Sentiment Analysis
Sentiment
analyis
Machine
learning
Approch
Superwised
learning
Linear
classifier
Support Vector
Machine
Neural
network
Decision tree
Rule based
classifires
Probablistic
classifiers
Unsuperwised
learning
6. SA using machine learning Approch
6
• Machine learning is a type of artificial intelligence (AI) that provides computers
with the ability to learn without being explicitly programmed.
• The Machine that Teaches Themselves.
7. Types of machine learning
• Supervised Learning
Inferring a function from labelled training data. A supervised learning
algorithm analyses the training data (a list of input and their correct output) and
produces an appropriate function, which can be used for mapping new examples.
• Unsupervised Learning
Inferring a function to describe hidden structure from unlabelled data. No labels
are given to the learning algorithm, leaving it on its own to find structure in its
input.
8. Support Vector Machine Algorithm
What is Support Vector Machine?
• SVM is a non-probabilistic binary linear classifier. It has the ability to linearly separate
the classes by a large margin. Add to it the Kernel, and SVM becomes one of the most
powerful classifier capable of handling infinite dimensional feature vectors.
• “Support Vector Machine” (SVM) is a supervised machine learning algorithm which can
be used for both classification or regression challenges. However, it is mostly
used in classification problems.
• In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the hyper-plane that differentiate
the two classes very well .
10. Identify the right hyper-plane
(Scenario-1):
• Here, we have three hyper-planes
(A, B and C). Now,identify the
right hyper-plane to classify star
and circle
• You need to remember a thumb
rule to identify the right hyper-
plane: “Select the hyper-plane
which segregates the two classes
better”. In this scenario, hyper-
plane “B” has excellently
performed this job.
11. Identify the right
hyper-plane
(Scenario-2):
Here, we have three hyper-planes (A, B and C)
and all are segregating the classes well. Now,
How can we identify the right hyper-plane?
Here, maximizing the distances between
nearest data point (either class) and hyper-
plane will help us to decide the right hyper-
plane. This distance is called as Margin
Above, you can see that the margin for hyper-
plane C is high as compared to both A and B.
Hence, we name the right hyper-plane as C.
Another lightning reason for selecting the hyper-
plane with higher margin is robustness. If we
select a hyper-plane having low margin then
there is high chance of miss-classification.
12. Identify the right hyper-plane
(Scenario-3)
• SVM selects the hyper-plane
which classifies the classes
accurately prior to maximizing
margin. Here, hyper-plane B has a
classification error and A has
classified all correctly.
• Therefore, the right hyper-plane
is A.
13. Can we classify two
classes (Scenario-4)?
Below, I am unable to segregate the
two classes using a straight line, as
one of star lies in the territory of
other(circle) class as an outlier
14. Can we classify two
classes (Scenario-4)
As I have already mentioned, one star
at other end is like an outlier for star
class. SVM has a feature to ignore
outliers and find the hyper-plane that
has maximum margin. Hence, we can
say, SVM is robust to outliers.
15. Find the hyper-plane to
segregate to classes (Scenario-
5):
• In the scenario below, we can’t have
linear hyper-plane between the two
classes, so how does SVM classify
these two classes? Till now, we have
only looked at the linear hyper-plane.
• SVM can solve this problem. Easily!
It solves this problem by
introducing additional feature. Here,
we will add a new feature
z=x^2+y^2. Now, let’s plot the data
points on axis x and z:
16. Find the hyper-plane to
segregate to classes (Scenario-
5):
In above plot, points to consider are:
• All values for z would be positive
always because z is the squared
sum of both x and y
• In the original plot, red circles
appear close to the origin of x and
y axes, leading to lower value of z
and star relatively away from the
origin result to higher value of z.
17. Find the hyper-plane to segregate
to classes (Scenario-5):
When we look at the hyper-plane in
original input space it looks like a
circle:
18. Pros and cons of SVM
• Pros:
• It works really well with clear margin of separation
• It is effective in high dimensional spaces.
• It is effective in cases where number of dimensions is greater than the number of samples.
• It uses a subset of training points in the decision function (called support vectors), so it is
also memory efficient.
• Cons:
• It doesn’t perform well, when we have large data set because the required training time is
higher
• It also doesn’t perform very well, when the data set has more noise i.e. target classes are
overlapping
• SVM doesn’t directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation.
19. Applications of SVM
• SVMs can be used to solve various real world problems:
• SVMs are helpful in text and hypertext categorization as their application can
significantly reduce the need for labeled training instances in both the standard
inductive and transductive settings
• Classification of images can also be performed using SVMs.
• Experimental results show that SVMs achieve significantly higher search
accuracy than traditional query refinement schemes after just three to four
rounds of relevance feedback.
• This is also true of image segmentation systems, including those using a
modified version SVM that uses the privileged approach as suggested by
Vapnik. Hand-written characters can be recognized using SVM.
20. Naïve Bays SVM Maximum
Entropy
Easy to Implement Harder to Implement Harder to Implement
Less Efficient,
Efficient due to
working with large sets
of Words
Efficiency is maximum Efficiency is moderate
Limited Use Versatile
Used in Comp Vision,
Text Cat, IP
Hardly used
Comparison of ML algorithms
21. Conclusion
The machine learning can prove efficient over traditional techniques for SA
The Support Vector Machine algorithm can be useful in sentiment analysis of text
categorization.
22. References
• Mining Social Media Data for Understanding Students’ Learning Experiences
,Xin Chen, Student Member, IEEE, Mihaela Vorvoreanu, and Krishna Madhavan
• Machine Learning Algorithms for Opinion Mining and Sentiment Classification
Jayashri Khairnar,Mayura Kinikar[IJSRP].
• Managing Data in SVM Supervised Algorithm for Data Mining Technology
Sachin Bhaskart, Vijay Bahadur Singh2, A. K. Nayak.
• Wekipedia and Internet