SlideShare ist ein Scribd-Unternehmen logo
1 von 24
2014
Lexicon-Based
Sentiment Analysis
Using the Most-Mentioned
Word Tree
Bo-Hyun Kim, Sr. Software Engineer
HP Big Data Business Unit
Oct 10th, 2014
#GHC14
2014
2014
What to Expect
 Sentiment Analysis
− What is it?
− Why is it interesting?
− How HP Vertica Pulse works
− Achieving greater accuracy
− Different point of view using the most-
mentioned word tree
2014
What I Expect
 A 5-star rating on GHC app
 I just expect you to enjoy and learn!
2014
Sentiment Analysis
 In plain English
− the process of automatically detecting if a text
segment contains emotional or opinionated
content and determining its polarity (e.g., “thumbs
up” or “thumbs down”), is a field of research that has
received significant attention in recent years, both in
academia and in industry. [Wright, 2009]
2014
Gimme Examples!
 Also known as:
− Opinion Mining
− Text Mining
 Determine people’s general opinion
− “I just got a new car, and I’m loving it ”
− “My new car isn’t as fast as I thought.”
2014
Why are we interested?
 Increasing(every minute!) web usage
− Articles
− Blogs
− Comments
 Power of Social Media
− Online Shopping
− Customer Reviews
− Recommended products on Amazon
− How other people feel about the product
2014
Product Review
2014
Data… Data… Data…
2014
HP Vertica Pulse
2014
How to Analyze?
 Lexicon-based approach – HP Labs [Zhang et. al. 2011]
 Choose a product, person, event, organization, or topic
[Hu and Liu, 2004] to analyze the opinion
 Determine the Semantic Orientation score of opinion
lexicons
Word Semantic Orientation Value
Fabulous +3
Good +1
Bad -1
Nasty -3
2014
Sentiment Scoring
 Input: text or sentence
 Output: For each attribute or entity, generates a sentiment score
ranging from -1 to 1
− -1: Negative sentiment
− 0: Neutral sentiment
− 1: Positive sentiment
 Entity-level lexicon-based sentiment scoring
2014
Limitation
 Semantic Orientation value(‘missed’) = -1
 Gives more weight to the closely located
word
 Accuracy can suffer
2014
Improve accuracy
 Accuracy is what we strive for!
 More robust pre-processing
− Prune data to fit for different types of user
opinion (e.g. Twitter vs. YouTube comments)
 Naïve Bayes Classifier Training
 Tune accordingly
2014
Data Set
 Test dataset
− Stanford students collected
− In 2009
− Over 3 million tweets with tested score
− Analyzed 3500 tweets
 Collected dataset
− HP Vertica Pulse Twitter Connector
− In 2014
− Total of 1.2 million tweets over 30 days
2014
Data Pruning
 Remove
− Job postings
• #job, #jobs, #tweetmyjob
− Links
• http://this.is/nogood
− Duplicates
− Twitter specific characters
• RT, @, #
− Emoticons
• I hate my life :-), sarcasm is wide-spread disease
 After pruning
− ~287000 tweets, 24% of the 1.2 million tweets
2014
Naïve Bayes Classifier
 Supervised learning
− Probabilistic classifier based on Bayes’ theorem
− Requires a small amount of data
− Assumes the presence/absence of a particular
feature of a class is unrelated to the
presence/absence of any other feature
− Classifying the object based on its included features
𝑃(𝐶𝑗|𝐷) =
𝑃 𝐷 𝐶𝑗 𝑃(𝐶𝑗)
𝑃(𝐷)
− Open source found at [nltk.org]
2014
Naïve Bayes Classifier
 Results:
− Final accuracy : 0.788
2014
Tuning Pulse
 Positive words
 Negative words
 Neutral words
 White lists
 Stop words
 Synonym mappings
2014
Accuracy Comparison
 Sentiment scores generated for each
phase
Keyword Ideal Original Pruning Training Tuning
Healthcare -0.1515 -0.0333 -0.0833 -0.1 -0.125
Obama 0.308 0.0944 0.1535 0.1535 0.1842
2014
Trend/Targeted Analysis
 Targeted dataset analysis can help improve accuracy
 Identify the most-mentioned words
− Use the most-recurrent words to narrow the scope of analysis
 Find new trends
− Government healthcare (2009) vs. Obamacare (2014)
 Are we looking at the targeted data?
− “Solve healthcare challenges with technology!”
− “Healthcare After ObamaCare”
− “Get affordable healthcare at HealthCare.gov”
2014
Generating Tree
 Increase the relevancy of sentiment score by
running the sentiment analysis on the entity, as
well as on the most-recurrent words to identify:
− Homonyms that machines do not understand
− More accurate scores based on user interest
 Generate tree using Text Search
− Merge stemmer words
e.g. query, queries, querying…
− Lucene - apache open source
2014
Tree View
healthcare
obamacare !(Obamacare)
obama !(Obama) !(health)health
2014
Thank you 
Questions?
bohyun@hp.com
bohyun.j.kim@gmail.com
Many thanks to*:
Tim Donar, Solution Engineer
Beth Favini, Tech Pubs Sr. Manager
Judith Plummer, Tech Pubs Editor in Chief
* In alphabetical order
2014
Got Feedback?
Rate and Review the session using the
GHC Mobile App
To download visit www.gracehopper.org

Weitere ähnliche Inhalte

Was ist angesagt?

Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
piya chauhan
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysis
nancy amala
 

Was ist angesagt? (20)

Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning AlgorithmsSentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
Sentiment Analysis Using Hybrid Structure of Machine Learning Algorithms
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
Opinion Mining or Sentiment Analysis
Opinion Mining or Sentiment AnalysisOpinion Mining or Sentiment Analysis
Opinion Mining or Sentiment Analysis
 
Can Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis ProblemCan Deep Learning solve the Sentiment Analysis Problem
Can Deep Learning solve the Sentiment Analysis Problem
 
Approaches to Sentiment Analysis
Approaches to Sentiment AnalysisApproaches to Sentiment Analysis
Approaches to Sentiment Analysis
 
Big Data & Sentiment Analysis
Big Data & Sentiment AnalysisBig Data & Sentiment Analysis
Big Data & Sentiment Analysis
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Sentimental analysis
Sentimental analysisSentimental analysis
Sentimental analysis
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
 
Aspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic LanguageAspect Level Sentiment Analysis for Arabic Language
Aspect Level Sentiment Analysis for Arabic Language
 
Opinion Mining
Opinion MiningOpinion Mining
Opinion Mining
 
Product Sentiment Analysis
Product Sentiment AnalysisProduct Sentiment Analysis
Product Sentiment Analysis
 
Potentials and limitations of ‘Automated Sentiment Analysis
Potentials and limitations of ‘Automated Sentiment AnalysisPotentials and limitations of ‘Automated Sentiment Analysis
Potentials and limitations of ‘Automated Sentiment Analysis
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 

Andere mochten auch

Andere mochten auch (6)

Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 

Ähnlich wie Lexicon-Based Sentiment Analysis at GHC 2014

A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity   A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity
Career Communications Group
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016
Kunal Dash
 
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsTelling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
UserZoom
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
padatascience
 

Ähnlich wie Lexicon-Based Sentiment Analysis at GHC 2014 (20)

7 notes
7 notes7 notes
7 notes
 
Support Optimization
Support OptimizationSupport Optimization
Support Optimization
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Top 5 Survey Data Analysis Software .pptx
Top 5 Survey Data Analysis Software .pptxTop 5 Survey Data Analysis Software .pptx
Top 5 Survey Data Analysis Software .pptx
 
Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review Exploratory data analysis and data mining on yelp restaurant review
Exploratory data analysis and data mining on yelp restaurant review
 
A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity   A Fresh and Objective Assessment of Workforce Diversity
A Fresh and Objective Assessment of Workforce Diversity
 
Perceptual Data_04182016
Perceptual Data_04182016Perceptual Data_04182016
Perceptual Data_04182016
 
Sentiment Analysis for SEO
Sentiment Analysis for SEOSentiment Analysis for SEO
Sentiment Analysis for SEO
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Interestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsInterestingness of articles using twitter sentiments
Interestingness of articles using twitter sentiments
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsTelling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
 
Interestingness of articles using twitter sentiments
Interestingness of articles using twitter sentimentsInterestingness of articles using twitter sentiments
Interestingness of articles using twitter sentiments
 
Experiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter ZadroznyExperiences with Sentiment Analysis with Peter Zadrozny
Experiences with Sentiment Analysis with Peter Zadrozny
 
I Am Data-driven and So Are You
I Am Data-driven and So Are YouI Am Data-driven and So Are You
I Am Data-driven and So Are You
 
Manisha_Microsoft_GlobalBootCamp_2019
Manisha_Microsoft_GlobalBootCamp_2019Manisha_Microsoft_GlobalBootCamp_2019
Manisha_Microsoft_GlobalBootCamp_2019
 
Webinar - How to Choose and Use Salary Data
Webinar - How to Choose and Use Salary DataWebinar - How to Choose and Use Salary Data
Webinar - How to Choose and Use Salary Data
 
General Tips to Fast-Track Your Quantitative Methodology
General Tips to Fast-Track Your Quantitative MethodologyGeneral Tips to Fast-Track Your Quantitative Methodology
General Tips to Fast-Track Your Quantitative Methodology
 
Goverment 2.0: Social Media in the Age of Obama - Measuring Your Success
Goverment 2.0: Social Media in the Age of Obama - Measuring Your SuccessGoverment 2.0: Social Media in the Age of Obama - Measuring Your Success
Goverment 2.0: Social Media in the Age of Obama - Measuring Your Success
 

Kürzlich hochgeladen

Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 

Kürzlich hochgeladen (20)

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 

Lexicon-Based Sentiment Analysis at GHC 2014

  • 1. 2014 Lexicon-Based Sentiment Analysis Using the Most-Mentioned Word Tree Bo-Hyun Kim, Sr. Software Engineer HP Big Data Business Unit Oct 10th, 2014 #GHC14 2014
  • 2. 2014 What to Expect  Sentiment Analysis − What is it? − Why is it interesting? − How HP Vertica Pulse works − Achieving greater accuracy − Different point of view using the most- mentioned word tree
  • 3. 2014 What I Expect  A 5-star rating on GHC app  I just expect you to enjoy and learn!
  • 4. 2014 Sentiment Analysis  In plain English − the process of automatically detecting if a text segment contains emotional or opinionated content and determining its polarity (e.g., “thumbs up” or “thumbs down”), is a field of research that has received significant attention in recent years, both in academia and in industry. [Wright, 2009]
  • 5. 2014 Gimme Examples!  Also known as: − Opinion Mining − Text Mining  Determine people’s general opinion − “I just got a new car, and I’m loving it ” − “My new car isn’t as fast as I thought.”
  • 6. 2014 Why are we interested?  Increasing(every minute!) web usage − Articles − Blogs − Comments  Power of Social Media − Online Shopping − Customer Reviews − Recommended products on Amazon − How other people feel about the product
  • 10. 2014 How to Analyze?  Lexicon-based approach – HP Labs [Zhang et. al. 2011]  Choose a product, person, event, organization, or topic [Hu and Liu, 2004] to analyze the opinion  Determine the Semantic Orientation score of opinion lexicons Word Semantic Orientation Value Fabulous +3 Good +1 Bad -1 Nasty -3
  • 11. 2014 Sentiment Scoring  Input: text or sentence  Output: For each attribute or entity, generates a sentiment score ranging from -1 to 1 − -1: Negative sentiment − 0: Neutral sentiment − 1: Positive sentiment  Entity-level lexicon-based sentiment scoring
  • 12. 2014 Limitation  Semantic Orientation value(‘missed’) = -1  Gives more weight to the closely located word  Accuracy can suffer
  • 13. 2014 Improve accuracy  Accuracy is what we strive for!  More robust pre-processing − Prune data to fit for different types of user opinion (e.g. Twitter vs. YouTube comments)  Naïve Bayes Classifier Training  Tune accordingly
  • 14. 2014 Data Set  Test dataset − Stanford students collected − In 2009 − Over 3 million tweets with tested score − Analyzed 3500 tweets  Collected dataset − HP Vertica Pulse Twitter Connector − In 2014 − Total of 1.2 million tweets over 30 days
  • 15. 2014 Data Pruning  Remove − Job postings • #job, #jobs, #tweetmyjob − Links • http://this.is/nogood − Duplicates − Twitter specific characters • RT, @, # − Emoticons • I hate my life :-), sarcasm is wide-spread disease  After pruning − ~287000 tweets, 24% of the 1.2 million tweets
  • 16. 2014 Naïve Bayes Classifier  Supervised learning − Probabilistic classifier based on Bayes’ theorem − Requires a small amount of data − Assumes the presence/absence of a particular feature of a class is unrelated to the presence/absence of any other feature − Classifying the object based on its included features 𝑃(𝐶𝑗|𝐷) = 𝑃 𝐷 𝐶𝑗 𝑃(𝐶𝑗) 𝑃(𝐷) − Open source found at [nltk.org]
  • 17. 2014 Naïve Bayes Classifier  Results: − Final accuracy : 0.788
  • 18. 2014 Tuning Pulse  Positive words  Negative words  Neutral words  White lists  Stop words  Synonym mappings
  • 19. 2014 Accuracy Comparison  Sentiment scores generated for each phase Keyword Ideal Original Pruning Training Tuning Healthcare -0.1515 -0.0333 -0.0833 -0.1 -0.125 Obama 0.308 0.0944 0.1535 0.1535 0.1842
  • 20. 2014 Trend/Targeted Analysis  Targeted dataset analysis can help improve accuracy  Identify the most-mentioned words − Use the most-recurrent words to narrow the scope of analysis  Find new trends − Government healthcare (2009) vs. Obamacare (2014)  Are we looking at the targeted data? − “Solve healthcare challenges with technology!” − “Healthcare After ObamaCare” − “Get affordable healthcare at HealthCare.gov”
  • 21. 2014 Generating Tree  Increase the relevancy of sentiment score by running the sentiment analysis on the entity, as well as on the most-recurrent words to identify: − Homonyms that machines do not understand − More accurate scores based on user interest  Generate tree using Text Search − Merge stemmer words e.g. query, queries, querying… − Lucene - apache open source
  • 23. 2014 Thank you  Questions? bohyun@hp.com bohyun.j.kim@gmail.com Many thanks to*: Tim Donar, Solution Engineer Beth Favini, Tech Pubs Sr. Manager Judith Plummer, Tech Pubs Editor in Chief * In alphabetical order
  • 24. 2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org

Hinweis der Redaktion

  1. Specifically clarify NLP -> part of it.
  2. Mention twitter’s limitation  spend less time and effort to understand the specific dataset Twitter – full outer join
  3. This is the last slide and must be included in the slide deck