SlideShare ist ein Scribd-Unternehmen logo
1 von 28
The JDPA Sentiment Corpus
for the Automotive Domain
Miriam Eckert, Lyndsie Clark,
Nicolas Nicolov
J.D. Power and Associates
Jason S. Kessler
Indiana University
Overview
• 335 blog posts containing opinions about cars
– 223K tokens of blog data
• Goal of annotation project:
– Examples of how words interact to evaluate entities
– Annotations encode these interactions
• Entities are invoked physical objects and their
properties
– Not just cars, car parts
– People, locations, organizations, times
Excerpt from the corpus
“last night was nice. sean bought me caribou
and we went to my house to watch the baseball
game …
“… yesturday i helped me mom with brians
house and then we went and looked at a kia
spectra. it looked nice, but when we got up to it,
i wasn't impressed ...”
Outline
• Motivating example
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
John recently purchased a
had agreat a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.
CAR
engine,
CAR-PART CAR-PART
stereo.
CAR-PART
CARPERSON
BMW
It
CAR
REFERS-TO
priced
CAR-FEATURE
REFERS-TO
John recently purchased a
had agreat a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.
CAR
engine,
CAR-PART CAR-PART
stereo.
CAR-PART
CARPERSON
BMW
It
CAR
priced
CAR-FEATURE
TARGET TARGET TARGET
TARGET
TARGET
John recently purchased a
had agreat a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.
CAR
engine,
CAR-PART CAR-PART
stereo.
CAR-PART
CARPERSON
BMW
It
CAR
REFERS-TO
priced
CAR-FEATURE
REFERS-TO
PART-OF PART-OF
FEATURE-OF
PART-OF
John recently purchased a
had a great a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.
CAR
engine,
CAR-PART CAR-PART
stereo.
CAR-PART
CARPERSON
BMW
It
CAR
priced
CAR-FEATURE
DIMENSION
MORE
LESS
John recently purchased a
had a great a disappointing stereo,
and was
mildly
very grippy. He also considered a
which, while highly had a better
PERSON
Honda Civic.
CAR
engine,
CAR-PART CAR-PART
stereo.
CAR-PART
CARPERSON
BMW
It
CAR
REFERS-TO
PART-OF PART-OF
TARGET TARGET TARGET
TARGET
TARGET
priced
CAR-FEATURE
FEATURE-OF
DIMENSION
MORE
LESS
Entity-level
sentiment: positive
Entity-level
sentiment: mixedREFERS-TO
TARGET
Outline
• Motivating example
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
John recently purchased a Civic. It had a
great engine and was priced well.
John
PERSON
Civic It
Entity annotations
REFERS-TO
REFERS-TO
CAR
engine
CAR-PART
• >20 semantic types from
• ACE Entity Mention Detection Task
• Generic automotive types
priced
CAR-
FEATURE
Entity-relation annotations
Entity-level sentiment:
Positive
• Relations between entities
• Entity-level sentiment
annotations
• Sentiment flow between
entities through relations
• My car has a great engine.
• Honda, known for its high
standards, made my car.
Civic
CAR
engine
CAR-
PART
priced
CAR-
FEATURE
PART-OF FEATURE-
OF
Entity annotation type: statistics
• Inter-annotator
agreement
• Among mentions 83%
• Refers-to: 68%
• 61K mentions in corpus
and 43K entities
• 103 documents
annotated by around 3
annotators
A1: …Kia Rio…
A2: …Kia Rio…
MATCH
A1: …Kia Rio…
A2: …Kia Rio…
NOT A MATCH
Sentiment expressions
great engine
highly priced
Prior polarity: positive
Prior polarity: negative
• Evaluations
• Target mentions
• Prior polarity:
• Semantic orientation
given target
• positive, negative,
neutral, mixed
… a
highly spec’ed
Prior polarity: positive
Sentiment expressions
• Occurrences in corpus: 10K
• 13% are multi-word
• like no other, get up and go
• 49% are headed by adjectives
• 22% nouns (damage, good amount)
• 20% verbs (likes, upset)
• 5% adverbs (highly)
Sentiment expressions
• 75% of sentiment expression occurrences
have non evaluative uses in corpus
• “light”
– …the car seemed too light to be safe…
– …vehicles in the light truck category…
• 77% sentiment expression occurrences are
positive
• Inter-annotator agreement:
– 75% spans, 66% targets, 95% prior polarity
Modifiers -> contextual polarity
NEGATORS
not a good car
not a very good car
INTENSIFIERS
very good cara
kind of good cara
UPWARD
DOWNARD
NEUTRALIZERS
i
f
goodthe car is
I hope goodthe car is
COMMITTERS
sure goodthe car isI am
UPWARD
suspect goodthe car isI
DOWNWARD
Other annotations
• Speech events (not sourced from author)
–John thinks the car is good.
• Comparisons:
–Car X has a better engine than car Y.
–Handles a variety of cases
Outline
• Motivating example
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
Possible tasks
• Detecting mentions, sentiment expressions,
and modifiers
• Identifying targets of sentiment expressions,
modifiers
• Coreference resolution
• Finding part-of, feature-of, etc. relations
• Identifying errors/inconsistencies in data
Possible tasks
• Exploring how elements interact:
– Some idiot thinks this is a good car.
• Evaluating unsupervised sentiment systems or
those trained on other domains
• How do relations between entities transfer
sentiment?
– The car’s paint job is flawless but the safety record
is poor.
• Solution to one task may be useful in solving
another.
But wait, there’s more!
• 180 digital camera blog posts were annotated
• Total of 223,001 + 108,593 = 331,594 tokens
Outline
• Motivating example
– Elements combine to render entity-level
sentiment
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
Other resources
• MPQA Version 2.0
– Wiebe, Wilson and Cardie (2005)
– Largely professionally written news articles
– Subjective expression
• “beliefs, emotions, sentiments, speculations, etc.”
– Attitude, contextual sentiment on subjective
expressions
– Target, source annotations
– 226K tokens (JDPA: 332K)
Other resources
• Data sets provided by Bing Liu (2004, 2008)
– Customer-written consumer electronics product
reviews
– Contextual sentiment toward mention of product
– Comparison annotations
– 130K tokens (JDPA: 332K)
Thank you!
• Obtaining the corpus:
– Research and educational purposes
– ICWSM.JDPA.corpus@gmail.com
– June 2010
– Annotation guidelines:
http://www.cs.indiana.edu/~jaskessl
• Thanks to: Prof. Michael Gasser, Prof. James
Martin, Prof. Martha Palmer, Prof. Michael
Mozer, William Headden
Top 20 annotations by type
Inter-annotator agreement

Weitere ähnliche Inhalte

Ähnlich wie The JDPA Sentiment Corpus: A Resource for Sentiment Analysis in the Automotive Domain

COM597 Interactive Design: CARmax Mobile APP
COM597 Interactive Design: CARmax Mobile APP COM597 Interactive Design: CARmax Mobile APP
COM597 Interactive Design: CARmax Mobile APP Melinda Yang
 
Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Kira
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsJason Kessler
 
Search Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level ViewSearch Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level Viewjustin spratt
 
Mktg350 lecture 10142013
Mktg350 lecture 10142013Mktg350 lecture 10142013
Mktg350 lecture 10142013lkirkman
 
Search Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and AnalyticsSearch Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and AnalyticsBill Hartzer
 
La increíble tabla periódica de los factores SEO
La increíble tabla periódica de los factores SEOLa increíble tabla periódica de los factores SEO
La increíble tabla periódica de los factores SEOIgnacio Santiago Pérez
 
DS.pptx
DS.pptxDS.pptx
DS.pptxJoeus1
 
Business Data Management- Car Rental Company
Business Data Management- Car Rental CompanyBusiness Data Management- Car Rental Company
Business Data Management- Car Rental CompanyJuhi Srivastava
 
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18Karen Pearce
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithmsVijay Sankar
 
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchTHAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchBrian McKeiver
 
Periodic table guide to seo - Search Engine Land
Periodic table guide to seo - Search Engine LandPeriodic table guide to seo - Search Engine Land
Periodic table guide to seo - Search Engine LandFanus van Straten
 

Ähnlich wie The JDPA Sentiment Corpus: A Resource for Sentiment Analysis in the Automotive Domain (20)

COM597 Interactive Design: CARmax Mobile APP
COM597 Interactive Design: CARmax Mobile APP COM597 Interactive Design: CARmax Mobile APP
COM597 Interactive Design: CARmax Mobile APP
 
Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)Tutorial 13 (explicit ugc + sentiment analysis)
Tutorial 13 (explicit ugc + sentiment analysis)
 
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation ToolsFrom Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
From Sentiment to Persuasion Analysis: A Look at Idea Generation Tools
 
Website workout
Website workoutWebsite workout
Website workout
 
Overview power point final
Overview power point finalOverview power point final
Overview power point final
 
Search Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level ViewSearch Engine Optimisation: A High Level View
Search Engine Optimisation: A High Level View
 
Mktg350 lecture 10142013
Mktg350 lecture 10142013Mktg350 lecture 10142013
Mktg350 lecture 10142013
 
Database Analysis
Database AnalysisDatabase Analysis
Database Analysis
 
Give Your CMS an SEO Jolt
Give Your CMS an SEO JoltGive Your CMS an SEO Jolt
Give Your CMS an SEO Jolt
 
Search Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and AnalyticsSearch Engine Optimization, SEO Audits, and Analytics
Search Engine Optimization, SEO Audits, and Analytics
 
How Google works
How Google worksHow Google works
How Google works
 
La increíble tabla periódica de los factores SEO
La increíble tabla periódica de los factores SEOLa increíble tabla periódica de los factores SEO
La increíble tabla periódica de los factores SEO
 
Summit EU Machine Learning
Summit EU Machine LearningSummit EU Machine Learning
Summit EU Machine Learning
 
Advanced Analytics with Social Media Data
Advanced Analytics with Social Media DataAdvanced Analytics with Social Media Data
Advanced Analytics with Social Media Data
 
DS.pptx
DS.pptxDS.pptx
DS.pptx
 
Business Data Management- Car Rental Company
Business Data Management- Car Rental CompanyBusiness Data Management- Car Rental Company
Business Data Management- Car Rental Company
 
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
Search Quality Evaluator Guidelines. Digirank Ltd Aug 18
 
Understanding search engine algorithms
Understanding search engine algorithmsUnderstanding search engine algorithms
Understanding search engine algorithms
 
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive SearchTHAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
 
Periodic table guide to seo - Search Engine Land
Periodic table guide to seo - Search Engine LandPeriodic table guide to seo - Search Engine Land
Periodic table guide to seo - Search Engine Land
 

Mehr von Jason Kessler

Visualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextVisualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextJason Kessler
 
Natural Language Visualization with Scattertext
Natural Language Visualization with ScattertextNatural Language Visualization with Scattertext
Natural Language Visualization with ScattertextJason Kessler
 
Lexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary ClassificationLexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary ClassificationJason Kessler
 
Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler
 
Discovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorDiscovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorJason Kessler
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Jason Kessler
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Jason Kessler
 

Mehr von Jason Kessler (7)

Visualizing Words and Topics with Scattertext
Visualizing Words and Topics with ScattertextVisualizing Words and Topics with Scattertext
Visualizing Words and Topics with Scattertext
 
Natural Language Visualization with Scattertext
Natural Language Visualization with ScattertextNatural Language Visualization with Scattertext
Natural Language Visualization with Scattertext
 
Lexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary ClassificationLexicon Mining for Semiotic Squares: Exploding Binary Classification
Lexicon Mining for Semiotic Squares: Exploding Binary Classification
 
Jason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with TwitterJason Kessler Problems: What's Wrong with Twitter
Jason Kessler Problems: What's Wrong with Twitter
 
Discovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer BehaviorDiscovering Persuasive Language through Observing Customer Behavior
Discovering Persuasive Language through Observing Customer Behavior
 
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
Targeting Sentiment Expressions through Supervised Ranking of Linguistic Conf...
 
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
Polling the Blogosphere: a Rule-Based Approach to Belief Classification, By J...
 

Kürzlich hochgeladen

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Kürzlich hochgeladen (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

The JDPA Sentiment Corpus: A Resource for Sentiment Analysis in the Automotive Domain

  • 1. The JDPA Sentiment Corpus for the Automotive Domain Miriam Eckert, Lyndsie Clark, Nicolas Nicolov J.D. Power and Associates Jason S. Kessler Indiana University
  • 2. Overview • 335 blog posts containing opinions about cars – 223K tokens of blog data • Goal of annotation project: – Examples of how words interact to evaluate entities – Annotations encode these interactions • Entities are invoked physical objects and their properties – Not just cars, car parts – People, locations, organizations, times
  • 3. Excerpt from the corpus “last night was nice. sean bought me caribou and we went to my house to watch the baseball game … “… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...”
  • 4. Outline • Motivating example • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources
  • 5. John recently purchased a had agreat a disappointing stereo, and was mildly very grippy. He also considered a which, while highly had a better PERSON Honda Civic. CAR engine, CAR-PART CAR-PART stereo. CAR-PART CARPERSON BMW It CAR REFERS-TO priced CAR-FEATURE REFERS-TO
  • 6. John recently purchased a had agreat a disappointing stereo, and was mildly very grippy. He also considered a which, while highly had a better PERSON Honda Civic. CAR engine, CAR-PART CAR-PART stereo. CAR-PART CARPERSON BMW It CAR priced CAR-FEATURE TARGET TARGET TARGET TARGET TARGET
  • 7. John recently purchased a had agreat a disappointing stereo, and was mildly very grippy. He also considered a which, while highly had a better PERSON Honda Civic. CAR engine, CAR-PART CAR-PART stereo. CAR-PART CARPERSON BMW It CAR REFERS-TO priced CAR-FEATURE REFERS-TO PART-OF PART-OF FEATURE-OF PART-OF
  • 8. John recently purchased a had a great a disappointing stereo, and was mildly very grippy. He also considered a which, while highly had a better PERSON Honda Civic. CAR engine, CAR-PART CAR-PART stereo. CAR-PART CARPERSON BMW It CAR priced CAR-FEATURE DIMENSION MORE LESS
  • 9. John recently purchased a had a great a disappointing stereo, and was mildly very grippy. He also considered a which, while highly had a better PERSON Honda Civic. CAR engine, CAR-PART CAR-PART stereo. CAR-PART CARPERSON BMW It CAR REFERS-TO PART-OF PART-OF TARGET TARGET TARGET TARGET TARGET priced CAR-FEATURE FEATURE-OF DIMENSION MORE LESS Entity-level sentiment: positive Entity-level sentiment: mixedREFERS-TO TARGET
  • 10. Outline • Motivating example • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources
  • 11. John recently purchased a Civic. It had a great engine and was priced well. John PERSON Civic It Entity annotations REFERS-TO REFERS-TO CAR engine CAR-PART • >20 semantic types from • ACE Entity Mention Detection Task • Generic automotive types priced CAR- FEATURE
  • 12. Entity-relation annotations Entity-level sentiment: Positive • Relations between entities • Entity-level sentiment annotations • Sentiment flow between entities through relations • My car has a great engine. • Honda, known for its high standards, made my car. Civic CAR engine CAR- PART priced CAR- FEATURE PART-OF FEATURE- OF
  • 13. Entity annotation type: statistics • Inter-annotator agreement • Among mentions 83% • Refers-to: 68% • 61K mentions in corpus and 43K entities • 103 documents annotated by around 3 annotators A1: …Kia Rio… A2: …Kia Rio… MATCH A1: …Kia Rio… A2: …Kia Rio… NOT A MATCH
  • 14. Sentiment expressions great engine highly priced Prior polarity: positive Prior polarity: negative • Evaluations • Target mentions • Prior polarity: • Semantic orientation given target • positive, negative, neutral, mixed … a highly spec’ed Prior polarity: positive
  • 15. Sentiment expressions • Occurrences in corpus: 10K • 13% are multi-word • like no other, get up and go • 49% are headed by adjectives • 22% nouns (damage, good amount) • 20% verbs (likes, upset) • 5% adverbs (highly)
  • 16. Sentiment expressions • 75% of sentiment expression occurrences have non evaluative uses in corpus • “light” – …the car seemed too light to be safe… – …vehicles in the light truck category… • 77% sentiment expression occurrences are positive • Inter-annotator agreement: – 75% spans, 66% targets, 95% prior polarity
  • 17. Modifiers -> contextual polarity NEGATORS not a good car not a very good car INTENSIFIERS very good cara kind of good cara UPWARD DOWNARD NEUTRALIZERS i f goodthe car is I hope goodthe car is COMMITTERS sure goodthe car isI am UPWARD suspect goodthe car isI DOWNWARD
  • 18. Other annotations • Speech events (not sourced from author) –John thinks the car is good. • Comparisons: –Car X has a better engine than car Y. –Handles a variety of cases
  • 19. Outline • Motivating example • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources
  • 20. Possible tasks • Detecting mentions, sentiment expressions, and modifiers • Identifying targets of sentiment expressions, modifiers • Coreference resolution • Finding part-of, feature-of, etc. relations • Identifying errors/inconsistencies in data
  • 21. Possible tasks • Exploring how elements interact: – Some idiot thinks this is a good car. • Evaluating unsupervised sentiment systems or those trained on other domains • How do relations between entities transfer sentiment? – The car’s paint job is flawless but the safety record is poor. • Solution to one task may be useful in solving another.
  • 22. But wait, there’s more! • 180 digital camera blog posts were annotated • Total of 223,001 + 108,593 = 331,594 tokens
  • 23. Outline • Motivating example – Elements combine to render entity-level sentiment • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources
  • 24. Other resources • MPQA Version 2.0 – Wiebe, Wilson and Cardie (2005) – Largely professionally written news articles – Subjective expression • “beliefs, emotions, sentiments, speculations, etc.” – Attitude, contextual sentiment on subjective expressions – Target, source annotations – 226K tokens (JDPA: 332K)
  • 25. Other resources • Data sets provided by Bing Liu (2004, 2008) – Customer-written consumer electronics product reviews – Contextual sentiment toward mention of product – Comparison annotations – 130K tokens (JDPA: 332K)
  • 26. Thank you! • Obtaining the corpus: – Research and educational purposes – ICWSM.JDPA.corpus@gmail.com – June 2010 – Annotation guidelines: http://www.cs.indiana.edu/~jaskessl • Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden