SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Authors
UNIVERSITY
POLITEHNICA
OF BUCHAREST
Opinion Mining for Social Media
and News Items in Romanian
Claudia Cârdei
Filip Manișor
Traian Rebedea traian.rebedea@cs.pub.ro
Overview
• Introduction
• Previous Work
– English
– Romanian
• Proposed Solutions
• Opinionated Corpus
• Results and Comparisons
• Conclusions
22.09.13 Sesiunea de Licenţe - Iulie 2012 2
Introduction
• Sentiment analysis and opinion mining research
has mainly concentrated on English and other
important languages (Spanish, Chinese, etc.)
– Various commercial and open-source solutions exist
mainly for English
– Corpora of opinionated texts and databases of
affective words (general or domain specific) also exist
for these languages
• Objective: develop an opinion mining solution for
Romanian texts gathered from a wide range of
online sources (mostly social media and news
items)
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 3
Introduction
• Popular research domain in the last years
• Sentiment, subjectivity, opinion, publicity
– Related, but somewhat different
• Sentiment or subjectivity in a text:
– Positive, negative or neutral
– Subjective or objective
• Opinionated text
– Opinion author
– Opinion target (subject)
– Opinion (affective) words
– Opinion polarity
E.g. President Obama declared that the US immigration system is broken.
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 4
Previous Work - English
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 5
Previous Work - English
• Lots of studies and corpora in different domains
• The movie reviews dataset – very popular
• Initial results using BoW, punctuation, etc.
– Accuracy ≈ 80%
• Improvement to find relations/dependencies
between opinion targets and affective words
– Accuracy ≈ 84%
• Mining frequent dependency subtrees for
positive and negative reviews and using a SVM
with these subtrees as features
– Accuracy ≈ 88%
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 6
Previous Work - Romanian
• Use machine translation to generate English
texts, then apply opinion mining
• Translate affective words databases in
Romanian (e.g. WordNet Affect)
• Developing new affective words lists
• Training and evaluation on specific corpora in
Romanian
• Problems with NER, dependency parsing,
affective words scores
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 7
Proposed Solutions
• Supervised solution trained for several
different opinion subjects (entities)
• Three approaches
– Bag of words
– Affective words and dependency parsing
– N-grams probabilities
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 8
Bag of Words
• Bag of words model:
– Tokenization, diacritics restoration, lemmatization
– Distinct lemmas selected as features
– Improvements: POS filter, word n-grams filter
– Used both binary features and TF-IDF
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 9
Affective Scores & Dependency Parsing
• Compute affective word scores in Romanian:
– Translate all the adjectives and adverbs from the English WordNet
into Romanian using Google Translate
– Uses the probability of each translation pair
• Several affective score databases have been translated:
SentiWordNet, SenticNet 2 and ANEW
• Used the UAIC Romanian FDG parser to identify dependencies
between the subject entity and adjectives or adverbs
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 10
N-grams Probabilities
• Compute the conditional probability for each
n-gram in the corpus given that the document
is either positive or negative
• Then use the following score for each n-gram
(feature f):
• The score of a new text is computed by
summing the scores for each of the n-grams
existing in that text
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 11
Opinionated Corpus
• Corpus manually annotated by analysts for their
customers (created by Treeworks for their
product ZeList, www.zelist.ro)
• ZeList indexes most of the texts published in
Romanian in most popular social networks, blogs,
online forums, news websites, etc.
• Used data for seven different entities (companies
or brands) ranging from banks and beer brands
and going to web publishers and media
corporations
• The name of the entities have been anonymized
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 12
Opinionated Corpus
• Problems:
– These texts are very noisy, very heterogeneous,
from a wide range of sources and with different
writing styles (e.g. Twitter vs. news items)
– Some of them also might express positive and
negative publicity rather than opinions
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 13
Opinionated Corpus
• Data about the first version of the corpus
• Data collection ranged from a couple of months to a couple of
years, depending on the entity
• The second version contained a larger export of data for each
entity
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 14
Entity Total items Neutral Opinionated Positive Negative
Ent1 6055 5853 202 29 173
Ent2 2240 1961 279 222 57
Ent3 343 260 83 64 19
Ent4 1168 876 292 120 172
Ent5 539 520 19 17 2
Ent6 1025 570 455 330 125
Ent7 3787 3016 771 593 178
Results - Outline
• Results obtained for the first version of the corpus, for all
entities
• Accuracy positive-negative should be more relevant
• Good results for entities with more data, poor results for the
ones with a small number of opinionated texts
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 15
Entity
Total
items
Neutral Opinionated
Accuracy
opinion-neutral
Accuracy
positive-
negative
Ent1 6055 5853 202 97.01% 92.07%
Ent2 2240 1961 279 91.79% 87.81%
Ent3 343 260 83 84.84% 89.15%
Ent4 1168 876 292 86.22% 82.19%
Ent5 539 520 19 97.40% 57.89%
Ent6 1025 570 455 76.20% 84.17%
Ent7 3787 3016 771 81.75% 83.65%
Results - Comparison
• Comparison of the above presented solutions using the
second (larger) version of the corpus
• Only for one entity by extracting a balanced dataset with 700
positive and 700 negative opinionated texts
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 16
Method Accuracy
BoW + POS filter 81.31%
BoW only adj. 70.89%
BoW only adj. & adv. 76.60%
Frequent bigrams 80.88%
Frequent trigrams 76.60%
Affective scores + dependency parsing 52.18%
Affective scores (comparison with 0 decision) 55.35%
Trigrams probabilities 88.44%
Bigrams probabilities 72.54%
Conclusions
• Several alternatives for determining the opinion
polarity have been evaluated on a corpus manually
annotated for different Romanian entities
• Best results obtained at this moment: BoW plus a POS
filter or a frequent bigrams approach + SVM classifier
• Romanian FDG parser does not provide a good
accuracy for the dependency parsing task, especially
for texts from social media
– Texts are somewhat freely written, with little regards to
usual form or structure
– Improvement of this method & the affective words
database are still possible
22.09.13
ICSCS 2013 . K-TEAMS 2013 Workshop
Opinion Mining for Social Media and News Items in Romanian 17
Thank you!
• Questions?
• Discussions
22.09.13 CSCS 2013 – Bucharest, Romania 18

Weitere ähnliche Inhalte

Was ist angesagt?

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Sebastian Ruder
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's TutorialWayne Lee
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesAdmixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesDavid Inouye
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1Traian Rebedea
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialVitomir Kovanovic
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Saeedeh Shekarpour
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the societyLifeng (Aaron) Han
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Andre Freitas
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSaeedeh Shekarpour
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 

Was ist angesagt? (20)

Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
Dynamic Topic Modeling via Non-negative Matrix Factorization (Dr. Derek Greene)
 
LDA Beginner's Tutorial
LDA Beginner's TutorialLDA Beginner's Tutorial
LDA Beginner's Tutorial
 
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word DependenciesAdmixture of Poisson MRFs: A New Topic Model with Word Dependencies
Admixture of Poisson MRFs: A New Topic Model with Word Dependencies
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Practical machine learning - Part 1
Practical machine learning - Part 1Practical machine learning - Part 1
Practical machine learning - Part 1
 
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 TutorialTopic Modeling for Learning Analytics Researchers LAK15 Tutorial
Topic Modeling for Learning Analytics Researchers LAK15 Tutorial
 
Question answering
Question answeringQuestion answering
Question answering
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Tutorial on Question Answering Systems
Tutorial on Question Answering Systems Tutorial on Question Answering Systems
Tutorial on Question Answering Systems
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the society
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
Question Answering over Linked Data: Challenges, Approaches & Trends (Tutoria...
 
Semantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked DataSemantic Interpretation of User Query for Question Answering on Interlinked Data
Semantic Interpretation of User Query for Question Answering on Interlinked Data
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 

Ähnlich wie Opinion mining for social media and news items in Romanian

A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisDiana Maynard
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Timo Wandhoefer
 
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet InstituteEric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet InstituteSarahFahmy
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware ac.uk
 
Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)rotciv
 
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...Open Knowledge Maps
 
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...TERMCAT
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapAxel Bruns
 
20190527_Karen Hytteballe Ibanez _ The OPERA project
 20190527_Karen Hytteballe Ibanez _ The OPERA project 20190527_Karen Hytteballe Ibanez _ The OPERA project
20190527_Karen Hytteballe Ibanez _ The OPERA projectOpenAIRE
 
I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010iVOX
 
A pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applicationsA pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applicationsAtlas Uned
 
Engagement handouts
Engagement handoutsEngagement handouts
Engagement handoutsSTIinnsbruck
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringPer Runeson
 
9 wietse hermanns
9  wietse hermanns9  wietse hermanns
9 wietse hermannsFEST
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Georg Rehm
 
The MyRI project presentation
The MyRI project presentationThe MyRI project presentation
The MyRI project presentationRos Pan
 
Best practices on co-design and research communication from finland
Best practices on co-design and research communication from finlandBest practices on co-design and research communication from finland
Best practices on co-design and research communication from finlandtyndallcentreuea
 
How to measure the impact of Research ?
How to measure the impact of Research ?How to measure the impact of Research ?
How to measure the impact of Research ?Le_GFII
 

Ähnlich wie Opinion mining for social media and news items in Romanian (20)

A tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysisA tailor-made one-size-fits-all approach to sentiment analysis
A tailor-made one-size-fits-all approach to sentiment analysis
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
JISC-WW1
JISC-WW1JISC-WW1
JISC-WW1
 
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet InstituteEric Mayer and Kathryn Eccles, Oxford Internet Institute
Eric Mayer and Kathryn Eccles, Oxford Internet Institute
 
SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers SoundSoftware: Software Sustainability for audio and Music Researchers
SoundSoftware: Software Sustainability for audio and Music Researchers
 
Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)Online08 stm market-outlook-vcamlek finalv1 (2)
Online08 stm market-outlook-vcamlek finalv1 (2)
 
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
The Student's and Researcher's Guide to Discovery: Exploring Scientific Field...
 
Analyzing User Reviews in Tourism with Topic Models
Analyzing User Reviews in Tourism with Topic ModelsAnalyzing User Reviews in Tourism with Topic Models
Analyzing User Reviews in Tourism with Topic Models
 
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...Dictionary self-assessment test: a way  to complete on-line dictionaries. Jor...
Dictionary self-assessment test: a way to complete on-line dictionaries. Jor...
 
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the MapNew Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
New Perspectives on Social Media: Putting Our ‘Known Unknowns’ on the Map
 
20190527_Karen Hytteballe Ibanez _ The OPERA project
 20190527_Karen Hytteballe Ibanez _ The OPERA project 20190527_Karen Hytteballe Ibanez _ The OPERA project
20190527_Karen Hytteballe Ibanez _ The OPERA project
 
I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010I vox presentation esomar conference innovate barcelona 2010
I vox presentation esomar conference innovate barcelona 2010
 
A pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applicationsA pedagogic assessment of mobile learning applications
A pedagogic assessment of mobile learning applications
 
Engagement handouts
Engagement handoutsEngagement handouts
Engagement handouts
 
Industry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software EngineeringIndustry-Academia Communication In Empirical Software Engineering
Industry-Academia Communication In Empirical Software Engineering
 
9 wietse hermanns
9  wietse hermanns9  wietse hermanns
9 wietse hermanns
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
The MyRI project presentation
The MyRI project presentationThe MyRI project presentation
The MyRI project presentation
 
Best practices on co-design and research communication from finland
Best practices on co-design and research communication from finlandBest practices on co-design and research communication from finland
Best practices on co-design and research communication from finland
 
How to measure the impact of Research ?
How to measure the impact of Research ?How to measure the impact of Research ?
How to measure the impact of Research ?
 

Mehr von Traian Rebedea

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitareTraian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaTraian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriTraian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyTraian Rebedea
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Traian Rebedea
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Traian Rebedea
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Traian Rebedea
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Traian Rebedea
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Traian Rebedea
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Traian Rebedea
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Traian Rebedea
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Traian Rebedea
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Traian Rebedea
 

Mehr von Traian Rebedea (20)

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
 

Kürzlich hochgeladen

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 

Kürzlich hochgeladen (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

Opinion mining for social media and news items in Romanian

  • 1. Authors UNIVERSITY POLITEHNICA OF BUCHAREST Opinion Mining for Social Media and News Items in Romanian Claudia Cârdei Filip Manișor Traian Rebedea traian.rebedea@cs.pub.ro
  • 2. Overview • Introduction • Previous Work – English – Romanian • Proposed Solutions • Opinionated Corpus • Results and Comparisons • Conclusions 22.09.13 Sesiunea de Licenţe - Iulie 2012 2
  • 3. Introduction • Sentiment analysis and opinion mining research has mainly concentrated on English and other important languages (Spanish, Chinese, etc.) – Various commercial and open-source solutions exist mainly for English – Corpora of opinionated texts and databases of affective words (general or domain specific) also exist for these languages • Objective: develop an opinion mining solution for Romanian texts gathered from a wide range of online sources (mostly social media and news items) 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 3
  • 4. Introduction • Popular research domain in the last years • Sentiment, subjectivity, opinion, publicity – Related, but somewhat different • Sentiment or subjectivity in a text: – Positive, negative or neutral – Subjective or objective • Opinionated text – Opinion author – Opinion target (subject) – Opinion (affective) words – Opinion polarity E.g. President Obama declared that the US immigration system is broken. 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 4
  • 5. Previous Work - English 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 5
  • 6. Previous Work - English • Lots of studies and corpora in different domains • The movie reviews dataset – very popular • Initial results using BoW, punctuation, etc. – Accuracy ≈ 80% • Improvement to find relations/dependencies between opinion targets and affective words – Accuracy ≈ 84% • Mining frequent dependency subtrees for positive and negative reviews and using a SVM with these subtrees as features – Accuracy ≈ 88% 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 6
  • 7. Previous Work - Romanian • Use machine translation to generate English texts, then apply opinion mining • Translate affective words databases in Romanian (e.g. WordNet Affect) • Developing new affective words lists • Training and evaluation on specific corpora in Romanian • Problems with NER, dependency parsing, affective words scores 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 7
  • 8. Proposed Solutions • Supervised solution trained for several different opinion subjects (entities) • Three approaches – Bag of words – Affective words and dependency parsing – N-grams probabilities 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 8
  • 9. Bag of Words • Bag of words model: – Tokenization, diacritics restoration, lemmatization – Distinct lemmas selected as features – Improvements: POS filter, word n-grams filter – Used both binary features and TF-IDF 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 9
  • 10. Affective Scores & Dependency Parsing • Compute affective word scores in Romanian: – Translate all the adjectives and adverbs from the English WordNet into Romanian using Google Translate – Uses the probability of each translation pair • Several affective score databases have been translated: SentiWordNet, SenticNet 2 and ANEW • Used the UAIC Romanian FDG parser to identify dependencies between the subject entity and adjectives or adverbs 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 10
  • 11. N-grams Probabilities • Compute the conditional probability for each n-gram in the corpus given that the document is either positive or negative • Then use the following score for each n-gram (feature f): • The score of a new text is computed by summing the scores for each of the n-grams existing in that text 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 11
  • 12. Opinionated Corpus • Corpus manually annotated by analysts for their customers (created by Treeworks for their product ZeList, www.zelist.ro) • ZeList indexes most of the texts published in Romanian in most popular social networks, blogs, online forums, news websites, etc. • Used data for seven different entities (companies or brands) ranging from banks and beer brands and going to web publishers and media corporations • The name of the entities have been anonymized 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 12
  • 13. Opinionated Corpus • Problems: – These texts are very noisy, very heterogeneous, from a wide range of sources and with different writing styles (e.g. Twitter vs. news items) – Some of them also might express positive and negative publicity rather than opinions 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 13
  • 14. Opinionated Corpus • Data about the first version of the corpus • Data collection ranged from a couple of months to a couple of years, depending on the entity • The second version contained a larger export of data for each entity 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 14 Entity Total items Neutral Opinionated Positive Negative Ent1 6055 5853 202 29 173 Ent2 2240 1961 279 222 57 Ent3 343 260 83 64 19 Ent4 1168 876 292 120 172 Ent5 539 520 19 17 2 Ent6 1025 570 455 330 125 Ent7 3787 3016 771 593 178
  • 15. Results - Outline • Results obtained for the first version of the corpus, for all entities • Accuracy positive-negative should be more relevant • Good results for entities with more data, poor results for the ones with a small number of opinionated texts 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 15 Entity Total items Neutral Opinionated Accuracy opinion-neutral Accuracy positive- negative Ent1 6055 5853 202 97.01% 92.07% Ent2 2240 1961 279 91.79% 87.81% Ent3 343 260 83 84.84% 89.15% Ent4 1168 876 292 86.22% 82.19% Ent5 539 520 19 97.40% 57.89% Ent6 1025 570 455 76.20% 84.17% Ent7 3787 3016 771 81.75% 83.65%
  • 16. Results - Comparison • Comparison of the above presented solutions using the second (larger) version of the corpus • Only for one entity by extracting a balanced dataset with 700 positive and 700 negative opinionated texts 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 16 Method Accuracy BoW + POS filter 81.31% BoW only adj. 70.89% BoW only adj. & adv. 76.60% Frequent bigrams 80.88% Frequent trigrams 76.60% Affective scores + dependency parsing 52.18% Affective scores (comparison with 0 decision) 55.35% Trigrams probabilities 88.44% Bigrams probabilities 72.54%
  • 17. Conclusions • Several alternatives for determining the opinion polarity have been evaluated on a corpus manually annotated for different Romanian entities • Best results obtained at this moment: BoW plus a POS filter or a frequent bigrams approach + SVM classifier • Romanian FDG parser does not provide a good accuracy for the dependency parsing task, especially for texts from social media – Texts are somewhat freely written, with little regards to usual form or structure – Improvement of this method & the affective words database are still possible 22.09.13 ICSCS 2013 . K-TEAMS 2013 Workshop Opinion Mining for Social Media and News Items in Romanian 17
  • 18. Thank you! • Questions? • Discussions 22.09.13 CSCS 2013 – Bucharest, Romania 18