SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
{damiano, jcalbornoz, tmartin, enrique, julio}@lsi.uned.es, fginer3@alumno.uned.es
Acknowledgments. This research was partially supported by the Spanish Ministry of Education (FPU grant nr AP2009-0507 and FPI grant nr BES-2011-044328), the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02), the Regional
Government of Madrid and the ESF under MA2VICMR (S2009/TIC-1542) and the European Community's FP7 Programme under grant agreement nr 288024 (LiMoSINe).
UNED NLP & IR Group
Madrid, Spain
RepLab at CLEF 2013
Valencia, Spain. September 23-26, 2013
Filter Keywords
Instance-based learning + Heterogeneity-Based Ranking (HBR)
LDA-based Clustering
Term Clustering
Wikified Tweet Clustering
Approach Accuracy Reliability Sensitivity F(R,S)
Rank
(out of 76 runs)
RepLab 2013 Best System 0.91 0.73 0.45 0.49 1
Filter Keywords
(Tweet Classification Step)
0.86 0.43 0.38 0.34 19
RepLab 2013 Official Baseline 0.87 0.49 0.32 0.33 21
Instance-based Learning + HBR 0.87 0.47 0.33 0.30 27
Filter Keywords
(training: same entity)
0.84 0.67 0.26 0.25 42
Filter Keywords
(training: other entities)
0.50 0.17 0.29 0.14 61
Approach Reliability Sensitivity F(R,S) Accuracy
Rank
(out of 68 runs)
RepLab 2013 Best System 0.48 0.34 0.38 0.69 1
SentiSense
(training: same entity)
0.36 0.10 0.15 0.62 21
SentiSense +
Domain-specific Adaptation
(training: same entity)
0.33 0.11 0.14 0.62 22
Instance-based Learning + HBR 0.32 0.29 0.30 0.59 26
RepLab 2013 Official Baseline 0.32 0.29 0.30 0.58 28
SentiSense +
Domain-specific Adaptation
(training: same entity, balanced)
0.34 0.12 0.16 0.58 31
Approach Reliability Sensitivity F(R,S)
Rank
(out of 34 runs)
Wikified Tweet Clustering 0.46 0.32 0.33 1
LDA-based Clustering
(all entities background tweets)
0.30 0.22 0.24 5
Term Clustering 0.42 0.21 0.23 7
LDA-based Clustering
(entity-specific
background tweets)
0.34 0.16 0.21 16
Instance-based Learning + HBR 0.15 0.22 0.17 21
RepLab 2013 Official Baseline 0.15 0.22 0.17 22
Filtering Topic Detection Topic Priority F-1*
Rank
(out of 26 runs)
Filter Keywords
(Tweet Classification Step)
Wikified Tweet Clustering Baseline 0.19 1
Baseline
LDA-based Clustering
(all entities background tweets)
Baseline 0.18 2
Filter Keywords
(Tweet Classification Step)
Term Clustering Baseline 0.17 3
Baseline
LDA-based Clustering
(entity-specific
background tweets)
Baseline 0.17 4
Instance-based Learning + HBR Instance-based Learning + HBR Instance-based Learning + HBR 0.16 13
Filter Keywords
(training: other entities)
Wikified Tweet Clustering Instance-based Learning + HBR 0.12 14
Filter Keywords
(training: all entites)
Wikified Tweet Clustering Baseline
0.11 15
Filter Keywords
(training: all entities)
Term Clustering Baseline
0.11 16
UNED Online Reputation Monitoring Team at
RepLab 2013
Filtering Subtask
Polarity for Reputation Subtask
Topic Detection Subtask
Full Monitoring Task
Input: entity of interest + set of tweets + representative URL
Example: Apple Inc. + tweets containing “apple” + www.apple.com
- Filtering: Binary classification of tweets (related/unrelated)
- Polarity for Reputation: Classify each tweet according to its
polarity for reputation (positive /negative/neutral)
- Topic Detection: Group tweets by topics
- Topic Priority: Rank topics, reputation alerts go first
Output: Monitoring summary (ranking of topics) for the
reputation manager
RepLab 2013
Dataset
Monitoring Task
• 61 entities
• 4 domains: automotive, banking, universities, music
• For each entity: ~ 750 tweets for training
~1,500 tweets for test
• Languages: English and Spanish
• ~ 142,500 tweets
• ~ 372,800 manual annotations
Two-step classification algorithm
• Step 1: Automatic Keyword Discovery
Each term is classified as positive keyword / negative keyword / other
Step 2: Automatic Tweet Classification
Tweets containing keywords are used to feed a binary BoW classifier that classifies the remaining
tweets as related/unrelated
• Similar to the RepLab 2013 official baseline
• Each tweet in the test set is labeled as the most similar tweet in the training set
• Combination of rankings given by multiple text similarity measures
• Applicable to all the subtasks (Topic Detection, Polarity, Priority...)
• SentiSense
Affective Lexicon of 5,496 words and 2,190 synsets from WordNet labeled with emotional
categories
• Domain-specificic Lexicon Adaptation
For each domain, WordNet concepts are extracted from the training data. The graph is
generated upon semantic relations between concepts. Emotional categories are propagated
using SentiSense as seed.
• Polarity Classification
Tweets represented as a Vector of Emotional Intensities (VEI) feed a Machine Learning
classifier.
Semantic Graphs for Domain-specific Affective Lexicon Adaptation
• Representation: Tweets are linked to Wikipedia pages/entities
• Clustering: Jaccard similarity over Wikipedia entities
• Step 1: Term Clustering
- Learned similarity function (content-based, meta-data, time-aware features)
- Hierarchical Agglomerative Clustering
• Step 2: Tweet clustering
- Assigns tweets according to maximal term overlap (highest Jaccard similarity).
• Based on Twitter-LDA and Topics over Time models
• Transfer learning: target tweets + background tweets to establish the right number of clusters
* F-1 = Harmonic Mean( {R,S} x {Filtering, Topic Detection, Topic Priority} )
Conclusions
• Full Task. Large room for improvement. Filtering is crucial for the overall performance of a
monitoring system.
• Filtering. Use entity-specific training data when available: +78% F(R,S), +68% accuracy for
Filter Keywords.
• Polarity for Reputation. Different from traditional sentiment analysis. Domain-adaptive
affective lexicons less competitive than other RepLab submissions.
• Topic Detection. Three approaches perform competitively w.r.t. other RepLab submissions.
• Topic Priority (future work). Challenging due to the difficulty of combining dissimilar and
unperfect signals (computed automatically): polarity, novelty, centrality, etc.

Weitere ähnliche Inhalte

Ähnlich wie UNED Online Reputation Monitoring Team Report on Filtering, Polarity, Topic Detection and Priority Subtasks at RepLab 2013

Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkPaolo Nesi
 
What happen after crawling big data?
What happen after crawling big data?What happen after crawling big data?
What happen after crawling big data?jcarpioc
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSAPRBETTER
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Suite Solutions
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntityAnkita Kumari
 
Automatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networksAutomatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networksAntonio Moreno
 
Feedable, Portable, Mashable, DITAble
Feedable, Portable, Mashable, DITAbleFeedable, Portable, Mashable, DITAble
Feedable, Portable, Mashable, DITAbleMichael Priestley
 
2015 syllabus 12_computer_science_new
2015 syllabus 12_computer_science_new2015 syllabus 12_computer_science_new
2015 syllabus 12_computer_science_new1oshane
 
TellMeFirst - A knowledge domain discovery framework
TellMeFirst - A knowledge domain discovery frameworkTellMeFirst - A knowledge domain discovery framework
TellMeFirst - A knowledge domain discovery frameworkgiuseppe_futia
 
Building a Learning Resource Exchange
Building a Learning Resource ExchangeBuilding a Learning Resource Exchange
Building a Learning Resource ExchangeDavid Massart
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Fed Stimulus & Oer Workshop
Fed Stimulus & Oer WorkshopFed Stimulus & Oer Workshop
Fed Stimulus & Oer WorkshopRuth Romi
 

Ähnlich wie UNED Online Reputation Monitoring Team Report on Filtering, Polarity, Topic Detection and Priority Subtasks at RepLab 2013 (20)

KMA Webinar: Managed Metadata Services in SharePoint 2010
KMA Webinar: Managed Metadata Services in SharePoint 2010KMA Webinar: Managed Metadata Services in SharePoint 2010
KMA Webinar: Managed Metadata Services in SharePoint 2010
 
Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social Network
 
What happen after crawling big data?
What happen after crawling big data?What happen after crawling big data?
What happen after crawling big data?
 
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSABetter Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
Better Hackathon 2020 - Fraunhofer IAIS - Semantic geo-clustering with SANSA
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009Modular Documentation Joe Gelb Techshoret 2009
Modular Documentation Joe Gelb Techshoret 2009
 
SubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an EntitySubTopic Detection of Tweets Related to an Entity
SubTopic Detection of Tweets Related to an Entity
 
Introducation to metadata
Introducation to metadataIntroducation to metadata
Introducation to metadata
 
Automatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networksAutomatic and unsupervised topic discovery in social networks
Automatic and unsupervised topic discovery in social networks
 
Ogier Virginia Tech's RIS Ecosystem
Ogier Virginia Tech's RIS EcosystemOgier Virginia Tech's RIS Ecosystem
Ogier Virginia Tech's RIS Ecosystem
 
Feedable, Portable, Mashable, DITAble
Feedable, Portable, Mashable, DITAbleFeedable, Portable, Mashable, DITAble
Feedable, Portable, Mashable, DITAble
 
2015 syllabus 12_computer_science_new
2015 syllabus 12_computer_science_new2015 syllabus 12_computer_science_new
2015 syllabus 12_computer_science_new
 
TellMeFirst - A knowledge domain discovery framework
TellMeFirst - A knowledge domain discovery frameworkTellMeFirst - A knowledge domain discovery framework
TellMeFirst - A knowledge domain discovery framework
 
Building a Learning Resource Exchange
Building a Learning Resource ExchangeBuilding a Learning Resource Exchange
Building a Learning Resource Exchange
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Planetdata simpda
Planetdata simpdaPlanetdata simpda
Planetdata simpda
 
PlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web ScalePlanetData: Consuming Structured Data at Web Scale
PlanetData: Consuming Structured Data at Web Scale
 
Fed Stimulus & Oer Workshop
Fed Stimulus & Oer WorkshopFed Stimulus & Oer Workshop
Fed Stimulus & Oer Workshop
 
Granules Are Forever
Granules Are ForeverGranules Are Forever
Granules Are Forever
 
KMA on Mms2010 nyc
KMA on Mms2010 nycKMA on Mms2010 nyc
KMA on Mms2010 nyc
 

Mehr von Damiano Spina

A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionDamiano Spina
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...Damiano Spina
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterDamiano Spina
 
Online Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveOnline Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveDamiano Spina
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Damiano Spina
 
Identifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsIdentifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsDamiano Spina
 
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsTowards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsDamiano Spina
 
A Corpus for Entity Profiling in Microblog Posts
A Corpus for Entity Profiling in Microblog PostsA Corpus for Entity Profiling in Microblog Posts
A Corpus for Entity Profiling in Microblog PostsDamiano Spina
 
Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Damiano Spina
 
Evaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosEvaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosDamiano Spina
 
Caracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoCaracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoDamiano Spina
 

Mehr von Damiano Spina (11)

A Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking FusionA Formal Account of Effectiveness Evaluation and Ranking Fusion
A Formal Account of Effectiveness Evaluation and Ranking Fusion
 
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
SpeakerLDA: Discovering Topics in Transcribed Multi-Speaker Audio Contents @ ...
 
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in TwitterORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
ORMA: A Semi-Automatic Tool for Online Reputation Monitoring in Twitter
 
Online Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access PerspectiveOnline Reputation Monitoring in Twitter from an Information Access Perspective
Online Reputation Monitoring in Twitter from an Information Access Perspective
 
Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...Towards an Active Learning System for Company Name Disambiguation in Microblo...
Towards an Active Learning System for Company Name Disambiguation in Microblo...
 
Identifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog PostsIdentifying Entity Aspects in Microblog Posts
Identifying Entity Aspects in Microblog Posts
 
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter StreamsTowards Real-Time Summarization of Scheduled Events from Twitter Streams
Towards Real-Time Summarization of Scheduled Events from Twitter Streams
 
A Corpus for Entity Profiling in Microblog Posts
A Corpus for Entity Profiling in Microblog PostsA Corpus for Entity Profiling in Microblog Posts
A Corpus for Entity Profiling in Microblog Posts
 
Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...Filter keywords and majority class strategies for company name disambiguation...
Filter keywords and majority class strategies for company name disambiguation...
 
Evaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuariosEvaluación de sistemas de monitorización de contenidos generados por usuarios
Evaluación de sistemas de monitorización de contenidos generados por usuarios
 
Caracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de casoCaracterización de una entidad basada en opiniones: un estudio de caso
Caracterización de una entidad basada en opiniones: un estudio de caso
 

Kürzlich hochgeladen

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Kürzlich hochgeladen (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

UNED Online Reputation Monitoring Team Report on Filtering, Polarity, Topic Detection and Priority Subtasks at RepLab 2013

  • 1. {damiano, jcalbornoz, tmartin, enrique, julio}@lsi.uned.es, fginer3@alumno.uned.es Acknowledgments. This research was partially supported by the Spanish Ministry of Education (FPU grant nr AP2009-0507 and FPI grant nr BES-2011-044328), the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02), the Regional Government of Madrid and the ESF under MA2VICMR (S2009/TIC-1542) and the European Community's FP7 Programme under grant agreement nr 288024 (LiMoSINe). UNED NLP & IR Group Madrid, Spain RepLab at CLEF 2013 Valencia, Spain. September 23-26, 2013 Filter Keywords Instance-based learning + Heterogeneity-Based Ranking (HBR) LDA-based Clustering Term Clustering Wikified Tweet Clustering Approach Accuracy Reliability Sensitivity F(R,S) Rank (out of 76 runs) RepLab 2013 Best System 0.91 0.73 0.45 0.49 1 Filter Keywords (Tweet Classification Step) 0.86 0.43 0.38 0.34 19 RepLab 2013 Official Baseline 0.87 0.49 0.32 0.33 21 Instance-based Learning + HBR 0.87 0.47 0.33 0.30 27 Filter Keywords (training: same entity) 0.84 0.67 0.26 0.25 42 Filter Keywords (training: other entities) 0.50 0.17 0.29 0.14 61 Approach Reliability Sensitivity F(R,S) Accuracy Rank (out of 68 runs) RepLab 2013 Best System 0.48 0.34 0.38 0.69 1 SentiSense (training: same entity) 0.36 0.10 0.15 0.62 21 SentiSense + Domain-specific Adaptation (training: same entity) 0.33 0.11 0.14 0.62 22 Instance-based Learning + HBR 0.32 0.29 0.30 0.59 26 RepLab 2013 Official Baseline 0.32 0.29 0.30 0.58 28 SentiSense + Domain-specific Adaptation (training: same entity, balanced) 0.34 0.12 0.16 0.58 31 Approach Reliability Sensitivity F(R,S) Rank (out of 34 runs) Wikified Tweet Clustering 0.46 0.32 0.33 1 LDA-based Clustering (all entities background tweets) 0.30 0.22 0.24 5 Term Clustering 0.42 0.21 0.23 7 LDA-based Clustering (entity-specific background tweets) 0.34 0.16 0.21 16 Instance-based Learning + HBR 0.15 0.22 0.17 21 RepLab 2013 Official Baseline 0.15 0.22 0.17 22 Filtering Topic Detection Topic Priority F-1* Rank (out of 26 runs) Filter Keywords (Tweet Classification Step) Wikified Tweet Clustering Baseline 0.19 1 Baseline LDA-based Clustering (all entities background tweets) Baseline 0.18 2 Filter Keywords (Tweet Classification Step) Term Clustering Baseline 0.17 3 Baseline LDA-based Clustering (entity-specific background tweets) Baseline 0.17 4 Instance-based Learning + HBR Instance-based Learning + HBR Instance-based Learning + HBR 0.16 13 Filter Keywords (training: other entities) Wikified Tweet Clustering Instance-based Learning + HBR 0.12 14 Filter Keywords (training: all entites) Wikified Tweet Clustering Baseline 0.11 15 Filter Keywords (training: all entities) Term Clustering Baseline 0.11 16 UNED Online Reputation Monitoring Team at RepLab 2013 Filtering Subtask Polarity for Reputation Subtask Topic Detection Subtask Full Monitoring Task Input: entity of interest + set of tweets + representative URL Example: Apple Inc. + tweets containing “apple” + www.apple.com - Filtering: Binary classification of tweets (related/unrelated) - Polarity for Reputation: Classify each tweet according to its polarity for reputation (positive /negative/neutral) - Topic Detection: Group tweets by topics - Topic Priority: Rank topics, reputation alerts go first Output: Monitoring summary (ranking of topics) for the reputation manager RepLab 2013 Dataset Monitoring Task • 61 entities • 4 domains: automotive, banking, universities, music • For each entity: ~ 750 tweets for training ~1,500 tweets for test • Languages: English and Spanish • ~ 142,500 tweets • ~ 372,800 manual annotations Two-step classification algorithm • Step 1: Automatic Keyword Discovery Each term is classified as positive keyword / negative keyword / other Step 2: Automatic Tweet Classification Tweets containing keywords are used to feed a binary BoW classifier that classifies the remaining tweets as related/unrelated • Similar to the RepLab 2013 official baseline • Each tweet in the test set is labeled as the most similar tweet in the training set • Combination of rankings given by multiple text similarity measures • Applicable to all the subtasks (Topic Detection, Polarity, Priority...) • SentiSense Affective Lexicon of 5,496 words and 2,190 synsets from WordNet labeled with emotional categories • Domain-specificic Lexicon Adaptation For each domain, WordNet concepts are extracted from the training data. The graph is generated upon semantic relations between concepts. Emotional categories are propagated using SentiSense as seed. • Polarity Classification Tweets represented as a Vector of Emotional Intensities (VEI) feed a Machine Learning classifier. Semantic Graphs for Domain-specific Affective Lexicon Adaptation • Representation: Tweets are linked to Wikipedia pages/entities • Clustering: Jaccard similarity over Wikipedia entities • Step 1: Term Clustering - Learned similarity function (content-based, meta-data, time-aware features) - Hierarchical Agglomerative Clustering • Step 2: Tweet clustering - Assigns tweets according to maximal term overlap (highest Jaccard similarity). • Based on Twitter-LDA and Topics over Time models • Transfer learning: target tweets + background tweets to establish the right number of clusters * F-1 = Harmonic Mean( {R,S} x {Filtering, Topic Detection, Topic Priority} ) Conclusions • Full Task. Large room for improvement. Filtering is crucial for the overall performance of a monitoring system. • Filtering. Use entity-specific training data when available: +78% F(R,S), +68% accuracy for Filter Keywords. • Polarity for Reputation. Different from traditional sentiment analysis. Domain-adaptive affective lexicons less competitive than other RepLab submissions. • Topic Detection. Three approaches perform competitively w.r.t. other RepLab submissions. • Topic Priority (future work). Challenging due to the difficulty of combining dissimilar and unperfect signals (computed automatically): polarity, novelty, centrality, etc.