SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Authors
University
Politehnica
of Bucharest
Relevance-Based Ranking
of Video Comments on YouTube
Andrei Șerbănoiu
Traian Rebedea traian.rebedea@cs.pub.ro
Overview
• Introduction
• Motivation
• System architecture
• Classification of relevant comments
• Ranking of relevant comments
• Results
• Conclusions
22.09.13 Sesiunea de Licenţe - Iulie 2012 2
Introduction
• Text classification and raking for comments on
YouTube videos
– First: classification whether the comment is relevant or
not for the given video file
– Second: ranking the relevant comments
• Focus on identifying relevant information
• Comments have a very small number of words –
sometimes less than 10, on average of the order of
tens
• Relevance is evaluated with respect to the information
collected from other online sources about the video
22.09.13 CSCS 2013 – Bucharest, Romania 3
Existing research
• We have not been able to identify any previous
research in the direction of identifying relevant
comments
• YouTube research
– Identify the relevant features of community acceptance
(comments with many “likes”)
– Extract the sentiment orientation
– Differentiate between clean and noisy comments
• Other research
– Ranking Comments on the Social Web (uses Digg)
22.09.13 CSCS 2013 – Bucharest, Romania 4
Motivation
• The Police – Every Breath You Take
22.09.13 CSCS 2013 – Bucharest, Romania 5
Motivation
• Most commented video
• “10 questions that every intelligent Christian
must answer”
• 1,429,425 comments on 30th
May 2013 (early
morning)
• How many of these comments are spam?
• Which ones would be most relevant to the
video?
22.09.13 CSCS 2013 – Bucharest, Romania 6
Solution
• Ranking of the comments according to relevance
• Steps:
1. Automatically link video with other online sources
relevant to it
2. Filter comments to remove noisy comments
3. Rank the remaining comments according to
relevance computed using NLP techniques
• Our solution works for music videos
22.09.13 CSCS 2013 – Bucharest, Romania 7
System architecture
22.09.13 CSCS 2013 – Bucharest, Romania 8
Processing pipeline
comments = fetchYouTubeComments();
comments = filterComments(comments);
commentTopics=createCommentTopics(comments)
resources = getResources(wikipedia,allmusic,lyrics);
for(int i=0;i<commentTopics.length;i++)
{
computeRelevance(commentTopics[i],
resources);
}
22.09.13 CSCS 2013 – Bucharest, Romania 9
Preprocessing
• Comments retrieved with YouTube Data API
– Only used last 100 comments per video
• Filter comments not written in English using
JLangDetect
• Extracted the main topics for each comment
using Mallet => 5 topics per comment
• Expanding the topics with synonyms and
hypernyms from WordNet
22.09.13 CSCS 2013 – Bucharest, Romania 10
Pre-classification of comments
• Objective: to reduce the number of comments considered for
ranking by identifying noise
• Classification based on a neural network by using a set of
simple linguistic features
• Multilayered Perceptron implemented in Weka
• Features
– Number of non-ASCII characters
– Number of capital letters
– Number of newlines
– Number of digits
– Number of trivial and swear-words
– Number of words in comment
– Average word size
– Number of punctuation marks
– Common text spam count
22.09.13 CSCS 2013 – Bucharest, Romania 11
Pre-classification of comments
• Trained on a small corpus with 100 relevant
comments and 100 noisy comments
• Examples of noisy comments:
– "Step 1: Pause this video
Step 2: Google 'Rainymood'
Step 3: Click the first link
Step 4: Unpause this video
Step 5: Thumbs? up this comment, enjoy and thank
me later"
– "Those 3,175 haters listen to? 'Techno'. “
– " IF YOU LIKE DIRTY DIANA SONG THE SINGER '' STEFANO GIORGINI '' DID A GREAT? REMAKE
STEFANO IS A VERY GOOD SINGER SONGWRITER I THINK YOU WILL LIKE HIS VERSION JUST LOOK FOR
'' STEFANO GIORGINI '' DIRTY DIANA" "
22.09.13 CSCS 2013 – Bucharest, Romania 12
Pre-classification of comments
• Results of pre-classification stage
22.09.13 CSCS 2013 – Bucharest, Romania 13
Type of Instances
No.
Instances
%
Correctly Classified Instances 174 87.46
Incorrectly Classified Instances 26 12.54
Total Number of Instances 200 -
Relevance scoring stage
• Initial approach
• Extract topics from comments as previously
mentioned (Mallet + WordNet)
• Fetch Wikipedia articles for artist and song name
• Score computed based in number of appearances
of the topics from the comments in the articles
22.09.13 CSCS 2013 – Bucharest, Romania 14
Relevance scoring stage
• Second approach: topic-based scoring
• Similar to the previous one, but topics are also
extracted from the Wikipedia articles with
Mallet
• Scoring is done based on:
– Number of topics extracted from each comment
– Wikipedia topic matches for each comment
22.09.13 CSCS 2013 – Bucharest, Romania 15
Relevance scoring stage
• Third approach
• Multiple-source topic-based scoring
• Additional source added to the Wikipedia articles
– Information from allmusic.com website on artists and
songs
– Information from song lyrics
• Topics matched between comments and Wikipedia +
Allmusic articles, plus exact match of lyrics
• Final relevance score is a weighted sum of the previous
factors
22.09.13 CSCS 2013 – Bucharest, Romania 16
Results
22.09.13 CSCS 2013 – Bucharest, Romania 17
Comment Relevance
maybe your friend should know that being english, have a picture in abbey road
and "sing" all you need is love" won't make one direction? a group like the
beatles...
662
my mom said she doesn't like the beatles and she said that john was only good
to look at? not to hear. my dad said, " haha so true!." i'm an orphan now.
968
you shouldn't be listening to the beatles since these seem to turn your friends
into enemies! beatles are all about peace!? you are not getting their message!
983
please read this ! hey i know u just wanna listen to the song but i still have to
write this hoping someone will see it and that someone will care .i'm a? young
musician from croatia so this spam is my only chance to get noticed.please check
out my channel and i promise u won't be sorry.i appreciate your time because
music means everything to me, thank you! ?
1309
i didn't mean fight other places. i meant focus on the hurt people in your own
country first, then expand to the others. if people don't agree with peace that's
an opinion. not a fact, and people often take offense to opinions. there isn't?
anything to take offense to, they say something that's all it is. they said it, don't
put meaning to it. world peace - i meant the whole world having peace there
1639
Results
• Difficult to assess whether the impact of the
relevance measure
• Interpreting the comments is subjective
– Need human annotators
• The order of the comments is completely
different from the one presented now on
YouTube (correlation lower than 0.031 for the
first 100 comments)
• Method 1 is also not correlated with the other
two methods
• Methods 2 and 3 have a higher correlation: 0.124
22.09.13 CSCS 2013 – Bucharest, Romania 18
Conclusions
• 2-stage method for ranking comments on YouTube
• The first stage removes noisy comments
• The second stage tries to link the comments with
information from other web pages relevant for the
video
• Relevance is computed based on topic-modeling with
Mallet
•
• Results are encouraging, but need to find a more
rigorous method of assessing them
• Results are better than the usual results provided by
YouTube, however the processing time for each video
should not be neglected
22.09.13 CSCS 2013 – Bucharest, Romania 19
Thank you!
• Questions?
• Discussion
22.09.13 CSCS 2013 – Bucharest, Romania 20

Weitere ähnliche Inhalte

Was ist angesagt?

Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
Research on AITV
Research on AITVResearch on AITV
Research on AITVSyo Kyojin
 
Its2016 presentations-stefan-trausan
Its2016 presentations-stefan-trausanIts2016 presentations-stefan-trausan
Its2016 presentations-stefan-trausanStefan Trausan-Matu
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and ChallengesJens Lehmann
 
Learning Probabilistic Relational Models
Learning Probabilistic Relational ModelsLearning Probabilistic Relational Models
Learning Probabilistic Relational ModelsUniversity of Nantes
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksUniversity of Nantes
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Tasnim Ara Islam
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...WarNik Chow
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...TELKOMNIKA JOURNAL
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webKarishma chaudhary
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaLifeng (Aaron) Han
 
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Traian Rebedea
 

Was ist angesagt? (20)

Using lexical chains for text summarization
Using lexical chains for text summarizationUsing lexical chains for text summarization
Using lexical chains for text summarization
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
Research on AITV
Research on AITVResearch on AITV
Research on AITV
 
Link prediction
Link predictionLink prediction
Link prediction
 
Its2016 presentations-stefan-trausan
Its2016 presentations-stefan-trausanIts2016 presentations-stefan-trausan
Its2016 presentations-stefan-trausan
 
Question Answering - Application and Challenges
Question Answering - Application and ChallengesQuestion Answering - Application and Challenges
Question Answering - Application and Challenges
 
Learning Probabilistic Relational Models
Learning Probabilistic Relational ModelsLearning Probabilistic Relational Models
Learning Probabilistic Relational Models
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Random Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian NetworksRandom Generation of Relational Bayesian Networks
Random Generation of Relational Bayesian Networks
 
Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.Presentation of Domain Specific Question Answering System Using N-gram Approach.
Presentation of Domain Specific Question Answering System Using N-gram Approach.
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
HashCount for SemEval-2018 Task 3: Concatenative Featurization of Tweet and H...
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
Improving Sentiment Analysis of Short Informal Indonesian Product Reviews usi...
 
Aspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the webAspect Opinion Mining From User Reviews on the web
Aspect Opinion Mining From User Reviews on the web
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
 
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
Extraction of Socio-Semantic Data from Chat Conversations in Collaborative Le...
 

Andere mochten auch

Software Services in Romania – Academia and Industry
Software Services in Romania – Academia and IndustrySoftware Services in Romania – Academia and Industry
Software Services in Romania – Academia and IndustryTraian Rebedea
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Traian Rebedea
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Traian Rebedea
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Traian Rebedea
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaTraian Rebedea
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Traian Rebedea
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyTraian Rebedea
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriTraian Rebedea
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitareTraian Rebedea
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Traian Rebedea
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Traian Rebedea
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Traian Rebedea
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Traian Rebedea
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Traian Rebedea
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Traian Rebedea
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Traian Rebedea
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringTraian Rebedea
 

Andere mochten auch (19)

Software Services in Romania – Academia and Industry
Software Services in Romania – Academia and IndustrySoftware Services in Romania – Academia and Industry
Software Services in Romania – Academia and Industry
 
Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009Istoria Web-ului - part 2 - tentativ How to Web 2009
Istoria Web-ului - part 2 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009Istoria Web-ului - part 1 - tentativ How to Web 2009
Istoria Web-ului - part 1 - tentativ How to Web 2009
 
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
Istoria Web-ului - part 1 (2) - tentativ How to Web 2009
 
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
Automatic assessment of collaborative chat conversations with PolyCAFe - EC-T...
 
Automatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corporaAutomatic plagiarism detection system for specialized corpora
Automatic plagiarism detection system for specialized corpora
 
Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 
Conclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD SurveyConclusions and Recommendations of the Romanian ICT RTD Survey
Conclusions and Recommendations of the Romanian ICT RTD Survey
 
Importanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuriImportanța algoritmilor pentru problemele de la interviuri
Importanța algoritmilor pentru problemele de la interviuri
 
Propunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitarePropunere de dezvoltare a carierei universitare
Propunere de dezvoltare a carierei universitare
 
Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9Algorithm Design and Complexity - Course 9
Algorithm Design and Complexity - Course 9
 
Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8Algorithm Design and Complexity - Course 8
Algorithm Design and Complexity - Course 8
 
Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11Algorithm Design and Complexity - Course 11
Algorithm Design and Complexity - Course 11
 
Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7Algorithm Design and Complexity - Course 7
Algorithm Design and Complexity - Course 7
 
Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12Algorithm Design and Complexity - Course 12
Algorithm Design and Complexity - Course 12
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2
 
Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3Algorithm Design and Complexity - Course 3
Algorithm Design and Complexity - Course 3
 
Intro to Deep Learning for Question Answering
Intro to Deep Learning for Question AnsweringIntro to Deep Learning for Question Answering
Intro to Deep Learning for Question Answering
 
What is word2vec?
What is word2vec?What is word2vec?
What is word2vec?
 

Ähnlich wie Relevance based ranking of video comments on YouTube

Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Carsten Eickhoff
 
The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...Dana Rotman
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineeringbcantrill
 
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about MusicCrowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Musicmultimediaeval
 
Recommender Systems Explained
Recommender Systems ExplainedRecommender Systems Explained
Recommender Systems ExplainedPratik Bhudke
 
Movies recommendation system in R Studio, Machine learning
Movies recommendation system in  R Studio, Machine learning Movies recommendation system in  R Studio, Machine learning
Movies recommendation system in R Studio, Machine learning Mauryasuraj98
 
Automated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion ForumAutomated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion ForumFutureLearn FLAN
 
Automated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a  MOOC Discussion ForumAutomated Evaluation of Comments in a  MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion Forumstudywbv
 

Ähnlich wie Relevance based ranking of video comments on YouTube (8)

Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
Exploiting User Comments for Audio-visual Content Indexing and Retrieval (ECI...
 
The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...The Community is Where the Rapport is - On Sense and Structure in the YouTube...
The Community is Where the Rapport is - On Sense and Structure in the YouTube...
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about MusicCrowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
Crowsourcing for Social Multimedia Task: Crowsorting Timed Comments about Music
 
Recommender Systems Explained
Recommender Systems ExplainedRecommender Systems Explained
Recommender Systems Explained
 
Movies recommendation system in R Studio, Machine learning
Movies recommendation system in  R Studio, Machine learning Movies recommendation system in  R Studio, Machine learning
Movies recommendation system in R Studio, Machine learning
 
Automated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion ForumAutomated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion Forum
 
Automated Evaluation of Comments in a MOOC Discussion Forum
Automated Evaluation of Comments in a  MOOC Discussion ForumAutomated Evaluation of Comments in a  MOOC Discussion Forum
Automated Evaluation of Comments in a MOOC Discussion Forum
 

Mehr von Traian Rebedea

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5Traian Rebedea
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesTraian Rebedea
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discoveryTraian Rebedea
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Traian Rebedea
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Traian Rebedea
 
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingAlgorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingTraian Rebedea
 

Mehr von Traian Rebedea (6)

AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5AI @ Wholi - Bucharest.AI Meetup #5
AI @ Wholi - Bucharest.AI Meetup #5
 
Deep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profilesDeep neural networks for matching online social networking profiles
Deep neural networks for matching online social networking profiles
 
A focused crawler for romanian words discovery
A focused crawler for romanian words discoveryA focused crawler for romanian words discovery
A focused crawler for romanian words discovery
 
Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6Algorithm Design and Complexity - Course 6
Algorithm Design and Complexity - Course 6
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
 
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingAlgorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
 

Kürzlich hochgeladen

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Kürzlich hochgeladen (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

Relevance based ranking of video comments on YouTube

  • 1. Authors University Politehnica of Bucharest Relevance-Based Ranking of Video Comments on YouTube Andrei Șerbănoiu Traian Rebedea traian.rebedea@cs.pub.ro
  • 2. Overview • Introduction • Motivation • System architecture • Classification of relevant comments • Ranking of relevant comments • Results • Conclusions 22.09.13 Sesiunea de Licenţe - Iulie 2012 2
  • 3. Introduction • Text classification and raking for comments on YouTube videos – First: classification whether the comment is relevant or not for the given video file – Second: ranking the relevant comments • Focus on identifying relevant information • Comments have a very small number of words – sometimes less than 10, on average of the order of tens • Relevance is evaluated with respect to the information collected from other online sources about the video 22.09.13 CSCS 2013 – Bucharest, Romania 3
  • 4. Existing research • We have not been able to identify any previous research in the direction of identifying relevant comments • YouTube research – Identify the relevant features of community acceptance (comments with many “likes”) – Extract the sentiment orientation – Differentiate between clean and noisy comments • Other research – Ranking Comments on the Social Web (uses Digg) 22.09.13 CSCS 2013 – Bucharest, Romania 4
  • 5. Motivation • The Police – Every Breath You Take 22.09.13 CSCS 2013 – Bucharest, Romania 5
  • 6. Motivation • Most commented video • “10 questions that every intelligent Christian must answer” • 1,429,425 comments on 30th May 2013 (early morning) • How many of these comments are spam? • Which ones would be most relevant to the video? 22.09.13 CSCS 2013 – Bucharest, Romania 6
  • 7. Solution • Ranking of the comments according to relevance • Steps: 1. Automatically link video with other online sources relevant to it 2. Filter comments to remove noisy comments 3. Rank the remaining comments according to relevance computed using NLP techniques • Our solution works for music videos 22.09.13 CSCS 2013 – Bucharest, Romania 7
  • 8. System architecture 22.09.13 CSCS 2013 – Bucharest, Romania 8
  • 9. Processing pipeline comments = fetchYouTubeComments(); comments = filterComments(comments); commentTopics=createCommentTopics(comments) resources = getResources(wikipedia,allmusic,lyrics); for(int i=0;i<commentTopics.length;i++) { computeRelevance(commentTopics[i], resources); } 22.09.13 CSCS 2013 – Bucharest, Romania 9
  • 10. Preprocessing • Comments retrieved with YouTube Data API – Only used last 100 comments per video • Filter comments not written in English using JLangDetect • Extracted the main topics for each comment using Mallet => 5 topics per comment • Expanding the topics with synonyms and hypernyms from WordNet 22.09.13 CSCS 2013 – Bucharest, Romania 10
  • 11. Pre-classification of comments • Objective: to reduce the number of comments considered for ranking by identifying noise • Classification based on a neural network by using a set of simple linguistic features • Multilayered Perceptron implemented in Weka • Features – Number of non-ASCII characters – Number of capital letters – Number of newlines – Number of digits – Number of trivial and swear-words – Number of words in comment – Average word size – Number of punctuation marks – Common text spam count 22.09.13 CSCS 2013 – Bucharest, Romania 11
  • 12. Pre-classification of comments • Trained on a small corpus with 100 relevant comments and 100 noisy comments • Examples of noisy comments: – "Step 1: Pause this video Step 2: Google 'Rainymood' Step 3: Click the first link Step 4: Unpause this video Step 5: Thumbs? up this comment, enjoy and thank me later" – "Those 3,175 haters listen to? 'Techno'. “ – " IF YOU LIKE DIRTY DIANA SONG THE SINGER '' STEFANO GIORGINI '' DID A GREAT? REMAKE STEFANO IS A VERY GOOD SINGER SONGWRITER I THINK YOU WILL LIKE HIS VERSION JUST LOOK FOR '' STEFANO GIORGINI '' DIRTY DIANA" " 22.09.13 CSCS 2013 – Bucharest, Romania 12
  • 13. Pre-classification of comments • Results of pre-classification stage 22.09.13 CSCS 2013 – Bucharest, Romania 13 Type of Instances No. Instances % Correctly Classified Instances 174 87.46 Incorrectly Classified Instances 26 12.54 Total Number of Instances 200 -
  • 14. Relevance scoring stage • Initial approach • Extract topics from comments as previously mentioned (Mallet + WordNet) • Fetch Wikipedia articles for artist and song name • Score computed based in number of appearances of the topics from the comments in the articles 22.09.13 CSCS 2013 – Bucharest, Romania 14
  • 15. Relevance scoring stage • Second approach: topic-based scoring • Similar to the previous one, but topics are also extracted from the Wikipedia articles with Mallet • Scoring is done based on: – Number of topics extracted from each comment – Wikipedia topic matches for each comment 22.09.13 CSCS 2013 – Bucharest, Romania 15
  • 16. Relevance scoring stage • Third approach • Multiple-source topic-based scoring • Additional source added to the Wikipedia articles – Information from allmusic.com website on artists and songs – Information from song lyrics • Topics matched between comments and Wikipedia + Allmusic articles, plus exact match of lyrics • Final relevance score is a weighted sum of the previous factors 22.09.13 CSCS 2013 – Bucharest, Romania 16
  • 17. Results 22.09.13 CSCS 2013 – Bucharest, Romania 17 Comment Relevance maybe your friend should know that being english, have a picture in abbey road and "sing" all you need is love" won't make one direction? a group like the beatles... 662 my mom said she doesn't like the beatles and she said that john was only good to look at? not to hear. my dad said, " haha so true!." i'm an orphan now. 968 you shouldn't be listening to the beatles since these seem to turn your friends into enemies! beatles are all about peace!? you are not getting their message! 983 please read this ! hey i know u just wanna listen to the song but i still have to write this hoping someone will see it and that someone will care .i'm a? young musician from croatia so this spam is my only chance to get noticed.please check out my channel and i promise u won't be sorry.i appreciate your time because music means everything to me, thank you! ? 1309 i didn't mean fight other places. i meant focus on the hurt people in your own country first, then expand to the others. if people don't agree with peace that's an opinion. not a fact, and people often take offense to opinions. there isn't? anything to take offense to, they say something that's all it is. they said it, don't put meaning to it. world peace - i meant the whole world having peace there 1639
  • 18. Results • Difficult to assess whether the impact of the relevance measure • Interpreting the comments is subjective – Need human annotators • The order of the comments is completely different from the one presented now on YouTube (correlation lower than 0.031 for the first 100 comments) • Method 1 is also not correlated with the other two methods • Methods 2 and 3 have a higher correlation: 0.124 22.09.13 CSCS 2013 – Bucharest, Romania 18
  • 19. Conclusions • 2-stage method for ranking comments on YouTube • The first stage removes noisy comments • The second stage tries to link the comments with information from other web pages relevant for the video • Relevance is computed based on topic-modeling with Mallet • • Results are encouraging, but need to find a more rigorous method of assessing them • Results are better than the usual results provided by YouTube, however the processing time for each video should not be neglected 22.09.13 CSCS 2013 – Bucharest, Romania 19
  • 20. Thank you! • Questions? • Discussion 22.09.13 CSCS 2013 – Bucharest, Romania 20