SlideShare ist ein Scribd-Unternehmen logo
1 von 15
Autor Conducător științific
Universitatea
Politehnica
București
Facultatea de
Automatică și
Calculatoare
Catedra de
Calculatoare
Tweets Topic Modelling Across
Different Countries
Ahmed ABDELWAHAB
Jose ROBLES
Costin-Gabriel CHIRU - costin.chiru@cs.pub.ro
Traian REBEDEA
Contents
• The Problem
• Twitter
• The Data
• Topic Modeling
• Results
• Conclusions
24.04.2014 eLSE 2014 2
The Problem
• Are there differences between the topics of
interests of people around the world?
• Identify the topics of interest for the people from
different countries
• Tweets offer the possibility to identify both the
topics of interest and the location of different
persons  analysis of English tweets published in
different countries
24.04.2014 eLSE 2014 3
Twitter
• Used intensely by the data mining communities:
– enormous amount of data available (more than 883
million users, with 241 million users active monthly,
average number of tweets sent on a daily basis is
about 500 million [Edwards, 2013]).
– news and hot stories are spreading very fast on this
micro-blogging network
• We extracted tweets containing both text and
URLs to external articles and modelled the topics
of the content and of the URLs independently
24.04.2014 eLSE 2014 4
The Data (1)
• Only 2% of the tweets had a location stamp set
• the number of tweets differ from one country to
another and the location stamped tweets are
even more non-uniformly distributed across
countries
24.04.2014 eLSE 2014 5
The Data (2)
• From these tweets, we kept only those written in
English which also contained shared URLs
• Largest shares of tweets written in English (besides
countries such as - UK, USA, South Africa, Canada)
have been seen in a lot of European countries (e.g.
Latvia, Serbia, Poland, Germany, Ukraine,
Netherlands, Italy, France, Portugal, Spain)
• Initially 1 million tweets  only 50 k respected all
conditions
24.04.2014 eLSE 2014 5
Topic Modeling
24.04.2014 eLSE 2014 6
• After extracting the data from tweets:
– For the URLs, the webpage is fetched and the
HTML is parsed  main 10 topics using Latent
Dirichlet Allocation (LDA) [Blei, 2012]
– For the tweets content, we removed the #s and
then used affinity propagation (AP) to cluster the
tweets for each country. The main 10 topics were
extracted from the resulting clusters using LDA
(text was too short  didn’t apply LDA directly).
Results (1)
• Top 10 topics for URL content: activities,
business, career, cooking, fun, market, places,
social, sports, twitter
• Top 10 topics for tweets’ content: city,
entertainment, fun, health, movies, places,
request, restaurants, romance, travel
• Top 10 words for each topic are presented in
the paper
24.04.2014 eLSE 2014 7
Results (2)
• Correlation between the two distributions (the topic tj
for URL content and the topic ti for tweets):
• Where
• This way we considered both the intersection of the
words for the 2 topics and how representative these
words were for the corresponding topics
24.04.2014 eLSE 2014 8
ji tt +
+∑∑ ∈∈ jhik tw ihtw jk
ji
)t,O(w)t,O(w
=)t,ity(tdissimilar



+
∈
=
otherwise
twifrank
,1t
),tw,(
t)O(w,
Results (3)
24.04.2014 eLSE 2014 9
• For identifying the coupling between topics we used a
greedy algorithm and obtained the following pairs:
– Entertainment - Social
– Places - Activities
– Restaurant - Career
– Request - Sports
– Travel - Market
– City - Twitter
– Health - Places
– Romance – Cooking
– Fun – Fun
– Movies – Business
Results (4)
• Country comparison:
– construct new matrix using:
N(Ci,tweets,tj)*P(Ci,tweets,tj) (i stands for the
countries and j for the topics).
– For both URL and tweets’ content, for each topic
select the most representative 5 countries
– Use the next formula to evaluate how similar the
topics are:
– eval = 56%
24.04.2014 eLSE 2014 10
5
)t(5)t(5
=)t,eval(t
ji
ji
CountriesTopCountriesTop 
Results (5)
• What different countries are talking about:
– USA: other tweets 50% of the time and 10% of the
time about blogs and other social networks
– UK: tweets less than 30% and for other social
networks and blogs about 20%
– Canada: tweets 40% of the time and 15% about other
social media
– South Africa: tweets 25% of the time and another
20% of the time take about other social networks
• In all the countries the percentage of discussion
about social and blogs topic is equal to the
percentage of tweets about sports
24.04.2014 eLSE 2014 11
Conclusions (1)
• Low matching between the topics debated in URLs and
tweets (maybe because the tweet doesn't always describe
or summarize the content of the shared URL)
• Analyzing the combined text of the tweets and of the
shared webpages showed that the topics generated from
tweets and shared URLs have only a 56% matching across
different countries
• We expected to have somehow similar country
distributions for the computed topics the degree to
which a topic is debated is highly influenced by the country
 the cultural differences between countries are at least
partly responsible for this difference
24.04.2014 eLSE 2014 12
Conclusions (2)
24.04.2014 eLSE 2014 13
• Results should be interpreted carefully:
– not all the countries have a representative number of
tweets in our dataset
– the ratio of English tweets to the total number of
tweets for each country (e.g. Brazil having a very low
percentage of English tweets)
– only 2% of the Twitter users set a location stamped
– the people who write tweets usually don't care about
spelling or using words which are not in the English
dictionary  problems on parsing (ignoring some
words)
Q&A
Thank you for your time!
24.04.2014 eLSE 2014 14

Weitere ähnliche Inhalte

Ähnlich wie Tweets topic modelling across different countries prezentarea

Using Twitter Chats and Podcasts to Mobilize Engagement
Using Twitter Chats and Podcasts to Mobilize EngagementUsing Twitter Chats and Podcasts to Mobilize Engagement
Using Twitter Chats and Podcasts to Mobilize EngagementRPO America
 
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...Axel Bruns
 
WEEK 3: Class PowerPoint
WEEK 3: Class PowerPointWEEK 3: Class PowerPoint
WEEK 3: Class PowerPointSocialMediaUCLA
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White HelmetsSameera Horawalavithana
 
IWMW 2003: Introduction
IWMW 2003: IntroductionIWMW 2003: Introduction
IWMW 2003: IntroductionIWMW
 
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...Axel Bruns
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterAxel Bruns
 
Seda 10 dot nov 2014
Seda 10 dot nov 2014Seda 10 dot nov 2014
Seda 10 dot nov 2014Helen Webster
 
Social Media Analytics Research at the QUT Digital Media Research Centre
Social Media Analytics Research at the QUT Digital Media Research CentreSocial Media Analytics Research at the QUT Digital Media Research Centre
Social Media Analytics Research at the QUT Digital Media Research CentreAxel Bruns
 
11 2-15 twitter and social media
11 2-15 twitter and social media11 2-15 twitter and social media
11 2-15 twitter and social mediaSung Woo Yoo
 
04 05-16 social media and news
04 05-16 social media and news04 05-16 social media and news
04 05-16 social media and newsSung Woo Yoo
 
Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Axel Bruns
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social NetworksBang Hui Lim
 
Presentation&workshop resultscb july 12 final
Presentation&workshop resultscb july 12 finalPresentation&workshop resultscb july 12 final
Presentation&workshop resultscb july 12 finalCrowdbrite
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Timo Wandhoefer
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and VisualisationDr Muhammad Adnan
 
Introduction to software that can be used to capture and analyse Twitter data
Introduction to software that can be used to capture and analyse Twitter dataIntroduction to software that can be used to capture and analyse Twitter data
Introduction to software that can be used to capture and analyse Twitter dataDr Wasim Ahmed
 

Ähnlich wie Tweets topic modelling across different countries prezentarea (20)

Swdm15
Swdm15Swdm15
Swdm15
 
Using Twitter Chats and Podcasts to Mobilize Engagement
Using Twitter Chats and Podcasts to Mobilize EngagementUsing Twitter Chats and Podcasts to Mobilize Engagement
Using Twitter Chats and Podcasts to Mobilize Engagement
 
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
New Approaches to Large-Scale Social Media Analytics: Investigating Twitter i...
 
WEEK 3: Class PowerPoint
WEEK 3: Class PowerPointWEEK 3: Class PowerPoint
WEEK 3: Class PowerPoint
 
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
Twitter Is the Megaphone of Cross-platform Messaging on the White Helmets
 
IWMW 2003: Introduction
IWMW 2003: IntroductionIWMW 2003: Introduction
IWMW 2003: Introduction
 
Reference Rot and E-Theses: Threat and Remedy
Reference Rot and E-Theses: Threat and RemedyReference Rot and E-Theses: Threat and Remedy
Reference Rot and E-Theses: Threat and Remedy
 
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
The Conversation, Ten Years On: Patterns of Engagement with The Conversation ...
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of Twitter
 
Seda 10 dot nov 2014
Seda 10 dot nov 2014Seda 10 dot nov 2014
Seda 10 dot nov 2014
 
Social Media Analytics Research at the QUT Digital Media Research Centre
Social Media Analytics Research at the QUT Digital Media Research CentreSocial Media Analytics Research at the QUT Digital Media Research Centre
Social Media Analytics Research at the QUT Digital Media Research Centre
 
11 2-15 twitter and social media
11 2-15 twitter and social media11 2-15 twitter and social media
11 2-15 twitter and social media
 
04 05-16 social media and news
04 05-16 social media and news04 05-16 social media and news
04 05-16 social media and news
 
Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010
 
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
#mytweet via Instagram: Exploring User Behaviour Across Multiple Social Networks
 
Reference Rot: Threat and Remedy
Reference Rot: Threat and RemedyReference Rot: Threat and Remedy
Reference Rot: Threat and Remedy
 
Presentation&workshop resultscb july 12 final
Presentation&workshop resultscb july 12 finalPresentation&workshop resultscb july 12 final
Presentation&workshop resultscb july 12 final
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
Open Data: Analysis and Visualisation
Open Data: Analysis and VisualisationOpen Data: Analysis and Visualisation
Open Data: Analysis and Visualisation
 
Introduction to software that can be used to capture and analyse Twitter data
Introduction to software that can be used to capture and analyse Twitter dataIntroduction to software that can be used to capture and analyse Twitter data
Introduction to software that can be used to capture and analyse Twitter data
 

Mehr von University Politehnica Bucharest

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisUniversity Politehnica Bucharest
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...University Politehnica Bucharest
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...University Politehnica Bucharest
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisUniversity Politehnica Bucharest
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...University Politehnica Bucharest
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...University Politehnica Bucharest
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileUniversity Politehnica Bucharest
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaUniversity Politehnica Bucharest
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyUniversity Politehnica Bucharest
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUniversity Politehnica Bucharest
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsUniversity Politehnica Bucharest
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...University Politehnica Bucharest
 

Mehr von University Politehnica Bucharest (20)

PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic AnalysisPhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
PhD Thesis - Influence of Repetitions on Discourse and Semantic Analysis
 
Time series analysis for sales prediction
Time series analysis for sales predictionTime series analysis for sales prediction
Time series analysis for sales prediction
 
Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...Identification and Classification of the Most Important Moments in Students’ ...
Identification and Classification of the Most Important Moments in Students’ ...
 
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
Digital Services Development Using Statistics Tools to Emphasize Pollution Ph...
 
Identifying cyclic words with the help of google
Identifying cyclic words with the help of googleIdentifying cyclic words with the help of google
Identifying cyclic words with the help of google
 
Expression of Political Opinions in Press
Expression of Political Opinions in PressExpression of Political Opinions in Press
Expression of Political Opinions in Press
 
Determine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysisDetermine the time period when a text was written using time series analysis
Determine the time period when a text was written using time series analysis
 
Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...Using machine learning to generate predictions based on the information extra...
Using machine learning to generate predictions based on the information extra...
 
Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...Hearthstone helper using optical character recognition techniques for cards d...
Hearthstone helper using optical character recognition techniques for cards d...
 
Movie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profileMovie recommender system using the user's psychological profile
Movie recommender system using the user's psychological profile
 
Tracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corporaTracing the paths between concepts in large bio medical corpora
Tracing the paths between concepts in large bio medical corpora
 
The collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case studyThe collection and analysis of public data - Bucharest case study
The collection and analysis of public data - Bucharest case study
 
Archaisms and neologisms identification in texts
Archaisms and neologisms identification in textsArchaisms and neologisms identification in texts
Archaisms and neologisms identification in texts
 
Unsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesisUnsupervised system for automatic grading of bachelor and master thesis
Unsupervised system for automatic grading of bachelor and master thesis
 
Sentiment based text segmentation
Sentiment based text segmentationSentiment based text segmentation
Sentiment based text segmentation
 
Creativity detection in texts
Creativity detection in textsCreativity detection in texts
Creativity detection in texts
 
Nlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chatsNlp based heuristics for assessing participants in cscl chats
Nlp based heuristics for assessing participants in cscl chats
 
Detecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversationsDetecting discourse creativity in chat conversations
Detecting discourse creativity in chat conversations
 
Metaphor detection
Metaphor detectionMetaphor detection
Metaphor detection
 
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...2012 Presidential Elections on Twitter - An Analysis of How the US and French...
2012 Presidential Elections on Twitter - An Analysis of How the US and French...
 

Kürzlich hochgeladen

Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantadityabhardwaj282
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trssuser06f238
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10ROLANARIBATO3
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 

Kürzlich hochgeladen (20)

Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Forest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are importantForest laws, Indian forest laws, why they are important
Forest laws, Indian forest laws, why they are important
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Neurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 trNeurodevelopmental disorders according to the dsm 5 tr
Neurodevelopmental disorders according to the dsm 5 tr
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10Gas_Laws_powerpoint_notes.ppt for grade 10
Gas_Laws_powerpoint_notes.ppt for grade 10
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 

Tweets topic modelling across different countries prezentarea

  • 1. Autor Conducător științific Universitatea Politehnica București Facultatea de Automatică și Calculatoare Catedra de Calculatoare Tweets Topic Modelling Across Different Countries Ahmed ABDELWAHAB Jose ROBLES Costin-Gabriel CHIRU - costin.chiru@cs.pub.ro Traian REBEDEA
  • 2. Contents • The Problem • Twitter • The Data • Topic Modeling • Results • Conclusions 24.04.2014 eLSE 2014 2
  • 3. The Problem • Are there differences between the topics of interests of people around the world? • Identify the topics of interest for the people from different countries • Tweets offer the possibility to identify both the topics of interest and the location of different persons  analysis of English tweets published in different countries 24.04.2014 eLSE 2014 3
  • 4. Twitter • Used intensely by the data mining communities: – enormous amount of data available (more than 883 million users, with 241 million users active monthly, average number of tweets sent on a daily basis is about 500 million [Edwards, 2013]). – news and hot stories are spreading very fast on this micro-blogging network • We extracted tweets containing both text and URLs to external articles and modelled the topics of the content and of the URLs independently 24.04.2014 eLSE 2014 4
  • 5. The Data (1) • Only 2% of the tweets had a location stamp set • the number of tweets differ from one country to another and the location stamped tweets are even more non-uniformly distributed across countries 24.04.2014 eLSE 2014 5
  • 6. The Data (2) • From these tweets, we kept only those written in English which also contained shared URLs • Largest shares of tweets written in English (besides countries such as - UK, USA, South Africa, Canada) have been seen in a lot of European countries (e.g. Latvia, Serbia, Poland, Germany, Ukraine, Netherlands, Italy, France, Portugal, Spain) • Initially 1 million tweets  only 50 k respected all conditions 24.04.2014 eLSE 2014 5
  • 7. Topic Modeling 24.04.2014 eLSE 2014 6 • After extracting the data from tweets: – For the URLs, the webpage is fetched and the HTML is parsed  main 10 topics using Latent Dirichlet Allocation (LDA) [Blei, 2012] – For the tweets content, we removed the #s and then used affinity propagation (AP) to cluster the tweets for each country. The main 10 topics were extracted from the resulting clusters using LDA (text was too short  didn’t apply LDA directly).
  • 8. Results (1) • Top 10 topics for URL content: activities, business, career, cooking, fun, market, places, social, sports, twitter • Top 10 topics for tweets’ content: city, entertainment, fun, health, movies, places, request, restaurants, romance, travel • Top 10 words for each topic are presented in the paper 24.04.2014 eLSE 2014 7
  • 9. Results (2) • Correlation between the two distributions (the topic tj for URL content and the topic ti for tweets): • Where • This way we considered both the intersection of the words for the 2 topics and how representative these words were for the corresponding topics 24.04.2014 eLSE 2014 8 ji tt + +∑∑ ∈∈ jhik tw ihtw jk ji )t,O(w)t,O(w =)t,ity(tdissimilar    + ∈ = otherwise twifrank ,1t ),tw,( t)O(w,
  • 10. Results (3) 24.04.2014 eLSE 2014 9 • For identifying the coupling between topics we used a greedy algorithm and obtained the following pairs: – Entertainment - Social – Places - Activities – Restaurant - Career – Request - Sports – Travel - Market – City - Twitter – Health - Places – Romance – Cooking – Fun – Fun – Movies – Business
  • 11. Results (4) • Country comparison: – construct new matrix using: N(Ci,tweets,tj)*P(Ci,tweets,tj) (i stands for the countries and j for the topics). – For both URL and tweets’ content, for each topic select the most representative 5 countries – Use the next formula to evaluate how similar the topics are: – eval = 56% 24.04.2014 eLSE 2014 10 5 )t(5)t(5 =)t,eval(t ji ji CountriesTopCountriesTop 
  • 12. Results (5) • What different countries are talking about: – USA: other tweets 50% of the time and 10% of the time about blogs and other social networks – UK: tweets less than 30% and for other social networks and blogs about 20% – Canada: tweets 40% of the time and 15% about other social media – South Africa: tweets 25% of the time and another 20% of the time take about other social networks • In all the countries the percentage of discussion about social and blogs topic is equal to the percentage of tweets about sports 24.04.2014 eLSE 2014 11
  • 13. Conclusions (1) • Low matching between the topics debated in URLs and tweets (maybe because the tweet doesn't always describe or summarize the content of the shared URL) • Analyzing the combined text of the tweets and of the shared webpages showed that the topics generated from tweets and shared URLs have only a 56% matching across different countries • We expected to have somehow similar country distributions for the computed topics the degree to which a topic is debated is highly influenced by the country  the cultural differences between countries are at least partly responsible for this difference 24.04.2014 eLSE 2014 12
  • 14. Conclusions (2) 24.04.2014 eLSE 2014 13 • Results should be interpreted carefully: – not all the countries have a representative number of tweets in our dataset – the ratio of English tweets to the total number of tweets for each country (e.g. Brazil having a very low percentage of English tweets) – only 2% of the Twitter users set a location stamped – the people who write tweets usually don't care about spelling or using words which are not in the English dictionary  problems on parsing (ignoring some words)
  • 15. Q&A Thank you for your time! 24.04.2014 eLSE 2014 14

Hinweis der Redaktion

  1. Edwards, J. 2013. Twitter's 'Dark Pool': IPO Doesn't Mention 651 Million Users Who Abandoned Twitter. [online, 6.11.2013] http://www.businessinsider.com/twitter-total-registered-users-v-monthly-active-users-2013-11
  2. Having English as their official language
  3. Blei, D. M., 2012. Probabilistic topic models. In Communications of the ACM 55.4: 77-84.