SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Semantic Modeling 
Computational Framework for 
Generating Visual Summaries of 
Topical Clusters in Twitter Streams* 
Authors: Presenter: 
! 
Miray Kas Sebastian Alfers - HTW Berlin 
Bongwon Suh 
1 
* http://link.springer.com/chapter/10.1007%2F978-3-319-02993-1_9
Visual Summaries of Twitter Streams 
2 
http://flowingdata.com/wp-content/uploads/2010/02/treemap-revised1.gif 
http://www.infobarrel.com/media/image/54054.jpg
Step 1: 
get & 
pre-process Data 
construct graph & 
clustering 
extract keywords & 
summarize 
Keywords 
Stream 
Tweets 
Preprocessing/ 
Cleaning 
Construct 
Graph 
Clustering 
Select Relevant 
Clusters 
Extract Topical 
Keywords 
Visual Cluster 
Summary 
Step 2: 
Step 3: 
3
Input: Keywords 
• initial set of Keywords 
• similar to Twitter Search 
4
Input: Keywords 
• initial set of Keywords 
• similar to Twitter Search 
5
Step 1: Stream Tweets 
• HTTP base API 
- JSON, REST 
6
7 
• OAuth + HTTP 
• here: java library with 
scala and play!framework
Step 1: Preprocessing 
• transform Tweets 
- easy-to-analyze / clan format 
• Process of cleaning: 
1. lowercase 
2. remove urls, user mentions and stop words 
• like @user, „a“ or „123“ 
3. remove special characters (#,.) 
8
Step 1: Preprocessing 
• Example Keywords: 
- SCALA 
- Scala 
- scala 
- #scala 
• Ling Pipe Library* 
- remove tense and plurals 
9 
}scala 
*http://alias-i.com/lingpipe/
Step 1: Preprocessing 
• Example Tweets 
10 
new york time 
reactive 
programming 
tool scala scale 
techrepublic 
akka-http based 
reactive stream 
scala scaladay
Step 1: Preprocessing 
• Example Tweets 
11 
new york time 
reactive 
programming 
tool scala scale 
techrepublic 
akka-http based 
reactive stream 
scala scaladay
Step 2: Graph 
• Word Co-Occurrence Graph 
- Word = Node (Unigrams) 
- Tweet = Link between Nodes 
• Example 
akka-http based stream reactive scala scaladay 
12 *http://alias-i.com/lingpipe/
Step 2: Graph 
• Word Co-Occurrence Graph 
- Word = Node (Unigrams) 
- Tweet = Link between Nodes 
• Example 
akka-http based stream reactive scala scaladay 
13 *http://alias-i.com/lingpipe/
Step 2: Graph 
• Word Co-Occurrence Graph 
- Word = Node (Unigrams) 
- Tweet = Link between Nodes 
• Example 
14 *http://alias-i.com/lingpipe/ 
based 
akka-http 
reactive 
stream 
scaladay scala
Step 2: Graph 
• Word Co-Occurrence Graph 
- Word = Node (Unigrams) 
- Tweet = Link between Nodes 
• Example 
15 *http://alias-i.com/lingpipe/ 
based 
akka-http 
reactive 
stream 
scaladay scala 
Nodes 
NLoindkess
Step 2: Graph 
• Word Co-Occurrence Graph 
- Word = Node (Unigrams) 
- Tweet = Link between Nodes 
• Example 
16 *http://alias-i.com/lingpipe/ 
based 
akka-http 
reactive 
stream 
scaladay scala
17
18
Step 2: Graph 
• Co-Occurrence Graph 
- connect nodes (words) within and between 
tweets 
- add strength (weight) and cost (distance) 
• More frequently words 
- increase the strength 
- decrease cost 
19
Step 2: Graph 
• Summary 
+ 
= 
reactive 
scala 
stream 
based 
… 
uses 
programming 
…
Step 2: Clustering 
• Here: „complete link (max) clustering“ algorithm 
- hierarchical clustering algorithm that forms 
clusters by merging subgroups 
• Group Words from Tweets 
- frequently appear on topic 
- cluster = topic 
* http://nlp.stanford.edu/IR-book/html/htmledition/single-link-and-complete-link-clustering-1.html
Step 2: Clustering 
• Here: „complete link (max) clustering“ algorithm 
• each node starts as individual cluster 
! 
Clusters = Nodes = Words in tweet 
• close clusters are successively merged together 
- close = highest cost within clusters 
22
Step 2: Clustering 
Graph Representation Cluster Representation 
reactive 
scala 
stream 
based 
… 
reactive 
scala 
stream 
based 
… 
23 
cost = distance = 0.5 
cost = distance = 1 
1 
1
Step 2: Clustering 
24
Step 2: Clustering 
distance = 0.5 
25
Step 2: Clustering 
distance = 1 
distance = 0.5 
distance = 1 
26
Step 2: Clustering 
distance = 1 
distance = 0.5 
distance = 1 
27 
1 
1
Step 2: Clustering 
distance = 1 
distance = 0.5 
distance = 1 
28 
distance = 2 
1 
1
Step 2: Clustering 
29
Step 2: Clustering 
• Final step: Dendrogram 
- tree diagram 
- represents the arrangement of hierarchical clusters 
• why? 
- easy to apply thresholds metics 
30
Step 2: Clustering 
• Final step: Dendrogram 
- closer to the root = lower similarity 
root 
reactive scala 
31 
first cluster
Step 2: Clustering 
• Final step: Dendrogram 
- closer to the root = lower similarity 
root 
new york programming … akka-http based stream scaladay 
32 
reactive scala
Step 2: Clustering 
• Final step: Dendrogram 
- closer to the root = lower similarity 
root 
new york programming … akka-http based stream scaladay 
33 
reactive scala 
thresholds
34
Step 3: Extract topical keywords 
Preprocessing/ 
Cleaning 
35 
Construct 
Graph 
Extract Topical 
Keywords
Step 3: Extract topical keywords 
• keywords 
- express a topic 
- frequently used 
- summarize tweets content 
• Questions 
- „What are the relevant keywords?“ 
- „In what clusters do they appear?“ 
36
Step 3: Extract topical keywords 
• How? 
- „topical tweets“ vs. „general tweets“ 
• frequently in topical tweets! 
- search keywords „reactive scala“! 
• not frequently in general tweets! 
- general twitter stream (all tweets) 
37
Step 3: Extract topical keywords 
• Strength of a word 
- is a word relevant for that topical cluster? 
38 
Low 
Frequency 
High 
Frequency 
Low 
Frequency 
High 
Frequency 
Topical Tweets 
General Tweets
Step 3: Extract topical keywords 
• Strength of a word 
- is a word relevant for that topical cluster? 
39 
Low 
Frequency 
High 
Frequency 
Low 
Frequency 
High 
Frequency 
Topical Tweets 
General Tweets 
✔ 
relevant for 
topic / cluster
Step 3: Extract topical keywords 
• Result 
- topical strength for each keyword 
- sort them by relevancy 
- select top 20 keyword 
• choose clusters that contain this words 
40
Final Step 
• Combine clusters and keywords 
• create visual summary 
41
Final Step 
42 
• Keyword1 
• Keyword2 
• Keyword3 
• Keyword4 
• … 
high relevancy 
low relevancy
Final Step 
43 
• Keyword1 
• Keyword2 
• Keyword3 
• Keyword4 
• … 
high relevancy 
low relevancy
Final Step 
44 
• Treemap Visualisation 
- color = cluster 
- area of word = frequency of word
Final Step 
• Wordcloud Visualisation 
- color = cluster 
- size of word = frequency of word 
45
Final Notes 
• 4. Million Topical Tweets 
• 15 Days 
• User Study 
- Treemap vs. Word Cloud 
46
Thank You! 
• Discussion 
- Loosing precision while cleaning tweet 
- Loosing sense while removing stop words like 
„not“ (negate) 
- Unigram vs. Multigram? 
- ? 
47

Weitere ähnliche Inhalte

Ähnlich wie Visual Summaries of Topical Clusters in Twitter Streams

Multi-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphXMulti-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphXQingbo Hu
 
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScyllaDB
 
Temporal and semantic analysis of richly typed social networks from user-gene...
Temporal and semantic analysis of richly typed social networks from user-gene...Temporal and semantic analysis of richly typed social networks from user-gene...
Temporal and semantic analysis of richly typed social networks from user-gene...Zide Meng
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesVinay Shukla
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton Araf Karsh Hamid
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streamsconfluent
 
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishIt summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishkevin_donovan
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...jexp
 
The Road to Lambda - Mike Duigou
The Road to Lambda - Mike DuigouThe Road to Lambda - Mike Duigou
The Road to Lambda - Mike Duigoujaxconf
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache SparkQuantUniversity
 
td_mxc_rubyrails_shin
td_mxc_rubyrails_shintd_mxc_rubyrails_shin
td_mxc_rubyrails_shintutorialsruby
 
td_mxc_rubyrails_shin
td_mxc_rubyrails_shintd_mxc_rubyrails_shin
td_mxc_rubyrails_shintutorialsruby
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareDatabricks
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nltieleman
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nlbartzon
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learningPaco Nathan
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesC4Media
 

Ähnlich wie Visual Summaries of Topical Clusters in Twitter Streams (20)

Multi-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphXMulti-label graph analysis and computations using GraphX
Multi-label graph analysis and computations using GraphX
 
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application DeveloperScylla Summit 2018: How Scylla Helps You to be a Better Application Developer
Scylla Summit 2018: How Scylla Helps You to be a Better Application Developer
 
Temporal and semantic analysis of richly typed social networks from user-gene...
Temporal and semantic analysis of richly typed social networks from user-gene...Temporal and semantic analysis of richly typed social networks from user-gene...
Temporal and semantic analysis of richly typed social networks from user-gene...
 
Apache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenchesApache con big data 2015 - Data Science from the trenches
Apache con big data 2015 - Data Science from the trenches
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Introducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka StreamsIntroducción a Stream Processing utilizando Kafka Streams
Introducción a Stream Processing utilizando Kafka Streams
 
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishIt summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
 
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
New Features in Neo4j 3.4 / 3.3 - Graph Algorithms, Spatial, Date-Time & Visu...
 
Saner17 sharma
Saner17 sharmaSaner17 sharma
Saner17 sharma
 
The Road to Lambda - Mike Duigou
The Road to Lambda - Mike DuigouThe Road to Lambda - Mike Duigou
The Road to Lambda - Mike Duigou
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
td_mxc_rubyrails_shin
td_mxc_rubyrails_shintd_mxc_rubyrails_shin
td_mxc_rubyrails_shin
 
td_mxc_rubyrails_shin
td_mxc_rubyrails_shintd_mxc_rubyrails_shin
td_mxc_rubyrails_shin
 
Stoop 305-reflective programming5
Stoop 305-reflective programming5Stoop 305-reflective programming5
Stoop 305-reflective programming5
 
What's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You CareWhat's New in Apache Spark 2.3 & Why Should You Care
What's New in Apache Spark 2.3 & Why Should You Care
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Lessons learned while building Omroep.nl
Lessons learned while building Omroep.nlLessons learned while building Omroep.nl
Lessons learned while building Omroep.nl
 
Microservices, containers, and machine learning
Microservices, containers, and machine learningMicroservices, containers, and machine learning
Microservices, containers, and machine learning
 
Streaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+TablesStreaming SQL Foundations: Why I ❤ Streams+Tables
Streaming SQL Foundations: Why I ❤ Streams+Tables
 

Kürzlich hochgeladen

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Kürzlich hochgeladen (20)

MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

Visual Summaries of Topical Clusters in Twitter Streams

  • 1. Semantic Modeling Computational Framework for Generating Visual Summaries of Topical Clusters in Twitter Streams* Authors: Presenter: ! Miray Kas Sebastian Alfers - HTW Berlin Bongwon Suh 1 * http://link.springer.com/chapter/10.1007%2F978-3-319-02993-1_9
  • 2. Visual Summaries of Twitter Streams 2 http://flowingdata.com/wp-content/uploads/2010/02/treemap-revised1.gif http://www.infobarrel.com/media/image/54054.jpg
  • 3. Step 1: get & pre-process Data construct graph & clustering extract keywords & summarize Keywords Stream Tweets Preprocessing/ Cleaning Construct Graph Clustering Select Relevant Clusters Extract Topical Keywords Visual Cluster Summary Step 2: Step 3: 3
  • 4. Input: Keywords • initial set of Keywords • similar to Twitter Search 4
  • 5. Input: Keywords • initial set of Keywords • similar to Twitter Search 5
  • 6. Step 1: Stream Tweets • HTTP base API - JSON, REST 6
  • 7. 7 • OAuth + HTTP • here: java library with scala and play!framework
  • 8. Step 1: Preprocessing • transform Tweets - easy-to-analyze / clan format • Process of cleaning: 1. lowercase 2. remove urls, user mentions and stop words • like @user, „a“ or „123“ 3. remove special characters (#,.) 8
  • 9. Step 1: Preprocessing • Example Keywords: - SCALA - Scala - scala - #scala • Ling Pipe Library* - remove tense and plurals 9 }scala *http://alias-i.com/lingpipe/
  • 10. Step 1: Preprocessing • Example Tweets 10 new york time reactive programming tool scala scale techrepublic akka-http based reactive stream scala scaladay
  • 11. Step 1: Preprocessing • Example Tweets 11 new york time reactive programming tool scala scale techrepublic akka-http based reactive stream scala scaladay
  • 12. Step 2: Graph • Word Co-Occurrence Graph - Word = Node (Unigrams) - Tweet = Link between Nodes • Example akka-http based stream reactive scala scaladay 12 *http://alias-i.com/lingpipe/
  • 13. Step 2: Graph • Word Co-Occurrence Graph - Word = Node (Unigrams) - Tweet = Link between Nodes • Example akka-http based stream reactive scala scaladay 13 *http://alias-i.com/lingpipe/
  • 14. Step 2: Graph • Word Co-Occurrence Graph - Word = Node (Unigrams) - Tweet = Link between Nodes • Example 14 *http://alias-i.com/lingpipe/ based akka-http reactive stream scaladay scala
  • 15. Step 2: Graph • Word Co-Occurrence Graph - Word = Node (Unigrams) - Tweet = Link between Nodes • Example 15 *http://alias-i.com/lingpipe/ based akka-http reactive stream scaladay scala Nodes NLoindkess
  • 16. Step 2: Graph • Word Co-Occurrence Graph - Word = Node (Unigrams) - Tweet = Link between Nodes • Example 16 *http://alias-i.com/lingpipe/ based akka-http reactive stream scaladay scala
  • 17. 17
  • 18. 18
  • 19. Step 2: Graph • Co-Occurrence Graph - connect nodes (words) within and between tweets - add strength (weight) and cost (distance) • More frequently words - increase the strength - decrease cost 19
  • 20. Step 2: Graph • Summary + = reactive scala stream based … uses programming …
  • 21. Step 2: Clustering • Here: „complete link (max) clustering“ algorithm - hierarchical clustering algorithm that forms clusters by merging subgroups • Group Words from Tweets - frequently appear on topic - cluster = topic * http://nlp.stanford.edu/IR-book/html/htmledition/single-link-and-complete-link-clustering-1.html
  • 22. Step 2: Clustering • Here: „complete link (max) clustering“ algorithm • each node starts as individual cluster ! Clusters = Nodes = Words in tweet • close clusters are successively merged together - close = highest cost within clusters 22
  • 23. Step 2: Clustering Graph Representation Cluster Representation reactive scala stream based … reactive scala stream based … 23 cost = distance = 0.5 cost = distance = 1 1 1
  • 25. Step 2: Clustering distance = 0.5 25
  • 26. Step 2: Clustering distance = 1 distance = 0.5 distance = 1 26
  • 27. Step 2: Clustering distance = 1 distance = 0.5 distance = 1 27 1 1
  • 28. Step 2: Clustering distance = 1 distance = 0.5 distance = 1 28 distance = 2 1 1
  • 30. Step 2: Clustering • Final step: Dendrogram - tree diagram - represents the arrangement of hierarchical clusters • why? - easy to apply thresholds metics 30
  • 31. Step 2: Clustering • Final step: Dendrogram - closer to the root = lower similarity root reactive scala 31 first cluster
  • 32. Step 2: Clustering • Final step: Dendrogram - closer to the root = lower similarity root new york programming … akka-http based stream scaladay 32 reactive scala
  • 33. Step 2: Clustering • Final step: Dendrogram - closer to the root = lower similarity root new york programming … akka-http based stream scaladay 33 reactive scala thresholds
  • 34. 34
  • 35. Step 3: Extract topical keywords Preprocessing/ Cleaning 35 Construct Graph Extract Topical Keywords
  • 36. Step 3: Extract topical keywords • keywords - express a topic - frequently used - summarize tweets content • Questions - „What are the relevant keywords?“ - „In what clusters do they appear?“ 36
  • 37. Step 3: Extract topical keywords • How? - „topical tweets“ vs. „general tweets“ • frequently in topical tweets! - search keywords „reactive scala“! • not frequently in general tweets! - general twitter stream (all tweets) 37
  • 38. Step 3: Extract topical keywords • Strength of a word - is a word relevant for that topical cluster? 38 Low Frequency High Frequency Low Frequency High Frequency Topical Tweets General Tweets
  • 39. Step 3: Extract topical keywords • Strength of a word - is a word relevant for that topical cluster? 39 Low Frequency High Frequency Low Frequency High Frequency Topical Tweets General Tweets ✔ relevant for topic / cluster
  • 40. Step 3: Extract topical keywords • Result - topical strength for each keyword - sort them by relevancy - select top 20 keyword • choose clusters that contain this words 40
  • 41. Final Step • Combine clusters and keywords • create visual summary 41
  • 42. Final Step 42 • Keyword1 • Keyword2 • Keyword3 • Keyword4 • … high relevancy low relevancy
  • 43. Final Step 43 • Keyword1 • Keyword2 • Keyword3 • Keyword4 • … high relevancy low relevancy
  • 44. Final Step 44 • Treemap Visualisation - color = cluster - area of word = frequency of word
  • 45. Final Step • Wordcloud Visualisation - color = cluster - size of word = frequency of word 45
  • 46. Final Notes • 4. Million Topical Tweets • 15 Days • User Study - Treemap vs. Word Cloud 46
  • 47. Thank You! • Discussion - Loosing precision while cleaning tweet - Loosing sense while removing stop words like „not“ (negate) - Unigram vs. Multigram? - ? 47