This document describes TweetProbe, a system for visualizing real-time microblog streams. It presents four visualization modes (sentiment map, retweet count ranking, emerging retweet ranking, hashtag ranking) to effectively visualize transient trending topics. The system architecture includes a back-end for processing tweet streams and a front-end for interactive visualization. Novel visual designs like rain drops and a logarithmic timeline are used to conceptualize the "continuum of discontinuity" in microblog data.
2. Motivation
• Microblogs are Valuable Source
-
To selectively consume news and information
-
Difficult to assess and comprehend enormous data
-
Majority of contents in microblogs are transient topics
To analyze social dynamics
• Scalability Issue
• Temporal Topics
3. Research Question
• Novel visualization design on real-time microblog stream
-
How to effectively visualize transient trending topics?
4. Related Work
• Social Stream Filtering and Detection
-
‘TwitterMonitor’ takes user feedback (Mathioudakis and Koudas 2010)
-
Network Intrusion Detection (Cyber-Security Situational Awareness)
Storyboard based Shared Media Curation (Milicic et al. 2013)
‘Catstream’ employes user profiling approach (Esparza et al. 2013)
• Real-time Visualization
‘Tweetping’ Geo-spatial mapping of real-time messages (http://tweeping.net)
‘We feel fine’ visualizes emotional web (Kamvar and Harris, 2011)
9. Design - Motivation
• Rain Drops
-
Stream of Information : Flow in a Continuous Medium (Stream)
Message : Discontinuous Element in a Flow
“Continuum of Discontinuity”
Wall paper image is an
exerpt from http://www.paqoo.com
10. Logarithmic Timeline
The Histomap of
Evolution
The former logarithmic timeline visualization of geologic
and human history, by John
B. Sparks (1932)
Logarithmic Timeline
The logarithmic time in
TweetProbe shows the
original posting time of
messages focusing more on
recent events
12. Sentiment Map
• Rain-drop Like Message Visualization
-
Tweet = Rain drop + Circular Wave
-
# Follower Defines Drop Size and Duration Time
-
Each Drop is Spatially Mapped to Grid
• Potential of Dissemination
• Visual Mapping
Color-coded Sentiment Score
13. Real-time Ranking Visualization
EMERGING
RETWEET
Transient emerging tweets
TOP-COUNT
RETWEET
Top retweets in CDF
EMERGING
HASHTAGS
Transient emerging #hashtags
• Main Visual Components
-
Sliding animation to reveal emerging retweets.
Logarithmic timeline to show the ‘freshness’ of messages
14. Emerging vs Top Retweets (1)
#msg
A
• A shows n/∆t
t
B
Easy to detect transient trending message
-
Time
-
Cumulative Distribute Function (CDF) of (a)
Emerging Retweet Ranking
#msg
• B shows ∑n
Time
t
Retweet Count Ranking
15. Emerging Retweet Ranking
• Top N emerging retweets
-
n/∆t : Number of binned RT within a
time-window
• Sliding animation shows
transition in rank
• Color-coded with markers in
timeline
16. Retweet Count Ranking
• Top N retweets
-
∑n : Shows top RTs (RT counts in CDF)
-
Shows only alive retweets
• Incoming RTs in real-time
17. Hashtag Ranking
• Top N hashtags
-
Emerging topics of messages
Text size is mapped with
its ratio of hashtag count
18. Example: #royalbaby
#royalbaby
22nd of July, 2013
-
900,000 hashtags
(#royalbaby)
-
25,300 Tweets/min at peak
More than 2 million mentions
of the news
https://blog.twitter.com/en-gb/2013/royalbaby-0
20. Conclusion
• Novel visualization design
-
Real-time visualization for trending microposts and topics
Conceptualize ‘Continuum of Discontinuity’
Metaphoric visual components such as rain drops
Color-coded sentiment visualization
Logarithmic timeline with sliding animation
Catch transient emerging topics using 3 ranking view modes
21. Research Question Revisited
• Novel visualization design on real-time microblog stream
-
How to effectively visualize transient trending topics?
Multi-thread based visualization
Identify emerging messages / hashtags with sentiments
New visual components (rain drop and sliding window)
Good afternoon everyone. I am Byungkyu Kang from UC Santa Barbara.I'm here today to talk about our paper, 'TweetProbe' - a real-time microblog stream visualization framework.Before beginning my presentation, I would like to appreciate everyone for coming here to listen to my talk.
As the importance of social network increases in our daily life, its data become valuable source for various practices. For example, many companies use social media to find consumer patterns on their products or services.
What about the individual users? They also need to selectively consume information that they want or need.
Then, what kind of issues do we encounter in those tasks?
Since we can find enormous amount of data on various social media, we have scalability issue. Given the limited resource and time we have, it is very difficult to assess and comprehend the data that we have.
Moreover, most of the topics come and out very quickly. Therefore, we need to timely find the temporal topics. Here we call them 'emerging topics or transient messages'.
Then, what can we do to deal with these challenging problems?
With the streaming service, we have access to huge amount of messages in real-time. And, we thought about harnessing this resource with effective visualization framework in order for users find emerging information or news easily with given keyword of interest.
In this paper, we propose a novel visualization design on real-time microblog stream. The main research question is 'How can we effectively visualize transient trending topics?'
Here's some related work of our approach.
Since our system has two different layers, we studied the literature in two parts.
First, Social Stream Filtering and Detection which is what our back-end data processing does.
-‘twitterMonitor’: QueueBurst algorithm to detect bursty keywords from the tweet texts. However, TweetProbe framework detects bursty retweets focusing extensively on ‘retweeting behavior’ instead of keyword-based pattern. However, we also believe that we can find the bursty trends through the 4th view mode, which detects trending hashtags.
-Milicic et al. developed storyboard based shared media curation approach. They developed a framework which collects microposts shared on social platforms that contain media items as a result of a query. Their visual storyboard shows results as a stream of images, clusters or timeline of items.
-Esparza et al. proposed a system called 'Catstream' which is a user profiling approach based on the topical categorization of users’ posted URLs.
Second, Real-time Visualization
-In real-time visualization, most of the research in the literature have been focusing on network intrusion detection (IDS) or infrastructure monitoring. Since timely alerts are a crucial factor in these systems, they visualize essential status information in a very simple visual structure.
-On the other hand, Kamvar and Harris developed a web-based framework called 'We feel fine' which visualizes emotional web. This system does process blog postings in back-end, extract sentiment of contents and visualize them with aesthetic circular components with various colors. However, it is not fully real-time approach.
-As another example, a recent web-based monitoring service called Tweetping was created by Paris-based developer Franck Ernewein. This service is the most recent real-time microblog visualization which shows number of messages being updated in each country over the world. This visualization tells us the distribution of messages in real-time regardless of topic or category of message.
Next, I want to talk briefly about the system architecture of the framework.
-As Twitter provides streaming API, we can access to the micro-postings in real-time. Through the network connection, we receive incoming data in JSON format as they arrive and parse them to get the message and its metadata.
-Once each message is parsed, it goes through a filter with a given query. In this process, we only store matching messages into a cache memory and discard others. This loop is being maintained while we keep the connection to Twitter.
-As new data comes into the cache memory, they are passed to an array and sorted simultaneously. Regular message and retweet are being treated differently.
-When each message is stored as an object, text and metadata are interpreted to extract sentiment and other information.
-Up to this point, the back-end data processing layer take in charge.
-While the back-end process works, front-end visualization layer runs independently, updating new incoming message on the screen.
Now I'm going to talk about the design consideration of the framework.
We have four different view modes: Sentiment Map, Emerging retweet ranking, Retweet count ranking and Hashtag ranking.
//
Except the first sentiment map, the other 3 view modes show the real-time ranking visualization. I’ll talk more about this later..
Our design is inspired by the rain drops.
When observing the message-posting behavior on the stream, we could see that / each message arrives irregularly. This random distribution of messages is analogous with that of the rain drops.
On a rainy day, one could see the rain drops falling here and there, / making different size of circular waves (on the puddle). We made an analogy / between the size of the waves and the influencing power of each user. / Therefore, the various sizes of the waves in our design represent / differences in dissemination ability of each message.
Here / we interpret the stream of microposts / as a flow in a continuous medium / and each message as a discontinuous element in a flow.
In this framework, / we apply the lógarithmic timeline / in order to show the original posting time of each message / with focusing more on the recent events.
Through the log-scale timeline approach, / both old and new messages can be seen together, / showing more detail of the latest ones.
This visualization technique was first introduced by the Histomap of Evolution by Sparks in 1932.
Now let’s look at the individual view modes of our visualization.
The first view is dubbed as sentiment map. / Here, each tweet is depicted as a rain drop / along with its circular wave animation.
Since the number of followers of each user is considered / as the potential of its dissemination power in the network, / we mapped it to the size of drop and the duration time of circular animation.
Each drop is also spatially mapped to the grid on the screen / and it is color-coded / according to one's sentiment score. / Both red and blue colors represent / positive and negative sentiments.
The following three view modes are the real-time ranking visualizations. /
In this visualization, / sliding animation and lógarithmic timeline are the two main visual components. / Sliding animation reveals the emerging retweets / and lógarithmic timeline shows the freshness of each message.
Due to the similarity of the two retweet rankings, I am going to talk about how they are different.
Here we have two different graphs.
The graph A shows the number of incoming messages on each time frame. Given a unit time 'delta t', we can see the variation of retweet counts of each message.
On the other hand, the graph B shows the same data in a different way. Since it is described as a cumulative distribution function, as time passes, we can see how many times each message has been retweeted since its birth.
As you might notice here, the graph A is a derivative of the graph B. Our emerging retweet ranking visualization is equivalent to the graph A and the retweet count ranking visualization is after the graph B.
The emerging retweet ranking view shows the top N emerging retweets. / Default time-window is set to 10 minutes, / but it can be modified by a user.
Through the sliding animation, / users can see / how the new emerging retweets take place in real-time.
On the contrary, / in the retweet count ranking, / messages are sorted based on the total retweet counts. / Note that here it shows only active retweets / since we are harnessing the data currency of information stream.
Lastly, the hashtag ranking.
In this visualization, / we show the top N hashtags in the given time window, / varying the size of each item. / The size of both text and sliding box is mapped to the ratio of hashtag count in the rank. / Since the hashtag is considered as the topic of each message, / we can interpret this as the trending topic.
In this example, / we can see a possible scenario of the application of TweetProbe system.
We have been monitoring the emerging retweets talking about the royal baby on 22nd of July. / As can be seen here, / people have been sharing different postings on Twitter as time moves on. / For example, on 8:40pm, / the official information began to spread out in the network / with detailed (information.) numbers // such as the time of birth and the weight of the baby.
Again, on 9:00pm, / people start sharing a photo / with a link containing the official source of the announcement.
This tells us that emerging topic or message on microblogs can be replaced with another every second.
Conclusion
TweetProbe is a real-time visualization framework, which is carefully designed for visualizing trending microposts and topics in real-time.
In visual language, it tries to conceptualize 'contínuum of discontinúity' using metaphoric visual components such as rain drops.
The four different visualization techniques comprise various components such as sentiment visualization in a binary color-scale, logarithmic time and sliding animation.
Getting back to the research question, / we thought about how to effectively visualize transient trending topics.
To answer to this question, / we have proposed a multi-thread based visualization technique for social stream and how to identify emerging messages and sentiments out of massive amount of data in real-time.
Also we have shown our new visual components.
You can also see and try this visualization at the Art Show exhibition.
We welcome you to the Art Show Opening Session today at 6pm, room A705.
Thank you.