The document provides an architectural overview of the SIWOW SocialRank engagement system. It describes the core components including data mining APIs, real-time APIs, an attention monitor, attention archive, engagement database, content system, language/semantic analyzers, feeds list, topic curator, feed update system, and content archive. It then explains how a blog post flows through the system from initial publication to engagement tracking and storage. Finally, it discusses SocialRank scoring and normalization and provides examples of how the data services can be used.
2. Audience
• This whitepaper is intended for technically
minded people who wish to peek behind
the curtain at SIWOW. We’ll cover a high-
level architectural overview of SIWOW
Data Services and describe what the
back-end systems do and how. Please
forward your questions to
stanley_do@hotmail.com for more
details.
3. Introduction
• SIWOW SocialRank began as an RSS feed filtering service to help fight
information overload. RSS feed subscribers, inundated with dozens or
hundreds of stories per day, would share their subscriptions with SIWOW
and subscribe to a new version of their feeds. Each new proxied feed was
filtered based on how the online community interacted with the individual
posts. The online community would leave comments on posts, share posts
on various social sites like Digg and Twitter, or bookmark posts on sites like
Delicious. Each of these engagement events was an implicit vote for the
authority of that content. By collecting all of the engagement activities
surrounding a post, a very useful quality signal emerges—the SIWOW
Engagement Score. Today, SIWOW provides blog and media publishers,
corporate marketers and public relations agencies with critical tools and
data to understand how readers are interacting with content.
4. SIWOW SocialRank Architecture
Overview
• The SIWOW SocialRank architecture combines a searchable RSS content
archive with real-time social media monitoring over an event-driven fabric.
The system computes engagement of readers with content in real-time.
Several of the event subscription points are available as Data Services
Real-time APIs. Our cloud-based deployment makes scaling out a regular
part of our process.
• The diagram on the following page provides an overall visual guide to the
primary architectural areas.
6. Components
• DATA MINING APIs:
Include: Feed, Top Posts, Engagement, Topic, SocialRank
Offer customary request-response style APIs over HTTP to query SIWOW,
engagement or content from the two-year-old archive of attention,
engagement and content data.
• REAL-TIME APIs:
Include: Content, Engaement
Events pushed to API subscribers are content oriented or engagement
oriented. Content oriented events are new blog and mainstream media
news content enhanced with engagement, sentiment and language
metadata. Engagement style events are notifications of significant changes
in engagement levels for individual posts or entire news feeds.
7. Components
• ATTENTION MONITOR: A collection of API adapters tailored to specific
social networking sites that store user interaction events with content.
Currently SIWOW tracks interactions at most of the popular social sites. As
new sites from around the world become popular, new adapters are
deployed and new attention metrics are captured. Engagement alerts occur
in real-time. They are notifications of significant changes in engagement for
an individual story or feed. Feed engagement is the aggregate amount of
engagement from each of the individual posts in a feed.
• ATTENTION ARCHIVE: A searchable repository of individual attention
events mentioning any URL across all social hubs monitored by SIWOW.
• ENGAGEMENT DATABASE: A database of engagement values generated
at social sites over time, allowing time series analysis and reporting of post,
feed or topic scores.
8. Components
• CONTENT SYSTEM: A system for checking RSS feeds, normalizing the
source data and enhancing that data with language detection and semantic
analysis. Content API subscribers can tap into the content pipeline at
several points.
• LANGUAGE & SEMANTIC ANALYZERS: Connect to the content stream
and analyze post text for language and human emotional weight across
several dimensions, including: anger, disgust, fear, happiness, sadness and
surprise. Provides an overall positive, negative or neutral score.
• FEEDS LIST: A database of over 1 million user provided feed URLs with
associated metadata.
9. Components
• TOPIC CURATOR: A human wiki-style curation system powered by
siwow.com for classifying individual feeds into topics. Feeds can exist in
multiple topics simultaneously. Ranking and filtering of topics is dynamic
and real-time based on the collected engagement activities for each feed.
• FEED UPDATE SYSTEM: A master feeds list is checked periodically based
on the level of engagement and publishing volume of the feed. Other
specific integrations with PubSubHubub, RSS Cloud, Ping servers and
other push protocols are also used to minimize the latency for gathering
newly published content. New and updated post events are broadcast over
the content stream.
• CONTENT ARCHIVE: A searchable content repository of news and blog
content with associated metadata, including full posts where available,
author, published dates, language, tags, and URLs.
10. Life of a Blog Post…
According to SIWOW SocialRank
•
11. • 1 An author publishes a post.
• 2 The author’s publishing system makes the new post available in its RSS feed and
optionally notifies a ping or push service, eventually notifying SIWOW.
• 3 The SIWOW Feed update system checks the publisher’s feed for new posts since
the last check. The new post is found.
• 4 The new post is normalized, passed through language and semantic analyzers,
enriched with additional SocialRank metadata about the feed (engagement score,
tags, etc.) and enters the Content Stream to be delivered to Data Services Content
API subscribers based on filter configuration.
• 5 The new post is stored in the searchable SIWOW Content Archive.
• 6 Readers visit the publisher’s site or consume the RSS feed in RSS readers.
• 7 Readers interested in the post share, link or comment on various social sites. The
link may optionally pass through redirecting proxies, url shortening services, etc.
• 8 The SIWOW Attention Monitor tracks all mentions of the story in real-time via site-
specific APIs and polls for comments on the publisher’s site. Events are sent to the
Attention Stream and delivered to Data Services Engagement Real-time API
subscribers based on filter configuration.
• 9 The engagement events are stored in the Attention Archive, ready to be included in
a SocialRank calculation.
12. Engagement
• SIWOW SocialRank is a measure of audience engagement with online
content. Usually that content is referenced as items in an RSS feed – but
today that can apply to almost any kind of content addressable by a
URL.
• In the late 1990s links between static HTML pages were the critical insight
that led to the development of Google’s PageRank algorithm. Today we
interact with content using more modes than static links, often in real-time.
Tracking engagement events like comments, links, shares and votes is
similar in spirit to what html page links were a decade ago – each of these
social gestures is counted as a vote for that content by SocialRank.
Google’s PageRank is used to drive search results while, in our case, you
define the applications driven by engagement via integrating with the Data
Services APIs.
• What is engagement? It’s a number representing the weighted sum of
attention events. We keep track of the number of times each type of event
happens for each post and then calculate the weighted sum to produce an
engagement value for a post.
14. Engagement
• A post’s engagement value is an abstract number at a point in time, like 982.
It’s most useful to benchmark against itself over time or in comparatives
with other posts in the same feed or even across feeds. The value of the
weights for each of the sources is influenced by the type of interaction at the
source. Not all interactions are equal, so interactions that imply higher levels
of engagement have higher weights.
• Looking at all of the posts in a feed over a period of time we can roll up the
Post Engagement score into a Feed Engagement score to compare feeds
between themselves or over time.
Feed Engagement (this week) =
Post #1 Engagement for this week + Post #2 Engagement for the
week + Post #3 Engagement for the week …
• The Feed engagement value includes all the engagement for any stories
that were available on the analyzed site. SocialRank uses week-over-week
feed engagement values in our Topic ranking features.
15. 5 C’s of Engagement
• The value of the weights depends on the type of attention source. Different user
interactions imply different levels of engagement. A pageview is the smallest level of
engagement since all it says is that a page was rendered in a browser (maybe), not
anything more. Other interaction modes imply a higher expenditure of effort or
emotional attachment. For example, leaving a comment implies having read the
article (pageview), given it some thought, and crafting a response back to the author.
Each of these modes is weighted higher. The actual numerical values are subject to
change and not essential to understand the core concept.
16. 5 C’s of Engagement
• CREATING: The strongest form of engagement is demonstrated by using
an item as inspiration to create your own, for example, writing your own blog
post that responds to or refutes someone else’s post. Creation requires the
most thought and investment of time, actively generates conversation, and
therefore indicates a high level of engagement.
• CRITIQUING: Reading a blog post and then leaving a comment requires an
investment of time, thought and effort (or sometimes just typing and name-
calling...), and is a form of conversation. However, it requires less effort than
writing a whole blog post. So while it is an important action, it does not
indicate as much engagement as Creating.
• CHATTING: Sharing and discussing information can often be started with
one click, so it doesn’t require a major investment of effort. However, a
desire to share is a strong indication of relevance and expends some social
capital. The act of sharing and its ensuing discussion are acts of
conversation. Use of social media applications like Twitter encourage both
the sharing of information and the resulting conversations. As a result,
social media “chatting” indicates a good level of engagement.
17. 5 C’s of Engagement
• COLLECTING: Bookmarking or submitting items to social sites also tend to
be “one-click” actions. They are intentional acts of archiving something of
value for future reference and often sharing, but don’t require much time or
effort. However, the sharing that occurs often sparks conversations, so
Collecting does demonstrate some engagement.
• CLICKING: Activities like clicks and pageviews indicate lower engagement
because they’re passive interactions. Clicking a link to read a blog post
doesn’t require much work, and you’re not giving anything back except your
reading time. It is an intentional act, however, and thus indicates a mild level
of interest and engagement, which may grow after the item is read.
18. SIWOW SocialRank Scores and
Engagement Normalization
• When discussing ranking, context is enormously important. Five comments
on a hobby blog may be high but a popular mainstream media site may get
hundreds of comments on average. SIWOW SocialRank values are a
normalization of Post Engagement into a 1.0-10.0 score that is easy for
humans to relate to.
19. SIWOW SocialRank Scores and
Engagement Normalization
• THEMATIC SocialRANK: Thematic ranking is context-free, meaning
several posts can be compared directly with no notion of what is normal for
the feeds they come from. A collection of blog posts originating across
several blogs can be compared and ranked in that set (for example, ten
posts discussing the iPhone). In this example, all of the Post Engagement
values are retrieved, the median engagement value is found and assigned a
SocialRank value of 5.0. The higher post engagement values are then
extended out to 10.0 and the lower ones down to 1.0. The only normalizing
effect comes from the engagement of the posts included in the set.
20. SIWOW SocialRank Scores and
Engagement Normalization
• TOPICS: SocialRank Topics are named collections of news feeds. User Topics are
private topics managed by users, while Global Topics are public. users manually
curate feeds into topics by making a decision about the fit for a feed to a topic. Topic
curation is done using a Wikipedia-style model where any registered user can create
new topics and add any feed to topics. The topics are then available to everyone else.
Since feeds can exist in any number of topics the topic names themselves become a
useful classification and provide tagging data for feeds. New topics are being created
all the time to map to news sources down the long tail.
In addition to the valuable classification metadata, users can consume aggregated
content from an entire topic and have those posts filtered by SocialRank scores.
Blogger discovery and topic coverage analysis is possible by looking at the week-
over-week rankings (Engagement Database) of bloggers in a topic. Bloggers that
generate engagement rise in the rankings while bloggers that don’t descend. This
transparent and meritocratic ranking system allows an unbiased view of who is
generating the most interest in a topic area and what they are writing about.
21. SIWOW SocialRank Scores and
Engagement Normalization
• INFLUENCE SHARE: At any point in time SocialRank knows the total engagement
generated by all stories across all feeds within a specific topic (total attention market
share). Influence share is the fraction that an individual author or a single feed
commands of the total attention market share in that topic. This influence share
provides critical insights into the nature of a topic. Is most of a topic’s engagement
dominated by a handful of bloggers? Or, is the engagement highly fragmented
across dozens?
• RSS CONTENT: It’s often said: “The best thing about standards is that there are so
many of them!” Nowhere is this more true than in the world of content syndication.
There are plenty of different RSS versions on the web today, plus ATOM, RDF and
others. SIWOW has a world class content archive and feed update system. All
formats of syndicated content are consumed and the payload format is normalized
and available in a consistent format. The content items themselves are available via
our Feed Server or via streaming API (AMQP, Webhooks, etc.) for more efficient
delivery of high volumes of data.
22. SIWOW SocialRank Scores and
Engagement Normalization
• LANGUAGE & SEMANTIC ANALYSIS: SIWOW can enhance a news content
stream with language detection and semantic analysis. Language detection uses
samples of words and phrases from the post content to select the most likely
language used predominantly in the post. This automated detection is based on the
actual post data, not the possibly incorrect configuration of the blog platform.
Semantic analysis computes the overall tone as positive, negative or neutral based
on the content of the posts. SocialRank also computes detailed weights based on 6
emotional dimensions: anger, sadness, happiness, surprise, fear and disgust.
23. Data Services Use Cases
• SIWOW Data Services power several applications around the web. The
following is a list of common use cases.
• Engagement Analytics: Real-time or archival views of off-site interaction
events with a site. Augment traditional pageview or click stream oriented
data with next generation social interaction events – the new driver of web
traffic.
• Feed Filtering: Read what matters. Not all posts in a feed are equal, some
are much more interesting and relevant than others. If you’re a big RSS
consumer, understand where you should be spending your time. Find out
where the conversation is today, and read what matters.
• Top Posts: Find publishers in a topic and what they are best known for.
Understanding a publisher’s most engaging posts provides unique insight
into the publisher and his community. As a publisher, understanding what
works and what doesn’t offers a reliable feedback loop on tone and topic.
24. Data Services Use Cases
• Quality Content Syndication and Aggregation: There are tens of millions
of news articles and blog posts being generated every day. Syndicating this
stream without a filter is a recipe for failure. How do you know the content
will be any good or on topic? This leads aggregators to stick to general,
commodity news sources that may be safe but aren’t specific to an area of
interest. Aggregate reader engagement is an effective signal for reducing
noise and finding articles of interest within any topic when looking through
large quantities of otherwise ambiguous content.
• News Discovery: What are the best news sources in a topic? If you know
about one blog can you find similar ones? How many blogs do you need to
get good coverage of a topic area? Topics and feed engagement ranking
help identify clusters of related blogs by topic and rank them according to
their recent performance with their readership.
25. Data Services Use Cases
• Influence Tracking: Who are the thought leaders in a topic? If there are 87
blogs in a topic how is the total engagement distributed among them? Is
there a handful of influencers that own the space or is it a highly fragmented
space? These topic attributes can have a heavy influence on how blogger
outreach is performed or how content is consumed from them.
• Sentiment Analysis: What is the emotional tone of content in a topic
filtered by keywords? Is it generally positive, neutral or negative? Where are
we today versus the normal baseline of sentiment? This is highly useful for
brand monitoring.
26. SIWOW Future!
Revolution!
We are on the way!
Thank You!
Stanley Du
Mobile: 0086-15910916606
E-mail: stanley_do@hotmail.com