1. User Guide
Updated April 3, 2011
Living Document for Sweeper v0.3
http://swiftly.org/userguide
2. Table of Contents
Table of Contents
I. Introduction
II. Using this Living Document
III. About the Sweeper Application
Suggested Uses
As a FeedReader
For Passive Data-Processing
For Active Content Filtering
For Real-time Social Media Curation
As a Vertical Content Dashboard
Terminology
IV. Explaining the Sweeper UI
Analytic Dashboard
Main Content Window
Admin Panel
View Tabs
Filter Panel
Refresh Staging Area
Rating Panel
Content Items
V. Overview of Plugins
Duplicate Content Filter
Google Language Services
Geo-Location (Yahoo)
Tagging
Ushahidi Push
Tag Clustering
Annotations*
Quiver/Bookmarking*
VI. Adding Sources
Email (IMAP)
Email (Gmail)
FrontlineSMS
News & Blog Search
RSS/ATOM
Flickr
SMS Gateways
Twitter
3. I. Introduction
Thanks for using the Sweeper application! Sweeper is meant to be fairly intuitive but we’re well aware that
sometimes it’s a little overwhelming at first to get started and knowing what’s possible. In this guide we
will walk you though using Sweeper and a handful of the native plugins. This is not a guide for installing
it (for that look here), rather this guide will walk you through use of the Sweeper software and the various
plugins for it. If you are a developer seeking information on how to develop plugins, parsers or other
modules for Sweeper and other SwiftRiver applications, click here.
4. II. Using this Living Document
Because Sweeper is an open-source product, who’s code and feature-set changes quite frequently, this
user guide is a living document that serves only as a snapshot of what’s possible at the time it was last
updated. We invite you to revisit this link often. If you decide to print it, just be aware that as soon as it’s
transferred from bits to pulp, it’s essentially become outdated.
Likewise, any copy of this document that is distributed in PDF form, DOC form, or FLV form, those
versions too are likely outdated. To ensure you have the latest version, it can always be found at - http://
swiftly.org/userguide/
5. III. About the Sweeper Application
Sweeper is an application that focuses on the aggregation, curation and filtering of real-time content.
It assumes the user knows exactly what sources they are tracking but needs an application to help
them prioritize their attention. Here is a comparison. Sweeper is sort of like an open source version of
TweetDeck, or to use a Google analogy: Google Reader. The user defines a number of sources to track
and Sweeper offers a number of ways for filtering and viewing that collected content.
Suggested Uses
What can Sweeper be used for? A number of things but here’s a few ideas...
As a FeedReader
Sweeper was designed for collecting large amounts of disparate real-time data and sweeping
through it quickly and efficiently, while also doing things to that content. So there is an emphasis
on speed and summation of large datasets, allowing the user to decide upon where to spend his
or her time to delve deeper.
As mentioned in the examples above, one might consider using Sweeper as a substitute for
a traditional feed-reader. However, unlike most feed-readers there are no restrictions on the
type of data that can be aggregated, and there’s smart triggers applied to data going out. ex. If I
perform this function, content is affected in this way. This functionality can be useful for setting up
really advanced conditional taskingwhich we’ll cover later.
For Passive Data-Processing
Sweeper can also be configured to be a passive filter for data, meaning you can set it to
aggregate content, then automatically perform certain tasks around that. ex. Aggregate all tweets
from #hashtag tagged in the state of Maine and send only that data to another platform.
When used in this way, Sweeper essentially becomes a smart cron tool equipped with geo-
tagging, natural language processing and other power contexual features.
For Active Content Filtering
Users are also provided a number of utilities for quickly searching through content. Clicking on
a selection of tags allows the user to see content only selecting those tags. The cluster panel
allows content to be clustered around other content in various channels that are similar. The user
can also sort by assigned scores (which can represent the favor they might have for some types
of content over others) in any variation between 1 and 100. ex. show me only the content with a
score of 40 or above; or only content between 20 and 60.
For Real-time Social Media Curation
Sweeper can be used for real-time media curation across channels (Blogs, News, RSS/ATOM,
Twitter, SMS, Email) and across over 50 languages. For a journalist attempting to collect data
6. that’s rapidly unfolding across social-media, this can save potentially unprecedented amounts of
time. Rather that opening 50 different windows for different apps, the Sweeper application can be
used to mine and add context to disparate content, completely at the users whim. Perhaps even
more interestingly, all this aggregated data can be annotated, mapped, shared or exported in a
number of ways after it’s been structured as the user sees fit.
As a Vertical Content Dashboard
Perhaps you have a need to know what’s going across various industries at all times. You
could enter the feeds of several well known bloggers, the @twitternames of thought leaders in
that industry, a public facing email address you control like sports@mynewsite.com, a public
facing shortcode (ex. 6060). That might just be your sports page. But when you replicate that
experience multiple times across Entertainment, World News, Food, Lifestyle etc. you end up with
an equally rich immerse real-time data-mining tool across all those interests.
Terminology
Before we continue, it will help if you have a basic understanding of the terminology we use to discuss the
application.
Sweeper (capital ‘S’) - the name of a SwiftRiver application for aggregating and processing feeds of
content
sweeper (lowercase ‘s’) - generally, one who performs the function of sweeping through feeds of
content. However, in the Sweeper application the user role of sweeper is assigned to users who can edit
tags and process content but who don’t have administrative rights to the application.
sweep - to process data
channel - the distribution type used to deliver content. Twitter, Email, RSS/ATOM, SMS are all channels.
source - the place (or person) from which content originates. a persons @twittername, email address,
blog or web url, or phone-number would all be considered sources. Several sources may be collected to
reference a single identity ex. this blog, this url, this phone number all belong to the same person
content item - a single item of content collected from a feed, regardless of the channel it came in on or
the source it came from
tag - a layer of taxonomy applied to all content
lat/lon - geospatial coordinates; short for latitude and longitude
veracity - more accurately the subjective favor the user (or users) has for content. The baseline of favor
expressed for certain types of content is uses as a building block for a score applied to content. This
score is then used both for prioritizing sources and for recommending other content the user or users may
favor.
cluster - a collection of content items deemed to be statistically similar based on tags
editors - editors don’t have full administrative rights to the application but they can perform tasks that
sweepers can’t.
turbine - another word for plugins for SwiftRiver applications
impulse turbine - plugins that pre-process content (before the application receives it). Impulse Turbine
plugins affect how data is structured as part of the Swift object module.
reactor turbine - plugins that process content based on human interaction or assigned logic (after
the application has received it). Reactor Turbine plugins can be used to take structured data and do
7. something with it.
parsers - on the application architecture level parsers are modules that can be written to create new
sources
trusted source - applies a default score of 100 to a source allowing the user to vote against a high-score
as the default. ex. you have my trust now but could lose it over-time
8. IV. Explaining the Sweeper UI
So now that we’ve got the basics we can walk you through the Sweeper user interface, it’s basic features
and functions. At first look the application can be a little intimidating so hopefully this guide takes the
edge off (like a martini!).
9. Analytic Dashboard
This dashboard offers a quick survey of the content being collected by Sweeper. Where is data mostly
being collected from? How much content in total? Howe much from each channel? The charts are
dynamic and update with each use of the application.
10. Main Content Window
Below you see the main content display window. This is where aggregated content can be viewed.
11. Admin Panel
This area contains four tabs. Login, Impulse Turbines, Reactor Turbines, Sources, Add User
Login - as you might expect, this area allows users to login to the application
Impulse Turbine - for enabling or disabling impulse turbine plugins
Reactor Turbine - for enabling or disabling reactor turbine plugins
Sources - this is the area where one can add sources to aggregate into Sweeper
Users - area for adding users and assigning their administrative rights
12. View Tabs
This area contains several tabs for altering the view of the main content window. The titles are fairly self-
explanatory. Dashboard, New content, Accurate, Inaccurate, Crosstalk, Irrelevant
Dashboard - contains a collection of charts plotting various aspects of the content being
collected
New content - for viewing new content as it’s being collected
Accurate - shows all content voted up
Inaccurate - shows all content voted down
Crosstalk - shows content that is completely off-topic
Irrelevant - shows content that is on-topic but not relevant to the user’s specific needs
13. Filter Panel
Filters for changing the view of the main content window.
Veracity Slider - allows the user to set a range of anything between 1 and 100 to view content by
assigned score
Channels - view only the content that came in on a particular channel
Tags - view only the content containing a selection of tags
Refresh Staging Area
Reveals how much content has been aggregated since the main content window was last refreshed.
14. Rating Panel
The upper left part of the Rating Panel is for quickly determining information about content. Is this
a ‘trusted’ source or has it been rated as trusted by the people within your bounded (or unbounded) group
of users?
The upper right quadrant shows a score that represents the favor the user or their community has for the
associated source.
In the lower quadrant we have four buttons here is what they essentially do:
Green (Up) - expresses favor for a content item while positively affecting it’s sources score so
that in the future content from the same source will be prioritized.
Red (Down) - expresses disapproval for a content item while negatively affecting it’s sources
score so that in the future content from the same source will be deprioritized
Crosstalk - expresses that this content is not relevant because it’s essentially been collected by
mistake and that it’s not useful. Removes it from the main view without negatively affecting the
source score.
Irrelevant - expresses that this content is not germane to the task the user is trying to perform
and more importantly, is somehow damaging or distracting. Removes the content from the main
view with negatively affecting the source score.
It’s important to note that these votes whether up or down are not the only things being factored into the
scoring of content. We also factor in a number of things like the tag profile of content, the ratings of the
individuals users rating this individual, and other factors. For an in-depth explanation see the RiverID
System Guide.
15. Content Items
Content items are divided into three sub-sections: the Header, the Body and the Footer.
In the Header you’ll find an icon denoting what channel this content came in on: Twitter, Email, SMS, or
RSS/Atom. Clicking this icon will reveal more:
A pop-up display reveals information about the source and the content itself:
Source - the source of the content (a Twitter @name, email address, url or phone number)
Channel - the channel the content came in on (Twitter, Email, SMS, or RSS/Atom)
Source Score - the trust score associated with this source
Link - hyperlink to the original content
16. In the Body you’ll find a portion of the message (from Twitter and SMS) or headline/subject (Articles,
Blogs, Email)
In the Footer you’ll find tags which add a layer of taxonomy to the content. You can quickly find other
content like this particular content item by clicking on the tags themselves. Users can also add their own
tags*, edit tags* or delete tags to help the system improve**.
* Adding tags and editing tags is not possible in the v0.3.0 of Sweeper UI. However a slight modification of the code exposes this
feature and makes it available.
** There is an active learning element of our Tagging API that allows the system to learn from user feedback that will be available
soon. You can read more about this in the section on Impulse Reactor Plugins.
17. V. Overview of Plugins
There are a few plugins that ship with Sweeper and that are either enabled by default or commonly used.
There are way too many to list here so in this section we’ll explain what a few of the available plugins are
and what they are used for.
You can always find more plugins for Swiftly applications at http://plugins.swiftly.org
Duplicate Content Filter
When activated, this plugin passes all content through the Duplication Filter API in the Swift Web Service
stack, effectively removing all duplicate content (like retweets) from a feed.
Google Language Services
When activated, this plugin passes all content through the Google Translate API. Google Translate will
automatically detect what language the content is in, translate it and send it back. This allows you to
aggregate content in multiple languages but only see the resulting translated, English content! This is a
huge time saver when doing international research.
But how do you know what content has been translated. When activated, additional info in the content
item’s header will let the user know what has been translated, and from what language. See the example
above.
If you expect large amounts of data you may want to opt for the Google Enterprise Language Service
18. plugin instead. With this plugin the amount of content that can be translated is increased significantly.
It requires an API key from Google. If you need help getting Enterprise level access, contact us at
support@swiftly.org
Geo-Location (Yahoo)
When activated, this plugin passes all content through the Yahoo Placemaker API where we try to detect
a location where the content is likely to have originated from. We then apply lat/lon coordinates to the
content that are then stored as part of the content meta info. When passed to other systems, this lat/lon
info can be used for geo-spatial reference.
To use this service, you’ll need to acquire a Yahoo Placemaker API key from Yahoo. If you need help
getting Enterprise level access, contact us at support@swiftly.org
Tagging
When activated, all content passing through Sweeper will be tagged by our natural language processing
API. Essentially this services tries to extract what it thinks are the active keywords being used, and uses
that to help the user automatically sort content.
Tags are very important to SwiftRiver and we take a dual taxonomic and folksonomic approach in our
applications. Meaning, although these tags are machine generated, they can be edited and improved
upon by humans which in turns helps to teach the algorithm how to tag content better.
Ushahidi Push
For users of Ushahidi or Crowdmap. This will take any content voted up in the Ratings panel and
automatically plot it on a designated Ushahidi deployment map as an approved report. This is a
significant time saver for large groups who want to use Sweeper to curate data, but use Ushahidi or
Crowdmap to visualize it.
19. Users will need to enter and API key for an Ushahidi deployment that they have administrative rights to.
ex. http://xxx.xxx.xx.xxx/ushahidi/
There are many variants of this plugin. One is called Ushahidi Passive Push and essentially it turns
Sweeper into a cron suite where content is automatically aggregated, structured, and passed along to
Ushahidi...mostly without any human operators!
Tag Clustering
When activated, this plugin allows the user to view content similar to any particular content item. The
clustering is done by using a statistical profile of the associated Tags for proximity matching. This gives
the user more control over alternative recommendation methods, because it can factor in the users own
tagging methods. For instance if I use unique identifiers or words unique to my organization, they too can
be used as part of the proximity matching algorithm!
Annotations*
Annotations offers the ability to annotate any content item. This can be used to leave individual notes for
reference, or to collaboratively converse around content with your team.
Quiver/Bookmarking*
Quiver is a bookmarklet that allows the ability to quickly collect content from around the web and post it
to your Sweeper deployment (effectively adding them to your quiver). This can be useful for individually
collecting research, or if you have teams of contributors actively recommending content for you to then
apply all our contextual APIs to.
* These features will ship with the forthcoming release of Sweeper.
20. VI. Adding Sources
To begin using Sweeper at all, one must begin aggregating from predefined sources. Essentially this
is where you inform the system what you want to track. Sweeper currently only accepts inputs that are
updated streams of data - feeds - in XML/ATOM/RSS or JSON format.
To get any content we don’t currently accept into Sweeper, all one would need to do is write a parser, a
few lines of code that tell the application how to structure data coming from that particular feed.
The types of content natively supported are IMAP, Gmail, FrontlineSMS, GoogleNews, any RSS or
Atom feed, Flickr, other SMS gateways and Twitter.
Email (IMAP)
Sweeper will accept the IMAP details of any email account and begin pulling in content allowing you to
aggregate, translate, tag and cluster your email.
Email (Gmail)
21. Sweeper supports aggregating email from any Gmail account, pulling in content and allowing you to
aggregate, translate, tag and cluster your email. Although Gmail also supports IMAP, the native Gmail
aggregation is recommend.
FrontlineSMS
In combination with FrontlineSMS, Sweeper can become a powerful SMS curation service that
aggregates real-time content (SMS) even if there is no internet connection! There are two ways of
integrating FrontlineSMS with Sweeper. Remote and Local.
22. Is for users who have access to some type of network, either it’s via the Internet or just a LAN. Simply
enter the details of the FrontlineSMS deployment you want to pull data from. You will need to use this
in combination with the FrontlineFetch go-between servlet which can be downloaded from http://
plugins.swiftly.org/?p=51.
23. The local option requires that Sweeper deployment and Frontline:SMS be installed on the same machine
or server. This allows the Sweeper application to pull directly from the FSMS database and will work even
if there is no Internet.
News & Blog Search
This source module allows you to set up a keyword search, returning real-time search results from
Google News, Posterous, Blogger and Wordpress.com. The results will appear in the main content view,
translated if necessary.
RSS/ATOM
24. Self-explanatory, simply enter the URL of a feed in the RSS, ATOM 1.0 or ATOM 2.0 service and
Sweeper will begin aggregating that content.
Flickr
This service allows the user to aggregate content from the photo-sharing service FlickR.
The options are fairly simple. Tag Search will return results aggregated from Flickr based on a search
using a specific keyword ex. cats, dogs, Eiffel Tower. Tag Search with Location will only return geo-
tagged results, great when used in combination with a mapping platform like Crowdmap. Follow User is
for only returning the results from a specific user account.
SMS Gateways
25. We’ve included a generic SMS gateway aggregator. It’s set up to read from the HTTP posts commonly
used by services that don’t have APIs. However, it’s there largely to fork and modify - a head start on
integrating your own SMS service.
Twitter
Culling content from Twitter is easy. There are two options Search and Follow User.
With Search, the user enters the name for a search (the name that has relevance to you) followed by the
term(s) that they would like to search. These can be common words or hashtags. ex. ‘My Twitter Search’
26. and ‘#searchword”. There is no limit to the number of search queries one can have, however the return of
results is limited by your individual access to the Twitter search API. If you’d like to increase this access
contact Twitter to get white-listed or contact support@swiftly.org.
Note on Sources and Search: When using a Twitter search please note that the search itself is not a
source. In the Swiftly eco-system, content producers are sources. This means that we will identify all the
individual content producers and help you keep track of them. This allows one to monitor conversations
around keywords that might lead them to great content producers.
With Follow User, the user can enter a unique name for the Twitter handle they want to follow along with
the actual @name on Twitter. For example ‘Bob Smith, Rwanda’ alongside ‘@bobsmith’. This is helpful
because it perhaps allows you to leave notes about who you may be following for yourself, or your team
members.