Micro-Scholarship, What it is, How can it help me.pdf
Â
Twet
1. Twet
Anca Antochi (aantochi@infoiasi.ro),
Lucian Pricop (lucian.gabriel.pricop@gmail.com),
Radu Sarghie (rsarghie@infoiasi.ro)
Abstract. By now, many people have become familiar with internet search
engines. Most internet users can easily find out necessary information by
simply typing in a word in a search engine and reading the search results.
However the web today offers us a wide variety of specialized applications that
allow us to search specific domains of interest. These applications can be
combined in so called mash-ups that can group the search results provided by
more of these applications. Twet is an effort to combine the results of searches
in the Twitter micro-blogging network and the Flickr photo sharing service. To
make our application more user friendly, the search is extended to the
synonyms of the search word (using the WordNet lexical database) and the
result is combined with the Yahoo mapping service to show the most recent
tweets and relevant photos about the topic in question.
Keywords: twitter, mash-up, flickr, wordnet, yahoo maps.
1 Introduction on the used technologies
1.1 Twet - Project Description
Twet is a search tool that combines the posts of the Twitter micro-blogging
network, displayed on an overlay over Yahoo maps, with the Flickr Photo sharing
service to give you the most relevant tweets and photos about a search term. To make
the application more user-friendly, the search is extended to the synonyms of our
search word, by using the WordNet lexical database.
The project consists from the Twet main web application, which can be deployed
on any ASP.Net enabled server, and two php web services named Twet-WordNet and
Twet-Twitter,
2. 1.2 Twitter
According to Wikipedia, Twitter is a free social networking and microblogging
service that enables its users to send and read messages known as tweets. Tweets are
text-based posts of up to 140 characters displayed on the author's profile page and
delivered to the author's subscribers who are known as followers. Senders can restrict
delivery to those in their circle of friends or, by default, allow open access. Users can
send and receive tweets via the Twitter website, Short Message Service (SMS) or
external applications.
Since its creation in 2006, Twitter has gained notability and popularity worldwide.
It is sometimes described as the "SMS of the Internet" since the use of Twitter's
application programming interface for sending and receiving short text messages by
other applications often eclipses the direct use of Twitter.
Twitter posts example:
U sing Twitter
Twitter exposes its data via an Application Programming Interface (API). A very
usefull documentation about the Twitter API can be found at
http://apiwiki.twitter.com/Twitter-API-Documentation.
Searching on Twitter
Searches on twitter can be performed by calling the search service found at
http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search.
The search url can be called at http://search.twitter.com/search.format
The search parameters of interest are:
3. âą rpp: Optional. The number of tweets to return per page, up to a max of 100.
In our case this is set to 10.
Example:http://search.twitter.com/search.atom?q=devo&rpp=10
âą page: Optional. The page number (starting at 1) to return, up to a max of
roughly 1500 results. In our case this is always set to 1.
Usage Notes:
âą Query strings should be URL encoded.
âą Queries are limited 140 URL encoded characters.
âą Some users may be absent from search results.
âą Applications must have a meaningful and unique User Agent when using this
method. A HTTP Referrer is expected but not required. Search traffic that
does not include a User Agent will be rate limited to fewer API calls per
hour than applications including a User Agent string.
Finding Out Information about Twitter users
In order to find out information about twitter users we can use the service at
http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-users%C2%A0show
One of the following parameters is required:
âą id: The ID or screen name of a user.
âą user_id: Specfies the ID of the user to return. Helpful for
disambiguating when a valid user ID is also a valid screen name.
âą screen_name: Specfies the screen name of the user to return.
Usage Notes:
âą Requests for protected users without credentials from 1) the user
requested or 2) a user that is following the
protected user will omit the nested status element. Only publicly
available data will be returned in this case.
4. 1.3 Flickr
Flickr is an image and video hosting website, web services suite, and online
community. In addition to being a popular website for users to share and embed
personal photographs, the service is widely used by bloggers to host images that they
embed in blogs and social media. As of October 2009, it claims to host more than 4
billion images.
Using Flickr
There are more available APIs that allow the interaction with Flickr. For the
purposes of this project, the Flickr.Net library was used (which can be found at
http://www.codeplex.com/FlickrNet ).
To get started you will need to get an API Key for use with Flickr. You apply for
new keys and manage your keys from the Your Keys section of the Flickr Services
Web site at http://www.flickr.com/services/api/keys.
Here is a small example on how to use the Flickr.Net in C#:
PhotoSearchOptions searchOptions = new
PhotoSearchOptions();
searchOptions.Tags = "Iasi";
Photos iasiPhotos = flickr.PhotosSearch(searchOptions);
Flickr photo results
5. 1.4 WordNet
WordNet is a lexical database for the English language. It groups English words
into sets of synonyms called synsets, provides short, general definitions, and records
the various semantic relations between these synonym sets. The purpose is twofold: to
produce a combination of dictionary and thesaurus that is more intuitively usable, and
to support automatic text analysis and artificial intelligence applications. The database
and software tools have been released under a BSD style license and can be
downloaded and used freely. The database can also be browsed online. WordNet was
created and is being maintained at the Cognitive Science Laboratory of Princeton
University.
Using Wordnet
Wordnet provides an online service for searching word definitions at
http://wordnetweb.princeton.edu/perl/webwn. However using the service for our
project proved difficult, because of the slow speed and because the rss feed returned
was difficult to parse in order to fing the synset. Instead we downloaded the Wordnet
database (found at http://www.semantilog.org/wn2sql.html#synset) and exposed a
php web service to perform our searches
The wordnet search engine:
6. 1.5 Yahoo Maps
The advent of web mapping can be regarded as a major new trend in cartography.
Previously, cartography was restricted to a few companies, institutes and mapping
agencies, requiring expensive and complex hard- and software as well as skilled
cartographers and geomatics engineers. With web mapping, freely available mapping
technologies and geodata potentially allow every skilled person to produce web maps,
with expensive geodata and technical complexity
Yahoo! Maps is a free online mapping portal provided by Yahoo.
Using Yahoo Maps
The Yahoo Ajaxs API lets developers add maps to their web sites using DHTML
and JavaScript. Maps are fully embeddable and scriptable using the JavaScript
programming language. Yahoo Maps has a built-in geocoder means that which we
can specify a physical address or latitude/longitude coordinates for your map's
location. The Api documentation can be found at
http://developer.yahoo.com/maps/ajax/.
In order to use Yahoo maps, an Application ID is needed. Yahoo gives for free suc
Application IDs after filling in a form at In order to use Yahoo maps, an Application
ID is needed. Yahoo gives for free suc Application IDs after filling in a form at
https://developer.apps.yahoo.com/wsregapp/.
Yahoo Maps Control:
7. 1 Twet
2.1 Project Description
Twet is a mash-up that combines more technologies. It's purpose is to show
relevant tweeter posts (so called âtweetsâ) about a topic, grouped nicely according to
the Twitter user's location in Yahoo maps. To make the application more user friendly
synonyms of the search word are also used (relying on the WordNet Service) and the
result is combined with relevant pictures fetched from the Flickr photo sharing
service.
The workflow of a Twet search is the following:
1. The user types in a search word in the Twet and clicks âsearchâ
2. Twet calls the Twet-Twitter service with the search term as a parameter
3. The Twet-Twitter Service calls the Twet-Wordnet service to get the
synonyms of the word
4. Having the synonyms, the Twet Service calls Twitter to find out the last 10
post about the relevant terms
5. The Twet-Twitter service returns to Twet the last 10 posts on Twitter (along
with meta-information like the Geo Tags) and the synonims
6. Twet draws an overlay on Yahoo Maps showing the desired tweets
7. Twet searches the Flickr photo sharing service for photos about the relevant
search terms
8. Twet shows:
⊠The Yahoo Maps with the Twitter pushpins
⊠The list of Tweets
⊠The relevant Flickr photos
8. Twet workflow diagram:
Twet-Input
Twet-Twitter Service Twet-Wordnet Service
Yahoo Maps Twet Flickr
Twet-Output
9. 2.2 The Twet Application
Twet is a Asp.Net web application that receives as input one or more search terms and
displayes the last 10 tweets relevant to the search. The tweets are projected also on
Yahoo maps and the result is combined with 10 relevant photos retrieved from the
Flickr photo sharing service.
It does so by calling the Twet-Twitter service and performing a search on Flickr.
The synonim list and the tweets map:
11. 2.3 Asp.net vs Yahoo pipes
Yahoo Pipes is a web application from Yahoo! that provides a graphical user
interface for building data mashups that aggregate web feeds, web pages, and other
services, creating Web-based apps from various sources, and publishing those apps.
The application works by enabling users to "pipe" information from different sources
and then set up rules for how that content should be modified (for example, filtering).
Initially Twet started as a Yahoo Pipes mashup. However we gave up on using
Pipes because it gave us too little control on string operations. Also Twitter is limiting
the number of requests made by Yahoo Pipes.
Yahoo Pipes Designer:
12. 2. 4 Twet-Twitter
Twet-Twitter is a php web service that returns a geo-tagged RSS feed with the 10
most relevant tweets that contain a search word (or it's synonyms).
This service is currently hosted at http://lucianpricop.is-a-geek.net/twitter
To obtain the synset of the desired word, we simply call the Twet-Wordnet service
(described later in this document).
After obtaining the synonims list, the simplest method is to get the Twitter content
by using the php file_get_contents function:
file_get_contents('http://search.twitter.com/search.atom
?q=twitter');
However, this method requires that php configuration to have
allow_url_fopen set to true, which allows reading data from remote files. Not all web
hosts enable this setting, for security reasons. Also, Twitter limits the number of
requests sent to their web services to less if they don't appear to originate from a
browser. They check this by looking at the UserAgent header of the HTTP request. So
we need a method to set this header to something eligible before sending a request to
Twitter.
The libcurl PHP library allows connections and communications to many
different types of servers with many different types of protocols. libcurl currently
supports the http, https, ftp, gopher, telnet, dict, file, and ldap protocols. libcurl also
supports HTTPS certificates, HTTP POST, HTTP PUT, etc. So it allows us to send a
value for the UserAgent header. Here's how we use libcurl's functions to achieve our
goal:
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,'http://search.twitter.com/
search.atom');
curl_setopt($ch, CURLOPT_POSTFIELDS,'lang=en&q='.$q);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0
(Windows; U; Windows NT 6.1; en-GB; rv:1.9.2)
Gecko/20100115 Firefox/3.6');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$xml = curl_exec($ch);
curl_close($ch);
Getting a geotagged RSS feed from Twitter
Twitter has recently launched their geotag API (November 2009), but users
need to update their profile in order to allow Twitter to geographically tag their posts.
Most users don't know and probably don't care about this option, so they haven't opted
in for this feature, so most tweets returned by a the twitter search API are not geo
tagged. However, we thought the geo tag is very important and decided to work
around this problem by using Twitter's user details API. This API allows us to get the
13. public details of users. These details include the textual location which can be
translated to geographical altitude and longitude with the help of a nifty web service
we found at http://www.geonames.org/export/geonames-search.html This service
returns exactly what we need so we can add the <geo:lat> and <geo:long> tags to
each tweet.
The only issue is with Twitter users that don't make their location public or
they write fictitious locations. There's not much we can do about it, so we decided to
geo tag these users' tweets to the middle of the Atlantic Ocean :)
A twitter comment rss feed entry looks like this:
<entry>
<id>tag:search.twitter.com,2005:8191823850</id>
<published>2010-01-25T13:39:59Z</published>
<link type="text/html"
href="http://twitter.com/Tudoor/statuses/8191823850"
rel="alternate"/>
<title>came back from school looking like an
popsicle :-j ... -25 degrees Celcius in iasi :-ss</title>
<content type="html">came back from school looking
like an popsicle :-j ... -25 degrees Celcius in
<b>iasi</b> :-ss</content>
<updated>2010-01-25T13:39:59Z</updated>
<link type="image/png"
href="http://a3.twimg.com/profile_images/582822757/myface
2_normal.jpg" rel="image"/>
<twitter:geo>
</twitter:geo>
<twitter:source><a href="http://echofon.com/"
rel="nofollow">Echofon</a></twitter:source>
<twitter:lang>en</twitter:lang>
<author>
<name>Tudoor (Tudor Necula)</name>
<uri>http://twitter.com/Tudoor</uri>
</author>
<geo:lat>47.1666667</geo:lat><geo:long>27.6</geo:lo
ng>
</entry>
14. 2. 5 Twet-Wordnet
The Twet-Wordnet web service takes a list of space separated words and
returns a list of all the synonyms for all these words, including the given words in xml
format.
The service is hosted at http://lucianpricop.is-a-geek.net/wordnet.php?
For example, accessing for example http://lucianpricop.is-a-
geek.net/wordnet.php?words=bubble will return:
<SYNSET>
<SYN>bubble</SYN>
<SYN>house of cards</SYN>
<SYN>belch</SYN>
<SYN>burp</SYN>
<SYN>eruct</SYN>
<SYN>babble</SYN>
<SYN>burble</SYN>
<SYN>guggle</SYN>
<SYN>gurgle</SYN>
<SYN>ripple</SYN>
</SYNSET>
At the beginning of this project, our service relied on another web service
provided by Mr Bernard Bou at http://jws-champo.ac-toulouse.fr:8080/wordnet-
xml/servlet . This service is called for each separate word and from the resulting xml,
all the synonyms from each sense of each category of each part of speech are
collected and returned.
However because that service was not reliable, we chose to download the
Wordnet Database from http://wordnet.princeton.edu/wordnet/download/ and
implement the data extraction ourselves.
15. References
1. "AplicaĆŁii hibride: mashup-uri" (in Romanian), in S.Buraga (ed.), "Programarea Ăźn Web
2.0", Polirom Publishing House, IaĆi, 2007
2. âMashing Up Feeds Using Yahoo Pipesâ article from
http://www.devlounge.net/code/mashing-up-feeds-using-yahoo-pipes
3. âYahoo! Pipes: An Introductionâ, by: Kim Cavanaugh from
http://www.communitymx.com/content/article.cfm?cid=86E4B
4. Yahoo Maps geocoding API - http://digitalcolony.com/2007/01/using-yahoo-maps-
geocoding-api-in-c.aspx
5. Twitter API Documentation - http://apiwiki.twitter.com/Twitter-API-Documentation
6. WordNet Documentation - http://wordnet.princeton.edu/wordnet/documentation/