Unique Identification Number: Implications and Challenges
Tag based Information Retrieval using foksonomy
1. Tag Based Information Retrieval using Folksonomy
Term Paper-2
Submitted by
NIKESH.N
International School of Information Management
University of Mysore
2010
2. Tag Based Information Retrieval using Folksonomy
1.0 Introduction
Web 2.0 represents the collaborative web which revolutionized the access and creation of
information over the internet. Web 2.0 facilitates the user to interact the web in a better way
and provides new ways of access to web information. Blogs, Wikis, micro blogging,
multimedia sharing services, content tagging services etc. are some of the main constituents
that add richness to web 2.0. Especially, tagging as a phenomenon corresponds with a Web
2.0 mentality that users can create not only content but a richer, more adaptive and responsive
way to navigate and search both existing and new media. The tagging promises better and
more intuitive information access through tag-based browsing, information retrieval [1].
A user's context affects how they interact with an information retrieval system, what type of
response they expect from a system and how they make decisions about the information
objects they retrieve [2]. The primary objective of an Information Retrieval system is to
retrieve relevant content to the user. So we can say Information Retrieval is context
dependent and subjective to user and situation. In principle, an information retrieval system
should be context aware [3]. This demands an IR system which incorporates the relevance of
context in information retrieval process. This paper analyse the importance of social tagging
(folksonomy) in Information retrieval and analyse some algorithms and methods suggested
by some scholars for retrieving foksonomy based information retrieval.
2. 0 Relevance of Tags as a metadata source
Many researchers state that the common formal way of professionally assigning metadata is
no longer the optimal way of annotating content, both in terms of efficiency and in terms of
user support. Macgregor & McCulloch [4] argue that if applied to digital libraries and the
web, traditional metadata creation and indexing suffer from scalability problems and the need
for a substantial amount of resources. In an opinionated web article [5], Shirky argues that
“Users have a terrifically hard time guessing how something they want will have been
categorized in advance, unless they have been educated about those categories in advance as
well, and the bigger the user base, the more work that user education is.” In a nutshell, The
user tagging concept is based on two simple ideas. The first one is that the keywords which
people use to tag audio files, video files, web pages, photos, blog posts, etc are similar to the
keywords that they use to search and retrieve information. As the whole process of tagging
involves a human element to it, the search results will make more sense than automated
search results which merely read the metadata embedded in the web pages or content. The
3. second basic foundation is the User Generated Content. User generated content has seen a
huge growth and popularity in the past couple of years. The Web 2.0 platform has taken User
Generated Content to a new level. All blogging platforms like Blogger, Wordpress, TypePad;
Video Sites like YouTube, Metacafe, etc allow users to tag content. The tags from this huge
User Generated Content give updated information which many users.
Social filtering
Social filtering is a community-based approach which is a promising complement to the
existing individual-based filtering approach. In collaborative tagging, users are motivated to
contribute tags to change the appearance of the tag cloud. On a website in which many users
tag many items, this collection of tags becomes a Folksonomy.
Tagging is not only an individual process of categorization, but implicitly it is also a social
process of indexing, a social process of knowledge construction. Users share their resources
with their tags, generating an aggregated tag-index so-called Folksonomy. The term
Folksonomy is coined by Thomas Vander Wal in AIfIA mailing list - a one-word neologism
that comes from the words “taxonomy” and “folk” (Quintarelli, 2005). Folksonomy allows
anyone to access to any web resource that was previously tagged, based on two main
paradigms of information access: Information Filtering (IF) and Information Retrieval (IR).
In Information Filtering, user plays a passive role, expecting that system pushes or sends
toward him information of interest according to some previously defined profile. Social
bookmarking tools allow a simple IF access model, where user can subscribe to a set of
specific tags via RSS/Atom syndication, and thus be alerted when a new resource will be
indexed with this set.
On the other hand, in Information Retrieval, user seeks actively information, pulling at it, by
means of querying or browsing. In tag querying, user enters one or more tags in the search
box to obtain an ordered list of resources which were in relation with these tags. When a user
is scanning this list, the system also provide a list of related tags (i.e. tags with a high degree
of co-occurrence with the original tag), allowing hypertext browsing. As of now, the crawler
pulls all the tags from all popular websites which have Application Programming Interfaces
(API) available and using linear interpolation forms an index which can be sorted and ranked.
Based on the search query of the user, the highest ranked tags will appear in the tag cloud,
which allow the user to have a visualized approach of the data.
3.0 Folksonomy based IR
One of the first scientific publications about folksonomy is done by Adam Mathes (7) in 2004
where several concept of bottom-up social annotation are introduced. Andreas Hotho et. al
(7) in their work described that using traditional information retrieval, folksonomy contents
can be searched textually. However, as the documents consist of short text snippets only
4. (e.g.,the web page title, and the tags themselves), ordinary ranking schemes such as TF/IDF
are not feasible. They propose FolkRank which is an adapted PageRank algorithm. In
order to employ a weight-spreading ranking scheme on folksonomies, FolkRank transforms
the hyper-graph into an undirected graph. Then it applies a differential ranking approach that
deals with the skewed structure of the network and the un directedness of folksonomies.
3.1 Results for Adapted PageRank
Hotho and his team have evaluated the Adapted PageRank on the del.ico.us dataset . First,
they studied the speed of convergence. If let ~p := 1 (the vector having 1 in all components),
and varied the parameter settings. In all settings, they discovered that ® 6= 0 slows down the
convergence rate. For instance, for ® = 0:35; ¯ = 0:65; ° = 0, 411 iterations were needed,
while ® = 0; ¯ = 1; ° = 0 returned the same result in only 320 iterations. It turns out that
using ° as a damping factor by spreading equal weight I. e., each row of the matrix is
normalized to 1 in the 1-norm, and if there are no rank sinks – but this holds trivially in graph
GF.
4.0 GroupMe Folksonomy
Fabian Abel et. al in their paper,”Analyzing Ranking Algorithms in Folksonomy Systems”
introduces a concept named as ‘GroupMe Folksonomy’. It is a resource sharing system like
in del.icio.us or Bibsonomy which have the extended feature of grouping Web resources.
These ‘GroupMe’ groups can contain arbitrary multimedia resources like websites, photos or
videos, which are visualized according to their media type: E.g., images are displayed as
thumbnails and the headlines from RSS feeds are structured in a way that the most recent
information are accessible by just one click. With this convenient visualization strategy, the
user can grasp the content immediately without the need of visiting the original Web
resource. GroupMe motivates users to tag resources by using the free-for-all tagging
approach which enables users to tag not only their own resources, but all resources within the
GroupMe! system. In a study Fabian et.al have conducted an experiment in which , on a
logarithimic scale (extended with zero), they plotted the number of tag assignments on the y-
axis and the number of resources having this number of tags assigned on the x- axis. They
observed a power law distribution of the tag assignments per resource, while about 50% of all
resources do not even have a single tag assignment. And they infer that 50% of all resources
in the GroupMe! System can hardly be found by known folksonomy based search and
ranking algorithms.
They have proposed 3 new Folksonomy based algorithms.
5. GFolkRank - Graph-based ranking algorithms, which extend FolkRank [7] and turn it into a
group-sensitive algorithm in order to exploit GroupMe! folksonomies
Personalized SocialPageRank- Extension to SocialPage-Rank, which allows for topic-
sensitive rankings.
GRank- A search and ranking algorithm optimized for Group-Me! folksonomies.
4.1 G Folk Rank
GFolkRank interprets groups as artificial, unique tags. If a user u adds a resource r to a group
g then GFolkRank interprets this as a tag assignment (u; tg; r; "), where tg 2 TG is the
artificial tag that identifies the group. The Folksonomy graph GF is extended with additional
vertices and edges. The set of vertices is expanded with the set of artificial tags TG: VG = VF
[ TG. Furthermore, the set of edges EF is augmented by EG = EF [ ffu; tgg; ftg; rg; fu; gju 2
U; tg 2 TG; r 2 _R; u has added r to group gg. The new edges are weighted with a constant
value wc as a resource is usually added only once to a certain group. They selected wc = 5:0
_ max(jw(t; r)j as they believed that grouping a resource is, in general, more valuable than
tagging it. GFolkRank is consequently the FolkRank algorithm, which operates on basis of
GG = (VG;EG).
4.2 Personalized socialpageRank
SocialpageRank 10 introduced by Bio et.al is based on the observation that there is a strong
interdependency between the popularity of users, tags, and resources within a folksonomy.
For example, resources become popular when they are annotated by many users with popular
tags, while tags, on the other hand, become popular when many users attach them to popular
resources. Personalized SocialPageRank algorithm is an extend of SocialPageRank which
transform into a topic –sensitive ranking algorithm. It emphasizes weights within the input
matrices of socialpageRank , so that preferences can be considered to a certain context.
4.3 GroupMe! Ranking Algorithm (GRank)
GRank, a search and ranking algorithm optimized for GroupMe! folksonomies.. The GRank
algorithm computes a ranking for all resources, which are related to a tag tq with respect to
the group structure of GroupMe! Folksonomies.
6. 4.4 Comparison of Algorithms
Study conducted by Fabian Abel on del.icio.us dataset shows that GFolkRank did better
than FolkRank and SocialPageRank as far as the overlapping similarity goes.
They then tested them on untagged resources (because a lot of the time not everything is
tagged up) and FolkRank has shown a better result.
5.0 Web page Recommender system using folksonomy
Satoshi Niwa11 et.al in their paper Web Page Recommender System based on
Folksonomy Mining, described various algorithm to find out various aspects of folksonomy
like affinity level between users and tags, similarity between tags, Cluster tags, affinity level
between users and tag, Calculate recommendation pages to each user.
6.0 Conclusion
Present trend shows increased presence of real time web in all areas. The amounts of data
generated by folksonomies are being increased exponentially. So it is a great challenge for
search engines to index and rank these data. The algorithms described above are giving only
partial success in their experimental stage. As social bookmarking and folksonomies have an
inevitable relation personalized human behavior and cognition, there need to conduct more
researches in this area in collaboration with Natural Language processing,
6.0 References
1. Robert Graham, Brian Eoff, and James Caverlee, "Plurality:A Context-Aware Personalized
Tagging System", WWW 2008, April 21-25, 2008, Beijing, China
2. Fabio Crestani and Ian Ruthven, "Introduction to special issue on contextual information
retrieval systems",Information Retrieval ,Vol.10, No.2, pp.111-113
3. Massimo Melucci, “A basis for information retrieval in context”, ACM Transactions on
Information Systems, Vo.26, No.3, ACM Press, June 2008
4. 7-G. Macgregor and E. McCulloch: Collaborative tagging as a knowledge organization and
resource discovery tool In Library Review 55 (5), pp. 91-300.
5. 14-C. Shirky: Ontologies are overrated: categories, links, and tags
Clay Shirky’s writings about the Internet Retrieved from:
http://www.shirky.com/writings/ontology_overrated.html
6. 17-J. Trant: Exploring the potential for social tagging and folksonomy in art museums:
Proof of concept In New Review in Hypermedia and Multimedia, Volume 12 (1), June 2006 ,
pp. 83-105.
7. Adam Mathes. Folksonomies – Cooperative Classification
7. and Communication Through Shared Metadata, December
2004. http://www.adammathes.com/academic/computer-mediatedcommunication/
folksonomies.html.
8. Andreas Hotho, Robert Jäschke, Christoph Schmitz, and
Gerd Stumme, “Information Retrieval in Folksonomies:
Search and Ranking”, Proceedings of the 3rd European
Semantic Web Conference. Budva, Montenegro, pp.411-
426, 2006.
9. Andreas Hotho,1 Robert J¨aschke,1;2 Christoph Schmitz,1 Gerd Stumme1;2, Information
Retrieval in Folksonomies: Search and Ranking
10. S. Bao, G. Xue, X. Wu, Y. Yu, B. Fei, and Z. Su.Optimizing Web Search using Social
Annotations. In Proc. of 16th Int. World Wide Web Conference (WWW '07), pages 501{510.
ACM Press, 2007.