SEMANTIC
RECOMMENDATION SYSTEMS
FOR RESEARCH 2.0
OR
A Conceptual Prototype for a Twitter based Recommender
System for Research 2.0
by Patrick Thonhauser
Thursday, October 11, 12
OUTLINE
• Motivation
• Basics (Semantic Web, Recommender Systems, Natural
Language Processing)
• Conceptual Prototype
• Test results and Discussion
• Questions
Thursday, October 11, 12
MOTIVATION
• Is Twitter useful for discovering new connections between
researchers in similar subject areas (and why Twitter)?
• How much information can we extract form 140 character
strings?
• Is it possible to separate useful information from noise?
• Are there any appropriate classifiers and metrics to measure
the significance of Twitter users and Tweets?
Thursday, October 11, 12
SEMANTIC WEB
• Additional Layer of Information
• Linked Data (use URIs as names, use HTTP URIs, use
standards to provide Information, include links to other URIs)
• RDF (based on triples -> subject, predicate, object) is like
HTML for the classic web
• Nearly all semantic web standards are based on RDF (like
FOAF - Friend of a Friend Project)
Thursday, October 11, 12
RECOMMENDER SYSTEMS
• Collaborative Filtering (user based/ item based)
• Content Based Recommendation
• Knowledge Based Recommendation
• Hybrid Recommendations
Thursday, October 11, 12
NATURAL LANGUAGE
PROCESSING (NLP)
• Classification of Microtext Artefacts (This presentation is killer!)
• Applying NLP - Pipelines
• End of Sentence Detection
• Tokenization
• POS Tagging
• Chunking
• Extraction
Thursday, October 11, 12
THE CONCEPT
OF THOUGHT
BUBBLES
Let’s imagine every Twitter
user belongs to several
different topic related Bubbles
Thursday, October 11, 12
LET’S SUMMARIZE
•A user is part of topic related bubbles
• Twitter users within topic related bubbles don’t necessarily
know each other
• Connections of already existing connections of the service user
lead to new information
• Non bidirectional connections preferred
So how can we find such potentially
interesting users?
Thursday, October 11, 12
PROOF OF CONCEPT SYSTEM
(1) Preselection of user set, which will
be analyzed in depth A USERS
THOUGHT
BUBBLE
SPORTS
(2) Apply NLP-Pipeline for measuring
user similarity SERVICE
IOS DEV
USER
(3) Categorize the top-n best scoring
TW
RE ITT SOCIAL MEDIA
ST ER T
BU HO
users according to the idea of
AP BB UG
I LE HT
S
AP
Thought Bubbles
I
PRE-
FILTERING NLP
(4) Recommend top-n best scoring CLUSTERING
users of a category to the user
DB
CATEGORI ANALYZE
SATION RECS
SERVER
(5) Analyze acceptance of
recommendations
Thursday, October 11, 12
(1) PRE-SELECTION/FILTERING
Filter accounts Filter accounts where: Filter non
that are already follower_count < 300 English speaking
connected to you status_count < 1000 accounts
Friends of
Identifiy People
Friends
Filter Filter Filter by using a simple
Twitter
NLP Pipeline
Accounts
Set of Twitter accounts for
further processing
• The set of friends of friend‘s Twitter accounts changes from
iteration to iteration
• Filtersare added after analyzing the acceptance of
recommendations
Thursday, October 11, 12
(2) NLP PIPELINE
Tokenization and Neglect 200 most
stripping
Raw Tweets @mentions and
POS tagged Tweets used English
words
URLs
[('The', 'AT'),
('grand', 'JJ'),
@testuser The ('jury', 'NN'),
grand jury ('commented',
commented on a POS tagging 'VBD'), ('on', 'IN'), Chunking
number of… ('a', 'AT'),
('number',
'NN'), ... ('.', '.')]
Set of Frequency
Distributed mined Mined nouns and phrases
nouns and phrases
[('jury', 'NN'),
[('jury', 34), 'number',
('social', 23), Frequency 'NN'),
DB ('test case', Distribution ('social
16), ...] dayly',
'NP'), ...]
Filter top n words
400 most recent Tweets of a potential recommendation are
used for calculating the similarity measure
Thursday, October 11, 12
• Calculate top-n users by applying Single-Linkage-
Clustering
• Categorize if user belongs to user specific bubbles
• Present recommendation lists to users
• Analyze acceptance of recommendations (connect
user accounts with FOAF) and add new filter
predicate if necessary.
Thursday, October 11, 12
UNSUPERVISED TEST RESULTS
The probability that a recommended item is relevant is
64.4%. Standard Derivation: 31.5%
Thursday, October 11, 12
DISCUSSION
Twitter IS useful for discovering new information in sense of
Research 2.0 but:
• Recommendations reflect the Twitter behavior of the user
• Automated tweets harm recommendation results (one
sentence gets an enormous weight because it occurs very
very often)
• Twitter‘s request limitation is a show stopper
• Comparison to similar systems (Content and collaborative
filtering)
Thursday, October 11, 12
THANK YOU!
ANY QUESTIONS?
Thursday, October 11, 12