Profiling User Interests on the Social Semantic Web

Profiling User Interests
on the
Social Semantic Web
Ph.D. Viva
Fabrizio Orlandi

1 – Heterogeneous data sources
Sport
CEV Volleyball Cup
Music
Heavy Metal
Mastodon
Atlanta
…
Microblog?
Challenges
5 / 37
Social
Networking
Service?

2 – Lack of provenance
Sport
CEV Volleyball Cup
Music
Heavy Metal
Mastodon
Atlanta
…
Where?Who?
How?
Challenges
6 / 37
What?

3 – Semantics of entities of interest
Sport
CEV Volleyball Cup
Music
Heavy Metal
Mastodon
Atlanta
…
Semantics?
Pragmatics?
Relevance?
Challenges
7 / 37

Research Questions
1. Aggregation of Social Web data:
How can we aggregate and represent user data distributed across
heterogeneous social media systems for profiling user interests?
2. Provenance of data for user profiling:
What is the role of provenance on the Social Web and on the Web of
Data and how to leverage its potential for user profiling?
3. Semantic enrichment of user profiles and personalisation:
How to combine data from the Social and Semantic Web for enriching
user profiles of interests and deploying them to different
personalisation tasks?
8 / 37

Research Goal
How can we collect, represent, aggregate, mine, enrich and
deploy user profiles of interests on the Social Web for
multi-source personalisation?
9 / 37

11 / 37

Aggregation of Social Web Data
 Modelling solution for Social Web data and user profiles
 Based on SIOC, FOAF and extensions
 Experiments on wikis
[Orlandi, Passant. WikiSym. ACM. 2010.] 12 / 37

Music
Heavy Metal
Mastodon
Atlanta
CEV Champions League
Volleyball
Semantic Web
RDF
“Mastodon is the best heavy metal band from Atlanta…
Can’t wait to see them live again!”
“Trentino vs Lugano about to start - Diatec youngster to
impress again in CEV Champions League #volleyball”
User likes RDF and SemanticWeb on Facebook
• Natural language
processing tools
for entity extraction
(Zemanta & Spotlight)
• Frequency + time-decay
weighting schemes
Example
13 / 37

Aggregation and Mining of Interests
14
7 types of user profiling strategies:
2 types of DBpedia entities: Categories vs. Resources
2 types of weighting-scheme for category-based methods
- Cat1: Interests Weight Propagation
- Cat2: Interests Weight Propagation w/ Cat. Discount
2 types of exponential Time Decay function
- Short mean lifetime
- Long mean lifetime
1 “bag-of-words” (Tag-based) state-of-the-art approach
days120
days360

 Evaluation
 User study: 21 users rating their user profiles from Twitter & Facebook
 210 ratings for each of the 7 different profiling methods
Aggregation and Mining of Interests
0
0.2
0.4
0.6
0.8
1
P@10
AVG
Score
 Key findings
 DBpedia resource-based profiles
outperform Dbpedia category-based and
tag-based profiles.
 Best strategy: Resources + Frequency &
Slow Time Decay weighting scheme
[Orlandi, Breslin, Passant. I-Semantics. ACM. 2012.] 15 / 37

16 / 37

Motivation: use of provenance information as core of the profiling heuristics
to improve mining of user interests and semantic enrichment
 Data Provenance as the history, the origins and the evolution of data
Who created/modified it? When? What is the content? Where is it located?
How and Why was it created? Which tools and processes were used?
Provenance of Data
 Provenance as the “bridge” between
Social Web and Web of Data
e.g. Wikipedia/DBpedia
17 / 37

Use Case: Provenance on Wikis
Provenance on the Social Web
for the Web of Data
 A semantic model to represent provenance information in wikis
 A software architecture to extract provenance from Wikipedia
 An application that uses and exposes provenance data to compute measures
and statistics on Wikipedia articles
[Orlandi, Champin, Passant. SWPM at ISWC. 2010.] 18 / 37

Provenance on the Social Web
19 / 37

 Using detailed provenance information extracted from Wikipedia we are
able to compute provenance also for DBpedia resources.
 Analyzing the “diffs” between the revisions of Wikipedia articles and the
users' contributions we identify the edits on Wikipedia that resulted in a
change in the related DBpedia resource.
 We built a model and an application that shows provenance information for
each triple on DBpedia that is the result of users' edits on Wikipedia.
Provenance on the Web of Data
for the Social Web
Use Case: Provenance on DBpedia
[Orlandi, Passant. Journal of Web Semantics. 2011] 20 / 37

Semantic provenance in DBpedia
• Using detailed provenance information extracted from Wikipedia we are able
to compute provenance also for DBpedia resources.
• Analyzing the “diffs” between the revisions of Wikipedia articles and the
users' contributions we identify the edits on Wikipedia that resulted in a
change in the related DBpedia resource.
• We built an application that shows provenance information for each triple on
DBpedia that is the result of users' edits on Wikipedia.
21 / 37

Provenance for Profiling Interests
Different provenance features to support interest mining
 Not only: authorship and temporal features
 But also: social media source, object, type of action, …
22 / 37

Provenance for Profiling Interests
User study: 27 users on Twitter and Facebook
They evaluated their aggregated and provenance-aware user profiles
Social Feature Score
E FB education 4.62
E FB workplace 4.60
I TW followees’ posts 4.03
I FB checkins 3.95
E FB interests 3.95
E FB likes 3.92
I TW favourite posts 3.76
I TW retweets 3.76
I TW posts 3.61
I TW replies 3.52
I FB status updates 3.50
I FB media actions 3.24
I FB comments 2.56
I FB direct posts 2.37
 AVG Scores from 1 to 5
 Locations, explicit profile info
and also followees’ posts
provide better accuracy for
mining user interests
 Interests stated explicitly by
users produce user profiles 20%
more accurate than implicitly
1 3 5
[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM WI. 2013] 23 / 37

3. Semantic enrichment of user profiles and personalisation:
How to combine data from the Social and Semantic Web for enriching
user profiles of interests and deploying them to different
personalisation tasks?
24 / 37

Semantic Enrichment
db:Montreal
db:Quebec
db:Gilles_Villeneuve
db:Ferrari db:Formula_1
dbo:wikiPageWikiLink
dbo:wikiPageWikiLink
dbo:birthPlace
dbp:largestcity
25 / 37

Music
Heavy Metal
Mastodon (band)
CEV Champions League
Volleyball
Semantic Web
RDF
Example
Are all the extracted entities useful for personalisation?
How are concepts/entities being used on the Social Web? (Pragmatics)
Very abstract, very popular
Specific and time-dependent on events, etc.
Specific and time-dependent on events, etc.
Abstract and not popular
Abstract and popular
Specific and not popular
Very popular
26 / 37

Characterising Concepts of Interest
27
Novel measures for the characterisation and semantic expansion of
concepts of interest
 Enrichment of entity-based user profiles for personalisation
 Popularity of concepts on the Social Web (using Twitter)
 How popular an entity is on the Social Web? How frequently is it
mentioned/used at that point of time?
 Trend and temporal dynamics (using Wikipedia page views)
 The trend and evolution of the frequency of mentions of an entity on
the Social Web (i.e. popularity over time)
 Specificity and categorisation of entities of interest (using LOD)
 The level of abstraction that an entity has in a common conceptual
schema shared by humans
27 / 37

Requirements
Use case: real-time personalisation of Social Web streams
1. Real-time computation of the dimensions
2. Results constantly up to date with the real world
3. Knowledge base and domain independent approach
28 / 37

Popularity?
[Orlandi, Kapanipathi, Sheth, Passant. IEEE/ACM, WI 2013]
Characterising Concepts of Interest
Trendy and
Stable?
Specificity?
29 / 37

Real-time Semantic Personalisation of
Social Web Streams
“SPOTS”: A methodology for real-time personalisation of any large
social stream
 Automatic dynamic generation of multi-source user profiles of interests.
 Semantic enrichment of concepts of interest with provenance and Linked
Data info.
 Ranking and selection of the interests according to their relevance for the
user and for the personalisation use case.
 Informativeness measures for posts to filter a large social stream.
 Evaluation of the approach on the public Twitter stream
 Against Twitter #Discover: from 192% increase in accuracy
30 / 37

[Kapanipathi, Orlandi, Sheth, Passant. SPIM at ISWC 2011.]
31
Real-time Semantic Personalisation of
Social Web Streams
31

Evaluation on SPOTS
User study to evaluate the impact of the enrichment on a
personalisation use case
27 users, 800 user ratings collected
Main outcome:
 Popularity and Temporal Dynamics are useful measures for real-time
personalisation
SPOTS Improvement*
No Enrichment ---
Trendy +29%
Not Stable +26%
At Least 2 Features +9%
Specific + Not Popular +5%
* In recommendations accuracy over non-enriched profiles 32 / 37

Evaluation on User Profiles
User study to evaluate the impact of the enrichment on user profiles
according to users’ judgement
27 users, 800 user ratings collected
Main outcome:
 Specificity is more useful than popularity measures according to user perception
User Profiles Improvement*
No Enrichment ---
Not Specific + Not Popular +13%
Not Specific +8%
Not Popular +2%
Stable + Not Trendy +1%
* In profile accuracy over non-enriched profiles 33 / 37

Summary
34 / 37[Orlandi, UMAP 2012]

Summary
 We provide and evaluate a complete methodology for profiling user
interests across multiple sources on the Social Web
 Collect, Represent, Aggregate, Mine, Enrich, Deploy
 Aggregation of user data:
• Semantic representation of Social Web content and user activities
 Provenance of data:
• Improves profiling accuracy and connects Social Web and WoD
 Mining of user interests:
• Provenance + Linked Data/Entity-based strategies + time decay, outperform
traditional “bag-of-words” strategies and facilitate enrichment
 Semantic enrichment:
• Improves profiling accuracy and it is necessary for the deployment of the
profiles in a personalisation use case
• Different types of personalisation need different entities of interest
35 / 37

Future Work
Federated Personal Data Manager
 Privacy-aware, interoperable, autonomous,
user profiling infrastructure
Provenance at Web Scale
 Necessary to focus on techniques for an easier and less expensive tracking and
management of provenance on the Social Semantic Web
Adaptive Profiling of User Interests
 Adaptation of the profiling algorithm and strategy according to the application and
the context
36 / 37

Contributions & Dissemination
 Semantic Web modelling solutions for Social Web data, user
profiles, provenance on the Social Web and Web of Data.
 A provenance computation framework
 Novel measures for characterising entities of interest
 A real-time personalisation system for large Social Web streams
 User studies for different profiling strategies, provenance features
and personalisation use-cases
 A privacy-aware user profile management system
Publications
 2 journal, 4 conference, 2 workshop papers
37 / 37
Thanks!

Context
39
User Modelling
• The process of representing a user or some of his/her
characteristics (e.g. interests, workplace, location, etc.)
User Profile
• A characterisation of a user at a particular point of time

Experiment
6 types of user profiles evaluated:
2 types of DBpedia entities
Categories vs. Resources
2 types of weighting-scheme for category-based methods
Cat1: Interests Weight Propagation
Cat2: Interests Weight Propagation w/ Cat. Discount
2 types of exponential Time Decay function
Short mean lifetime
Long mean lifetime
days120
days360

Experiment
6 types of user profiles evaluated:
Cat2
Cat1-120 Cat1-360 Cat2-120 Cat2-360Res-120 Res-360
Res Cat
Cat1

42
User-based Evaluation
 We asked users to rate the top 10 interests generated for each of
the 6 profiling strategies
 Question:
“Please rate how relevant is each concept for representing your
personal interests and context…”
 Rating:
0 (not at all or don't know), 1 (low), 2, 3, 4, 5 (high)
 Rating converted to a (0…10) scale
 Performance evaluated with:
MRR (Mean Reciprocal Rank)
P@10 (Precision at K = 10)
 Comparison with a Baseline
A traditional approach based on “keyword frequency”

Evaluation
On average for:
200 Tweets & 200 Facebook posts, and items.
~106 interests – DBpedia Resources
~720 interests – DBpedia Categories (~7 times)
Statistical significance for:
Resources vs. Categories (p<0.05)
Any method vs. Baseline (p<0.05)
Not for time decay (p~0.2) and Cat1 vs. Cat2

Profiling User Interests on the Social Semantic Web

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (19)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Profiling User Interests on the Social Semantic Web

Ähnlich wie Profiling User Interests on the Social Semantic Web (20)

Mehr von Fabrizio Orlandi

Mehr von Fabrizio Orlandi (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Profiling User Interests on the Social Semantic Web