Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 Conference - User Modeling Personalization and Adaptation - Montreal (Canada) - Joint Work University of Bari and Philips Research, presented at the Industrial Track of the Conference
Call Girls Service Bantala - Call 8250192130 Rs-3500 with A/C Room Cash on De...
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides
1. UMAP 2012 - Industrial Track
Montréal (Canada), 19.07.2012
Cataldo Musto, Fedelucio Narducci, Pasquale Lops, Giovanni Semeraro, Marco de Gemmis (University of Bari, Aldo Moro)
Mauro Barbieri, Jan Korst,Verus Pronk and Ramon Clout (Philips Research, Eindhoven, The Netherlands)
Enhanced Semantic TV-Show Representation
for Personalized Electronic Program Guides
2. exponential growth
of available TV assets
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
3. Some stats
4 hours watched every day
out of 3000 hours of broadcast TV shows
ratio
0.013%
source: Nielsen Survey, 2011
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
4. Information Overload
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
5. what TV shows should I watch?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
6. industrial scenario
how does Philips cope with the
overload of TV shows?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
7. solution
personalization.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
8. recommender systems
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
9. content-based recommenders
key concepts
• Each item (TV show) has to be described through a set
of features
• Description of TV shows, plot of the movie and so on.
• Each user is described through the features that occur
in TV shows she watched (liked) in the past
• Recommendations are provided by calculating the
overlap between the textual description of the TV
show and the features stored in the user profile
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
10. content-based recommenders
example: TV shows recommendations
user profile recommendations
♥
basketball nba (basketball)
♥
football documentary
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
11. content-based recommenders
example: TV shows recommendations
user profile recommendations
♥
basketball nba (basketball)
X
♥
football documentary
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
12. content-based recommenders
example: TV shows recommendations
user profile recommendations
♥
basketball nba (basketball)
X
♥
football documentary
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
13. personal channels
‘in vitro’ experiments
concept
Idea: combining boolean filters to filter TV shows
and recommenders to rank them.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
14. Watchmi plug-in
developed by Aprico.tv
‘in vitro’ experiments
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
15. problem
descriptions of TV shows are often
too short or poorly meaningful
to feed a content-based
recommendation algorithm
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
16. solution
feature generation techniques
based on open knowledge sources
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
17. solution
feature generation techniques
based on open knowledge sources
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
18. explicit semantic analysis
• Explicit Semantic Analysis (ESA) (Gabrilovitch and Markovitch, 2006)
• Goals To introduce a methodology for representing the knowledge
stored in Wikipedia
• To define a relationship between terms in natural language and
Wikipedia articles
• Insights
• ESA provides a vector-space representation for each term
• Terms are represented as rows in a matrix (called ESA matrix) where
each column is a Wikipedia concept (article)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
19. ESA representation
term/document matrix
a1 a2 a3 a4 a5 a6 a7 a8 a9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
20. ESA representation
term/Wikipedia articles matrix
a1 a2 MotoGP a4 a5 a6 a7 a8 a9
t1 ✔ ✔ ✔ ✔
t2 ✔ ✔ ✔ ✔
t3 ✔ ✔ ✔
t4 ✔ ✔ ✔ ✔
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
21. ESA representation
MotoGp
Cat$[0.92]+
Superbike (0.92)
Every Wikipedia article is a concept Leopard$[0.84]+
grand prix (0.76)
Each concept is represented through the
TF-IDF scores of the terms that occur in the valentino rossi
Roar$[0.77]+
article (0.59)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
22. ESA representation
term/Wikipedia Articles matrix
Politics MotoGP Basketball M.Biaggi V.Rossi
Superbike ✔ ✔ ✔
t2 ✔
t3 ✔
t4 ✔ ✔ ✔
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
23. ESA representation
Each term can be defined upon the
Wikipedia concepts it occurs in
MotoGP Max Biaggi Jane.
Bridgestone
Superbike
Cat$ Cat$ Panthera(
Fonda$
[0.95]$
(0.92) (0.63)
[0.92]( (0.43)
[0.07]$
“ the semantics of a term is the vector of its associations with Wikipedia articles”
the whole vector is called
Semantic Interpretation Vector
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
24. ESA representation
semantics of text fragments
Mouse+ Mouse+ Mickey% John+
mouse& rodent* computing* Mouse% Steinbeck&
[0.91]& [0.89]& [0.81]% [0.17]&
Dick+ Mouse+ Game%
Bu#on&
bu#on& [0.93]&
Bu#on& computing* Controller%
[0.84]& [0.81]& [0.32]%
DragB+
mouse++ Mouse+ Mouse+ IBM&
andB
computing* rodent* PS/2*
bu#on& [0.85]& [0.46]& [0.35]&
drop&
[0.32]&
calculated as the centroid vector of the semantic
interpretations vectors that compose the fragment
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
25. ESA has already been adopted for text
classification, information retrieval and
semantic relatedness computation
Research Question
How can we exploit ESA for performing
feature generation in the scenario of EPGs
personalization?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
26. From BOW to eBOW
Given a description of a TV show, we exploit ESA to
obtain an enhanced representation
The original set of features is enriched with the set of
Wikipedia articles related the most with the TV show
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
27. From BOW to eBOW
algorithm
Concept$ Concept$ Concept$ Concept$
centroid BOW$ 1! 47! 50! n$
vector [0.85]$ [0.46]$ [0.35]$ [0.32]$
The centroid vector of the whole description of the TV
show is calculated
The n most related Wikipedia concepts are extracted
Concepts are added to the original BOW to
obtain an enhanced BOW (e-BOW)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
28. From BOW to eBOW
example
Wikipedia(Articles(
großer&preis&von&italien&
(motorrad)&
großer&preis&von&malaysia&
(motorrad)&
großer&preis&von&tschechien&
(motorrad)&
scuderia&ferrari&
valen8no&rossi&
motorrad9wm9saison&2005&
motorrad9wm9saison&2006&
max&biaggi&
TV SHOW großer&preis&der&usa&(motorrad)&
Rad an Rad motorrad9wm9saison&2008&
Die besten Duelle der MotoGP rad&(heraldik)&
(Wheel to wheel loris&capirossi&
The best duels in the MotoGP) shin’ya&nakano&
motogp&
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
29. what about the
advantages?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
30. example
BOW representation
user profile tv show
motogp
sports 2012 Superbike
motorbike Italian Grand
Prix
...
competition
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
31. example
BOW representation
X
user profile tv show
motogp
sports 2012 Superbike
No matching! Italian Grand
motorbike
Prix
...
competition
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
32. example
eBOW representation
user profile tv show
motogp
superbike
2012 Superbike
sports
motorbike
Italian Grand
formula 1 Prix
...
competition
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
33. example
eBOW representation
✔
user profile tv show
motogp
superbike
2012 Superbike
sports
Matching! Italian Grand
motorbike
formula 1 Prix
...
competition
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
34. ESA advantages
knowledge is fluid.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
35. ESA advantages
knowledge is fluid.
it is necessary to exploit open and
always updated knowledge sources
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
36. example
concept: ‘American Politics’
Year Enrichment
2000 Clinton
2005 Bush
2011 Obama
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
37. (counter)example
concept: ‘Italian Politics’
Year Enrichment
2000 Berlusconi
2005 Berlusconi
2011 Berlusconi
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
38. experiments.
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
39. design of the experiments
task
• retrieval task
• Given a set of program types and a repository of TV
shows
• We want to retrieve the shows that belong to a
specific program type
Movie
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
40. dataset
Aprico.tv data
• Dataset
• 47 German-language Channels provided by Axel Springer
• 133k TV Shows, 17 program types
• Textual features: title, synopsis, description, program type
• Explicit Semantic Analysis
• Dump: October, 2010
• 814,013 terms (rows) and 484,218 articles (colums)
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
41. design of the experiments
learning methods
• Two state-of-the-art learning methods have been compared
• Random Indexing
• Vector Space Model (VSM)-based representation
• Incremental approach to compress the representation in an effective
way
• Both TV shows and user profile are points in a vector space
• Logistic Regression
• Supervised Learning Method, state of the art for Text Classification
• Each TV show is classified as relevant or not relevant for the user,
according to user profile
• TV shows can be ranked according to their probability scores
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
42. design of the experiments
research questions
1.
Which one is the learning method than can
provide the best recommendations ?
2. Does the idea of enriching the
BOWs with ESA improve the
accuracy of the suggestions ?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
43. experiment 1
results
100
Logistic Regression
Random Indexing
87,5
75
62,5
50
P@5% P@10% P@25% P@50% P@75% P@100%
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
44. experiment 2
results
95
BOW
eBOW (+20)
eBOW (+40)
eBOW (+60)
89,75
84,5
79,25
74
P@5% P@10% P@25% P@50% P@75% P@100%
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
45. experiment 2
results
95
BOW
eBOW (+20)
eBOW (+40)
eBOW (+60)
89,75
84,5 Differences between BOW
and eBOW(+40, +60) are
statistically significant
79,25 (Mann-Whitney Test,
p<0,005)
74
P@5% P@10% P@25% P@50% P@75% P@100%
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
46. Recap
• Content-based Personalization Techniques for
Electronic Program Guides
• Joint work: Philips Research - Aprico.tv - University of Bari
• Feature generation to enrich textual descriptions of TV shows
• Exploitation of ESA: Explicit Semantic Analysis
• Introducing eBOW for content representation
• BOW + Wikipedia concepts related to the textual description
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
47. Conclusions
• Linear Regression can provide good accuracy in retrieving
related TV shows
• Almost 90% in precision.
• Feature Generation techniques based on Wikipedia can
improve the precision of a content-based recommendation
approach
• eBOW representation overcomes the classical BOW
representation
• Good results: 94% in precision
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12
48. questions?
C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst,V. Pronk, R. Clout
Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 18.07.12