3. WP6 Objectives
3TITLE
• O.6.1
• External data service design
• Analysis of candidate sources
• Analysis of data extracted
• O.6.2
• External data service employed
• Enrich the EPG data
• Enrich feature extraction data
• Discover links between programs for novel recommendations
• O.6.3
• Publish data to the Linked Open Data cloud
The external data service aims at supporting the recommendation process
by improving the connectivity of TV programs,
which does not surface with the standard EPG metadata.
5. External Data Service 5
"World War II"
"Television Program"
"Green Cross Code"
"Tom Stoppard"
"David Prowse"
synopsis concepts
"In this episode, Larry
meets two veterans who
each lost a limb in World
War 2 to ask how
differently we treat today
's injured soldiers. Plus a
look back at the iconic
Green Cross Code films.
With Stuart Hall and
Miriam Stoppard"
po:long_synopsis
"Larry Lamb"
"Miriam Stoppard"
"Stuart Hall"
po:credit
po:credit
"http://dbpedia.org/resource/Larry_Lamb_(newspaper_editor)"
"http://dbpedia.org/resource/Larry_Lamb_(actor)"
"http://dbpedia.org/resource/Miriam_stoppard"
"http://dbpedia.org/resource/Stuart_Hall_(boxer)"
"http://dbpedia.org/resource/Stuart_Hall_(presenter)"
"http://dbpedia.org/resource/Stuart_Hall_(cultural_theorist)"
"http://dbpedia.org/resource/Stuart_Hall_(musician)"
po:credit
EPG
DWH
Concept
tagging
DBpedia:<LABEL> LABELrdfs:labeldc:subject
Language
Detection
Synopsis
Credits
Title
DBpedia:<concept>
6. Zattoo Data Service: RDF
6WP6: External Data Service
"9966901"
po:pid
"Die allerbeste
Sebastian
Winkler Show"dc:title
"mit Motsi Mabuse,
Lady Bitch Ray und
Sarah Brendel"
zattoo:episode_title
po:masterbrand
"(Premiere in
Einsfestival )"
po:long_synopsis
po:category
po:episode
rdf:type
po:credit
po:credit
po:credit
"guest"
"Sarah
Brendel"
"guest"
"Motsi
Mabuse"
"guest"
"Lady
Bitch Ray"
po:role
po:alias
po:role
po:alias
po:role
po:alias
po="http://purl.org/ontology/po/"
zattoo="http://zattoo.com/"
dc="http://purl.org/dc/elements/1.1/"
rdf ="http://www.w3.org/1999/02/22 rdf syntax ns#"
17. LOD for recommendations
17External Data Service
• LOD datasets provide additional information which can be used to
provide novel TV recommendations
• The challenge is to identify those links which are more useful to be
used in the recommendation process.
• We started to analyze the datasets to identify features which can help
in selecting the right links to use
19. Current & Future Work
1. Continuously adding new sources
2. Continuous improvement of EPG enrichment quality
• complimentary services
• crowdsourcing
3. Defining LOD-based notion of serendipity
4. Further studies on the LOD patterns and their
suitability for recommendations
5. Applying approach in other domains, e.g. books
19TITLE
20. 1. Adding new sources
20TITLE
Dataset
Objects
Triples
Links to ...
DBpedia
3.77 mil
400 mil
27.2 mil
Freebase
23 mil
337 mil
3.9 mil
BBC
60 mil
43.237
BBC music
20 mil
23.000
NYT
10.467
345.889
23.400
MusicBrainz
178 mil
855.754
Flickr
1.95 mil
5.61 mil
3.400.000
LinkedMDB
503.242
6 mil
162 756
GeoNames
8 mil
94 mil
0
LinkedGeoData
1 bil
20 bil
53204
21. 2. Data cleaning
Following the grandeur of Baroque, Rococo
art is often dismissed as frivolous and
unserious, but Waldemar Januszczak
disagrees. […] The first episode is about
travel in the 18th century and how it impacted
greatly on some of the finest art ever made.
The world was getting smaller and took on
new influences shown in the glorious
Bavarian pilgrimage architecture, Canaletto's
romantic Venice and the blossoming of exotic
designs and tastes all over Europe. The
Rococo was art expressing itself in new,
exciting ways.
enrichment
“Canaletto”
ontology:Location
“Rococo”
dbpedia:Rococo_(band)
• Type mis-classification
• URI mis-annotation
v Integration of different text annotators results
v Validation through crouwdsourcing tasks
Collaboration with:
Silvia Giannini
22. 2. Data cleaning
extractor label DBpedia ontology class DBpedia URI
Canaletto ontology:Location dbpedia:Canaletto
TextRazor Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto
Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto
Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto
• Label
• NERD ontology class
• sameAs link
• Label
• DBpedia ontology class
• Dbpedia URI
• Label
• DBpedia category
• Wikipedia page
• Label
• DBpedia ontology class
• DBpedia URI
Type & URI alignment
Voting system: <Canaletto, dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto> 3/4
23. Validate:
• Labels relevance
• Relevant labels types
results
integration
Aggregated
enrichment
(based on
majority vote)
Automatic
integration of
text annotators
for enrichment
Analysis of collected data for:
• Voting system validation
(also URIs)
• Parameters tuning
(e.g., complementarity handling)
Program
synopsis
What if:
• there is a tie-break?
• majority of annotators are wrong?
• more granular alignment ontologies
are adopted to avoid lack of type
(or, type owl:Thing)?
Aggregated
enrichment
(based on majority
vote)
28. 4. LOD-based Patterns for Diversity
28WP6: External Data Service
LOD-based method for increasing
diversity in recommendations
• extracts all the patterns from an
RDF dataset à clusters generated
& measured for diversity
• fed into two statistical models
• to determine, which semantic
patterns can extract subsets of
Linked Data to improve diversity
in recommendations
• data characterization step to
choose model
• diversity measures, e.g. entropy
& semantic similarity
• IMDB & DBPedianoisiness, size & sparsity of LOD
30. References
• Valentina Maccatrozzo, Lora Aroyo and Willem Robert van Hage, Crowdsourced
Evaluation of Semantic Patterns for Recommendations, User Modeling, Adaptation, and
Personalization, Rome, Italy, July 10-14, 2013.
• Valentina Maccatrozzo, Davide Ceolin and Lora Aroyo, LOD Enrichment of TV Programs,
in W3C Italy Event: Linked Open Data: where are we?, Rome, Italy, February 20-21, 2014.
• Valentina Maccatrozzo, Davide Ceolin, Lora Aroyo and Paul Groth, Semantic Pattern-
based Recommender, Extended Semantic Web Conference (ESWC2014), Heraclion,
Greece, May 25-29, 2014.
• Ceolin, Davide, Moreau, Luc, O'Hara, Kieron, Fokkink, Wan, Van Hage, Willem Robert,
Maccatrozzo, Valentina, Sackley, Alistair, Schreiber, Guus and Shadbolt, Nigel (2014) Two
procedures for analyzing the reliability of open government data. Information
Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'2014),
Montpellier, FR, 15 Jul 2014.
30TITLE