1. Building a structured catalog for educational
datasets
Stefan Dietze
04/07/13 1Stefan Dietze
2. Linked Open (educational) Data
LOD: 300+ datasets, 32 billion
distinct RDF statements
DataHub: 6000+ open datasets
2
LinkedUp: FP7-ICT-2012-8, CSA
(http://linkedup-project.eu)
Goal: enabling large-scale take-up of (Linked) Open Data
(education as application context)
3. Linked Open (educational) Data
LOD: 300+ datasets, 32 billion
distinct RDF statements
DataHub: 6000+ open datasets
http://datahub.io/dataset/bbc
60.000.000 triples
Using/exploiting Linked Data in Education ?
Lack of reliable dataset metadata about
Resource types
Topics & disciplines
Quality, currentness & availability
Provenance
Lack of links and cross-dataset references
Lack of scalable query methods
Example dataset
description
3
4. 04/07/13 4Stefan Dietze
Linked Data „Observatory“ – Processing Chain
Endpoint Retrieval
& Graph
Extraction
Schema
Extraction and
Mapping
Sample Graph
Extraction
(per dataset)
NER & NED
(per resource)
Interlinking & Co-
Resolution
(cross-dataset)
Category Mapping,
Normalisation,
Filtering
Dataset
Catalog/Index
Links/
Cross-references
rdfs:label:„…ECB….“
?
Dataset metadata (RDF/VoID):
Schema mappings
(types, properties)
Entities & categories
Topic relevance scores
Availability, currentness
data (tbc)
dbpedia:Finance
dbpedia:Sports
dbpedia:England-Wales-Cricket-Board
dbpedia:European_Central_Bank
Goals:
RDF catalog of datasets
dataset of datasets
(classification of datasets
according to, eg,
represented types,
disciplines/topics, data
quality, accessability)
Links and coreferences =>
unified view on data =>
Linked Education Graph
Infrastructure & APIs for
federated queries
5. 04/07/13 5Stefan Dietze
Linked Data „Observatory“ – Processing Chain
Endpoint Retrieval
& Graph
Extraction
Schema
Extraction and
Mapping
Sample Graph
Extraction
(per dataset)
NER & NED
(per resource)
Interlinking & Co-
Resolution
(cross-dataset)
Category Mapping,
Normalisation,
Filtering
Dataset
Catalog/Index
Links/
Cross-references
rdfs:label:„…ECB….“
?
Dataset metadata (RDF/VoID):
Schema mappings
(types, properties)
Entities & categories
Topic relevance scores
Availability, currentness
data (tbc)
dbpedia:Finance
dbpedia:Sports
dbpedia:England-Wales-Cricket-Board
dbpedia:European_Central_Bank
Assessing the Educational Linked Data
Landscape, D’Aquin, M., Adamou, A.,
Dietze, S., ACM Web Science 2013
(WebSci2013), Paris, France, May 2013.
Complex Matching of RDF Datatype
Properties, Nunes, B. P., Mera, A.,
Casanova, M. A., Fetahu, B., Paes Leme, L.
Dietze, S., 24th International Conference on
Database and Expert Systems Applications
– DEXA 2013, August 2013, Prague, CR.
Combining a co-occurrence-based and a
semantic measure for entity linking, B. P.
Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl. , ESWC
2013 - 10th Extended Semantic Web
Conference, (May 2013).
Indexing of Linked Data, What’s all the
data about, Fetahu, B; Adamou, A., Dietze,
S., d’Aquin, M., Nunes, B.P., ISWC2013 –
12th International Semantic Web
Conference; under review.
A Probabilistic Scheme for Keyword-
Based Incremental Query Construction.,
Demidova, E., Zhou, X, Nejdl, W., IEEE
Transactions on Knowledge and Data
Engineering, 24(3):426-439, 2012.
[DEXA13]
[WEBSCI13]
[ESWC13]
[ISWC13?]
[TKDE12]
6. 04/07/13 6Stefan Dietze
<yov:Lecture8748720>
<yov:title>Pluto & the
Dwarf Planets</yov:title>
…
< yov:Lecture8748720>
Online Lecture
<ss:SlideSet-2139393292>
<title>Planetary motion
& gravity</title>
…
</ss:Slideset-2139393292>
Lecture Slideset
Relatedness of resources/entities?
(types, semantics)
Metadata about datasets?
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Video Documentary
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
Challenge: data heterogeneity
7. 04/07/13 7Stefan Dietze
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
Data disambiguation, linking & annotation
<yov:Lecture8748720>
<yov:title>Pluto & the
Dwarf Planets</yov:title>
…
< yov:Lecture8748720>
Online Lecture
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Brian Cox?
Sun?
Pluto?
Video Documentary
8. db:Pluto
(Dwarf Planet)
db:Astrono-
mical Objects
db:Sun
04/07/13 8Stefan Dietze
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
Data disambiguation, linking & annotation
db:Astronomy
<yov:Lecture8748720>
<yov:title>Pluto & the
Dwarf Planets</yov:title>
…
< yov:Lecture8748720>
Online Lecture
<ss:SlideSet-2139393292>
<title>Planetary motion
& gravity</title>
…
</ss:Slideset-2139393292>
Lecture Slideset
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Video Documentary
9. db:Pluto
(Dwarf Planet)
db:Astrono-
mical Objects
04/07/13 9Stefan Dietze
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
Data disambiguation, linking & annotation
<yov:Lecture8748720>
<title>Pluto & the Dwarf
Planets</title>
…
< yov:Lecture8748720>
Online Lecture
db:Astronomy
Computation of connectivity scores
between resources/entities
Method: combination of a
(i) semantic (graph-based) connectivity
score (SCS) with
(ii) a Web co-occurence-based measure
(CBM) (similar to NGD)
For (i): adaptation of Katz-Index from SNA
for (linked) data graphs (considering path
number and path lengths of transversal
properties)
Data linking
Dataset categorisation: computation of
normalised (DBpedia) category relevance
scores for datasets
db:Sun
SCS = 0.32
CBM = 0.24
<ss:SlideSet-2139393292>
<title>Planetary motion
& gravity</title>
…
</ss:Slideset-2139393292>
Lecture Slideset
<po:Programme519215>
<po:Series>Wonders of the Solar
System</po:Series>
<po:Episode>Emp. of the Sun</po:Episode>
<po:Actor>Brian Cox</po:Actor>
</po:Programme519215 >
Video Documentary
10. Data disambiguation, linking & annotation
Combining a co-occurrence-based and a semantic measure
for entity linking, B. P. Nunes, S. Dietze, M.A. Casanova, R.
Kawase, B. Fetahu, and W. Nejdl., ESWC 2013 - 10th Extended
Semantic Web Conference, (May 2013).
04/07/13 10Stefan Dietze
Evaluation based on USA Today News items (80.000 entity pairs)
Manually created gold standard
(1000 entity pairs)
Baseline: Explicit Semantic Analysis (ESA)
=> CBM/SCS: „relatedness“; ESA: „similarity“
Precision/Recall/F1 for SCS, CBM, ESA.
11. Enhanced dataset descriptions
on the DataHub
Dataset RDF graph: correlations
based on semantic annotations (categories)
Dataset classification: expanded dataset catalog & graph
04/07/13 11Stefan Dietze
http://linkedup-project.eu
http://data.linkededucation.org/linkedup/catalog/
Assessing the Educational Linked Data Landscape,
D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science
2013 (WebSci2013), Paris, France, May 2013.