Publishing Linked Data using Schema.org

Publishing Linked Data
using Schema.org
Development and management
of e-Repositories – OTA

IODE, Oostende, Belgium,
April 11th, 2013

An introduction to the project of
Mr. Aditya Kakodkar by
Christophe.Dupriez@destin-informatique.com

LinkedData, Why?
● External/Internal (Reference) Data
use and reuse
● (Meta) Data encoded and published along
standardized, perennial and documented
measurement systems and categories
● Massive international efforts for tools and
interlinked repositories development
● Opportunity to become a General Reference on
the Web for a specific domain
● Your work becomes discoverable
and well positioned by Search Engines

Data to be linked ?
●
Metadata provides the context,
links to a MODEL
● Observed Data: source, measure/range, unit...
● Manually entered Data: validation rules
● Aggregated Data:
Which indicator for which decision?
● Published Data: exact? complete? perenial?
● Reference Data: comparability with other data?
● Open Data is (not) Public Data! http://opendatacommons.org
● Personal Data: protection? anonymisation?
● Big Data: dangers? opportunities?

Linking Data in order to...
● Denote an “real life” object,
a concept, a transaction...
– not uniquely enough: sameAs.org
● Document (explain, contextualize)
the data to the user (HTML document page)
● Enrich, linking to other data ...
(RDF data page)

LinkedData semiotic triangle

https://twitter.com/kidehen/status/312571324742651908

RDF: Resource Description
Framework
● A standard to provide (meta)data on the Web
● Based on a very simple model of triplets:
subject – property – object
● Everything is an URI; object can also be a
“constant value” (a text, a number, a date...)
suffixed by an indication of the language
● Example:
dbpedia:European_Herring_Gull rdfs:label “Goéland argenté”@fr
where “dbpedia:” stands for URI prefix:
http://dbpedia.org/resource/
and “rdfs:” stands for URI prefix:
http://www.w3.org/2000/01/rdf-schema#

Being a Gull is not Dull !
● http://en.wikipedia.org/wiki/European_Herring_Gull
● http://dbpedia.org/resource/European_Herring_Gull
which redirects to the document
(HTML for human consumption):
http://dbpedia.org/page/European_Herring_Gull
● Data (for machine consumption) is generated separately in
different formats (N3, Turtle, XML, JSON...) :
http://dbpedia.org/data/European_Herring_Gull.n3
● Browser negotiates the suitable format...
● What is validated there? What are the rules?
● Can it be a reference to take decisions?

Using a single page?
●
RDFa and MicroData are two standards to MERGE
an HTML document (made for humans) and the data
a machine may wish to extract from it
● Example from a page in OceanExpert.net:
<h1>Details of<span itemprop="name">
<span itemprop="familyName">Dupriez</span>
, 
<span itemprop="givenName">Christophe
</span>
</span></h1>
● ANY23.org, an Open Source software to collect data
embedded in a Web Page will be demonstrated later
on OceanExpert.net...

Data Model
● Which processes do we need to automate?
(use cases)
● Which entities (real objects, concepts,
transactions/events) have to be represented?
● How do those entities interrelate?
● What measures (properties) are made about
each type of entity?
● Reuse: who else will align on the same model?
What Google may do with my data?

Schema.org
●
Schema.org is a modelling initiative of
Google / Microsoft / Yahoo to standardize URIs for RDF
properties
● Common model for data published as documents
harvestable on the web
● Their goal is to collect the data in our pages.
Those pages are then better indexed.
What else? (A.I.?)
● Schema.org models are far from exhaustive
(for instance, insufficient for CVs)
but a “/extension” mechanism exists
● Examples on the site http://schema.org

Google RichSnippets
● Google Spider extracts data tagged using RDFa
or MicroData
● Pages with such data are promoted...
● Google Search Engine enriches results using
this data
● Example “Apollo Theatre”:
place, events, reviews...
● Google RichSnippets tool validates a web page:
http://www.google.com/webmasters/tools/richsnippets

Data Search Engine
● ANY23 is used to feed SINDICE,
the Search Engine for RDF data
● Example:
http://www.sindice.com/search?q=apollo+theatre

Publishing Linked Data using Schema.org

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Publishing Linked Data using Schema.org

Ähnlich wie Publishing Linked Data using Schema.org (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Publishing Linked Data using Schema.org