Linked Open Data means making data more interoperable with other datasets on the web by using URIs as identifiers and triples as atomic building blocks. URIs are assigned to every term and concept, and triples are used to connect terms and represent facts about entities. This allows different machines to understand the relationships between data in a consistent way. Publishing data as Linked Open Data according to these principles can make it easier to query and integrate with other datasets.
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
Basics of Open Data: what you need to know by Wouter Degadt & Pieter Colpaert
1. Data → open and linked
Wouter Degadt & Pieter Colpaert
wouter.degadt@leiedal.be & pieter.colpaert@okfn.org
2. Programme
1. The basics
Data → Open Data → Linked Data
1. Linked Open Data
How to publish data?
3. Data
Wikipedia says:
English (disambiguation): data is uninterpreted information
English (computing): is any sequence of symbols given meaning by specific acts of
interpretation.
Dutch: data is the plural of datum, which is an observation of a fact
6. process
legal
technical
syntactic
object
semantic
↓
Querying
Would the data governance be able
to be merged?
Are you legally allowed to merge 2 datasets?
Can you connect the communication channels?
e.g., merge a dataset published as a CD with a
dataset published using floppy disk
What’s the interoperability of the serialisation
formats? E.g., JSON vs. PDF?
What can you request to the server?
Do the words in the one dataset mean the same
as the words in the other?
How easy is it to ask certain questions over the
borders of the dataset?
7. Open Data
Because non-personal data increases in value when
others reuse it
8. reuse is allowed
Data on the web
reuse in a gray zone unauthorised reuse
10. How can we find open data?
It’s made available through open data portals
http://data.gov.uk,
http://datahub.io,
http://open-data.europa.eu,
http://data.gent.be,
…
Via links in existing datasets
e.g., http://dbpedia.org/resource/Ghent
11. Linked Data
Because it is impossible to store all the world’s
knowledge on one machine
12. name type same as location
iMinds company IBBT Gaston
Crommenlaan 8
{
“iMinds” : {
“type” : “company”,
“same as” : “IBBT,
“location” : “Gaston
Crommenlaan 8”
}
}
<iMinds>
<type>company</type>
<sameas>IBBT</sameas>
<location>
Gaston Crommenlaan 8
</location>
</iMinds>
Table / CSV / Spreadsheet
JSON XML
14. Machine 1 Machine 2 Machine 3
iMinds
same as
IBBT
World Wide Web
iMinds
is a
company
IBBT
located at
Gaston Crommenlaan 8
15. Probleem
semantic interoperability
The word company is ambiguous. How can we make
sure that machines understand each other?
What about “is a”?
and what about “iMinds”?
16. Solution
Uniform Resource Identifiers (URI’s)
iMinds → http://data.kbodata.be/organisation/0866_386_380#id
is a → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Company → http://www.w3.org/ns/regorg#RegisteredOrganization
een triple = is an atomary piece of data (a datum
or a fact) that cannot be misunderstood on
machine-level in a Web context
17. iMinds
compa
ny
is a
iMinds → http://data.kbodata.be/organisation/0866_386_380#id
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Company → http://www.w3.org/ns/regorg#RegisteredOrganization
20. Linked Open Data cloud: de verzameling
van biljoenen triples gepubliceerd via het
Web
21. Summary
New terms: data quality, data interoperability, triples, open
data, linked open data cloud
Linked Open Data means: making your data more
interoperable with other datasets on the web by using URIs
as identifiers and triples as atomary building blocks
22. Data publishing
iMinds → http://data.kbodata.be/organisation/0866_386_380#id
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization
23. Linked Data principles
1. Use a URI for every term
2. Dereference these URIs over HTTP
3. Return useful information
4. Add links towards useful sources
24. E.g., I’m launching a new company
{mynewcompany} → http://{mynewcompany}.be/#org
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization
Een identifier voor jouw bedrijf en
jij bent baas over de betekenis.
26. E.g., I’m launching a new company
{mynewcompany} → http://{mynewcompany}.be/#org
is een → http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Bedrijf → http://www.w3.org/ns/regorg#RegisteredOrganization
{mynewcompany} → http://{mynewcompany}.be/#org
heeft een home page → http://xmlns.com/foaf/0.1/homepage
http://{mynewcompany}.be/
Hoe data hergebruiken komt na de pauze aan de hand van concrete voorbeelden
Wat betekent het woord data precies?
Is dit goeie data? Waarom? Voorstellen om de data beter te maken?
https://github.com/datasets/employment-us/blob/master/archive/aat1.txt
2 soorten feedback:
Over de structuur van de data → hoe snel bruikbaar voor mijn use-case
Over de inhoud van de data → hoe dicht bij de realiteit
Categorisaties van interoperabiliteit tussen 2 datasets verschillen enorm afhangende van de context. Als je ergens over interoperabiliteit leest, bekijk heel goed wat er nu net bedoeld wordt.
Deze categorisatie is samengebracht uit verschillende literatuur:
ISA
Rezaei
Alle data die nu al op uw website staat zouden moeten kunnen worden hergebruikt voor andere doeleinden. Zo bouwen we aan een gedecentraliseerde kennisdatabank. Bvb: je doet een evenement, je hebt contactgegevens van uw werknemers, je publiceert aankomsttijden van bussen, enzovoort. Laat anderen bouwen bovenop jouw website.