From the Feb 19 2014 NISO Virtual Conference: The Semantic Web Coming of Age: Technologies and Implementations
The Web of Data - Ralph Swick, Domain Lead of the Information and Knowledge Domain at W3C
1. The Web of Data
NISO Virtual Conference
19 February 2014
Ralph Swick, W3C
2. Agenda
• Data is changing our lives
• W3C’s traditional focus
• Expanding scope of W3C’s data activities
3. Web has transformed our relation
to computers and to data
• A computer in every pocket
• Apps leveraging context
– geolocation and other sensors
– social context (“I’m at the conference, too!”)
• Change in the use of search
– people search for answers, not sites
– answers from aggregated data
(Siri, Google Now, Wolfram Alpha)
4. Apps are using data from many
sources
•
•
•
•
Social networking
Mobile devices
Sensors
Open data
5. Imagine…
• A “Web” where
– documents are available for download
on the Internet
– but there would be no hyperlinks
among them
6. Data on the Web is not enough…
• We need a proper infrastructure for a
real Web of Data where:
– data are available on the Web
• accessible via standard Web technologies
– data are interlinked over the Web
– data can be integrated over the Web
• This is Linked Data
7. Agenda
• Data is changing our lives
• W3C’s traditional focus
• Expanding scope of W3C’s data activities
8. Semantic Web Core
•
•
•
•
•
•
•
•
•
•
RDF
RDF Schema
RDB2RDF
SPARQL
SKOS
OWL
RIF
LDP
POWDER
GRDDL
data model
vocabulary design
relational DB export
query
vocabulary description
ontological inference
rules interchange
read-write Web of Data
description resources
app-specific XML
9. Need for RDF schemas
• First step towards the “extra knowledge”:
– define the terms we can use
– what restrictions apply
– what extra relationships are there?
• “RDF Vocabulary Description Language”
– the term “Schema” is retained for historical
reasons…
10. Vocabularies
• There is a need for “languages” to
define such vocabularies
– to define those vocabularies
– to assign clear “semantics” on how new
relationships can be deduced
11. SKOS
• SKOS provides a simple bridge
between the “print world” and the
(Semantic) Web
• Thesauri, glossaries, etc., from the
library community can be made
available
• SKOS can also be used to organize,
e.g., tags, annotate other vocabularies,
…
12. Semantic Web/Linked Data Today
• Standards are mature
– some level of maintenance work is always needed
• Server-side applications dominate
• Commercial applications exist, e.g.:
– direct integration/usage of linked data on the Web
– consumption of other formats converted internally to a
common format (RDF)
13. Challenge: leverage data in
interoperable apps
• Public, private, behind enterprise firewalls
• From informal to highly curated
• From machine readable to human readable
– HTML tables, twitter feeds, local vocabularies,
spreadsheets, …
• Expressed in diverse data models
– tree, graph, table, …
• Serialized in many ways
– XML, CSV, RDF, PDF, JSON, HTML Tables,…
15. Linked Data Principles
Is your data 5 Star?
Available on the Web in some format (i.e., use URI to access the
data)
Available as machine-readable structured data (e.g., excel instead
of an image scan)
As before, but using a non-proprietary format (e.g., CSV instead of
excel)
All the above, plus use open standards (RDF & Co.) to identify
things, so that people could point at your stuff
All the above, plus link your data to other people’s data to provide
context
17. The importance of Linked Data
• Provide a core set of data that
applications can build on
– stable references for “things”,
• e.g., http://dbpedia.org/resource/Kolkata/
– many many relationships that applications
may reuse
– a “nucleus” for a larger, semantically
enabled Web!
18. Linked Data Platform (LDP)
• Define an HTTP/RESTful based
infrastructure to publish, read, write, or
modify linked data
– typical usage: data intensive application in a
browser, application integration using shared
data…
• The infrastructure should be easy to
implement and install
– provides an “entry point” for Linked Data
applications!
• The work is nearing completion
19. RDF with HTML: RDFa
• By adding some “meta” information, the
same source can be reused
– typical example: your personal information,
like address, should be readable for humans
and processable by machines
• Some solutions have emerged:
– add extra statements in microdata or RDFa
that can be converted to RDF
• microdata can be used for a (useful) subset of RDF
• RDFa is, essentially, a complete serialization of
RDF
20. schema.org
• Schema.org is a cooperation of search engines
(Bing, Google, Yahoo!, and Yandex)
• It is a large vocabulary that they all understand
• The terms are extracted from
HTML5+microdata or HTML5+RDFa
– the various partners use it for different purposes
– it can be used by anyone outside of the search
world!
21.
22. Some things to remember when
you publish data
• Publish your data first, do user interfaces later!
– the “raw data” can become useful on its own right
and others may use it
– you can add your added value later by providing nice
user access
• If possible, publish your data in RDF but if you
cannot, others may help you in conversions
– trust the community…
• Add links to other data. “Just” publishing isn’t
enough…
23. Some things to remember when
you publish data (2)
• Think about persistence and versioning
– others may depend on the data you publish…
• Be thoughtful about the URIs you choose
• Try to avoid reinventing the wheel when
choosing vocabularies
24. Some things to remember when
you publish data (3)
• Document your data, i.e., provide
metadata
– there are vocabularies to do this
•
•
•
•
Data Catalog Vocabulary (DCAT)
Vocabulary of Interlinked Datasets (VoID)
DCTERMS
vocabularies for licensing (Open Data Commons,
government licenses)
– this area is still very much in development…
25. Agenda
• Data is changing our lives
• W3C’s work on data integration
• Expanding scope of W3C’s data activities
26. New work underway
• CSV on the Web
• Data on the Web Best Practices
• Vocabulary management
27. What we are hearing
• CSV is everywhere
– can be huge data sets, not easily readable in a spreadsheet
or Google refine
– meaning of data not in machine-readable form
– data is not necessarily used for web-scale integration but
rather immediate usage
• Metadata is essential
• Conversion is an issue
• European Commission Study on business models
for Linked Open Government Data (BM4LOGD)
28. Linked Data Benefits (BM4LOD)
• Flexible data integration
– Streamlined internal processes
– Where working relationships already exist, much easier to
share
– Linking reference collections; discovery of new relationships
• Increase in data quality
– More use of data internally brings errors to light
– Use of open standards increases quality of system
• New services
• Cost reduction
– Increased efficiency
– Increase in data usage due to LOD enrichment
29. CSV on the Web
• How W3C can help
– metadata vocabulary to describe CSV data (structure,
reference to access rights, annotations, etc.)
– metadata discovery (e.g., part of an HTTP header, special
rows and columns, packaging formats…)
– mapping content to RDF, JSON, XML
30. Best practices
• Document best practices for the data publishers
– URI design, management of persistence, versioning
– business models
– use of core metadata vocabularies (provenance, access
control, ownership)
• Specific vocabularies
– quality, application descriptions, …
31. Vocabulary management:
challenge
• Interoperable vocabularies are key for (meta)data
• At the moment, it is a fairly chaotic world…
– many, possibly overlapping vocabularies
– difficult to locate the one that is needed
– vocabularies may not be properly managed, maintained,
versioned, provided persistence…
32. Vocabulary management: how
W3C can help
• Provide a space where
– communities can develop vocabularies (through, e.g.,
CGs, possibly WGs)
– host vocabularies at W3C if requested
– annotate vocabularies with a proper set of metadata terms
– establish a vocabulary directory
• The exact structure is still being discussed
33. Summary
• Data-driven smart apps are one of the major growth
engines for the worldwide software market.
• We need to meet developers where they are.
• 5 Star Benefits of LOD
–
–
–
–
–
Greater efficiency, better provision of the task
Greater flexibility leads to lower costs for future projects
New services, new connections, new discoveries
Improved navigation within and between datasets
Others can build apps based on your data
34. Available specifications:
Primers, Guides`
• Primers:
– RDF Primer
– OWL Guide
– SKOS Primer
– GRDDL Primer
– RDFa Primer
• The W3C Semantic Web Activity Wiki has links to all
the specifications
35. These slides are in the Web at
http://www.w3.org/2014/Talks
/0219-NISO-RRS
with thanks to Ivan Herman, W3C
and Phil Archer, W3C