The Future of LOD

THE FUTURE OF LINKED
OPEN DATA
Ghislain Atemezing, PhD
Director R&D - MONDECA
@gatemezing
1ESSnet Linked Open Statistics - Sofia, Bulgaria - 28th May 2019

AGENDA
❖ Current status of LOD
❖ Challenges
➢ LOD is NOT (only) about Technology
❖ Signs of Hope
❖ Towards sustainable LOD ecosystem - FAIRS
(FAIR + Sustainable)
2

RDF: Simple or hard to use?
“RDF is hard to sell”
“RDF is heavy” - Eoin MacCuirc
“RDF is simple enough that you can
build a complex system”
“It’s difficult to standardize vocabularies
because of many ego”
"The Semantic Web is . . . an extension
of the current one, in which
information is given well-defined
meaning." "Meaning is expressed by
RDF."
Is RDF hard to use? Why?
3

Google Trends - RDF vs LOD - Last five years
LOD is popular than RDF
LOD & RDF search decreasing since 2004
4

LOD Evolution in the last decade
2008 2014 2019
34 datasets
1,239 datasets
16,147 links
570 datasets
2,909 links
In the last five years,
- 2X datasets available
- 5X links in the LOD 5

LOD Stats by 2024 - Predictions
LOD will contain at least:
- 2,688 datasets
- 88,808 links
Is this realistic or not?
6

● Society
● Organisations
● Publishers
● Consumers
7
LOD benefits are well-known

What’s up LOD? - State of the LOD Cloud in 2019
- How many datasets by domain are
available ?
- How many vocabularies by datasets?
- What are the most used predicates for
interlink by category?
- Number of linked datasets?
- How many datasets are using cube
vocab?
- How many broken links? 8
In 2019, you can’t simply
have an answer by looking
at the LOD Cloud.

Challenges with
LOD publication
Photo by Dylan Siebelink on Unsplash 9

Publishers, do they ever know who is consuming
their datasets ?
Not always .. why?
Are we building towers of
knowledge?
How to know who are consuming
our dataset?
What are the incentives for the
publishers? 10

Are we (really) data driven ORGs?
Many use cases of semantic technologies in
industry
Why and how people are still sceptical with RDF
?
The problem is maybe NOT about the technology
ORGs should show the path through massive
data generation on the Web
11

(Some) Challenges to create LOD
Shared vocabulary management
Ontology creation: No clear methodology / Lack of
internal expertise
Mappings to ontology are not trivial
Links to external datasets (which ones? Default:
DBpedia ?)
Pan-national interpretation and comparison is
particularly challenging.
12

More Challenges
Maintenance of tools : we can’t trust
tools built by PhDs / interns
Versioning of datasets in LOD
Annual review of datasets (Who?)
General commitment/ Find a real
business value
13

Organizational challenges: where is the CDO?
Lack of data gouvernance in our ORGs
Minimal data sharing within ORG
No existing practice for documenting
knowledge
Lack of visions on harmonizing different
“data lakes”
14

Challenges - Metadata / Versioning
Frequent releases of datasets in LOD
Manage versions and track Diff of datasets
Proper use of metadata to track changes / check
data consistency
Data Quality and Provenance attached to datasets
Licensing issues (How to properly cite and reuse
datasets )
15

Signs of Hope
Many semantic technology advances in
reducing the barriers of querying billion triples
even in a normal laptop
Photo by Ron Smith on Unsplash 16

Democratizing Access to LOD
“Fernández, J. D., Beek, W., Martínez-Prieto, M.A., and Arias, M. LOD-a-lot: A Queryable Dump of the
LOD cloud (2017). http://purl.org/HDT/lod-a-lot.”
28 billion unique triples from 650K
datasets - All LOD in a medium size
laptop:
524 GB of disk space; 15.7 GB of RAM
17

Google Data Search or Schema.org In Action
Google data service launched in 2008
Based on schema.org ( cf.
https://toolbox.google.com/datasetsearch/search?query=
Site%3Adata.gouv.fr )
Uses DCAT and other structured metadata to discover
open datasets
One DCAT file per Dataset / Googlebot is not smart
enough
Link: https://toolbox.google.com/datasetsearch 18

Semantic technology guarantees
FAIR principles of data published
on the Web
19

Beyond FAIR Principles
Findable: unique ids that are resolvable,
Accessible: common access method,
Interoperable: shared vocabularies &
taxonomies,
Reusable: provenance, license
20
FAIR + Sustainable => FAIRS

56M items, 700M statements, 400 lang, 20K active contrib
p/month, 900M edits, 8.5M daily SPARQL queries
Healthy community that helps write sparql queries.
Showing that technology is mature
60M links to DBpedia, 7.7 Billion triples
Many applications in Chabot
(Apple Siri, research, scientists, etc)
SPARQL is affordable and usable
Wikidata - A community for Wikibase
21

The future starts today: data is
infinite, the Web is here to
stay, semantic technologies
are mature.
22

Towards a Sustainable LOD Ecosystem
Work on having a board of committed people from
different expertises (Tech, academics, industry,
government, etc)
Gather and Promote LOD tools, applications based
on past experience
Learned from errors of the past
Create a real community of publishers and
consumers out of W3C
Liaise with W3C to create a community group?
23

LOD for Social Good: Killer Apps ?
Develop and use LOD for solving
societal issues
Apps achieve any of the 17
Sustainable Development Goals
(SDGs) included in the 2030 Agenda
for Sustainable Development.
LOD to enhance advances on
Misinformation issues on the Web
24

Solutions for future LOD -
Create a forum with different stakeholders to discuss
LOD issues and maintenance
Create a new way to manage and maintain datasets in
LOD (W3C community, mix of community? Foundation
ala Apache ?
New enforcement rules for LOD management life-
cycle
More use cases of datasets with probabilistic and
temporal models
25

Graph Of Linked “Insights” Datasets ?
Statistical models also find “insights”
over datasets
Data scientists spent hours
understanding underlying data to
generate reports, dashboards or
applications
How to model that knowledge and
publish on the Web?
“Current insights after data
analysis gets stored in a
spreadsheet and then gets lost.
We want to create a graph of
insights, link them and generate
new insights” - Lambert
Hogenhout, UN #kgc2019
https://twitter.com/juansequeda/status/1126144558
683885569
26

Takeaway message
The maturity of semantic technologies is fully
demonstrated in many real world applications
The Web is a precious mean to exchange
information both by humans and machines
Versioning and datasets updates are still
challenging.
For exchanging knowledge, LOD is probably the
(only) solution - The only way to be able to make
“AI intelligent”
New applications combining AI (autonomous
agents, chatbots, etc)
27

The more you publish datasets as
LOD, the more you are preparing
the next generation of
"prescriptive" autonomous agents.
Classical AI (predictive) alone
(neural networks, machine
learning) can’t make this happen
28

“The future [of the Web] is
still so much bigger than
the past.” - Tim BL (2018)
So will be the future of
Linked Open Data…
Just publish and share your
Assets on the Web
https://inrupt.com/blog/one-small-step-for-the-web
29

Thank you for
your attention!!
Questions?
30

THE FUTURE OF LINKED
OPEN DATA
Ghislain Atemezing, PhD
Director R&D - MONDECA
@gatemezing
31ESSnet Linked Open Statistics - Sofia, Bulgaria - 28th May 2019

The Future of LOD

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie The Future of LOD

Ähnlich wie The Future of LOD (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

The Future of LOD

Hinweis der Redaktion