SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Exploring
Opportunities of
Linked Open
Innovation Data:
Part 1
Presenters: Dolores Modic,
Alan Johnson, Miha Vučkovič
With: Ana Hafner, Borut Lužar,
Borut Rožac, Einar Rasmussen
1
NIPO workshop, February 2020
Linked open data
2
From linking documents to linking data & link as a
bearer of meaning
Four rules of linked data (Berners-
Lee, 2006) :
1.Use URIs as names for things
2.Use HTTP URIs so that people can
look up those names.
3.When someone looks up a URI,
provide useful information, using
the standards (RDF, SPARQL)
4.Include links to other URIs so
that they can discover more
things.
Connecting&Aligning
3
Data in N-Triples: Subject - Predicate - Object
4
� Subject: specifies the entity under consideration, e.g.
publication (“Mapping the human brain”);
� Predicate: specifies property types for the entity under
consideration, e.g., “authored by”, “published in”, “has
date”, ”has impact factor 2018”;
� Object; specifies a value for the property type, e.g.,
“Dolores Modic”, “Science and Public Policy”, “03 Jun
2017”, “1.575”;
Simple example:
Subject → predicate → object
Mapping the Human Brain → published in→ Science and
Public Policy
The promise of Linked Open Data
5
The vision is that all data on the World Wide Web can be treated
and researched as one database, using a machine-readable
format to share and reuse existing data (Khusro, 2014).
Further, the availability of Linked Open Data from large, credible,
institutions has grown dramatically in recent years, e.g.,
European Patent Office, Korean Patent Office, National
Governments, etc.
… but the problem with Linked open data
• Linked open data is oriented towards machine-readability, hence human-
readable browsability can be a problem.
• The availability and accessibility of many LOD sources is a problem,
with many unstable and difficult to discover.
• Most LOD sources are not adequately interlinked.
• It is difficult to identify objects referring to the same real-world
entity different LOD sources (or even in the same source).
6
“Users of linked data today are generally programmers and developers who are comfortable
working directly with what is under the hood of this new technology. The rest of us are impatiently
waiting for the user-friendly interface that will let us easily make use of linked data.” (Coyle, 2012)”
Clouds and subclouds of
the LOD universe
• LOD cloud: DBPedia as HUB
• limited number of sub-clouds (e.g.
Bio2RDF cloud), i.e. maps; or aggregators
(e.g. LOD-a-LOT)
• IP LOD Map: Innovation oriented map: EP
LOD as the HUB
• Why? Increase the discoverability and
reusability of EP LOD data by integrating
them into sub-cloud (map) which
showcases the likeability of this data with
other LOD datasets.
• How? Not relying purely on machine
support, but utilizing a diligent scientific
approach
03/12/2019 7
THE LINKED
OPEN DATA
CLOUD
• The dataset currently
contains 1,239 datasets.
The datasets are widely
distributed into several
categories: e.g.
Government.
It is evolving rapidly in
terms of new included
datasets; first version
2007 with 12 datasets.
But only rudimentary
information on datasets
available; many with dead
links.
8
Source: https://lod-cloud.net/ 2017
IP Centered cloud
...but it is not easy to get to this...
9
Descriptive Exploratory Plots
Anscombe's quartet
Data ≈ Experience
Wise researchers conduct descriptive
exploratory analyses of their data before
fitting statistical models.
- Judith D. Singer & John B. Willett
Experience without theory is blind, but
theory without experience is mere
intellectual play.
- Immanuel Kant
10
Each panel displays a scatter plot of 11
observations that have the same descriptive
statistics:
1. Mean: x = 9; y = 7.5
2. Variance: x = 11; y = 4.125
3. Correlation at .816
4. Regression coefficient: y = 3 + .5x
5. R-squared at .67
Johnson, Masyn, & McKelvie, (2020), Anscombe (1973), Singer & Willett (2003),
Figure (https://en.wikipedia.org/wiki/File:Anscombe%27s_quartet_3.svg)
x
y
(1)
x
y
(2)
x
y
(3)
x
y
(4)
Name Disambiguation Steps
1. Cleaning and parsing
2. Blocking
3. Choose auxiliary variables
4. Compute potential matches using similarity scores
5. Create unique entities
Diligent alignment and meaningful connections
between the LOD databases are key
11
x
y
Overfit
x
y
Parsimony
Johnson, Little, Masyn, McKelvie (2020), Hastie, Tibshirani, & Friedman (2009),
Figure(https://towardsdatascience.com/bias-variance-tradeoff-e8995c42b55b)
Training Data and Predictive Validity
Variance and Bias
12
HighLow
Variance
Bias
LowHigh
x
y
Underfit
All models are wrong
but some useful.
- George E. P. Box
Machine learning models
overfit to training data
produce excess variance when
applied to new data. Similarly,
underfit models produce
excess bias with new data.
Machine Learning and Training Data
Data Split Ratio
More parsimonious models need less
data to validate and tune
Training data is a sample from the data used
to fit a parsimonious predictive model.
Validation data is another sample from the
data used to provide an unbiased evaluation
of the predictive model from the previous
step, followed by some tuning.
13
Test data is a third sample (without
replacement) from the data used to used to
provide an unbiased evaluation of a ‘final’
model, refined in the previous steps,
without further adjustment.
Figure (https://towardsdatascience.com/train-validation-and-test-sets-
72cb40cba9e7)
Train Validate Test
Name Disambiguation Steps
IP
LodB
naïve
IP LodB
intermediate
Torvik &
Smalheiser
(2009)
Li, …, Torvik,
et al., (2014)
Pezzoni,
Lissoni, et
al., (2014)
1. Cleaning and parsing
a. Find relevant fields in (LOD) source and extract ✔ ✔ ✔ ✔ ✔
b. Extract “family name” and “given name” from “author name”
strings
✔ ✔ ✔ ✔ ✔
c. Remove punctuation, accents, and double spaces
(normalization)
✔ ✔ ✔ ✔ ✔
d. Convert to same format, e.g., ASCII ✔ ✔ ✔ ✔ ✔
e. Remove redundant strings (and tokenize), e.g., c/o IBM ✔ ✔
Alignment procedure: Literature Review (1)
14
Name Disambiguation Steps
IP LodB
naïve
IP LodB
intermediat
e
Torvik &
Smalheiser
(2009)
Li, …, Torvik,
et al., (2014)
Pezzoni,
Lissoni, et
al., (2014)
2. Blocking
a. Parse “author name” strings into a “family name” and “given
name” and use alphabetic order
✔ ✔ ✔
b. Parse all “author name” strings into “tokens” and use lexical
distance
✔ ✔
c. Block records using parsed “author names” strings, i.e.,
prima facie list of author-document “match” candidates. In
one block, similar records are collected, e.g. having the same
normalized family name. In subsequent steps, the algorithm
only processes records within each block.
✔ ✔ ✔ ✔ ✔
Alignment procedure: Literature Review (2)
15
Name Disambiguation Steps
IP LodB
naïve
IP LodB
intermediat
e
Torvik &
Smalheiser
(2009)
Li, …, Torvik,
et al., (2014)
Pezzoni,
Lissoni, et
al., (2014)
3. Choose auxiliary variables
a. Select independent “auxiliary variables”, e.g., co-authors,
organization affiliation
✔ ✔ ✔ ✔ ✔
b. Extend “auxiliary variables” using parsing techniques
described above, e.g., keywords, address
✔ ✔
c. Create entity cards containing records that are a priori
author-document “match” candidates, using similarity score
values based on “auxiliary variables”
✔ ✔
d. Refine entity card creation using a multi-dimensional vector
based on “auxiliary variables”I
✔
Alignment procedure: Literature Review (3)
16
Name Disambiguation Steps
IP LodB
naïve
IP LodB
intermediat
e
Torvik &
Smalheiser
(2009)
Li, …, Torvik,
et al., (2014)
Pezzoni,
Lissoni, et
al., (2014)
4. Compute potential matches using similarity scores
a. Refine “match” candidates using similarity scores based on
parsed “family names” and “given names”, i.e., exceeding a
user defined threshold
✔ ✔ ✔ ✔ ✔
b. Compute similarity scores using trial-and-error values from
“auxiliary variables” for each pair of “match“ candidates
✔ ✔
c. Refine “match” candidates using estimated similarity scores
from auxiliary variables looked-up in a multi-dimensional
vector
✔ ✔ ✔
d. Create a joint entity for pairs of author-document records
that exceed a user defined similarity score threshold
✔ ✔ ✔ ✔ ✔
e. Correct triplet violations using 3-degrees of separation ✔ ✔ ✔ ✔
f. Repeat the process over the block as long as some change is
made within each step
✔ ✔ ✔ ✔
g. Adjust and apply the process on the joined datasets, i.e., EP
LOD and SN Scigraph
✔
Alignment procedure: Literature Review (4)
17
Name Disambiguation Steps
IP LodB
naïve
IP LodB
intermediat
e
Torvik &
Smalheiser
(2009)
Li, …, Torvik,
et al., (2014)
Pezzoni,
Lissoni, et
al., (2014)
5. Create unique entities
a. Create a new entity from all remaining entities in a block
(some joint through the process, some the same as before
the process) with a union of the properties of all entities
being joint into it
✔ ✔ ✔ ✔
b. Create unique entities based on EP LOD and SN SciGraph
alignment
✔
c. Show its details in the webpage (www.iplod.io) ✔
Alignment procedure: Literature Review (5)
18
Some use cases...
CASE 2: additional information on technology - connecting EP LOD to UNIPROT
19
CASE 1: Additional information on inventor
CASE 1: You found on EP LOD a patent you are interested
in: EP1097195 entitled “Screening of neisserial vaccine
candidates and vaccines against pathogenic Neisseria”
where the applicant was University of Nottingham. You see
that the first name inventor is Ala Aldeen Dlawer. You wish
to know if you could cooperate with him; and knowing a
common science communication channel is Twitter, so you
simply find on Wikidata his Twitter account and surprisingly
find out he went into politics. You will need to target some
other expert on this.
CASE 3: Connecting to additional data on patents - the case for SEP
patents
SEP Figures: Lorenz Brachtendorf, Fabian Gaessler, Dietmar Harhoff
(2019) Approximating the Standard Essentiality of Patents –A
Semantics-based Analysis
Thank you
for your
attention.
We gratefully acknowledge that this
work has been co-sponsored by the
Academic Research Programme of
the European Patent Office.
The research results and views
contained inside these materials or
during the workshop are those of
the researchers only. They do not
necessarily represent the views of
the EPO.
We also thank the NIPO and Nord
University for their support for
preparing and organizing this
workshop event.
20

Weitere ähnliche Inhalte

Was ist angesagt?

Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesLaura Po
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedJoel Azzopardi
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic WebMyungjin Lee
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...Ben De Meester
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaÓscar Muñoz García
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphsStefan Dietze
 
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkHerbert Van de Sompel
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessdatacite
 
Context, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge OntologyContext, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge OntologyMike Bergman
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpediaSamantha Lam
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3IPLODProject
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?Frank van Harmelen
 
A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...
A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...
A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...Cataldo Musto
 
BioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of WikisBioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of WikisYI-JHEN LIN
 

Was ist angesagt? (20)

Exploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sourcesExploration, visualization and querying of linked open data sources
Exploration, visualization and querying of linked open data sources
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Inference on the Semantic Web
Inference on the Semantic WebInference on the Semantic Web
Inference on the Semantic Web
 
Linked Data
Linked DataLinked Data
Linked Data
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
ESWC2017 In-Use - Declarative Data Transformations for Linked Data Generation...
 
Identifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpediaIdentifying Topics in Social Media Posts using DBpedia
Identifying Topics in Social Media Posts using DBpedia
 
Towards research data knowledge graphs
Towards research data knowledge graphsTowards research data knowledge graphs
Towards research data knowledge graphs
 
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability FrameworkAn Overview of the OAI Object Reuse and Exchange Interoperability Framework
An Overview of the OAI Object Reuse and Exchange Interoperability Framework
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
Riding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information accessRiding the wave - Paradigm shifts in information access
Riding the wave - Paradigm shifts in information access
 
Context, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge OntologyContext, Perspective, and Generalities in a Knowledge Ontology
Context, Perspective, and Generalities in a Knowledge Ontology
 
Similarity on DBpedia
Similarity on DBpediaSimilarity on DBpedia
Similarity on DBpedia
 
Linked Open Data (LOD) part 3
Linked Open Data (LOD)  part 3Linked Open Data (LOD)  part 3
Linked Open Data (LOD) part 3
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Connecting Museums with Linked Data
Connecting Museums with Linked DataConnecting Museums with Linked Data
Connecting Museums with Linked Data
 
The RDFIndex-MTSR 2013
The RDFIndex-MTSR 2013The RDFIndex-MTSR 2013
The RDFIndex-MTSR 2013
 
A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...
A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...
A Deep Architecture for Content-based Recommendations Exploiting Recurrent Ne...
 
BioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of WikisBioSnowball_Automated Population of Wikis
BioSnowball_Automated Population of Wikis
 

Ähnlich wie Linked Open Data (LOD) part 1

Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityBarry Smith
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...Stefan Dietze
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.PhiloWeb
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Stefan Dietze
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data VisualizationLaura Po
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip finalDeborah McGuinness
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesFreddy Limpens
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCarole Goble
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021Gérard Dupont
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfWARCnet
 
GoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiGoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiFriprogsenteret
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021hala Skaf
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10Scott Edmunds
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...GigaScience, BGI Hong Kong
 

Ähnlich wie Linked Open Data (LOD) part 1 (20)

Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and SecurityOntology Tutorial: Semantic Technology for Intelligence, Defense and Security
Ontology Tutorial: Semantic Technology for Intelligence, Defense and Security
 
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
 
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
Freddy Limpens: From folksonomies to ontologies: a socio-technical solution.
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...Beyond research data infrastructures: exploiting artificial & crowd intellige...
Beyond research data infrastructures: exploiting artificial & crowd intellige...
 
Linked Open Data Visualization
Linked Open Data VisualizationLinked Open Data Visualization
Linked Open Data Visualization
 
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
 
PhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomiesPhD defense : Multi-points of view semantic enrichment of folksonomies
PhD defense : Multi-points of view semantic enrichment of folksonomies
 
Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking  Profile-based Dataset Recommendation for RDF Data Linking
Profile-based Dataset Recommendation for RDF Data Linking
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Tds — big science dec 2021
Tds — big science dec 2021Tds — big science dec 2021
Tds — big science dec 2021
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
 
GoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar ConradiGoOpen 2010: Reidar Conradi
GoOpen 2010: Reidar Conradi
 
Presentation at MTSR 2012
Presentation at MTSR 2012Presentation at MTSR 2012
Presentation at MTSR 2012
 
Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021Hala skafkeynote@conferencedata2021
Hala skafkeynote@conferencedata2021
 
Reproducible Science and Deep Software Variability
Reproducible Science and Deep Software VariabilityReproducible Science and Deep Software Variability
Reproducible Science and Deep Software Variability
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10HKU Data Curation MLIM7350 Class 10
HKU Data Curation MLIM7350 Class 10
 
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
Scott Edmunds: GigaScience - a journal or a database? Lessons learned from th...
 

Kürzlich hochgeladen

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 

Kürzlich hochgeladen (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 

Linked Open Data (LOD) part 1

  • 1. Exploring Opportunities of Linked Open Innovation Data: Part 1 Presenters: Dolores Modic, Alan Johnson, Miha Vučkovič With: Ana Hafner, Borut Lužar, Borut Rožac, Einar Rasmussen 1 NIPO workshop, February 2020
  • 2. Linked open data 2 From linking documents to linking data & link as a bearer of meaning Four rules of linked data (Berners- Lee, 2006) : 1.Use URIs as names for things 2.Use HTTP URIs so that people can look up those names. 3.When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) 4.Include links to other URIs so that they can discover more things.
  • 4. Data in N-Triples: Subject - Predicate - Object 4 � Subject: specifies the entity under consideration, e.g. publication (“Mapping the human brain”); � Predicate: specifies property types for the entity under consideration, e.g., “authored by”, “published in”, “has date”, ”has impact factor 2018”; � Object; specifies a value for the property type, e.g., “Dolores Modic”, “Science and Public Policy”, “03 Jun 2017”, “1.575”; Simple example: Subject → predicate → object Mapping the Human Brain → published in→ Science and Public Policy
  • 5. The promise of Linked Open Data 5 The vision is that all data on the World Wide Web can be treated and researched as one database, using a machine-readable format to share and reuse existing data (Khusro, 2014). Further, the availability of Linked Open Data from large, credible, institutions has grown dramatically in recent years, e.g., European Patent Office, Korean Patent Office, National Governments, etc.
  • 6. … but the problem with Linked open data • Linked open data is oriented towards machine-readability, hence human- readable browsability can be a problem. • The availability and accessibility of many LOD sources is a problem, with many unstable and difficult to discover. • Most LOD sources are not adequately interlinked. • It is difficult to identify objects referring to the same real-world entity different LOD sources (or even in the same source). 6 “Users of linked data today are generally programmers and developers who are comfortable working directly with what is under the hood of this new technology. The rest of us are impatiently waiting for the user-friendly interface that will let us easily make use of linked data.” (Coyle, 2012)”
  • 7. Clouds and subclouds of the LOD universe • LOD cloud: DBPedia as HUB • limited number of sub-clouds (e.g. Bio2RDF cloud), i.e. maps; or aggregators (e.g. LOD-a-LOT) • IP LOD Map: Innovation oriented map: EP LOD as the HUB • Why? Increase the discoverability and reusability of EP LOD data by integrating them into sub-cloud (map) which showcases the likeability of this data with other LOD datasets. • How? Not relying purely on machine support, but utilizing a diligent scientific approach 03/12/2019 7
  • 8. THE LINKED OPEN DATA CLOUD • The dataset currently contains 1,239 datasets. The datasets are widely distributed into several categories: e.g. Government. It is evolving rapidly in terms of new included datasets; first version 2007 with 12 datasets. But only rudimentary information on datasets available; many with dead links. 8 Source: https://lod-cloud.net/ 2017
  • 9. IP Centered cloud ...but it is not easy to get to this... 9
  • 10. Descriptive Exploratory Plots Anscombe's quartet Data ≈ Experience Wise researchers conduct descriptive exploratory analyses of their data before fitting statistical models. - Judith D. Singer & John B. Willett Experience without theory is blind, but theory without experience is mere intellectual play. - Immanuel Kant 10 Each panel displays a scatter plot of 11 observations that have the same descriptive statistics: 1. Mean: x = 9; y = 7.5 2. Variance: x = 11; y = 4.125 3. Correlation at .816 4. Regression coefficient: y = 3 + .5x 5. R-squared at .67 Johnson, Masyn, & McKelvie, (2020), Anscombe (1973), Singer & Willett (2003), Figure (https://en.wikipedia.org/wiki/File:Anscombe%27s_quartet_3.svg) x y (1) x y (2) x y (3) x y (4)
  • 11. Name Disambiguation Steps 1. Cleaning and parsing 2. Blocking 3. Choose auxiliary variables 4. Compute potential matches using similarity scores 5. Create unique entities Diligent alignment and meaningful connections between the LOD databases are key 11
  • 12. x y Overfit x y Parsimony Johnson, Little, Masyn, McKelvie (2020), Hastie, Tibshirani, & Friedman (2009), Figure(https://towardsdatascience.com/bias-variance-tradeoff-e8995c42b55b) Training Data and Predictive Validity Variance and Bias 12 HighLow Variance Bias LowHigh x y Underfit All models are wrong but some useful. - George E. P. Box Machine learning models overfit to training data produce excess variance when applied to new data. Similarly, underfit models produce excess bias with new data.
  • 13. Machine Learning and Training Data Data Split Ratio More parsimonious models need less data to validate and tune Training data is a sample from the data used to fit a parsimonious predictive model. Validation data is another sample from the data used to provide an unbiased evaluation of the predictive model from the previous step, followed by some tuning. 13 Test data is a third sample (without replacement) from the data used to used to provide an unbiased evaluation of a ‘final’ model, refined in the previous steps, without further adjustment. Figure (https://towardsdatascience.com/train-validation-and-test-sets- 72cb40cba9e7) Train Validate Test
  • 14. Name Disambiguation Steps IP LodB naïve IP LodB intermediate Torvik & Smalheiser (2009) Li, …, Torvik, et al., (2014) Pezzoni, Lissoni, et al., (2014) 1. Cleaning and parsing a. Find relevant fields in (LOD) source and extract ✔ ✔ ✔ ✔ ✔ b. Extract “family name” and “given name” from “author name” strings ✔ ✔ ✔ ✔ ✔ c. Remove punctuation, accents, and double spaces (normalization) ✔ ✔ ✔ ✔ ✔ d. Convert to same format, e.g., ASCII ✔ ✔ ✔ ✔ ✔ e. Remove redundant strings (and tokenize), e.g., c/o IBM ✔ ✔ Alignment procedure: Literature Review (1) 14
  • 15. Name Disambiguation Steps IP LodB naïve IP LodB intermediat e Torvik & Smalheiser (2009) Li, …, Torvik, et al., (2014) Pezzoni, Lissoni, et al., (2014) 2. Blocking a. Parse “author name” strings into a “family name” and “given name” and use alphabetic order ✔ ✔ ✔ b. Parse all “author name” strings into “tokens” and use lexical distance ✔ ✔ c. Block records using parsed “author names” strings, i.e., prima facie list of author-document “match” candidates. In one block, similar records are collected, e.g. having the same normalized family name. In subsequent steps, the algorithm only processes records within each block. ✔ ✔ ✔ ✔ ✔ Alignment procedure: Literature Review (2) 15
  • 16. Name Disambiguation Steps IP LodB naïve IP LodB intermediat e Torvik & Smalheiser (2009) Li, …, Torvik, et al., (2014) Pezzoni, Lissoni, et al., (2014) 3. Choose auxiliary variables a. Select independent “auxiliary variables”, e.g., co-authors, organization affiliation ✔ ✔ ✔ ✔ ✔ b. Extend “auxiliary variables” using parsing techniques described above, e.g., keywords, address ✔ ✔ c. Create entity cards containing records that are a priori author-document “match” candidates, using similarity score values based on “auxiliary variables” ✔ ✔ d. Refine entity card creation using a multi-dimensional vector based on “auxiliary variables”I ✔ Alignment procedure: Literature Review (3) 16
  • 17. Name Disambiguation Steps IP LodB naïve IP LodB intermediat e Torvik & Smalheiser (2009) Li, …, Torvik, et al., (2014) Pezzoni, Lissoni, et al., (2014) 4. Compute potential matches using similarity scores a. Refine “match” candidates using similarity scores based on parsed “family names” and “given names”, i.e., exceeding a user defined threshold ✔ ✔ ✔ ✔ ✔ b. Compute similarity scores using trial-and-error values from “auxiliary variables” for each pair of “match“ candidates ✔ ✔ c. Refine “match” candidates using estimated similarity scores from auxiliary variables looked-up in a multi-dimensional vector ✔ ✔ ✔ d. Create a joint entity for pairs of author-document records that exceed a user defined similarity score threshold ✔ ✔ ✔ ✔ ✔ e. Correct triplet violations using 3-degrees of separation ✔ ✔ ✔ ✔ f. Repeat the process over the block as long as some change is made within each step ✔ ✔ ✔ ✔ g. Adjust and apply the process on the joined datasets, i.e., EP LOD and SN Scigraph ✔ Alignment procedure: Literature Review (4) 17
  • 18. Name Disambiguation Steps IP LodB naïve IP LodB intermediat e Torvik & Smalheiser (2009) Li, …, Torvik, et al., (2014) Pezzoni, Lissoni, et al., (2014) 5. Create unique entities a. Create a new entity from all remaining entities in a block (some joint through the process, some the same as before the process) with a union of the properties of all entities being joint into it ✔ ✔ ✔ ✔ b. Create unique entities based on EP LOD and SN SciGraph alignment ✔ c. Show its details in the webpage (www.iplod.io) ✔ Alignment procedure: Literature Review (5) 18
  • 19. Some use cases... CASE 2: additional information on technology - connecting EP LOD to UNIPROT 19 CASE 1: Additional information on inventor CASE 1: You found on EP LOD a patent you are interested in: EP1097195 entitled “Screening of neisserial vaccine candidates and vaccines against pathogenic Neisseria” where the applicant was University of Nottingham. You see that the first name inventor is Ala Aldeen Dlawer. You wish to know if you could cooperate with him; and knowing a common science communication channel is Twitter, so you simply find on Wikidata his Twitter account and surprisingly find out he went into politics. You will need to target some other expert on this. CASE 3: Connecting to additional data on patents - the case for SEP patents SEP Figures: Lorenz Brachtendorf, Fabian Gaessler, Dietmar Harhoff (2019) Approximating the Standard Essentiality of Patents –A Semantics-based Analysis
  • 20. Thank you for your attention. We gratefully acknowledge that this work has been co-sponsored by the Academic Research Programme of the European Patent Office. The research results and views contained inside these materials or during the workshop are those of the researchers only. They do not necessarily represent the views of the EPO. We also thank the NIPO and Nord University for their support for preparing and organizing this workshop event. 20