SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Knowledge Graph Engineering
Keynote at Summer School on AI for Industry 4.0
Armin Haller
Associate Professor, ANU
Knowledge Graphs (KGs)
“A Knowledge Graph is a graph of data intended to accumulate and
convey knowledge of the real world, whose nodes represent entities of
interest and whose edges represent relations between these entities.”
[Hogan et al., 2020]
• Knowledge graphs are created collaboratively by many users
• Information can be added in a relatively arbitrary manner as
structural constraints are few
Closed KGs (~2019) [Noy et al., 2019]
Microsoft ~2bn entities, ~55bn facts
Google ~1bn entities, ~70bn assertions
Facebook ~50m entities, ~500m assertions
eBay ~1bn triples
IBM ~100m entities, 5bn relationships
Open KGs (April 2021)
DBpedia ~4.58m entities, ~9.25GB
Yago4 ~50m entities, ~18.4GB
Wikidata ~93m entities, ~99GB
Knowledge Graphs (KGs)
Graphs
Natural way of
structuring and
presenting
knowledge
Heterogenous
Knowledge from
different sources
can be integrated
and/or interlinked
Schema-later
Schema often not
decided until later,
and does not impose
integrity constraints
Schema in KGs
Ontologies as schemas in KGs
An ontology is an “explicit specification of a conceptualization consisting of a set of
objects, and the describable relationships among them”
[Gruber, 1993]
Components of an Ontology
• Classes: abstract groups (sets) of objects that are defined by properties that all its
members share (e.g., Person, Organisation, Event)
• Attributes: characteristics or parameters that objects (and classes) can have (e.g.,
data of birth, longitude, latitude, timestamp)
• Relationships: ways in which classes and individuals can be related to one another
(e.g., role, attributed to, observed by)
• Individuals: Concrete objects that are inherent to the domain of discourse, such as
specific people, organisations or abstract individuals such as numbers (e.g., g, π)
Limited
many entities
Generic
applies to many
Specific
applies to few
RDF Knowledge Graphs
Comprehensive
fewer entities
ABox (Data)
TBox (Schema)
Q58043963
Q76
Barack Obama
(3,947 axioms)
Armin Haller
(189 axioms)
P361
Q35120
Entity
partOf
minimum
no of players
Chess Person Q73145133
P1872
Meta-modelling issues in KGs
Without enforced (upfront designed) schemas, KGs suffer from, e.g.:
• Inconsistent modelling of classes/instances
<Q1412680> <P279> <Q28100368> | <Beef Wellington> <subclass of> <Beef Dish>
<Q6497852> <P31> <Q28100665> | <Wiener Schnitzel> <instance of> <Veal Dish>
• Subclassing of disjoint super-classes
<Q190928> <P279> <Q124282> | <shipyard> <subclass of> <dock>
<Q190928> <P279> <Q4830453> | <shipyard> <subclass of> <business>
<Q124282> <P279> <Q7184903> | <shipyard> <subclass of> <abstract object>
<Q190928> <P279> <Q223557> | <shipyard> <subclass of> <physical object>
• Instance of relations between first-order classes
<Q12156> <P31> <Q12136> | <Malaria> <instance of> <Disease>
<Q12156> <P279> <Q12136> | <Malaria> <subclass of> <Disease>
• Redundant/circular inheritances between first-order classes
<Q18557307> <P279> <Q692536> | <muscle tissue disease> <subclass of> <muscular disease>
<Q692536> <P279> <Q18557307> | <muscular disease <subclass of> <muscle tissue disease>
Types of Schemas (Ontologies)
Level
of
Abstraction
Most
General
Most
Specific
Reusability
Highest
Lowest
Upper
Ontologies
Mid-Level Ontologies
Domain Ontologies
Use-Case Ontologies
e.g., CyC,
SUMO,
DOLCE, BFO,
CYC
e.g., PROV-O,
FOAF, ORG,
SOSA/SSN,
AGRIF
e.g., GO,
ChEBI,
DO,
BTO
[Haller & Polleres, 2020a]
KG Engineering
KG Creation
Extract data
from existing
resources
KG Usage
KG Linking
Add instance
assertions
KG Curation
Add schema
assertions
KG Creation – Develop Schema
Top-Down
Schema first,
Data later
Bottom Up
Data first,
Schema later
ABox (Data)
TBox (Schema)
Middle-Out
KG Creation
Bottom-Up KG Creation
• Schema is not defined, and data is added organically and manually using tools such as:
– OntoWiki [Frischmuth et al., 2015]
– Semantic MediaWiki [Krötzsch et al., 2006]
– Wikibase
– Schímatos [Wright et al., 2020]
Top-Down KG Creation
• Schema is created upfront, existing data mapped to schema using languages/tools such as:
– R2RML
– SPARQL Generate [Lefrançois et al., 2017]
– SHACL Rules
– TARQL
– NLP/NER from unstructured text
Middle-Out KG Creation [Sure et al., 2004]
• Schema is partly defined upfront, with mappings added later when data defines semantics
• Use case data is provided upfront
KG Curation
Correctness
– Evaluation
Accessibility, Accuracy, Consistency, Conciseness, Trustability,
Dynamicity, Representationality [Zaveri et al., 2016]
– Correction
Evaluating data quality (SHACL, SheX)
• Syntactic errors
• Semantic errors
Completeness
– KG Completion [Paulheim, 2017]
Using structural information observed in triples
• Classification
• Probabilistic and Statistical Methods
KG Linking
Linked Data Principles [Berners-Lee, 2006]
• LDP1: Use URIs as identifiers for things;
• LDP2: Use HTTP URIs so those identifiers can be
dereferenced;
• LDP3: return useful information upon dereferencing
of those URIs using a standard format (typically,
RDF);
• LDP4: include links using externally
dereferenceable URIs
KG Linking
Linking Issues [Haller et al., 2020b]
• References to many inaccessible URIs (i.e., broken links) may
render a KG largely useless
• Changes in linked external KGs are out of control of the KG
publisher
• Previously, no definition of what constitutes a “link”, specifically
“internal links”, i.e., links between parts of one coherent KG,
and “external links”, i.e., links between different KGs)
– A triple is a link if it contains a URI in a namespace other than the
authoritative namespace URI of the dataset/KG where the triple is
defined. [Haller et al., 2020b]
KG Linking – Link Types
• Ontology links [Haller et al., 2020b]
– class link
t:[dbo:Person, rdfs:subClassOf, foaf:Person]
– instance typing link
t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person]
– property link
t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang
Amadeus Mozart"@en]
– instance role link
t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088]
(Antonio Salieri)
• Instance link
t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
KG Linking in the wild
• Crawl of the LODcloud [Abele et al., 2017] + historical datasets from the
LODcloud that were cached in the LODLaundromat
• 430 Linked datasets in resulting corpus, each encoded in HDT for a total
size of 51 GB (3.3bn triples)
% of total Available Available as % of total
Total # of datasets 1,359 100%
SPARQL endpoint 459 33.5% 125 9.1%
Available as download 890 65.4% 226 16.6%
Characteristic Median Mean
Number of Triples 4,478 17,860,436
Number of Unique Subjects 613 1,774,578
Number of Unique Predicates 31 65.4%
Number of unique objects 2,245 5,296,390
KG Linking in the wild (cont’d)
Class Links
http://vivo.iu.edu 119,538
http://vivo.scripps.edu 63,128
http://www.imagesnippets.com 12,874
http://core.kmi.open.ac.uk 9,143
http://commons.wikimedia.org 8,258
http://vivo.psm.edu 8,036
http://datos.bne.es 2,778
http://dbpedia.org 1,614
http://www.productontology.org 1,000
http://vivoweb.org 84
http://commons.wikimedia.org 4,995
http://datos.bne.es 1,255
http://vivo.iu.edu 510
http://vivo.psm.edu 481
http://vivoweb.org 386
http://vivo.scripps.edu 187
http://semanticscience.org 168
http://www.iupac.org 102
http://dbpedia.org 101
http://tkm.kiom.re.kr 60
Property Links
Median 0
Mean 1,299
% above 0 44%
Median 0
Mean 47
% above 0 18%
Instance Typing Links
KG Linking in the wild (cont’d)
Instance Links
http://webisa.webdatacommons.org 101,491,507
http://commons.wikimedia.org 100,022,186
http://lod.b3kat.de 40,674,519
http://lod.hebis.de 39,160,423
http://d-nb.info 20,096,228
http://datos.bne.es 7,419,630
http://data.ordnancesurvey.co.uk 5,653,997
http://data.europeana.eu 4,987,332
http://id.loc.gov 1,570,877
http://data.bibsys.no 1,440,011
http://ld.zdb-services.de 398,381,851
http://commons.wikimedia.org 319,988,690
http://d-nb.info 14,160,649
http://data.ordnancesurvey.co.uk 13,277,718
https://data.gov.cz 3,081,559
http://core.kmi.open.ac.uk 1,696,618
http://lod.hebis.de 1,624,579
http://id.loc.gov 1,143,545
http://data.europeana.eu 687,735
http://spraakbanken.gu.se 451,081
http://www.imagesnippets.com 214,362
http://data.coi.cz 34,277
Median 206
Mean 1,967,570
% above
0
97%
Median 206
Mean 4,240,890
% above 0 72%
KG Linking in the wild (cont’d)
• Selected predicates used in links
owl:samesAs owl:DifferentFrom Rdfs:seeAlso owl:AllDifferent
Median 0 0 0 0
Mean 503,859 581 2,735 0
% above 0 53% <1% 14% 0
P90% 1,460 0 1 0
1st
1st #
http://commons.wikimedia.org N/A
40,636,493 103,439 324,659
2nd
2nd #
http://ld.zdb-services.de
18,049,155
N/A http://stitch.cs.vu.nl N/A
3rd
3rd #
http://d-nb.info
17,410,586
N/A http://data.nobelprize.org N/A
KG Linking in the wild (cont’d)
Total Links
http://ld.zdb-services.de 421,206,061
http://commons.wikimedia.org 420,024,129
http://webisa.webdatacommons.org 101,491,507
http://lod.hebis.de 40,785,002
http://lod.b3kat.de 40,677,795
http://d-nb.info 34,256,877
http://data.ordnancesurvey.co.uk 18,931,817
http://datos.bne.es 7,428,111
http://data.europeana.eu 5,675,067
https://data.gov.cz 3,958,043
Median 416
Mean 6,209,808
%
above 0
96%
Broken Class URIs Broken Property URIs
Prefix.cc crawl LOD corpus Prefix.cc crawl LOD corpus
HTTP Code # % # % # % # %
200 7,175 12.3% 2,579 12.8% 814 44.7% 58,108 40.9%
301 18,598 31.8% 2,610 12.9% 442 24.3% 1,137 0.8%
302 4,331 7.4% 925 0.5% 194 10.7% 1,391 1.0%
303 12,805 21.9% 3,903 19.3% 108 5.9% 5,247 3.7%
40x 12,054 20.6% 8,664 42.9% 130 7.1% 73,366 51.7%
50x 66 <0.1% 111 <0.1% 4 <0.1% 362 0.3%
No response 146,145 5.9% 1,425 7% 129 7.1% 2,332 1.6%
Total 204,616 100% 20,217 100% 1,821 100% 141,943 100%
KG Linking in the wild – Wikidata
• Wikidata by far the largest openly available KG and the only one truly built bottom-up → cause of
many modelling errors/inconsistencies
• Not part of the LODCloud, therefore was not included in [Haller et al., 2020b], however, we did
an analysis since for the 9th of March 2020 Wikidata dump (HDT file 49.4GB compressed)
Number of triples 3,381,623,911
Number of unique subjects 1,327,447,995
Number of predicates 32,713
Number of unique objects 2,010,015,636
Number of shared subject-object 1,173,987,281
Unique Individuals 75,261,968
Class Links 375,351,770
Property Links 2,723,834
of which sameAs links
2,723,834
Instance Typing Links 77,479,623
# of Classes 1,045,455
# of Properties 74,746
Ratio 1/14
# of unique Properties 7,259
KG Linking in the wild (cont’d)
• Ontologies are reused widely
– Only a few KGs define their own ontology → a large number of
ontologies exist that cover already many domains
• Ubiquity of broken Class and Property links
– Alarming number of broken links, i.e., more than half of all class and
property URIs
– Data publishers need to consider to replicate linked ontologies
• Lack of Instance Links
– Many (28% of all) KGs do not use any Instance Links, and
owl:sameAs is not particularly popular at all (other than in Wikidata)
1. these links are expensive to establish manually
2. expensive to maintain, and
3. even if they exist, there is no incentive to publish them openly.
KG Usage
• Knowledge Management, Knowledge
Discovery
• Training of ML models with KGs
• Conversational Agents
– Q&A
– Personal Assistants
– Chatbots
• Open Data
Building the AGRIF KG
Australian Government Records Interoperability Framework
• Address discovery and semantic interoperability needs in Australian
Government
• Combine records/archives/information management with contemporary
data science
• Emphasis business benefit to the creators of information
• Make sure it does not require an entirely new skillset for everyone involved
• Build proof-of-concept KG for two use case agencies
Building the AGRIF KG
Learning graph
shapes from
KG
KG Usage
Adding schema
links to external
KG
Develop AGRIF
ontology
Map from
source
metadata to
JSON objects
Map from
JSON objects
to RDFS/OWL
Extract data from
unstructured
sources using
NLP/NER
KG Curation
(e.g., entity
reconciliation)
Building the AGRIF KG
Metadata Extractor
Document
Store
(CouchDB)
Triple Store
(Virtuoso)
JSON
NLP/NER-Toolkit
Schímatos
Platform
SHACL Learner
Active Knowledge
Graph Completion
J2RM
RDF
A
P
I
.pdf
.docx
.msg
.xlsx
.csv
…
End User
Domain Expert
A
P
I
KG-I
Protégé
Architecture
AGRIF KG tools
• Schema
– AGRIF Ontology
http://reference.data.gov.au/def/ont/agrif
• Open-source software
– Metadata Extractor & Loader (MEL)
– JSON to RDF Mappings (J2RM) [Méndez et al., 2020]
– SHACLearner [Omran et al., 2020]
– Schímatos [Wright et al., 2020]
Conclusions
• Stronger focus on the end user needed
– Tools/methods needed for creating/maintaining KGs
– Tools/methods needed to support querying/analysing KG
Schemas
• Improved NLP/NER-based learning techniques
needed (distant supervision) that build s-p-o
relations from unstructured text [Mintz et al., 2009]
• Permanent Distributed querying/replication of
data/schema
References
• Hogan, A., et al.: Knowledge Graphs. ACM Computing Surveys (to appear), 2021.
• Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A. , Taylor, J.: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2),
2019.
• Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993.
• Frischmuth, P., Martin, M., Tramp, S., Riechert, T., Auer, S.: OntoWiki – An Authoring, Publication and Visualization Interface for the Data Web.
Semantic Web, vol. 6, no. 3, pp. 215-240, 2015.
• Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. The Semantic Web – ISWC 2006.
• Wright, J., Méndez, S. J. R., Haller, A., Taylor, K., Omran, P. G.: Schímatos: a SHACL-based Web-Form Generator for Knowledge Graph Editing. The
Semantic Web – ISWC 2020.
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL Extension for Generating RDF from Heterogeneous Formats. ESWC (1), 2017.
• Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic Web 7 (1), 63-93,
2016.
• Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508, 2017.
• Berners-Lee, T.: Linked Data. W3C Design Issues. URL: http://www.w3.org/DesignIssues/LinkedData.html, 2006.
• Haller, A., Polleres, A.: Are we better off with just one ontology on the Web? Semantic Web 11(1): 87-99, 2020a.
• Sure, Y., Staab, S., Studer, R., On-To-Knowledge Methodology (OTKM), Handbook on Ontologies (2004) pp 117-132.
• Haller, A., Fernández, J. D., Kamdar, M. R. , Polleres, A.: What Are Links in Linked Open Data? A Characterization and Evaluation of Links between
Knowledge Graphs on the Web. ACM J. Data Inf. Qual. 12(2): 9:1-9:34, 2020b.
• Abele, A., McCrae, J. P., Buitelaar, P., Jentzsch, A., Cyganiak, R: Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre. 2017.
• Méndez, S. J. R., Haller, A., Omran, P.G., Wright, J., Taylor, K.: J2RM: An ontology-based JSON-to-RDF Mapping tool. ISWC (Demos/Industry) 2020.
• Omran, P. G., Taylor, K., Méndez, S. J. R., Haller, A.: Towards SHACL Learning from Knowledge Graphs. ISWC (Demos/Industry) 2020.
• Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. Joint Conference of the 47th Annual
Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, (ACL ‘09), 2009.

Weitere ähnliche Inhalte

Was ist angesagt?

What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?Szymon Klarman
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationRinke Hoekstra
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark GreavesMediabistro
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwanandrea huang
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Sören Auer
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudDhaval Thakker
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
Museum Data Exchange
Museum Data ExchangeMuseum Data Exchange
Museum Data ExchangeOCLC Research
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityunivTope Omitola
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 

Was ist angesagt? (20)

Sanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUDSanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUD
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
Web 3 Mark Greaves
Web 3 Mark GreavesWeb 3 Mark Greaves
Web 3 Mark Greaves
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...Towards Knowledge Graph based Representation, Augmentation and Exploration of...
Towards Knowledge Graph based Representation, Augmentation and Exploration of...
 
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at ScaleFull Erdmann Ruttenberg Community Approaches to Open Data at Scale
Full Erdmann Ruttenberg Community Approaches to Open Data at Scale
 
Ziegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections LibrariesZiegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections Libraries
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
NISO/NFAIS Joint Virtual Conference: Connecting the Library to the Wider Worl...
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
Museum Data Exchange
Museum Data ExchangeMuseum Data Exchange
Museum Data Exchange
 
Omitola birmingham cityuniv
Omitola birmingham cityunivOmitola birmingham cityuniv
Omitola birmingham cityuniv
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)
 

Ähnlich wie Knowledge Graph Engineering

Fuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task SchedulingFuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task Schedulingijtsrd
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked DataDave Reynolds
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataChristophe Debruyne
 
Developments in catalogues and data sharing
Developments in catalogues and data sharingDevelopments in catalogues and data sharing
Developments in catalogues and data sharingEdmund Chamberlain
 
The Impact of Bibframe
The Impact of BibframeThe Impact of Bibframe
The Impact of BibframeThomas Meehan
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...Michele Pasin
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Paragon_Science_Inc
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsEnrico Daga
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioCatalogue
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsThe University of Edinburgh
 
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...Bioschemas
 
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
Open (linked) bibliographic data   edmund chamberlain (university of cambridge)Open (linked) bibliographic data   edmund chamberlain (university of cambridge)
Open (linked) bibliographic data edmund chamberlain (university of cambridge)RDTF-Discovery
 
BLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEM
BLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEMBLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEM
BLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEMIRJET Journal
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesTanu Malik
 
Towards Linked Open Services and Processes
Towards Linked Open Services and ProcessesTowards Linked Open Services and Processes
Towards Linked Open Services and ProcessesBarry Norton
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Toni Hermoso Pulido
 

Ähnlich wie Knowledge Graph Engineering (20)

Fuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task SchedulingFuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
Fuzzy Based Algorithm for Cloud Resource Management and Task Scheduling
 
20131207ximengliu
20131207ximengliu20131207ximengliu
20131207ximengliu
 
Resilient Linked Data
Resilient Linked DataResilient Linked Data
Resilient Linked Data
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
Developments in catalogues and data sharing
Developments in catalogues and data sharingDevelopments in catalogues and data sharing
Developments in catalogues and data sharing
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
The Impact of Bibframe
The Impact of BibframeThe Impact of Bibframe
The Impact of Bibframe
 
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...STI 2022 - Generating large-scale network analyses of scientific landscapes i...
STI 2022 - Generating large-scale network analyses of scientific landscapes i...
 
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
Finding Emerging Topics Using Chaos and Community Detection in Social Media G...
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
BioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogueBioIT Europe 2010 - BioCatalogue
BioIT Europe 2010 - BioCatalogue
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...
Make your Web resources more discoverable with Bioschemas markup –Bioschemas ...
 
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
Open (linked) bibliographic data   edmund chamberlain (university of cambridge)Open (linked) bibliographic data   edmund chamberlain (university of cambridge)
Open (linked) bibliographic data edmund chamberlain (university of cambridge)
 
BLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEM
BLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEMBLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEM
BLOCKCHAIN IMPLEMENTATION IN EDUCATIONAL SYSTEM
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
Benchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging ServicesBenchmarking Cloud-based Tagging Services
Benchmarking Cloud-based Tagging Services
 
Towards Linked Open Services and Processes
Towards Linked Open Services and ProcessesTowards Linked Open Services and Processes
Towards Linked Open Services and Processes
 
Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...Semantic web technologies applied to bioinformatics and laboratory data manag...
Semantic web technologies applied to bioinformatics and laboratory data manag...
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 

Kürzlich hochgeladen

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 

Kürzlich hochgeladen (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 

Knowledge Graph Engineering

  • 1. Knowledge Graph Engineering Keynote at Summer School on AI for Industry 4.0 Armin Haller Associate Professor, ANU
  • 2. Knowledge Graphs (KGs) “A Knowledge Graph is a graph of data intended to accumulate and convey knowledge of the real world, whose nodes represent entities of interest and whose edges represent relations between these entities.” [Hogan et al., 2020] • Knowledge graphs are created collaboratively by many users • Information can be added in a relatively arbitrary manner as structural constraints are few Closed KGs (~2019) [Noy et al., 2019] Microsoft ~2bn entities, ~55bn facts Google ~1bn entities, ~70bn assertions Facebook ~50m entities, ~500m assertions eBay ~1bn triples IBM ~100m entities, 5bn relationships Open KGs (April 2021) DBpedia ~4.58m entities, ~9.25GB Yago4 ~50m entities, ~18.4GB Wikidata ~93m entities, ~99GB
  • 3. Knowledge Graphs (KGs) Graphs Natural way of structuring and presenting knowledge Heterogenous Knowledge from different sources can be integrated and/or interlinked Schema-later Schema often not decided until later, and does not impose integrity constraints
  • 4. Schema in KGs Ontologies as schemas in KGs An ontology is an “explicit specification of a conceptualization consisting of a set of objects, and the describable relationships among them” [Gruber, 1993] Components of an Ontology • Classes: abstract groups (sets) of objects that are defined by properties that all its members share (e.g., Person, Organisation, Event) • Attributes: characteristics or parameters that objects (and classes) can have (e.g., data of birth, longitude, latitude, timestamp) • Relationships: ways in which classes and individuals can be related to one another (e.g., role, attributed to, observed by) • Individuals: Concrete objects that are inherent to the domain of discourse, such as specific people, organisations or abstract individuals such as numbers (e.g., g, π)
  • 5. Limited many entities Generic applies to many Specific applies to few RDF Knowledge Graphs Comprehensive fewer entities ABox (Data) TBox (Schema) Q58043963 Q76 Barack Obama (3,947 axioms) Armin Haller (189 axioms) P361 Q35120 Entity partOf minimum no of players Chess Person Q73145133 P1872
  • 6. Meta-modelling issues in KGs Without enforced (upfront designed) schemas, KGs suffer from, e.g.: • Inconsistent modelling of classes/instances <Q1412680> <P279> <Q28100368> | <Beef Wellington> <subclass of> <Beef Dish> <Q6497852> <P31> <Q28100665> | <Wiener Schnitzel> <instance of> <Veal Dish> • Subclassing of disjoint super-classes <Q190928> <P279> <Q124282> | <shipyard> <subclass of> <dock> <Q190928> <P279> <Q4830453> | <shipyard> <subclass of> <business> <Q124282> <P279> <Q7184903> | <shipyard> <subclass of> <abstract object> <Q190928> <P279> <Q223557> | <shipyard> <subclass of> <physical object> • Instance of relations between first-order classes <Q12156> <P31> <Q12136> | <Malaria> <instance of> <Disease> <Q12156> <P279> <Q12136> | <Malaria> <subclass of> <Disease> • Redundant/circular inheritances between first-order classes <Q18557307> <P279> <Q692536> | <muscle tissue disease> <subclass of> <muscular disease> <Q692536> <P279> <Q18557307> | <muscular disease <subclass of> <muscle tissue disease>
  • 7. Types of Schemas (Ontologies) Level of Abstraction Most General Most Specific Reusability Highest Lowest Upper Ontologies Mid-Level Ontologies Domain Ontologies Use-Case Ontologies e.g., CyC, SUMO, DOLCE, BFO, CYC e.g., PROV-O, FOAF, ORG, SOSA/SSN, AGRIF e.g., GO, ChEBI, DO, BTO [Haller & Polleres, 2020a]
  • 8. KG Engineering KG Creation Extract data from existing resources KG Usage KG Linking Add instance assertions KG Curation Add schema assertions
  • 9. KG Creation – Develop Schema Top-Down Schema first, Data later Bottom Up Data first, Schema later ABox (Data) TBox (Schema) Middle-Out
  • 10. KG Creation Bottom-Up KG Creation • Schema is not defined, and data is added organically and manually using tools such as: – OntoWiki [Frischmuth et al., 2015] – Semantic MediaWiki [Krötzsch et al., 2006] – Wikibase – Schímatos [Wright et al., 2020] Top-Down KG Creation • Schema is created upfront, existing data mapped to schema using languages/tools such as: – R2RML – SPARQL Generate [Lefrançois et al., 2017] – SHACL Rules – TARQL – NLP/NER from unstructured text Middle-Out KG Creation [Sure et al., 2004] • Schema is partly defined upfront, with mappings added later when data defines semantics • Use case data is provided upfront
  • 11. KG Curation Correctness – Evaluation Accessibility, Accuracy, Consistency, Conciseness, Trustability, Dynamicity, Representationality [Zaveri et al., 2016] – Correction Evaluating data quality (SHACL, SheX) • Syntactic errors • Semantic errors Completeness – KG Completion [Paulheim, 2017] Using structural information observed in triples • Classification • Probabilistic and Statistical Methods
  • 12. KG Linking Linked Data Principles [Berners-Lee, 2006] • LDP1: Use URIs as identifiers for things; • LDP2: Use HTTP URIs so those identifiers can be dereferenced; • LDP3: return useful information upon dereferencing of those URIs using a standard format (typically, RDF); • LDP4: include links using externally dereferenceable URIs
  • 13. KG Linking Linking Issues [Haller et al., 2020b] • References to many inaccessible URIs (i.e., broken links) may render a KG largely useless • Changes in linked external KGs are out of control of the KG publisher • Previously, no definition of what constitutes a “link”, specifically “internal links”, i.e., links between parts of one coherent KG, and “external links”, i.e., links between different KGs) – A triple is a link if it contains a URI in a namespace other than the authoritative namespace URI of the dataset/KG where the triple is defined. [Haller et al., 2020b]
  • 14. KG Linking – Link Types • Ontology links [Haller et al., 2020b] – class link t:[dbo:Person, rdfs:subClassOf, foaf:Person] – instance typing link t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person] – property link t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang Amadeus Mozart"@en] – instance role link t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088] (Antonio Salieri) • Instance link t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
  • 15. KG Linking in the wild • Crawl of the LODcloud [Abele et al., 2017] + historical datasets from the LODcloud that were cached in the LODLaundromat • 430 Linked datasets in resulting corpus, each encoded in HDT for a total size of 51 GB (3.3bn triples) % of total Available Available as % of total Total # of datasets 1,359 100% SPARQL endpoint 459 33.5% 125 9.1% Available as download 890 65.4% 226 16.6% Characteristic Median Mean Number of Triples 4,478 17,860,436 Number of Unique Subjects 613 1,774,578 Number of Unique Predicates 31 65.4% Number of unique objects 2,245 5,296,390
  • 16. KG Linking in the wild (cont’d) Class Links http://vivo.iu.edu 119,538 http://vivo.scripps.edu 63,128 http://www.imagesnippets.com 12,874 http://core.kmi.open.ac.uk 9,143 http://commons.wikimedia.org 8,258 http://vivo.psm.edu 8,036 http://datos.bne.es 2,778 http://dbpedia.org 1,614 http://www.productontology.org 1,000 http://vivoweb.org 84 http://commons.wikimedia.org 4,995 http://datos.bne.es 1,255 http://vivo.iu.edu 510 http://vivo.psm.edu 481 http://vivoweb.org 386 http://vivo.scripps.edu 187 http://semanticscience.org 168 http://www.iupac.org 102 http://dbpedia.org 101 http://tkm.kiom.re.kr 60 Property Links Median 0 Mean 1,299 % above 0 44% Median 0 Mean 47 % above 0 18%
  • 17. Instance Typing Links KG Linking in the wild (cont’d) Instance Links http://webisa.webdatacommons.org 101,491,507 http://commons.wikimedia.org 100,022,186 http://lod.b3kat.de 40,674,519 http://lod.hebis.de 39,160,423 http://d-nb.info 20,096,228 http://datos.bne.es 7,419,630 http://data.ordnancesurvey.co.uk 5,653,997 http://data.europeana.eu 4,987,332 http://id.loc.gov 1,570,877 http://data.bibsys.no 1,440,011 http://ld.zdb-services.de 398,381,851 http://commons.wikimedia.org 319,988,690 http://d-nb.info 14,160,649 http://data.ordnancesurvey.co.uk 13,277,718 https://data.gov.cz 3,081,559 http://core.kmi.open.ac.uk 1,696,618 http://lod.hebis.de 1,624,579 http://id.loc.gov 1,143,545 http://data.europeana.eu 687,735 http://spraakbanken.gu.se 451,081 http://www.imagesnippets.com 214,362 http://data.coi.cz 34,277 Median 206 Mean 1,967,570 % above 0 97% Median 206 Mean 4,240,890 % above 0 72%
  • 18. KG Linking in the wild (cont’d) • Selected predicates used in links owl:samesAs owl:DifferentFrom Rdfs:seeAlso owl:AllDifferent Median 0 0 0 0 Mean 503,859 581 2,735 0 % above 0 53% <1% 14% 0 P90% 1,460 0 1 0 1st 1st # http://commons.wikimedia.org N/A 40,636,493 103,439 324,659 2nd 2nd # http://ld.zdb-services.de 18,049,155 N/A http://stitch.cs.vu.nl N/A 3rd 3rd # http://d-nb.info 17,410,586 N/A http://data.nobelprize.org N/A
  • 19. KG Linking in the wild (cont’d) Total Links http://ld.zdb-services.de 421,206,061 http://commons.wikimedia.org 420,024,129 http://webisa.webdatacommons.org 101,491,507 http://lod.hebis.de 40,785,002 http://lod.b3kat.de 40,677,795 http://d-nb.info 34,256,877 http://data.ordnancesurvey.co.uk 18,931,817 http://datos.bne.es 7,428,111 http://data.europeana.eu 5,675,067 https://data.gov.cz 3,958,043 Median 416 Mean 6,209,808 % above 0 96% Broken Class URIs Broken Property URIs Prefix.cc crawl LOD corpus Prefix.cc crawl LOD corpus HTTP Code # % # % # % # % 200 7,175 12.3% 2,579 12.8% 814 44.7% 58,108 40.9% 301 18,598 31.8% 2,610 12.9% 442 24.3% 1,137 0.8% 302 4,331 7.4% 925 0.5% 194 10.7% 1,391 1.0% 303 12,805 21.9% 3,903 19.3% 108 5.9% 5,247 3.7% 40x 12,054 20.6% 8,664 42.9% 130 7.1% 73,366 51.7% 50x 66 <0.1% 111 <0.1% 4 <0.1% 362 0.3% No response 146,145 5.9% 1,425 7% 129 7.1% 2,332 1.6% Total 204,616 100% 20,217 100% 1,821 100% 141,943 100%
  • 20. KG Linking in the wild – Wikidata • Wikidata by far the largest openly available KG and the only one truly built bottom-up → cause of many modelling errors/inconsistencies • Not part of the LODCloud, therefore was not included in [Haller et al., 2020b], however, we did an analysis since for the 9th of March 2020 Wikidata dump (HDT file 49.4GB compressed) Number of triples 3,381,623,911 Number of unique subjects 1,327,447,995 Number of predicates 32,713 Number of unique objects 2,010,015,636 Number of shared subject-object 1,173,987,281 Unique Individuals 75,261,968 Class Links 375,351,770 Property Links 2,723,834 of which sameAs links 2,723,834 Instance Typing Links 77,479,623 # of Classes 1,045,455 # of Properties 74,746 Ratio 1/14 # of unique Properties 7,259
  • 21. KG Linking in the wild (cont’d) • Ontologies are reused widely – Only a few KGs define their own ontology → a large number of ontologies exist that cover already many domains • Ubiquity of broken Class and Property links – Alarming number of broken links, i.e., more than half of all class and property URIs – Data publishers need to consider to replicate linked ontologies • Lack of Instance Links – Many (28% of all) KGs do not use any Instance Links, and owl:sameAs is not particularly popular at all (other than in Wikidata) 1. these links are expensive to establish manually 2. expensive to maintain, and 3. even if they exist, there is no incentive to publish them openly.
  • 22. KG Usage • Knowledge Management, Knowledge Discovery • Training of ML models with KGs • Conversational Agents – Q&A – Personal Assistants – Chatbots • Open Data
  • 23. Building the AGRIF KG Australian Government Records Interoperability Framework • Address discovery and semantic interoperability needs in Australian Government • Combine records/archives/information management with contemporary data science • Emphasis business benefit to the creators of information • Make sure it does not require an entirely new skillset for everyone involved • Build proof-of-concept KG for two use case agencies
  • 24. Building the AGRIF KG Learning graph shapes from KG KG Usage Adding schema links to external KG Develop AGRIF ontology Map from source metadata to JSON objects Map from JSON objects to RDFS/OWL Extract data from unstructured sources using NLP/NER KG Curation (e.g., entity reconciliation)
  • 25. Building the AGRIF KG Metadata Extractor Document Store (CouchDB) Triple Store (Virtuoso) JSON NLP/NER-Toolkit Schímatos Platform SHACL Learner Active Knowledge Graph Completion J2RM RDF A P I .pdf .docx .msg .xlsx .csv … End User Domain Expert A P I KG-I Protégé Architecture
  • 26. AGRIF KG tools • Schema – AGRIF Ontology http://reference.data.gov.au/def/ont/agrif • Open-source software – Metadata Extractor & Loader (MEL) – JSON to RDF Mappings (J2RM) [Méndez et al., 2020] – SHACLearner [Omran et al., 2020] – Schímatos [Wright et al., 2020]
  • 27. Conclusions • Stronger focus on the end user needed – Tools/methods needed for creating/maintaining KGs – Tools/methods needed to support querying/analysing KG Schemas • Improved NLP/NER-based learning techniques needed (distant supervision) that build s-p-o relations from unstructured text [Mintz et al., 2009] • Permanent Distributed querying/replication of data/schema
  • 28. References • Hogan, A., et al.: Knowledge Graphs. ACM Computing Surveys (to appear), 2021. • Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A. , Taylor, J.: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2), 2019. • Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993. • Frischmuth, P., Martin, M., Tramp, S., Riechert, T., Auer, S.: OntoWiki – An Authoring, Publication and Visualization Interface for the Data Web. Semantic Web, vol. 6, no. 3, pp. 215-240, 2015. • Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. The Semantic Web – ISWC 2006. • Wright, J., Méndez, S. J. R., Haller, A., Taylor, K., Omran, P. G.: Schímatos: a SHACL-based Web-Form Generator for Knowledge Graph Editing. The Semantic Web – ISWC 2020. • Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL Extension for Generating RDF from Heterogeneous Formats. ESWC (1), 2017. • Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic Web 7 (1), 63-93, 2016. • Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508, 2017. • Berners-Lee, T.: Linked Data. W3C Design Issues. URL: http://www.w3.org/DesignIssues/LinkedData.html, 2006. • Haller, A., Polleres, A.: Are we better off with just one ontology on the Web? Semantic Web 11(1): 87-99, 2020a. • Sure, Y., Staab, S., Studer, R., On-To-Knowledge Methodology (OTKM), Handbook on Ontologies (2004) pp 117-132. • Haller, A., Fernández, J. D., Kamdar, M. R. , Polleres, A.: What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web. ACM J. Data Inf. Qual. 12(2): 9:1-9:34, 2020b. • Abele, A., McCrae, J. P., Buitelaar, P., Jentzsch, A., Cyganiak, R: Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre. 2017. • Méndez, S. J. R., Haller, A., Omran, P.G., Wright, J., Taylor, K.: J2RM: An ontology-based JSON-to-RDF Mapping tool. ISWC (Demos/Industry) 2020. • Omran, P. G., Taylor, K., Méndez, S. J. R., Haller, A.: Towards SHACL Learning from Knowledge Graphs. ISWC (Demos/Industry) 2020. • Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, (ACL ‘09), 2009.