SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Collaboratively building
Knowledge Graphs on the Web
Armin Haller
Associate Professor, ANU
Data deluge
Impossible to
manually process
even a fraction of
this information …
… we need to
prepare for a post-
big data world.
Machine Learning/AI
ML/AI approaches are performing extremely well in dealing
with such massive amounts of data on tasks such as:
– Image Recognition
– Speech Recognition
– Product recommendations
– Question & Answering
– Spam filtering
… and for neither of these applications we need an
explanation of the learned facts.
Machine Learning/AI and its
limitations
However, if it comes to:
– Self driving cars
– Medical diagnosis
– Drug design
– Robot interactions
– Military applications
– etc.
Humans need to understand the rationale of a decision.
– Facebook employs nearly 15,000 people to moderate posts
deemed inappropriate by ML/AI
eXplainable AI
XAI requires
• Encoding of context (Who, What, How,
When...)
• Encoding the semantics of inputs,
outputs and their properties
• Encoding of common sense knowledge
(e.g., one sits on a chair and eats on a
table)
Knowledge Graphs (KGs)
• Performance and explainability of ML
improves when data is given a context
– a Knowledge Graph increases the informative value
of the collected data that is given to the model
Knowledge Graphs [Paulheim 2017]
– describe real-world entities and their interrelations
– define possible classes and relations of entities in a
schema (ontology)
– allow for interrelating arbitrary entities with each
other
Knowledge Graphs (KGs)
• Knowledge graphs are (generally) created collaboratively by many
users
• Information can be added in a relatively arbitrary manner as
structural constraints are few
Closed KGs (~2019) [Noy et al., 2019]
Microsoft ~2bn entities, ~55bn facts
Google ~1bn entities, ~70bn assertions
Facebook ~50m entities, ~500m assertions
eBay ~1bn triples
IBM ~100m entities, 5bn relationships
Open KGs (April 2021)
DBpedia ~4.58m entities, ~9.25GB
Yago4 ~50m entities, ~18.4GB
Wikidata ~93m entities, ~99GB
Knowledge Graphs (KGs)
Graphs
Natural way of
structuring and
presenting
knowledge
Heterogenous
Knowledge from
different sources
can be integrated
and/or interlinked
Schema-later
Schema often not
decided until later,
and does not impose
integrity constraints
Schema in KGs
Ontologies as schemas in KGs
An ontology is an “explicit specification of a conceptualization consisting of a set of
objects, and the describable relationships among them”
[Gruber, 1993]
Components of an Ontology
• Classes: abstract groups (sets) of objects that are defined by properties that all its
members share (e.g., Person, Organisation, Event)
• Attributes: characteristics or parameters that objects (and classes) can have (e.g.,
data of birth, longitude, latitude, timestamp)
• Relationships: ways in which classes and individuals can be related to one another
(e.g., role, attributed to, observed by)
• Individuals: Concrete objects that are inherent to the domain of discourse, such as
specific people, organisations or abstract individuals such as numbers (e.g., g, π)
Limited
many entities
Generic
applies to many
Specific
applies to few
KG modelling detail
Comprehensive
fewer entities
Data
Schema
Q58043963
Q76
Barack Obama
(3,947 axioms)
Armin Haller
(189 axioms)
P361
Q35120
Entity
partOf
minimum
no of players
Chess Person Q73145133
P1872
Types of Schemas (Ontologies)
Level
of
Abstraction
Most
General
Most
Specific
Reusability
Highest
Lowest
Upper
Ontologies
Mid-Level Ontologies
Domain Ontologies
Use-Case Ontologies
e.g., CyC,
SUMO,
DOLCE, BFO,
CYC
e.g., PROV-O,
FOAF, ORG,
SOSA/SSN,
AGRIF
e.g., GO,
ChEBI,
DO,
BTO
[Haller & Polleres, 2020a]
KG Engineering
KG Creation
Extract data
from existing
resources
KG Usage
KG Linking
Add instance
assertions
KG Curation
Add schema
assertions
KG Creation
Top-Down
Schema first,
Data later
Bottom Up
Data first,
Schema later
Data
Schema
Middle-Out
KG Creation (cont’d)
Bottom-Up KG Creation
• Schema is not defined, and data is added organically and manually using tools such as:
– OntoWiki [Frischmuth et al., 2015]
– Semantic MediaWiki [Krötzsch et al., 2006]
– Wikibase
– Schímatos [Wright et al., 2020]
Top-Down KG Creation
• Schema is created upfront, existing data mapped to schema using languages/tools such as:
– R2RML
– SPARQL Generate [Lefrançois et al., 2017]
– SHACL Rules
– TARQL
– Metadata Extractor & Loader (MEL) [Méndez et al., 2021]
– JSON to RDF Mappings (J2RM) [Méndez et al., 2020]
Middle-Out KG Creation [Sure et al., 2004]
• Schema is partly defined upfront based on use cases, with mappings added later when data
defines semantics
Collaboratively building KGs
• Biggest KGs on the Web are built, collaboratively, bottom-up:
– Schema.org Ontology and KG
• Over 10 million sites use Schema.org to markup their web pages and email messages
– Wikidata Ontology and KG
• Wikipedia for Data, 149GB
schema.org Wikidata
Availability • Ontology highly available
• Data availability depending on publisher
• Ontology highly available
• Data highly available
Discoverability • Ontology → Easy
• Instances → Very Difficult
• Ontology → Relatively Difficult
• Instances → Very Easy
Completeness
& Adaptability
• Domain specific (E-Commerce)
• Community extensions available
• (All of) Human Knowledge
Maintenance
& Versioning
• Continuous curation
• Versions are not made explicit
• Continuous curation
• Explicit entity versions + version history
Modularization • Fully distributed, easily accessible,
ontology
• Fully distributed, difficult to access, data
• Fully distributed, relatively difficult to
access, ontology
• Fully distributed, easy to access, data
Quality • High quality ontology
• Low quality data
• High quality ontology
• High quality data
Meta-modelling issues
Without enforced (upfront designed) schemas, KGs suffer from, e.g.:
• Inconsistent modelling of classes/instances
<Q1412680> <P279> <Q28100368> | <Beef Wellington> <subclass of> <Beef Dish>
<Q6497852> <P31> <Q28100665> | <Wiener Schnitzel> <instance of> <Veal Dish>
• Subclassing of disjoint super-classes
<Q190928> <P279> <Q124282> | <shipyard> <subclass of> <dock>
<Q190928> <P279> <Q4830453> | <shipyard> <subclass of> <business>
<Q124282> <P279> <Q7184903> | <shipyard> <subclass of> <abstract object>
<Q190928> <P279> <Q223557> | <shipyard> <subclass of> <physical object>
• Instance of relations between first-order classes
<Q12156> <P31> <Q12136> | <Malaria> <instance of> <Disease>
<Q12156> <P279> <Q12136> | <Malaria> <subclass of> <Disease>
• Redundant/circular inheritances between first-order classes
<Q18557307> <P279> <Q692536> | <muscle tissue disease> <subclass of> <muscular disease>
<Q692536> <P279> <Q18557307> | <muscular disease> <subclass of> <muscle tissue disease>
KG Curation
Correctness
– Evaluation
Accessibility, Accuracy, Consistency, Conciseness, Trustability,
Dynamicity, Representationality [Zaveri et al., 2016]
– Correction
Evaluating data quality (SHACL, SheX)
• Syntactic errors
• Semantic errors
Completeness
– KG Completion [Paulheim, 2017]
Using structural information observed in triples
• Classification
• Probabilistic and Statistical Methods
KG Linking
Internal vs. External links [Haller et al., 2020b]
– internal links, i.e., links between parts of one coherent KG, i.e., edges linking
nodes within the graph
• Link prediction techniques are used to learn those new links
– external links, i.e., links between different KGs, i.e., edges between nodes from
different graphs, or reusing edges from a different graph to link nodes in one KG
Linking Issues [Haller et al., 2020b]
• References to many inaccessible URIs (i.e., broken links) may render
a KG largely useless
• Changes in linked external KGs are out of control of the KG publisher
KG Linking
• Ontology links [Haller et al., 2020b]
– class link
t:[dbo:Person, rdfs:subClassOf, foaf:Person]
– instance typing link
t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person]
– property link
t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang
Amadeus Mozart"@en]
– instance role link
t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088]
(Antonio Salieri)
• Instance link
t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
KG Linking in Wikidata
• Wikidata by far the largest openly available KG, truly built bottom-up
schema (ontology) and data
• Wikidata dump (in HDT) from 3rd of March 2021, 53GB (149GB
uncompressed).
General Statistics
# Triples (Facts) 1,693,668,039
# Subjects 1,625,057,179
# Predicates (edges) 38,867
# Unique objects 2,538,585,808
# Unique entities 89,120,227
# Unique Classes 2,522,595
# Unique Properties 74,309
Links
# Class Links 3,955
(0.001 per class)
# Property Links 835
(0.01 per property)
# Instance Typing Links 0
# Instance Links
• Exact Match (P2888)
• Said to be the Same (P460)
• Inverse Property (P1696)
173,177,045
(1.94 per entity)
3,268,021
2
0
KG Linking in Wikidata
(cont’d)
• Wikidata ontology includes links to other ontologies,
but relatively fewer class and property links
compared to other open KGs on the Web
• Wikidata defines an extensive ontology (schema)
that is used to define entities within its KG
• Wikidata links to other KGs, but uses relatively
less instance links than other KGs on the Web
– Does not (yet) include many similarity relations even
though it should not be the authoritative source for many
of its entities
KG Usage
• Knowledge Management, Knowledge
Discovery
• Training of ML models with KGs
• Conversational Agents
– Q&A
– Personal Assistants
– Chatbots
• Open Data
Conclusions
• Stronger focus on the KG contributors and end user needed
– Tools/methods needed for creating/maintaining KGs
– Tools/methods needed to support querying/analysing KG Schemas
• KGs need to be stronger interlinked, e.g., link prediction
techniques need to be deployed between KGs rather than just
on a single KG
• Improved NLP/NER-based learning techniques needed (distant
supervision) that build s-p-o relations from unstructured text [Mintz et
al., 2009]
• Permanent Distributed querying/replication of data/schema
References
• Hogan, A., et al.: Knowledge Graphs. ACM Computing Surveys (to appear), 2021.
• Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A. , Taylor, J.: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2), 2019.
• Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993.
• Frischmuth, P., Martin, M., Tramp, S., Riechert, T., Auer, S.: OntoWiki – An Authoring, Publication and Visualization Interface for the Data Web. Semantic Web, vol. 6,
no. 3, pp. 215-240, 2015.
• Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. The Semantic Web – ISWC 2006.
• Wright, J., Méndez, S. J. R., Haller, A., Taylor, K., Omran, P. G.: Schímatos: a SHACL-based Web-Form Generator for Knowledge Graph Editing. The Semantic Web –
ISWC 2020.
• Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL Extension for Generating RDF from Heterogeneous Formats. ESWC (1), 2017.
• Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic Web 7 (1), 63-93, 2016.
• Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508, 2017.
• Berners-Lee, T.: Linked Data. W3C Design Issues. URL: http://www.w3.org/DesignIssues/LinkedData.html, 2006.
• Haller, A., Polleres, A.: Are we better off with just one ontology on the Web? Semantic Web 11(1): 87-99, 2020a.
• Sure, Y., Staab, S., Studer, R., On-To-Knowledge Methodology (OTKM), Handbook on Ontologies (2004) pp 117-132.
• Haller, A., Fernández, J. D., Kamdar, M. R. , Polleres, A.: What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge
Graphs on the Web. ACM J. Data Inf. Qual. 12(2): 9:1-9:34, 2020b.
• Abele, A., McCrae, J. P., Buitelaar, P., Jentzsch, A., Cyganiak, R: Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre. 2017.
• Méndez, S. J. R., Haller, A., Omran, P.G., Wright, J., Taylor, K.: J2RM: An ontology-based JSON-to-RDF Mapping tool. ISWC (Demos/Industry) 2020.
• Méndez, S. J. R., Haller, A., Omran, P.G., Taylor, K.: MEL: Metadata Extractor & Loader. ISWC (Posters/Demos/Industry) 2021.
• Omran, P. G., Taylor, K., Méndez, S. J. R., Haller, A.: Towards SHACL Learning from Knowledge Graphs. ISWC (Demos/Industry) 2020.
• Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. Joint Conference of the 47th Annual Meeting of the ACL and
the 4th International Joint Conference on Natural Language Processing of the AFNLP, (ACL ‘09), 2009.

Weitere ähnliche Inhalte

Was ist angesagt?

The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Benefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleBenefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleMartin Kaltenböck
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AICori Faklaris
 
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...DataScienceConferenc1
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsNeo4j
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowDatabricks
 
AI Transformation
AI TransformationAI Transformation
AI TransformationLiming Zhu
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxSahithiGurlinka
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOpsCarl W. Handlin
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohNUS-ISS
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingAnimesh Singh
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Introduction to AWS Greengrass on IoT
Introduction to AWS Greengrass on IoTIntroduction to AWS Greengrass on IoT
Introduction to AWS Greengrass on IoTAmazon Web Services
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 

Was ist angesagt? (20)

The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Benefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycleBenefiting from Semantic AI along the data life cycle
Benefiting from Semantic AI along the data life cycle
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
RDF and OWL
RDF and OWLRDF and OWL
RDF and OWL
 
An Introduction to Generative AI
An Introduction  to Generative AIAn Introduction  to Generative AI
An Introduction to Generative AI
 
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
[DSC DACH 23] ChatGPT and Beyond: How generative AI is Changing the way peopl...
 
Debunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative FactsDebunking some “RDF vs. Property Graph” Alternative Facts
Debunking some “RDF vs. Property Graph” Alternative Facts
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
 
AI Transformation
AI TransformationAI Transformation
AI Transformation
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Cloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptxCloud AI GenAI Overview.pptx
Cloud AI GenAI Overview.pptx
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Introduction to AWS Greengrass on IoT
Introduction to AWS Greengrass on IoTIntroduction to AWS Greengrass on IoT
Introduction to AWS Greengrass on IoT
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 

Ähnlich wie Knowledge graphs on the Web

DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!dclsocialmedia
 
Measurement and modeling of the web and related data sets
Measurement and modeling of the web and related data setsMeasurement and modeling of the web and related data sets
Measurement and modeling of the web and related data setsMark J. Feldman
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things PayamBarnaghi
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009Ian Foster
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanPeter Berger
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyondErnesto Reig
 
Knowledge Graph Engineering
Knowledge Graph EngineeringKnowledge Graph Engineering
Knowledge Graph EngineeringArmin Haller
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...giuseppe_futia
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...cscpconf
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22marpierc
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYcseij
 
Lecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityLecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityPayamBarnaghi
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...Thomas Rones
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the HaystackAdrian Stevenson
 

Ähnlich wie Knowledge graphs on the Web (20)

DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!DITA's New Thang: Going Mapless!
DITA's New Thang: Going Mapless!
 
Measurement and modeling of the web and related data sets
Measurement and modeling of the web and related data setsMeasurement and modeling of the web and related data sets
Measurement and modeling of the web and related data sets
 
Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things Semantic technologies for the Internet of Things
Semantic technologies for the Internet of Things
 
Grid Computing July 2009
Grid Computing July 2009Grid Computing July 2009
Grid Computing July 2009
 
Semantics in Financial Services -David Newman
Semantics in Financial Services -David NewmanSemantics in Financial Services -David Newman
Semantics in Financial Services -David Newman
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
Knowledge Graph Engineering
Knowledge Graph EngineeringKnowledge Graph Engineering
Knowledge Graph Engineering
 
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
Big Data e tecnologie semantiche - Utilizzare i Linked data come driver d'int...
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
Object-Oriented Database Model For Effective Mining Of Advanced Engineering M...
 
It's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic webIt's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic web
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22Cyberinfrastructure and Applications Overview: Howard University June22
Cyberinfrastructure and Applications Overview: Howard University June22
 
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEYUSING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
USING ONTOLOGIES TO OVERCOMING DRAWBACKS OF DATABASES AND VICE VERSA: A SURVEY
 
Lecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and InteroperabilityLecture 7: Semantic Technologies and Interoperability
Lecture 7: Semantic Technologies and Interoperability
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
G9_Cognition-Knowledge Rep and Reas_H6
G9_Cognition-Knowledge Rep and Reas_H6G9_Cognition-Knowledge Rep and Reas_H6
G9_Cognition-Knowledge Rep and Reas_H6
 
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
BIAM 410 Final Paper - Beyond the Buzzwords: Big Data, Machine Learning, What...
 
How to Find a Needle in the Haystack
How to Find a Needle in the HaystackHow to Find a Needle in the Haystack
How to Find a Needle in the Haystack
 
Keynote at AImWD
Keynote at AImWDKeynote at AImWD
Keynote at AImWD
 

Kürzlich hochgeladen

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...HyderabadDolls
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...HyderabadDolls
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...kumargunjan9515
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfSayantanBiswas37
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...HyderabadDolls
 

Kürzlich hochgeladen (20)

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 

Knowledge graphs on the Web

  • 1. Collaboratively building Knowledge Graphs on the Web Armin Haller Associate Professor, ANU
  • 2. Data deluge Impossible to manually process even a fraction of this information … … we need to prepare for a post- big data world.
  • 3. Machine Learning/AI ML/AI approaches are performing extremely well in dealing with such massive amounts of data on tasks such as: – Image Recognition – Speech Recognition – Product recommendations – Question & Answering – Spam filtering … and for neither of these applications we need an explanation of the learned facts.
  • 4. Machine Learning/AI and its limitations However, if it comes to: – Self driving cars – Medical diagnosis – Drug design – Robot interactions – Military applications – etc. Humans need to understand the rationale of a decision. – Facebook employs nearly 15,000 people to moderate posts deemed inappropriate by ML/AI
  • 5. eXplainable AI XAI requires • Encoding of context (Who, What, How, When...) • Encoding the semantics of inputs, outputs and their properties • Encoding of common sense knowledge (e.g., one sits on a chair and eats on a table)
  • 6. Knowledge Graphs (KGs) • Performance and explainability of ML improves when data is given a context – a Knowledge Graph increases the informative value of the collected data that is given to the model Knowledge Graphs [Paulheim 2017] – describe real-world entities and their interrelations – define possible classes and relations of entities in a schema (ontology) – allow for interrelating arbitrary entities with each other
  • 7. Knowledge Graphs (KGs) • Knowledge graphs are (generally) created collaboratively by many users • Information can be added in a relatively arbitrary manner as structural constraints are few Closed KGs (~2019) [Noy et al., 2019] Microsoft ~2bn entities, ~55bn facts Google ~1bn entities, ~70bn assertions Facebook ~50m entities, ~500m assertions eBay ~1bn triples IBM ~100m entities, 5bn relationships Open KGs (April 2021) DBpedia ~4.58m entities, ~9.25GB Yago4 ~50m entities, ~18.4GB Wikidata ~93m entities, ~99GB
  • 8. Knowledge Graphs (KGs) Graphs Natural way of structuring and presenting knowledge Heterogenous Knowledge from different sources can be integrated and/or interlinked Schema-later Schema often not decided until later, and does not impose integrity constraints
  • 9. Schema in KGs Ontologies as schemas in KGs An ontology is an “explicit specification of a conceptualization consisting of a set of objects, and the describable relationships among them” [Gruber, 1993] Components of an Ontology • Classes: abstract groups (sets) of objects that are defined by properties that all its members share (e.g., Person, Organisation, Event) • Attributes: characteristics or parameters that objects (and classes) can have (e.g., data of birth, longitude, latitude, timestamp) • Relationships: ways in which classes and individuals can be related to one another (e.g., role, attributed to, observed by) • Individuals: Concrete objects that are inherent to the domain of discourse, such as specific people, organisations or abstract individuals such as numbers (e.g., g, π)
  • 10. Limited many entities Generic applies to many Specific applies to few KG modelling detail Comprehensive fewer entities Data Schema Q58043963 Q76 Barack Obama (3,947 axioms) Armin Haller (189 axioms) P361 Q35120 Entity partOf minimum no of players Chess Person Q73145133 P1872
  • 11. Types of Schemas (Ontologies) Level of Abstraction Most General Most Specific Reusability Highest Lowest Upper Ontologies Mid-Level Ontologies Domain Ontologies Use-Case Ontologies e.g., CyC, SUMO, DOLCE, BFO, CYC e.g., PROV-O, FOAF, ORG, SOSA/SSN, AGRIF e.g., GO, ChEBI, DO, BTO [Haller & Polleres, 2020a]
  • 12. KG Engineering KG Creation Extract data from existing resources KG Usage KG Linking Add instance assertions KG Curation Add schema assertions
  • 13. KG Creation Top-Down Schema first, Data later Bottom Up Data first, Schema later Data Schema Middle-Out
  • 14. KG Creation (cont’d) Bottom-Up KG Creation • Schema is not defined, and data is added organically and manually using tools such as: – OntoWiki [Frischmuth et al., 2015] – Semantic MediaWiki [Krötzsch et al., 2006] – Wikibase – Schímatos [Wright et al., 2020] Top-Down KG Creation • Schema is created upfront, existing data mapped to schema using languages/tools such as: – R2RML – SPARQL Generate [Lefrançois et al., 2017] – SHACL Rules – TARQL – Metadata Extractor & Loader (MEL) [Méndez et al., 2021] – JSON to RDF Mappings (J2RM) [Méndez et al., 2020] Middle-Out KG Creation [Sure et al., 2004] • Schema is partly defined upfront based on use cases, with mappings added later when data defines semantics
  • 15. Collaboratively building KGs • Biggest KGs on the Web are built, collaboratively, bottom-up: – Schema.org Ontology and KG • Over 10 million sites use Schema.org to markup their web pages and email messages – Wikidata Ontology and KG • Wikipedia for Data, 149GB schema.org Wikidata Availability • Ontology highly available • Data availability depending on publisher • Ontology highly available • Data highly available Discoverability • Ontology → Easy • Instances → Very Difficult • Ontology → Relatively Difficult • Instances → Very Easy Completeness & Adaptability • Domain specific (E-Commerce) • Community extensions available • (All of) Human Knowledge Maintenance & Versioning • Continuous curation • Versions are not made explicit • Continuous curation • Explicit entity versions + version history Modularization • Fully distributed, easily accessible, ontology • Fully distributed, difficult to access, data • Fully distributed, relatively difficult to access, ontology • Fully distributed, easy to access, data Quality • High quality ontology • Low quality data • High quality ontology • High quality data
  • 16. Meta-modelling issues Without enforced (upfront designed) schemas, KGs suffer from, e.g.: • Inconsistent modelling of classes/instances <Q1412680> <P279> <Q28100368> | <Beef Wellington> <subclass of> <Beef Dish> <Q6497852> <P31> <Q28100665> | <Wiener Schnitzel> <instance of> <Veal Dish> • Subclassing of disjoint super-classes <Q190928> <P279> <Q124282> | <shipyard> <subclass of> <dock> <Q190928> <P279> <Q4830453> | <shipyard> <subclass of> <business> <Q124282> <P279> <Q7184903> | <shipyard> <subclass of> <abstract object> <Q190928> <P279> <Q223557> | <shipyard> <subclass of> <physical object> • Instance of relations between first-order classes <Q12156> <P31> <Q12136> | <Malaria> <instance of> <Disease> <Q12156> <P279> <Q12136> | <Malaria> <subclass of> <Disease> • Redundant/circular inheritances between first-order classes <Q18557307> <P279> <Q692536> | <muscle tissue disease> <subclass of> <muscular disease> <Q692536> <P279> <Q18557307> | <muscular disease> <subclass of> <muscle tissue disease>
  • 17. KG Curation Correctness – Evaluation Accessibility, Accuracy, Consistency, Conciseness, Trustability, Dynamicity, Representationality [Zaveri et al., 2016] – Correction Evaluating data quality (SHACL, SheX) • Syntactic errors • Semantic errors Completeness – KG Completion [Paulheim, 2017] Using structural information observed in triples • Classification • Probabilistic and Statistical Methods
  • 18. KG Linking Internal vs. External links [Haller et al., 2020b] – internal links, i.e., links between parts of one coherent KG, i.e., edges linking nodes within the graph • Link prediction techniques are used to learn those new links – external links, i.e., links between different KGs, i.e., edges between nodes from different graphs, or reusing edges from a different graph to link nodes in one KG Linking Issues [Haller et al., 2020b] • References to many inaccessible URIs (i.e., broken links) may render a KG largely useless • Changes in linked external KGs are out of control of the KG publisher
  • 19. KG Linking • Ontology links [Haller et al., 2020b] – class link t:[dbo:Person, rdfs:subClassOf, foaf:Person] – instance typing link t:[dbr:Wolfgang_Amadeus_Mozart, rdf:type, foaf:Person] – property link t:[dbr:Wolfgang_Amadeus_Mozart, foaf:name, "Wolfgang Amadeus Mozart"@en] – instance role link t:[dbr:Wolfgang_Amadeus_Mozart, foaf:knows, wd:Q51088] (Antonio Salieri) • Instance link t:[dbr:Wolfgang_Amadeus_Mozart, owl:sameAs, wd:Q254]
  • 20. KG Linking in Wikidata • Wikidata by far the largest openly available KG, truly built bottom-up schema (ontology) and data • Wikidata dump (in HDT) from 3rd of March 2021, 53GB (149GB uncompressed). General Statistics # Triples (Facts) 1,693,668,039 # Subjects 1,625,057,179 # Predicates (edges) 38,867 # Unique objects 2,538,585,808 # Unique entities 89,120,227 # Unique Classes 2,522,595 # Unique Properties 74,309 Links # Class Links 3,955 (0.001 per class) # Property Links 835 (0.01 per property) # Instance Typing Links 0 # Instance Links • Exact Match (P2888) • Said to be the Same (P460) • Inverse Property (P1696) 173,177,045 (1.94 per entity) 3,268,021 2 0
  • 21. KG Linking in Wikidata (cont’d) • Wikidata ontology includes links to other ontologies, but relatively fewer class and property links compared to other open KGs on the Web • Wikidata defines an extensive ontology (schema) that is used to define entities within its KG • Wikidata links to other KGs, but uses relatively less instance links than other KGs on the Web – Does not (yet) include many similarity relations even though it should not be the authoritative source for many of its entities
  • 22. KG Usage • Knowledge Management, Knowledge Discovery • Training of ML models with KGs • Conversational Agents – Q&A – Personal Assistants – Chatbots • Open Data
  • 23. Conclusions • Stronger focus on the KG contributors and end user needed – Tools/methods needed for creating/maintaining KGs – Tools/methods needed to support querying/analysing KG Schemas • KGs need to be stronger interlinked, e.g., link prediction techniques need to be deployed between KGs rather than just on a single KG • Improved NLP/NER-based learning techniques needed (distant supervision) that build s-p-o relations from unstructured text [Mintz et al., 2009] • Permanent Distributed querying/replication of data/schema
  • 24. References • Hogan, A., et al.: Knowledge Graphs. ACM Computing Surveys (to appear), 2021. • Noy, N., Gao, Y., Jain, A., Narayanan, A., Patterson, A. , Taylor, J.: Industry-scale Knowledge Graphs: Lessons and Challenges. ACM Queue 17(2), 2019. • Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5(2):199-220, 1993. • Frischmuth, P., Martin, M., Tramp, S., Riechert, T., Auer, S.: OntoWiki – An Authoring, Publication and Visualization Interface for the Data Web. Semantic Web, vol. 6, no. 3, pp. 215-240, 2015. • Krötzsch, M., Vrandečić, D., Völkel, M.: Semantic MediaWiki. The Semantic Web – ISWC 2006. • Wright, J., Méndez, S. J. R., Haller, A., Taylor, K., Omran, P. G.: Schímatos: a SHACL-based Web-Form Generator for Knowledge Graph Editing. The Semantic Web – ISWC 2020. • Lefrançois, M., Zimmermann, A., Bakerally, N.: A SPARQL Extension for Generating RDF from Heterogeneous Formats. ESWC (1), 2017. • Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic Web 7 (1), 63-93, 2016. • Paulheim, H.: Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3): 489-508, 2017. • Berners-Lee, T.: Linked Data. W3C Design Issues. URL: http://www.w3.org/DesignIssues/LinkedData.html, 2006. • Haller, A., Polleres, A.: Are we better off with just one ontology on the Web? Semantic Web 11(1): 87-99, 2020a. • Sure, Y., Staab, S., Studer, R., On-To-Knowledge Methodology (OTKM), Handbook on Ontologies (2004) pp 117-132. • Haller, A., Fernández, J. D., Kamdar, M. R. , Polleres, A.: What Are Links in Linked Open Data? A Characterization and Evaluation of Links between Knowledge Graphs on the Web. ACM J. Data Inf. Qual. 12(2): 9:1-9:34, 2020b. • Abele, A., McCrae, J. P., Buitelaar, P., Jentzsch, A., Cyganiak, R: Linking open data cloud diagram. URL: http://lod-cloud.net. Insight-Centre. 2017. • Méndez, S. J. R., Haller, A., Omran, P.G., Wright, J., Taylor, K.: J2RM: An ontology-based JSON-to-RDF Mapping tool. ISWC (Demos/Industry) 2020. • Méndez, S. J. R., Haller, A., Omran, P.G., Taylor, K.: MEL: Metadata Extractor & Loader. ISWC (Posters/Demos/Industry) 2021. • Omran, P. G., Taylor, K., Méndez, S. J. R., Haller, A.: Towards SHACL Learning from Knowledge Graphs. ISWC (Demos/Industry) 2020. • Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, (ACL ‘09), 2009.