Knowledge Graph Research and Innovation Challenges

Sören Auer
Symposium of the Knowledge Graph IG at the
Alan Turing Institute
June 17, 2022
Knowledge Graph Research and
Innovation Challenges

• Fabric of concept, class, property, relationships, entity descriptions
• Uses a knowledge representation formalism
(typically RDF, RDF-Schema, OWL)
• Holistic knowledge (multi-domain, source, granularity):
• instance data (ground truth),
• open (e.g. DBpedia, WikiData), private (e.g. supply chain data),
closed data (product models),
• derived, aggregated data,
• schema data (vocabularies, ontologies)
• meta-data (e.g. provenance, versioning, documentation licensing)
• comprehensive taxonomies to categorize entities
• links between internal and external data
• mappings to data stored in other systems and databases
Knowledge Graphs – A definition

Industry
Knowledge
Graph
Adoption
https://www.slideshare.net/
Frank.van.Harmelen/adopti
on-of-knowledge-graphs-
late-2019
Eccenca aims at making
KGs a commodity

Comparison of various enterprise data
integration paradigms
Paradigm Data
Model
Integr.
Strategy
Conceptual/
operational
Hetero-
geneous
data
Intern./
extern.
data
No. of
sources
Type of
integr.
Domain
coverage
Se-
mantic
repres.
XML
Schema
DOM trees LaV operational   medium both medium high
Data
Warehouse
relational GaV operational - partially medium physical small medium
Data Lake various LaV operational   large physical high medium
MDM UML GaV conceptual - - small physical small medium
PIM / PCS trees GaV operational partially partially - physical medium medium
Enterprise
search
document - operational  partially large virtual high low
EKG RDF LaV both   medium both high very high
[1] M. Galkin, S. Auer, M.-E. Vidal, S. Scerri: Enterprise Knowledge Graphs: A Semantic Approach for Knowledge
Management in the Next Generation of Enterprise Information Systems. ICEIS (2) 2017: 88-98
KGs are pretty much
established for Data
Integration, but what
about real Knowledge?

1. Integrate KGs with ML - Neuro-symbolic AI
2. Extend the concept of KGs
3. Establish true Human-Machine Collaboration
From KGs for Data Integration to KGs for
Knowledge Integration

Integrate KGs with ML -
Neuro-symbolic AI

How can we combine ML and KG?
ML reseracher: We can learn on graphs (GNN) 
KG researcher: We can use ML for KG completion (KG embedding) 

Towards Neuro-Symbolic Perception
Input Output
Horse
Tail
4
hasLegs
has
Pony small
size
subClassOf
Zebra Stripes
has
subClassOf

What do we need?
1. Use KGs as contextual/background knowledge for ML in addition to
raw data  Causal reasoning
2. Use ML to extend and revise KGs
3. Integrate human and machine intelligence

Synergistic Combination of Human & Machine
Intelligence leveraging Knowledge Graphs
Machine Intelligence
Cognitive
Knowledge Graph Human Intelligence
Concept
KG nodes/graphlets
Connecting KG graphlets
with ML models
KG graphlet authoring,
curation, validation

KGs are proven to capture factual knowledge
Research Challenge: Manage
• Uncertainty & disagreement
• Varying semantic granularity
• Emergence, evolution & provenance
• Integrating existing domain models
But maintain flexibility and simplicity
Cognitive Knowledge Graphs
for scholarly knowledge
Towards Cognitive
Knowledge Graphs
• Fabric of knowledge molecules (graphlets) –
compact, relatively simple, structured units of knowledge
• Can be incrementally enriched, annotated, interlinked …

KG Graphlets initial working definition
Formally a CKG graphlet is a tuple of sets of classes and properties (C,P), where
1. ∀ p ∈ P the domain (either explicitly defined or implicitly inferred from a concrete
CKG) includes at least one of the types c ∈ C: domain(p) ⊂ C and
2. all classes in C are connected via a property chain in P: ∀c1, c2 ∈ C ∃p1, ..., pj, ..., pn
∈ P: domain(p1) = c1, range(pj) = domain(pj+1), range(pn) = c2.
Alternatively (a) a special type of connected graph patterns, where variables occur in the
positions of concrete instances and literals or (b) as specific sets of SHACL shapes.
Graphlets can serve as a structuring element between entity/resource descriptions
and whole ontologies/KGs  KG management (e.g. reasoning, querying,
completion etc.) can be adapted to KG graphlet handling

Graphlet Example „Scholarly Contribution“

Graphlet Example „Secutiry Advise“

Factual
Base entities Real world
Granularity Atomic Entities
Evolution
Addition/deletion
of facts
Collaboration Fact enrichment
From Factual Knowledge Graphs
Today

Factual Cognitive
Base entities Real world Conceptual
Granularity Atomic Entities
Interlinked descriptions (molecules)
with annotations (provenance)
Evolution
Addition/deletion
of facts
Concept drift,
varying aggregation levels
Collaboration Fact enrichment Emergent semantics
From Factual to Cognitive Knowledge Graphs
Today Needed for SKG

Organizing Scholarly
Communication with
Knowledge Graphs

How did information flows change
in the digital era?

How does it work today?
The World of Publishing &
Communication has profundely changed
• New means adapted to the new possibilities were
developed, e.g. „zooming“, dynamics
• Business models changed completely
• More focus on data, interlinking of data / services and
search in the data
• Integration, crowdsourcing, data curation play an
important role

What about
Scholarly
Communication?

Scholarly Communication has not changed
(much)
17th century 19th century 20th century 21th century

Challenges we are facing:
We need to rethink the way how research
is represented and communicated
[1] http://thecostofknowledge.com, https://www.projekt-deal.de
[2] M. Baker: 1,500 scientists lift the lid on reproducibility, Nature, 2016.
[3] Science and Engineering Publication Output Trends, National Science Foundation, 2018.
[4] J. Couzin-Frankel: Secretive and Subjective, Peer Review Proves Resistant to Study. Science, 2013.
Digitalisation
of Science
 Data integration
and analysis
 Digital
collaboration
Monopolisation by
commercial actors
 Publisher
look-in effects
 Maximization
of profits [1]
Reproducibility
Crisis
 Majority of
experiments are
hard or not
reproducible [2]
Proliferation
of publications
 Publication output
doubled within a
decade
 continues to rise
[3]
Deficiency
of Peer Review
 Deteriorating
quality [4]
 Predatory
publishing

Lack of…
Root Cause –
Deficiency of Scholarly Communication?
Transparency
information is hidden
in text
Integratability
fitting different
research results
together
Machine assistance
unstructured content
is hard to process
Identifyability
of concepts beyond
metadata
Collaboration
one brain barrier
Overview
Scientists look for the
needle in the haystack

How good is CRISPR
(wrt. precision, safety, cost)?
What specifics has genome
editing with insects?
Who has applied it to
butterflies?
Search for CRISPR:
> 238.000 Results
Source: https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=CRISPR&btnG=, 04.2019

Mathematics
• Definitions
• Theorems
• Proofs
• Methods
• …
Physics
• Experiments
• Data
• Models
• …
Chemistry
• Substances
• Structures
• Reactions
• …
Computer
Science
• Concepts
• Implemen-
tations
• Evaluations
• …
Technology
• Standards
• Processes
• Elements
• Units,
Sensor data
Architecture
• Regulations
• Elements
• Models
• …
Concepts
Overarching Concepts
 Research problems
 Definitions
 Research approaches
 Methods
Artefacts
 Publications
 Data
 Software
 Image/Audio/Video
 Knowledge Graphs / Ontologies
Domain specific Concepts

Chemistry Example: CRISPR Genome Editing
Source: https://cacm.acm.org/system/assets/0002/2618/021116_Google_KnowledgeGraph.large.jpg?1476779500&1455222197

1. Original Publication
Chemistry Example: Populating the Graph
2. Adaptive Graph Curation & Completion
Author Robert Reed
Research Problem Genome editing in Lepidoptera
Methods CRISPR / cas9
Applied on Lepidoptera
Experimental Data
https://doi.org/10.5281/zenodo.89691
6
3. Graph representation
CRISPR / cas9 editing
in Lepidoptera
https://doi.org/10.1101/130344
Robert Reed
https://orcid.org/0000-0002-6065-6728
Genome editing in
Lepidoptera
Experimental Data
https://doi.org/10.5281/zenodo.896916
adresses
CRSPRS/cas9
isEvaluatedWith
Genome editing
https://www.wikidata.org/wiki/Q24630389

Research Challenge:
• Intuitive exploration leveraging the
rich semantic representations
• Answer natural language questions
Exploration and Question Answering
Questi
on
parsin
g Named
Entity
Recogniti
on (NER)
& Linking
(NEL)
Relatio
n
extracti
on
Query
con-
structi
on
Query
executi
on
Result
renderi
ng
Q: How do different
genome editing techniques
compare?
SELECT Approach, Feature WHERE {
Approach adresses GenomEditing .
Approach hasFeature Feature }
[1] K. Singh, S. Auer et al: Why Reinvent
the Wheel? Let's Build Question
Answering Systems Together. The Web
Conference (WWW 2018).
Q: How do different
genome editing techniques
compare?

Engineered Nucleases Site-specificity Safety Ease-of-use / costs/ speed
zinc finger nucleases (ZFN) ++
9-18nt
+ --
$$$: screening, testing to define efficiency
transcription activator-like
effector nucleases (TALENs)
+++
9-16nt
++ ++
Easy to engineer
1 week / few hundred dollar
engineered meganucleases +++
12-40 nt
0 --
$$$ Protein engineering, high-throughput
screening
CRISPR system/cas9 ++
5-12 nt
- +++
Easy to engineer
few days / less 200 dollar
Result:
Automatic Generation of Comparisons / Surveys
Q: How do different genome editing techniques
compare?

The Open Research
Knowledge Graph

Establish true Human-
Machine Collaboration

To create a scholarly knowledge graph, a transformation from unstructured
to structured knowledge should happen
ORKG | Knowledge transformation
Unstructured knowledge Structured knowledge
Can we use Natural Language Processing (NLP) for
the transformation process?

● NLP techniques are not sufficiently accurate to perform this task
autonomously
● But we can intertwine machine intelligence with human intelligence
to get a synergy → the best of both worlds!
ORKG | Knowledge transformation
Can we use Natural Language Processing (NLP) for
the transformation process?
74% 84% 78%
x x = 48% Error propagation

Manual data entry
Gradations of automation
Human-in-the-loop
Machine-in-the-loop Fully automated
Human adds
paper manually
Human is assisted
by a machine
Assistance Assistance
Machine is assisted
by a human
Machine adds paper
automatically
Better scalable

Manual data entry
Human-in-the-loop
Machine-in-the-loop Fully automated
Human adds
paper manually
Human is assisted
by a machine
Machine is assisted
by a human
Machine adds paper
automatically
Better scalable
Human-in-the-loop
Machine-in-the-loop
Human is assisted
by a machine
Machine is assisted
by a human

Human-in-the-loop
Machine-in-the-loop Human-in-the-loop
Machine-in-the-loop
1. Add paper wizard
2. Paper
annotator
3. TinyGenius
Main entry point of adding new
papers to the ORKG
Annotation of key sentences in
scholarly PDF articles
Microtasks to validate NLP
generated statements

Machine-in-the-loop | Add paper wizard | Step 1
● Collect metadata of
paper
● Fetched
automatically if a
DOI is available
● Manual entry
possible

● Selection of a
research field
● Shows the ORKG
research field
taxonomy

The third step is the
description of
contribution data
Machine-in-the-
loop

Add paper wizard - Step 3
● The third step is the
description of
contribution data
● This includes the
possibility to
annotate the
abstract
● The user is in charge and
make the final decision on
whether the automatically
generated data is added on not
(i.e., machine-in-the-loop)
● Annotations can be added or
removed
● A confidence slider hides
suggestions with a low score

Machine-in-the-loop | Add paper wizard
Try it yourself!
https://www.orkg.org/orkg/add-paper

Machine-in-the-loop | Paper annotator
● Goal: annotate key sentences
in scholarly articles with
discourse classes
● Two machine-in-the-loop
approaches: sentence
highlighting and class
recommendations

Sentence highlighting
● Highlights potentially interesting sentences within
the article
● Can be ignored by users

Class recommendations
Recommends potentially relevant classes based on the
selected sentence, called “Smart suggestions”

Machine-in-the-loop | Add paper wizard
Try it yourself!
https://www.orkg.org/orkg/pdf-text-annotation

● The human takes the lead, machine assists where possible
● The user interface integration plays a key role
● Machine provides non-intrusive suggestions, wrong suggestions can
easily be ignored
● Indicate to users that suggestions are based on AI (for example by
using a dedicated color schema)
Machine-in-the-loop takeaways

● Leverage existing NLP tools to process large quantities of scholarly
data
● Ask any user/visitor to validate the statements using simple tasks (aka
microtasks)
● Users that are normally “content consumers” can become
“content creators” as microtasks lower the entrance barrier to
contribute significantly
Human-in-the-loop | TinyGenius

● Use question templates to ask relevant questions for a variety of NLP
tasks
Summarization (Hugging face)
Entity Linking (Ambiverse NLU)
Open Information Extraction (ORKG abstract annotator & ORKG title parser)
Topic Modeling (CSO Classifier)
Human-in-the-loop | TinyGenius | NLP tasks

Show only validated statements by default
Human-in-the-loop | TinyGenius | Prototype

1. Neuro Symbolic AI – combination of knowledge graphs and machine learning
2. Extend the concept of KGs (e.g. with graphlets)
3. Integration of Human and Machine Intelligence (e.g. with crowdsourcing)
The grand KG challenges

The Team
Prof. (Univ. S. Bolivar)
Dr. Maria Esther Vidal
Software Development
Dr. Kemele Endris
Collaborators TIB Scientific Data Mgmt.
Group Leaders PostDocs
Project Management
Doctoral Researchers
Dr. Markus Stocker Dr. Gábor Kismihók Dr. Javad Chamanara Dr. Jennifer D’Souza
Allard Oelen Yaser Jaradeh Manuel Prinz
Alex Garatzogianni
Collaborators InfAI Leipzig / AKSW
Dr. Michael Martin Natanael Arndt
Dr. Lars Vogt
Vitalis Wiens Kheir Eddine Farfar
Muhammad Haris
Administration
Katja Bartel Simone Matern

https://de.linkedin.com/in/soerenauer
https://twitter.com/soerenauer
https://www.xing.com/profile/Soeren_Auer
http://www.researchgate.net/profile/Soeren_Auer
TIB & Leibniz University of Hannover
auer@tib.eu
Prof. Dr. Sören Auer

Knowledge Graph Research and Innovation Challenges

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Knowledge Graph Research and Innovation Challenges

Ähnlich wie Knowledge Graph Research and Innovation Challenges (20)

Mehr von Sören Auer

Mehr von Sören Auer (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Knowledge Graph Research and Innovation Challenges