Gives an overview on some challenges regarding the combination of machine-learning and knowledge graph technologies and the vision of devising a concept of Cognitive Knowledge Graphs consisting of graphlets instead of mere entity descriptions.
4. Page 4
Comparison of various enterprise data
integration paradigms
Paradigm Data
Model
Integr.
Strategy
Conceptual/
operational
Hetero-
geneous
data
Intern./
extern.
data
No. of
sources
Type of
integr.
Domain
coverage
Se-
mantic
repres.
XML
Schema
DOM trees LaV operational medium both medium high
Data
Warehouse
relational GaV operational - partially medium physical small medium
Data Lake various LaV operational large physical high medium
MDM UML GaV conceptual - - small physical small medium
PIM / PCS trees GaV operational partially partially - physical medium medium
Enterprise
search
document - operational partially large virtual high low
EKG RDF LaV both medium both high very high
[1] M. Galkin, S. Auer, M.-E. Vidal, S. Scerri: Enterprise Knowledge Graphs: A Semantic Approach for Knowledge
Management in the Next Generation of Enterprise Information Systems. ICEIS (2) 2017: 88-98
KGs are pretty much
established for Data
Integration, but what
about real Knowledge?
5. Page 5
1. Integrate KGs with ML - Neuro-symbolic AI
2. Extend the concept of KGs
3. Establish true Human-Machine Collaboration
From KGs for Data Integration to KGs for
Knowledge Integration
7. Page 7
How can we combine ML and KG?
ML reseracher: We can learn on graphs (GNN)
KG researcher: We can use ML for KG completion (KG embedding)
8. Page 8
Towards Neuro-Symbolic Perception
Input Output
Horse
Tail
4
hasLegs
has
Pony small
size
subClassOf
Zebra Stripes
has
subClassOf
9. Page 9
What do we need?
1. Use KGs as contextual/background knowledge for ML in addition to
raw data Causal reasoning
2. Use ML to extend and revise KGs
3. Integrate human and machine intelligence
10. Page 10
Synergistic Combination of Human & Machine
Intelligence leveraging Knowledge Graphs
Machine Intelligence
Cognitive
Knowledge Graph Human Intelligence
Concept
KG nodes/graphlets
Connecting KG graphlets
with ML models
KG graphlet authoring,
curation, validation
12. Page 12
KGs are proven to capture factual knowledge
Research Challenge: Manage
• Uncertainty & disagreement
• Varying semantic granularity
• Emergence, evolution & provenance
• Integrating existing domain models
But maintain flexibility and simplicity
Cognitive Knowledge Graphs
for scholarly knowledge
Towards Cognitive
Knowledge Graphs
• Fabric of knowledge molecules (graphlets) –
compact, relatively simple, structured units of knowledge
• Can be incrementally enriched, annotated, interlinked …
13. Page 13
KG Graphlets initial working definition
Formally a CKG graphlet is a tuple of sets of classes and properties (C,P), where
1. ∀ p ∈ P the domain (either explicitly defined or implicitly inferred from a concrete
CKG) includes at least one of the types c ∈ C: domain(p) ⊂ C and
2. all classes in C are connected via a property chain in P: ∀c1, c2 ∈ C ∃p1, ..., pj, ..., pn
∈ P: domain(p1) = c1, range(pj) = domain(pj+1), range(pn) = c2.
Alternatively (a) a special type of connected graph patterns, where variables occur in the
positions of concrete instances and literals or (b) as specific sets of SHACL shapes.
Graphlets can serve as a structuring element between entity/resource descriptions
and whole ontologies/KGs KG management (e.g. reasoning, querying,
completion etc.) can be adapted to KG graphlet handling
16. Page 16
Factual
Base entities Real world
Granularity Atomic Entities
Evolution
Addition/deletion
of facts
Collaboration Fact enrichment
From Factual Knowledge Graphs
Today
17. Page 17
Factual Cognitive
Base entities Real world Conceptual
Granularity Atomic Entities
Interlinked descriptions (molecules)
with annotations (provenance)
Evolution
Addition/deletion
of facts
Concept drift,
varying aggregation levels
Collaboration Fact enrichment Emergent semantics
From Factual to Cognitive Knowledge Graphs
Today Needed for SKG
19. Page 19
How did information flows change
in the digital era?
20. Page 20
How does it work today?
The World of Publishing &
Communication has profundely changed
• New means adapted to the new possibilities were
developed, e.g. „zooming“, dynamics
• Business models changed completely
• More focus on data, interlinking of data / services and
search in the data
• Integration, crowdsourcing, data curation play an
important role
23. Page 23
Challenges we are facing:
We need to rethink the way how research
is represented and communicated
[1] http://thecostofknowledge.com, https://www.projekt-deal.de
[2] M. Baker: 1,500 scientists lift the lid on reproducibility, Nature, 2016.
[3] Science and Engineering Publication Output Trends, National Science Foundation, 2018.
[4] J. Couzin-Frankel: Secretive and Subjective, Peer Review Proves Resistant to Study. Science, 2013.
Digitalisation
of Science
Data integration
and analysis
Digital
collaboration
Monopolisation by
commercial actors
Publisher
look-in effects
Maximization
of profits [1]
Reproducibility
Crisis
Majority of
experiments are
hard or not
reproducible [2]
Proliferation
of publications
Publication output
doubled within a
decade
continues to rise
[3]
Deficiency
of Peer Review
Deteriorating
quality [4]
Predatory
publishing
24. Page 24
Lack of…
Root Cause –
Deficiency of Scholarly Communication?
Transparency
information is hidden
in text
Integratability
fitting different
research results
together
Machine assistance
unstructured content
is hard to process
Identifyability
of concepts beyond
metadata
Collaboration
one brain barrier
Overview
Scientists look for the
needle in the haystack
25. Page 25
How good is CRISPR
(wrt. precision, safety, cost)?
What specifics has genome
editing with insects?
Who has applied it to
butterflies?
Search for CRISPR:
> 238.000 Results
Source: https://scholar.google.de/scholar?hl=de&as_sdt=0%2C5&q=CRISPR&btnG=, 04.2019
29. Page 29
1. Original Publication
Chemistry Example: Populating the Graph
2. Adaptive Graph Curation & Completion
Author Robert Reed
Research Problem Genome editing in Lepidoptera
Methods CRISPR / cas9
Applied on Lepidoptera
Experimental Data
https://doi.org/10.5281/zenodo.89691
6
3. Graph representation
CRISPR / cas9 editing
in Lepidoptera
https://doi.org/10.1101/130344
Robert Reed
https://orcid.org/0000-0002-6065-6728
Genome editing in
Lepidoptera
Experimental Data
https://doi.org/10.5281/zenodo.896916
adresses
CRSPRS/cas9
isEvaluatedWith
Genome editing
https://www.wikidata.org/wiki/Q24630389
30. Page 30
Research Challenge:
• Intuitive exploration leveraging the
rich semantic representations
• Answer natural language questions
Exploration and Question Answering
Questi
on
parsin
g Named
Entity
Recogniti
on (NER)
& Linking
(NEL)
Relatio
n
extracti
on
Query
con-
structi
on
Query
executi
on
Result
renderi
ng
Q: How do different
genome editing techniques
compare?
SELECT Approach, Feature WHERE {
Approach adresses GenomEditing .
Approach hasFeature Feature }
[1] K. Singh, S. Auer et al: Why Reinvent
the Wheel? Let's Build Question
Answering Systems Together. The Web
Conference (WWW 2018).
Q: How do different
genome editing techniques
compare?
31. Page 31
Engineered Nucleases Site-specificity Safety Ease-of-use / costs/ speed
zinc finger nucleases (ZFN) ++
9-18nt
+ --
$$$: screening, testing to define efficiency
transcription activator-like
effector nucleases (TALENs)
+++
9-16nt
++ ++
Easy to engineer
1 week / few hundred dollar
engineered meganucleases +++
12-40 nt
0 --
$$$ Protein engineering, high-throughput
screening
CRISPR system/cas9 ++
5-12 nt
- +++
Easy to engineer
few days / less 200 dollar
Result:
Automatic Generation of Comparisons / Surveys
Q: How do different genome editing techniques
compare?
40. To create a scholarly knowledge graph, a transformation from unstructured
to structured knowledge should happen
ORKG | Knowledge transformation
Unstructured knowledge Structured knowledge
Can we use Natural Language Processing (NLP) for
the transformation process?
41. ● NLP techniques are not sufficiently accurate to perform this task
autonomously
● But we can intertwine machine intelligence with human intelligence
to get a synergy → the best of both worlds!
ORKG | Knowledge transformation
Can we use Natural Language Processing (NLP) for
the transformation process?
74% 84% 78%
x x = 48% Error propagation
42. Manual data entry
Gradations of automation
Human-in-the-loop
Machine-in-the-loop Fully automated
Human adds
paper manually
Human is assisted
by a machine
Assistance Assistance
Machine is assisted
by a human
Machine adds paper
automatically
Better scalable
43. Manual data entry
Gradations of automation
Human-in-the-loop
Machine-in-the-loop Fully automated
Human adds
paper manually
Human is assisted
by a machine
Assistance Assistance
Machine is assisted
by a human
Machine adds paper
automatically
Better scalable
Human-in-the-loop
Machine-in-the-loop
Human is assisted
by a machine
Assistance Assistance
Machine is assisted
by a human
44. Gradations of automation
Human-in-the-loop
Machine-in-the-loop Human-in-the-loop
Machine-in-the-loop
1. Add paper wizard
2. Paper
annotator
3. TinyGenius
Main entry point of adding new
papers to the ORKG
Annotation of key sentences in
scholarly PDF articles
Microtasks to validate NLP
generated statements
45. Gradations of automation
Human-in-the-loop
Machine-in-the-loop Human-in-the-loop
Machine-in-the-loop
1. Add paper wizard
2. Paper
annotator
3. TinyGenius
Main entry point of adding new
papers to the ORKG
Annotation of key sentences in
scholarly PDF articles
Microtasks to validate NLP
generated statements
46. Machine-in-the-loop | Add paper wizard | Step 1
● Collect metadata of
paper
● Fetched
automatically if a
DOI is available
● Manual entry
possible
47. Machine-in-the-loop | Add paper wizard | Step 2
● Selection of a
research field
● Shows the ORKG
research field
taxonomy
48. Machine-in-the-loop | Add paper wizard | Step 3
The third step is the
description of
contribution data
Machine-in-the-
loop
49. Add paper wizard - Step 3
● The third step is the
description of
contribution data
● This includes the
possibility to
annotate the
abstract
● The user is in charge and
make the final decision on
whether the automatically
generated data is added on not
(i.e., machine-in-the-loop)
● Annotations can be added or
removed
● A confidence slider hides
suggestions with a low score
51. Gradations of automation
Human-in-the-loop
Machine-in-the-loop Human-in-the-loop
Machine-in-the-loop
1. Add paper wizard
2. Paper
annotator
3. TinyGenius
Main entry point of adding new
papers to the ORKG
Annotation of key sentences in
scholarly PDF articles
Microtasks to validate NLP
generated statements
52. Machine-in-the-loop | Paper annotator
● Goal: annotate key sentences
in scholarly articles with
discourse classes
● Two machine-in-the-loop
approaches: sentence
highlighting and class
recommendations
56. Machine-in-the-loop | Add paper wizard
Try it yourself!
https://www.orkg.org/orkg/pdf-text-annotation
57. ● The human takes the lead, machine assists where possible
● The user interface integration plays a key role
● Machine provides non-intrusive suggestions, wrong suggestions can
easily be ignored
● Indicate to users that suggestions are based on AI (for example by
using a dedicated color schema)
Machine-in-the-loop takeaways
58. Gradations of automation
Human-in-the-loop
Machine-in-the-loop Human-in-the-loop
Machine-in-the-loop
1. Add paper wizard
2. Paper
annotator
3. TinyGenius
Main entry point of adding new
papers to the ORKG
Annotation of key sentences in
scholarly PDF articles
Microtasks to validate NLP
generated statements
59. ● Leverage existing NLP tools to process large quantities of scholarly
data
● Ask any user/visitor to validate the statements using simple tasks (aka
microtasks)
● Users that are normally “content consumers” can become
“content creators” as microtasks lower the entrance barrier to
contribute significantly
Human-in-the-loop | TinyGenius
60. ● Use question templates to ask relevant questions for a variety of NLP
tasks
Summarization (Hugging face)
Entity Linking (Ambiverse NLU)
Open Information Extraction (ORKG abstract annotator & ORKG title parser)
Topic Modeling (CSO Classifier)
Human-in-the-loop | TinyGenius | NLP tasks
61. Show only validated statements by default
Human-in-the-loop | TinyGenius | Prototype
63. Page 63
1. Neuro Symbolic AI – combination of knowledge graphs and machine learning
2. Extend the concept of KGs (e.g. with graphlets)
3. Integration of Human and Machine Intelligence (e.g. with crowdsourcing)
The grand KG challenges
64. Page 64
The Team
Prof. (Univ. S. Bolivar)
Dr. Maria Esther Vidal
Software Development
Dr. Kemele Endris
Collaborators TIB Scientific Data Mgmt.
Group Leaders PostDocs
Project Management
Doctoral Researchers
Dr. Markus Stocker Dr. Gábor Kismihók Dr. Javad Chamanara Dr. Jennifer D’Souza
Allard Oelen Yaser Jaradeh Manuel Prinz
Alex Garatzogianni
Collaborators InfAI Leipzig / AKSW
Dr. Michael Martin Natanael Arndt
Dr. Lars Vogt
Vitalis Wiens Kheir Eddine Farfar
Muhammad Haris
Administration
Katja Bartel Simone Matern