SlideShare ist ein Scribd-Unternehmen logo
1 von 52
Downloaden Sie, um offline zu lesen
8/11/21 Heiko Paulheim 1
Using Knowledge Graphs in Data Science –
From Symbolic to Latent Representations
and a few Steps Back
Heiko Paulheim
University of Mannheim
Heiko Paulheim
8/11/21 Heiko Paulheim 2
Brief Introduction
2006 2008 2011 2013 2014 2017
Pre PhD Years PhD Years PostDoc Years Assistant Prof. Full Prof.
SDType
rdf2vec
ReNewRS
Kare§KoKI
MELT
8/11/21 Heiko Paulheim 3
Knowledge Graphs: At a Glance
• Graph shaped knowledge representation
– nodes: entities
– edges: relations
University of Mannheim
Mannheim
Baden-
Württemberg
Germany
Heiko Paulheim
DWS Group
employer
a
f
f
il
i
a
t
io
n
part of
residence
s
t
a
t
e
part of
8/11/21 Heiko Paulheim 4
Knowledge Graphs in Organizations
• Knowledge Graphs are used…
• …in companies and
organizations
– collect, organize,
and integrate knowledge
– link isolated
information sources
– make information
searchable and findable
Masuch et al., 2016
8/11/21 Heiko Paulheim 5
Public Knowledge Graphs
• Knowledge Graphs are used…
• …as (free), public resources
– collect common knowledge
– general purpose, not task specific
– make it easy to build knowledge-intensive applications
8/11/21 Heiko Paulheim 6
Usage of Public Knowledge Graphs
OK, Google, when will the final
season of Money Heist be on Netflix?
The fifth season of Money Heist
will be released on September 3rd
.
8/11/21 Heiko Paulheim 7
Usage of Public Knowledge Graphs
2021-09-03
2020-04-03
release date
release date
has part
h
a
s
p
a
r
t
OK, Google, when will the final season
Money Heist be on Netflix?
.
.
.
8/11/21 Heiko Paulheim 8
Usage of Public Knowledge Graphs
2021-09-03
2020-04-03
release date
release date
creator
has part
h
a
s
p
a
r
t
cast
c
a
s
t
creator
c
a
s
t
Are there any other series
by the same creator?
creator
cast
cast .
.
.
.
.
.
8/11/21 Heiko Paulheim 9
History: CyC
• The beginning
– Encyclopedic collection of knowledge
– Started by Douglas Lenat in 1984
– Estimation: 350 person years and 250,000 rules
should do the job
of collecting the essence of the world’s knowledge
• The present (as of June 2017)
– ~1,000 person years, $120M total development cost
– 21M axioms and rules
– Declared “ready to use” in 2017
8/11/21 Heiko Paulheim 10
History: Freebase
• The 2000s
– Freebase: collaborative editing
– Schema not fixed
• Present
– Acquired by Google in 2010
– Powered first version
of Google’s Knowledge Graph
– Shut down in 2016
– Partly lives on in Wikidata (see in a minute)
8/11/21 Heiko Paulheim 11
History: Wikidata
• The 2010s
– Wikidata: launched 2012
– Goal: centralize data from Wikipedia languages
– Collaborative
– Imports other datasets
• Present
– One of the largest
public knowledge graphs
– Includes rich provenance
8/11/21 Heiko Paulheim 12
History: DBpedia & co.
• The 2010s
– DBpedia: launched 2007
– YAGO: launched 2008
– Extraction from Wikipedia
using mappings & heuristics
• Present
– Two of the most used knowledge graphs
– ...with Wikidata catching up
8/11/21 Heiko Paulheim 13
History: NELL
• The 2010s
– NELL: Never ending language learner
– Input: ontology, seed examples, text corpus
– Output: facts, text patterns
– Large degree of automation,
occasional human feedback
• Until 2018
– Continuously ran for ~8 years
– New release every few days
http://rtw.ml.cmu.edu/rtw/overview
8/11/21 Heiko Paulheim 14
Knowledge Graph Creation
• Sources for generating knowledge graphs:
– Manual (also: crowd sourcing) curation
• Cyc, Freebase, Wikidata, ...
– (Semi-)structured knowledge (Wikis, databases, …)
• DBpedia, YAGO, BabelNet, ...
– Unstructured text or web page collections
• NELL, DeepDive, ReVerb, …
8/11/21 Heiko Paulheim 15
Knowledge Graph Creation – Ongoing Projects
• WebIsA & WebIsALOD
– 400M hypernyms extracted from a Web Crawl
Seitner et al. (2016): A Large DataBase of Hypernymy Relations Extracted from the Web
8/11/21 Heiko Paulheim 16
Knowledge Graph Creation – Ongoing Projects
• DBkWik
– Harvesting data from 400k Wikis
Paulheim & Hertling (2018): DBkWik: A consolidated knowledge graph from thousands of Wikis
8/11/21 Heiko Paulheim 17
Knowledge Graph Creation – Ongoing Projects
• CaLiGraph
– Learning analogies, e.g., from lists
Heist (2018): Towards Knowledge Graph Construction from Entity Co-occurrence
8/11/21 Heiko Paulheim 18
Use Cases for Knowledge Graphs
• Background Knowledge
– e.g., company data (address, CEO, branch, …)
→ SAP CRM (BSc thesis 2019)
– e.g., geographic regions (demographics)
→ for example, sales data prediction
– data interpretation (e.g., Excel tables, business models)
→ PhD thesis under supervision
• Data Integration
– unified view of different data sources
– relating business entities in different systems
– cross-source data visualization and analytics
8/11/21 Heiko Paulheim 19
Knowledge Graphs in Data Science
• Typical cases:
– predictive modeling, information retrieval, recommendation, …
• For all of those, there’s sophisticated implementations
– but...
?
8/11/21 Heiko Paulheim 20
Wanted: A Bridge between Both Worlds
8/11/21 Heiko Paulheim 21
Wanted: A Bridge between Both Worlds
• Data Science tools for prediction etc.
– Python, Weka, R, RapidMiner, …
– Algorithms that work on vectors, not graphs
• Bridges built over the past years:
– FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015),
Python KG Extension (2021)
?
8/11/21 Heiko Paulheim 22
Wanted: A Bridge between Both Worlds
• Transformation strategies (aka propositionalization)
– e.g., types: type_horror_movie=true
– e.g., data values: year=2011
– e.g., aggregates: nominations=7
?
8/11/21 Heiko Paulheim 23
Wanted: A Bridge between Both Worlds
• Observations with simple propositionalization strategies
– Even simple features (e.g., add all numbers and types)
can help on many problems
– More sophisticated features often bring additional improvements
• Combinations of relations and individuals
– e.g., movies directed by Steven Spielberg
• Combinations of relations and types
– e.g., movies directed by Oscar-winning directors
• …
– But
• The search space is enormous!
• Generate first, filter later does not scale well
8/11/21 Heiko Paulheim 24
Wanted: A Bridge between Both Worlds
• Excursion: word embeddings
– word2vec proposed by Mikolov et al. (2013)
– predict a word from its context or vice versa
• Idea: similar words appear in similar contexts, like
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
– usually trained on large text corpora
• projection layer: embedding vectors
8/11/21 Heiko Paulheim 25
From Word Embeddings to Graph Embeddings
• Basic idea:
– extract random walks from an RDF graph:
Mulholland Dr. David Lynch US
– feed walks into word2vec algorithm
• Order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
• Result:
– node embeddings
– most often outperform other propositionalization techniques
director nationality
8/11/21 Heiko Paulheim 26
A First Glance at RDF2vec Embeddings
• Observation: close projection of similar entities
8/11/21 Heiko Paulheim 27
Random vs. non-random
• Maybe random walks are not such a good idea
– They may give too much weight on less-known entities and facts
• Strategies:
– Prefer edges with more frequent predicates
– Prefer nodes with higher indegree
– Prefer nodes with higher PageRank
– …
– They may cover less-known entities and facts too little
• Strategies:
– The opposite of all of the above strategies
• External signals (e.g., human notions of importance)
– generally work better than graph-internal signals
Cochez et al. (2017): Biased Graph Walks for RDF Graph Embeddings
Al Taweel and Paulheim (2020): Towards Exploiting Implicit Human Feedback for Improving RDF2vec
Embeddings
8/11/21 Heiko Paulheim 28
Local Embeddings
• Recap: order of magnitude (e.g., DBpedia)
– ~6M entities (“words”)
– start up to 500 random walks per entity, length up to 8
→ corpus of >20B tokens
– “Train once, reuse often”
• In some cases, only a small subset (of 6M) is of interest
– RDF2vec light: “train when needed”
– Runtime: minutes instead of days
Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge
Graph Embeddings
8/11/21 Heiko Paulheim 29
RDF2vec: Example Applications
• Data Model Matching with WebIsA and RDF2vec
Portisch et al. (2019): Evaluating ontology matchers on real-world financial services
data models.
8/11/21 Heiko Paulheim 30
RDF2vec: Example Applications
• Entity disambiguation: linking texts to a knowledge graph
Türker et al. (2019): Knowledge-Based Short Text Categorization
Using Entity and Category Embedding
8/11/21 Heiko Paulheim 31
RDF2vec: Example Applications
• Finding related research papers on CoViD-19
Steenwinckel et al. (2020): Facilitating COVID-19 Meta-analysis Through a Literature
Knowledge Graph
8/11/21 Heiko Paulheim 32
RDF2vec: Example Applications
• Table search by keyword
Zhang and Balog (2018): Ad Hoc Table Retrieval using Semantic Similarity.
8/11/21 Heiko Paulheim 33
RDF2vec: Example Applications
• Predicting biological interactions
Sousa et al. (2021): Supervised Semantic Similarity.
8/11/21 Heiko Paulheim 34
RDF2vec: Example Applications
• Zero-Shot Image Classification
Tristan Hascoet et al. (2017): Semantic Web and Zero-Shot Learning of Large Scale
Visual Classes.
8/11/21 Heiko Paulheim 35
Embeddings for Link Prediction
• RDF2vec example
– similar instances form clusters, direction of relation is ~stable
– link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing)
Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
8/11/21 Heiko Paulheim 36
Embeddings for Link Prediction
• In RDF2vec, relation preservation is a by-product
• TransE (and its descendants): direct modeling
– Formulates RDF embedding as an optimization problem
– Find mapping of entities and relations to Rn
so that
• across all triples <s,p,o>
Σ ||s+p-o|| is minimized
• try to obtain a smaller error
for existing triples
than for non-existing ones
Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013.
Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete
Repositories. WI 2016
8/11/21 Heiko Paulheim 37
Link Prediction vs. Node Embedding
• Hypothesis:
– Embeddings for link prediction also cluster similar entities
– Node embeddings can also be used for link prediction
Portisch et al. (under review): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph
Embedding for Link Prediction - Two Sides of the Same Coin?
8/11/21 Heiko Paulheim 38
Similarity vs. Relatedness
• Closest 10 entities to Angela Merkel in different vector spaces
Portisch et al. (under review): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph
Embedding for Link Prediction - Two Sides of the Same Coin?
8/11/21 Heiko Paulheim 39
Similarity vs. Relatedness
• (s-)RDF2vec allows an explicit trade off w/ different walk strategies
Mannheim
Baden-
Württemberg
Germany
Adler
Mannheim
SAP Arena
Reiss-
Engelhorn
-Museum
location
location
location
federal
state
country
location
city
stadium
Knowledge Graph
Walk Generation
Adler_Mannheim → city → Mannheim → country → Germany
Adler_Mannheim → stadium → SAP_Arena → location → Mannheim
SAP_Arena → location → Mannheim → country → Germany
...
“Classic” RDF2vec walks
city → Mannheim → country
stadium → SAP_Arena → location
location → Mannheim → country
...
s-RDF2vec walks
+
RDF2vec “union walks”
RDF2vec “classic”
RDF2vec “edge”
concatenated
vector
Global PCA
Test Cases
concatenated
vector
(task-specific
subset)
w
2
w
1
(weighted)
local PCA
Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity
Embeddings.
8/11/21 Heiko Paulheim 40
Similarity vs. Relatedness
• s-RDF2vec
– using different walk strategies
– combining different vector spaces (weighted combinations are possible)
• 10 closest neighbors to Mannheim:
Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity
Embeddings.
8/11/21 Heiko Paulheim 41
Similarity vs. Relatedness
• Recap word embeddings:
– Jobs, Wozniak, and Wayne founded Apple Computer Company in April
1976
– Google was officially founded as a company in January 2006
• Graph walks:
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
Germany
Angela_Merkel Hamburg
birthPlace
country
leader
Peter_Tschentscher
leader
residence
country
8/11/21 Heiko Paulheim 42
Similarity vs. Relatedness
• Surrounding entities indicate relatedness
– Hamburg → country → Germany → leader → Angela_Merkel
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
• Same entities in similar positions indicate similarity
– Germany → leader → Angela_Merkel → birthPlace → Hamburg
– Hamburg → leader → Peter_Tschentscher → residence → Hamburg
• Someone is a leader vs. something has a leader
• Solution approach: use embedding approach that respects positions
– CWINDOW / Structured Skip-ngram
Portisch and Paulheim (2021): Putting RDF2vec in Order.
8/11/21 Heiko Paulheim 43
Similarity vs. Relatedness
• Why bother?
– Use case: table interpretation (a special case of entity disambiguation)
related
similar
8/11/21 Heiko Paulheim 44
Back to Interpretability
• Hot topic: Explainable AI
– Knowledge Graphs are a favorable ingredient
– Human/machine interpretable knowledge → explainable systems
• However:
– Embeddings replace interpretable axioms
with numeric vectors over non-interpretable dimensions
– Where did the semantics go?
Paulheim (2018): Make Embeddings Semantic Again!
8/11/21 Heiko Paulheim 45
Towards Semantic Vector Space Embeddings
cartoon
superhero
Paulheim (2018): Make Embeddings Semantic Again!
8/11/21 Heiko Paulheim 46
Towards Semantic Vector Space Embeddings
cartoon
superhero
• Approach 1: learn interpretation function
• Each dimension of the embedding model
is a target for a separate learning problem
• Learn a function to explain the dimension
• E.g.:
• Just an approximation used for explanations and justifications
y≈−|∃character .Superhero|
8/11/21 Heiko Paulheim 47
Towards Semantic Vector Space Embeddings
cartoon
superhero
• Approach 2: learn inherently
interpretable embeddings
• Step 1: learn typical patterns
that exist in a knowledge graph
– e.g., graph pattern learning
– e.g., Horn clauses
• Step 2a: use those patterns
as embedding dimensions
– probably not low dimensional
• Step 2b: compact the space
– e.g., use dimensions for mutually exclusive patterns
8/11/21 Heiko Paulheim 48
Towards Semantic Vector Space Embeddings
• Different angle: learn interpretation for similarity function
~similar
type
~same
country
~connected
to same
entity
8/11/21 Heiko Paulheim 49
Summary
• Knowledge Graphs are a versatile ingredient for AI
– Integrated view on data
– Large-scale free source of background knowledge
• Knowledge Graph Embeddings
– Effective processing of large-scale knowledge sources
– Encoding of similarity and/or relatedness
• RDF2vec: explicit trade-off is possible!
– Additional insights that are not explicit in the graph
• aka latent semantics
8/11/21 Heiko Paulheim 50
More on RDF2vec
• Collection of
– Implementations
– Pre-trained models
– >40 use cases
in various domains
8/11/21 Heiko Paulheim 51
Thank you!
http://www.heikopaulheim.com
@heikopaulheim
8/11/21 Heiko Paulheim 52
Using Knowledge Graphs in Data Science –
From Symbolic to Latent Representations
and a few Steps Back
Heiko Paulheim
University of Mannheim
Heiko Paulheim

Weitere ähnliche Inhalte

Was ist angesagt?

Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphIoan Toma
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMAshish Chandra Jha
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Enterprise Knowledge
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019Randall Hunt
 
HIMSS Analytics Adoption Model for Analytics Maturity - March 2016
HIMSS Analytics Adoption Model for Analytics Maturity - March 2016HIMSS Analytics Adoption Model for Analytics Maturity - March 2016
HIMSS Analytics Adoption Model for Analytics Maturity - March 2016James E. Gaston, FHIMSS
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
 
Differential privacy and applications to location privacy
Differential privacy and applications to location privacyDifferential privacy and applications to location privacy
Differential privacy and applications to location privacyPôle Systematic Paris-Region
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsDenodo
 
Data Governance and Compliance in Clinical Research
Data Governance and Compliance in Clinical ResearchData Governance and Compliance in Clinical Research
Data Governance and Compliance in Clinical ResearchClinosolIndia
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...4Science
 
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachAutomated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachDatabricks
 
Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success DATAVERSITY
 
Data mining with differential privacy
Data mining with differential privacy Data mining with differential privacy
Data mining with differential privacy Wei-Yuan Chang
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Neo4j
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 

Was ist angesagt? (20)

Querying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge GraphQuerying the Wikidata Knowledge Graph
Querying the Wikidata Knowledge Graph
 
Data Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DMData Mining Technique - CRISP-DM
Data Mining Technique - CRISP-DM
 
Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020Introduction to Knowledge Graphs: Data Summit 2020
Introduction to Knowledge Graphs: Data Summit 2020
 
How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019How to Choose The Right Database on AWS - Berlin Summit - 2019
How to Choose The Right Database on AWS - Berlin Summit - 2019
 
HIMSS Analytics Adoption Model for Analytics Maturity - March 2016
HIMSS Analytics Adoption Model for Analytics Maturity - March 2016HIMSS Analytics Adoption Model for Analytics Maturity - March 2016
HIMSS Analytics Adoption Model for Analytics Maturity - March 2016
 
Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Differential privacy and applications to location privacy
Differential privacy and applications to location privacyDifferential privacy and applications to location privacy
Differential privacy and applications to location privacy
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Taxonomy made easy
Taxonomy made easyTaxonomy made easy
Taxonomy made easy
 
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern AnalyticsThe Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
The Role of the Logical Data Fabric in a Unified Platform for Modern Analytics
 
Data Governance and Compliance in Clinical Research
Data Governance and Compliance in Clinical ResearchData Governance and Compliance in Clinical Research
Data Governance and Compliance in Clinical Research
 
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
How to enhance your DSpace repository: use cases for DSpace-CRIS, DSpace-RDM,...
 
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven ApproachAutomated Metadata Management in Data Lake – A CI/CD Driven Approach
Automated Metadata Management in Data Lake – A CI/CD Driven Approach
 
Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success Real-World DG Webinar: A Data Governance Framework for Success
Real-World DG Webinar: A Data Governance Framework for Success
 
Introduction to GDPR
Introduction to GDPRIntroduction to GDPR
Introduction to GDPR
 
Data mining with differential privacy
Data mining with differential privacy Data mining with differential privacy
Data mining with differential privacy
 
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
Knowledge Graphs & Graph Data Science, More Context, Better Predictions - Neo...
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Data Quality
Data QualityData Quality
Data Quality
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 

Ähnlich wie Using Knowledge Graphs in Data Science - From Symbolic to Latent Representations (and a Few Steps Back)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesHelge Holzmann
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Georg Rehm
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationJohn Doove
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkSebastian Hellmann
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11Rafael Alvarado
 
20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research Foundation20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research FoundationMarc Smith
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesAlessandro Adamou
 
Semantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked DataSemantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked DataNick Bassiliades
 
SWSIG wlic2016
SWSIG wlic2016SWSIG wlic2016
SWSIG wlic2016Figoblog
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesDr.-Ing. Thomas Hartmann
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 

Ähnlich wie Using Knowledge Graphs in Data Science - From Symbolic to Latent Representations (and a Few Steps Back) (20)

New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Web Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web ArchivesWeb Data Engineering - A Technical Perspective on Web Archives
Web Data Engineering - A Technical Perspective on Web Archives
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
CNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundationCNI fall 2009 enhanced publications john_doove-SURFfoundation
CNI fall 2009 enhanced publications john_doove-SURFfoundation
 
Linguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future WorkLinguistic Linked Open Data, Challenges, Approaches, Future Work
Linguistic Linked Open Data, Challenges, Approaches, Future Work
 
MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11MDST 3703 F10 Seminar 11
MDST 3703 F10 Seminar 11
 
20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research Foundation20110830 Introducing the Social Media Research Foundation
20110830 Introducing the Social Media Research Foundation
 
FAIR data: LOUD for all audiences
FAIR data: LOUD for all audiencesFAIR data: LOUD for all audiences
FAIR data: LOUD for all audiences
 
Semantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked DataSemantic Technologies for the Web of Linked Data
Semantic Technologies for the Web of Linked Data
 
SWSIG wlic2016
SWSIG wlic2016SWSIG wlic2016
SWSIG wlic2016
 
IASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with TriplesIASSIST 2012 - DDI-RDF - Trouble with Triples
IASSIST 2012 - DDI-RDF - Trouble with Triples
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 

Mehr von Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopHeiko Paulheim
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionHeiko Paulheim
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesHeiko Paulheim
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataHeiko Paulheim
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge DiscoveryHeiko Paulheim
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Heiko Paulheim
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaHeiko Paulheim
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionHeiko Paulheim
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF DataHeiko Paulheim
 

Mehr von Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 
Gathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia EntitiesGathering Alternative Surface Forms for DBpedia Entities
Gathering Alternative Surface Forms for DBpedia Entities
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked Open Data enhanced Knowledge Discovery
Linked Open Data enhanced  Knowledge DiscoveryLinked Open Data enhanced  Knowledge Discovery
Linked Open Data enhanced Knowledge Discovery
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
Data Mining with Background Knowledge from the Web - Introducing the RapidMin...
 
Detecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpediaDetecting Incorrect Numerical Data in DBpedia
Detecting Incorrect Numerical Data in DBpedia
 
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier DetectionIdentifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
Identifying Wrong Links between Datasets by Multi-dimensional Outlier Detection
 
Type Inference on Noisy RDF Data
Type Inference on Noisy RDF DataType Inference on Noisy RDF Data
Type Inference on Noisy RDF Data
 

Kürzlich hochgeladen

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfkalichargn70th171
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdfSteve Caron
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapIshara Amarasekera
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...OnePlan Solutions
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxSasikiranMarri
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxRTS corp
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 

Kürzlich hochgeladen (20)

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdfPros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
Pros and Cons of Selenium In Automation Testing_ A Comprehensive Assessment.pdf
 
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
[ CNCF Q1 2024 ] Intro to Continuous Profiling and Grafana Pyroscope.pdf
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Key Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery RoadmapKey Steps in Agile Software Delivery Roadmap
Key Steps in Agile Software Delivery Roadmap
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
Tech Tuesday Slides - Introduction to Project Management with OnePlan's Work ...
 
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptxUnderstanding Plagiarism: Causes, Consequences and Prevention.pptx
Understanding Plagiarism: Causes, Consequences and Prevention.pptx
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Advantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptxAdvantages of Cargo Cloud Solutions.pptx
Advantages of Cargo Cloud Solutions.pptx
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 

Using Knowledge Graphs in Data Science - From Symbolic to Latent Representations (and a Few Steps Back)

  • 1. 8/11/21 Heiko Paulheim 1 Using Knowledge Graphs in Data Science – From Symbolic to Latent Representations and a few Steps Back Heiko Paulheim University of Mannheim Heiko Paulheim
  • 2. 8/11/21 Heiko Paulheim 2 Brief Introduction 2006 2008 2011 2013 2014 2017 Pre PhD Years PhD Years PostDoc Years Assistant Prof. Full Prof. SDType rdf2vec ReNewRS Kare§KoKI MELT
  • 3. 8/11/21 Heiko Paulheim 3 Knowledge Graphs: At a Glance • Graph shaped knowledge representation – nodes: entities – edges: relations University of Mannheim Mannheim Baden- Württemberg Germany Heiko Paulheim DWS Group employer a f f il i a t io n part of residence s t a t e part of
  • 4. 8/11/21 Heiko Paulheim 4 Knowledge Graphs in Organizations • Knowledge Graphs are used… • …in companies and organizations – collect, organize, and integrate knowledge – link isolated information sources – make information searchable and findable Masuch et al., 2016
  • 5. 8/11/21 Heiko Paulheim 5 Public Knowledge Graphs • Knowledge Graphs are used… • …as (free), public resources – collect common knowledge – general purpose, not task specific – make it easy to build knowledge-intensive applications
  • 6. 8/11/21 Heiko Paulheim 6 Usage of Public Knowledge Graphs OK, Google, when will the final season of Money Heist be on Netflix? The fifth season of Money Heist will be released on September 3rd .
  • 7. 8/11/21 Heiko Paulheim 7 Usage of Public Knowledge Graphs 2021-09-03 2020-04-03 release date release date has part h a s p a r t OK, Google, when will the final season Money Heist be on Netflix? . . .
  • 8. 8/11/21 Heiko Paulheim 8 Usage of Public Knowledge Graphs 2021-09-03 2020-04-03 release date release date creator has part h a s p a r t cast c a s t creator c a s t Are there any other series by the same creator? creator cast cast . . . . . .
  • 9. 8/11/21 Heiko Paulheim 9 History: CyC • The beginning – Encyclopedic collection of knowledge – Started by Douglas Lenat in 1984 – Estimation: 350 person years and 250,000 rules should do the job of collecting the essence of the world’s knowledge • The present (as of June 2017) – ~1,000 person years, $120M total development cost – 21M axioms and rules – Declared “ready to use” in 2017
  • 10. 8/11/21 Heiko Paulheim 10 History: Freebase • The 2000s – Freebase: collaborative editing – Schema not fixed • Present – Acquired by Google in 2010 – Powered first version of Google’s Knowledge Graph – Shut down in 2016 – Partly lives on in Wikidata (see in a minute)
  • 11. 8/11/21 Heiko Paulheim 11 History: Wikidata • The 2010s – Wikidata: launched 2012 – Goal: centralize data from Wikipedia languages – Collaborative – Imports other datasets • Present – One of the largest public knowledge graphs – Includes rich provenance
  • 12. 8/11/21 Heiko Paulheim 12 History: DBpedia & co. • The 2010s – DBpedia: launched 2007 – YAGO: launched 2008 – Extraction from Wikipedia using mappings & heuristics • Present – Two of the most used knowledge graphs – ...with Wikidata catching up
  • 13. 8/11/21 Heiko Paulheim 13 History: NELL • The 2010s – NELL: Never ending language learner – Input: ontology, seed examples, text corpus – Output: facts, text patterns – Large degree of automation, occasional human feedback • Until 2018 – Continuously ran for ~8 years – New release every few days http://rtw.ml.cmu.edu/rtw/overview
  • 14. 8/11/21 Heiko Paulheim 14 Knowledge Graph Creation • Sources for generating knowledge graphs: – Manual (also: crowd sourcing) curation • Cyc, Freebase, Wikidata, ... – (Semi-)structured knowledge (Wikis, databases, …) • DBpedia, YAGO, BabelNet, ... – Unstructured text or web page collections • NELL, DeepDive, ReVerb, …
  • 15. 8/11/21 Heiko Paulheim 15 Knowledge Graph Creation – Ongoing Projects • WebIsA & WebIsALOD – 400M hypernyms extracted from a Web Crawl Seitner et al. (2016): A Large DataBase of Hypernymy Relations Extracted from the Web
  • 16. 8/11/21 Heiko Paulheim 16 Knowledge Graph Creation – Ongoing Projects • DBkWik – Harvesting data from 400k Wikis Paulheim & Hertling (2018): DBkWik: A consolidated knowledge graph from thousands of Wikis
  • 17. 8/11/21 Heiko Paulheim 17 Knowledge Graph Creation – Ongoing Projects • CaLiGraph – Learning analogies, e.g., from lists Heist (2018): Towards Knowledge Graph Construction from Entity Co-occurrence
  • 18. 8/11/21 Heiko Paulheim 18 Use Cases for Knowledge Graphs • Background Knowledge – e.g., company data (address, CEO, branch, …) → SAP CRM (BSc thesis 2019) – e.g., geographic regions (demographics) → for example, sales data prediction – data interpretation (e.g., Excel tables, business models) → PhD thesis under supervision • Data Integration – unified view of different data sources – relating business entities in different systems – cross-source data visualization and analytics
  • 19. 8/11/21 Heiko Paulheim 19 Knowledge Graphs in Data Science • Typical cases: – predictive modeling, information retrieval, recommendation, … • For all of those, there’s sophisticated implementations – but... ?
  • 20. 8/11/21 Heiko Paulheim 20 Wanted: A Bridge between Both Worlds
  • 21. 8/11/21 Heiko Paulheim 21 Wanted: A Bridge between Both Worlds • Data Science tools for prediction etc. – Python, Weka, R, RapidMiner, … – Algorithms that work on vectors, not graphs • Bridges built over the past years: – FeGeLOD (Weka, 2012), RapidMiner LOD Extension (2015), Python KG Extension (2021) ?
  • 22. 8/11/21 Heiko Paulheim 22 Wanted: A Bridge between Both Worlds • Transformation strategies (aka propositionalization) – e.g., types: type_horror_movie=true – e.g., data values: year=2011 – e.g., aggregates: nominations=7 ?
  • 23. 8/11/21 Heiko Paulheim 23 Wanted: A Bridge between Both Worlds • Observations with simple propositionalization strategies – Even simple features (e.g., add all numbers and types) can help on many problems – More sophisticated features often bring additional improvements • Combinations of relations and individuals – e.g., movies directed by Steven Spielberg • Combinations of relations and types – e.g., movies directed by Oscar-winning directors • … – But • The search space is enormous! • Generate first, filter later does not scale well
  • 24. 8/11/21 Heiko Paulheim 24 Wanted: A Bridge between Both Worlds • Excursion: word embeddings – word2vec proposed by Mikolov et al. (2013) – predict a word from its context or vice versa • Idea: similar words appear in similar contexts, like – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 – usually trained on large text corpora • projection layer: embedding vectors
  • 25. 8/11/21 Heiko Paulheim 25 From Word Embeddings to Graph Embeddings • Basic idea: – extract random walks from an RDF graph: Mulholland Dr. David Lynch US – feed walks into word2vec algorithm • Order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens • Result: – node embeddings – most often outperform other propositionalization techniques director nationality
  • 26. 8/11/21 Heiko Paulheim 26 A First Glance at RDF2vec Embeddings • Observation: close projection of similar entities
  • 27. 8/11/21 Heiko Paulheim 27 Random vs. non-random • Maybe random walks are not such a good idea – They may give too much weight on less-known entities and facts • Strategies: – Prefer edges with more frequent predicates – Prefer nodes with higher indegree – Prefer nodes with higher PageRank – … – They may cover less-known entities and facts too little • Strategies: – The opposite of all of the above strategies • External signals (e.g., human notions of importance) – generally work better than graph-internal signals Cochez et al. (2017): Biased Graph Walks for RDF Graph Embeddings Al Taweel and Paulheim (2020): Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings
  • 28. 8/11/21 Heiko Paulheim 28 Local Embeddings • Recap: order of magnitude (e.g., DBpedia) – ~6M entities (“words”) – start up to 500 random walks per entity, length up to 8 → corpus of >20B tokens – “Train once, reuse often” • In some cases, only a small subset (of 6M) is of interest – RDF2vec light: “train when needed” – Runtime: minutes instead of days Portisch et al. (2020): RDF2Vec Light – A Lightweight Approach for Knowledge Graph Embeddings
  • 29. 8/11/21 Heiko Paulheim 29 RDF2vec: Example Applications • Data Model Matching with WebIsA and RDF2vec Portisch et al. (2019): Evaluating ontology matchers on real-world financial services data models.
  • 30. 8/11/21 Heiko Paulheim 30 RDF2vec: Example Applications • Entity disambiguation: linking texts to a knowledge graph Türker et al. (2019): Knowledge-Based Short Text Categorization Using Entity and Category Embedding
  • 31. 8/11/21 Heiko Paulheim 31 RDF2vec: Example Applications • Finding related research papers on CoViD-19 Steenwinckel et al. (2020): Facilitating COVID-19 Meta-analysis Through a Literature Knowledge Graph
  • 32. 8/11/21 Heiko Paulheim 32 RDF2vec: Example Applications • Table search by keyword Zhang and Balog (2018): Ad Hoc Table Retrieval using Semantic Similarity.
  • 33. 8/11/21 Heiko Paulheim 33 RDF2vec: Example Applications • Predicting biological interactions Sousa et al. (2021): Supervised Semantic Similarity.
  • 34. 8/11/21 Heiko Paulheim 34 RDF2vec: Example Applications • Zero-Shot Image Classification Tristan Hascoet et al. (2017): Semantic Web and Zero-Shot Learning of Large Scale Visual Classes.
  • 35. 8/11/21 Heiko Paulheim 35 Embeddings for Link Prediction • RDF2vec example – similar instances form clusters, direction of relation is ~stable – link prediction by analogy reasoning (Japan – Tokyo ≈ China – Beijing) Ristoski & Paulheim: RDF2vec: RDF Graph Embeddings for Data Mining. ISWC, 2016
  • 36. 8/11/21 Heiko Paulheim 36 Embeddings for Link Prediction • In RDF2vec, relation preservation is a by-product • TransE (and its descendants): direct modeling – Formulates RDF embedding as an optimization problem – Find mapping of entities and relations to Rn so that • across all triples <s,p,o> Σ ||s+p-o|| is minimized • try to obtain a smaller error for existing triples than for non-existing ones Bordes et al: Translating Embeddings for Modeling Multi-relational Data. NIPS 2013. Fan et al.: Learning Embedding Representations for Knowledge Inference on Imperfect and Incomplete Repositories. WI 2016
  • 37. 8/11/21 Heiko Paulheim 37 Link Prediction vs. Node Embedding • Hypothesis: – Embeddings for link prediction also cluster similar entities – Node embeddings can also be used for link prediction Portisch et al. (under review): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 38. 8/11/21 Heiko Paulheim 38 Similarity vs. Relatedness • Closest 10 entities to Angela Merkel in different vector spaces Portisch et al. (under review): Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?
  • 39. 8/11/21 Heiko Paulheim 39 Similarity vs. Relatedness • (s-)RDF2vec allows an explicit trade off w/ different walk strategies Mannheim Baden- Württemberg Germany Adler Mannheim SAP Arena Reiss- Engelhorn -Museum location location location federal state country location city stadium Knowledge Graph Walk Generation Adler_Mannheim → city → Mannheim → country → Germany Adler_Mannheim → stadium → SAP_Arena → location → Mannheim SAP_Arena → location → Mannheim → country → Germany ... “Classic” RDF2vec walks city → Mannheim → country stadium → SAP_Arena → location location → Mannheim → country ... s-RDF2vec walks + RDF2vec “union walks” RDF2vec “classic” RDF2vec “edge” concatenated vector Global PCA Test Cases concatenated vector (task-specific subset) w 2 w 1 (weighted) local PCA Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity Embeddings.
  • 40. 8/11/21 Heiko Paulheim 40 Similarity vs. Relatedness • s-RDF2vec – using different walk strategies – combining different vector spaces (weighted combinations are possible) • 10 closest neighbors to Mannheim: Portisch et al. (under review): s-RDF2vec: Injecting Knowledge Graph Structure Into RDF2vec Entity Embeddings.
  • 41. 8/11/21 Heiko Paulheim 41 Similarity vs. Relatedness • Recap word embeddings: – Jobs, Wozniak, and Wayne founded Apple Computer Company in April 1976 – Google was officially founded as a company in January 2006 • Graph walks: – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg Germany Angela_Merkel Hamburg birthPlace country leader Peter_Tschentscher leader residence country
  • 42. 8/11/21 Heiko Paulheim 42 Similarity vs. Relatedness • Surrounding entities indicate relatedness – Hamburg → country → Germany → leader → Angela_Merkel – Germany → leader → Angela_Merkel → birthPlace → Hamburg • Same entities in similar positions indicate similarity – Germany → leader → Angela_Merkel → birthPlace → Hamburg – Hamburg → leader → Peter_Tschentscher → residence → Hamburg • Someone is a leader vs. something has a leader • Solution approach: use embedding approach that respects positions – CWINDOW / Structured Skip-ngram Portisch and Paulheim (2021): Putting RDF2vec in Order.
  • 43. 8/11/21 Heiko Paulheim 43 Similarity vs. Relatedness • Why bother? – Use case: table interpretation (a special case of entity disambiguation) related similar
  • 44. 8/11/21 Heiko Paulheim 44 Back to Interpretability • Hot topic: Explainable AI – Knowledge Graphs are a favorable ingredient – Human/machine interpretable knowledge → explainable systems • However: – Embeddings replace interpretable axioms with numeric vectors over non-interpretable dimensions – Where did the semantics go? Paulheim (2018): Make Embeddings Semantic Again!
  • 45. 8/11/21 Heiko Paulheim 45 Towards Semantic Vector Space Embeddings cartoon superhero Paulheim (2018): Make Embeddings Semantic Again!
  • 46. 8/11/21 Heiko Paulheim 46 Towards Semantic Vector Space Embeddings cartoon superhero • Approach 1: learn interpretation function • Each dimension of the embedding model is a target for a separate learning problem • Learn a function to explain the dimension • E.g.: • Just an approximation used for explanations and justifications y≈−|∃character .Superhero|
  • 47. 8/11/21 Heiko Paulheim 47 Towards Semantic Vector Space Embeddings cartoon superhero • Approach 2: learn inherently interpretable embeddings • Step 1: learn typical patterns that exist in a knowledge graph – e.g., graph pattern learning – e.g., Horn clauses • Step 2a: use those patterns as embedding dimensions – probably not low dimensional • Step 2b: compact the space – e.g., use dimensions for mutually exclusive patterns
  • 48. 8/11/21 Heiko Paulheim 48 Towards Semantic Vector Space Embeddings • Different angle: learn interpretation for similarity function ~similar type ~same country ~connected to same entity
  • 49. 8/11/21 Heiko Paulheim 49 Summary • Knowledge Graphs are a versatile ingredient for AI – Integrated view on data – Large-scale free source of background knowledge • Knowledge Graph Embeddings – Effective processing of large-scale knowledge sources – Encoding of similarity and/or relatedness • RDF2vec: explicit trade-off is possible! – Additional insights that are not explicit in the graph • aka latent semantics
  • 50. 8/11/21 Heiko Paulheim 50 More on RDF2vec • Collection of – Implementations – Pre-trained models – >40 use cases in various domains
  • 51. 8/11/21 Heiko Paulheim 51 Thank you! http://www.heikopaulheim.com @heikopaulheim
  • 52. 8/11/21 Heiko Paulheim 52 Using Knowledge Graphs in Data Science – From Symbolic to Latent Representations and a few Steps Back Heiko Paulheim University of Mannheim Heiko Paulheim