SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Downloaden Sie, um offline zu lesen
making sense of text and data
October, 2019
Connected Data London
Semantic Similarity for Faster
Knowledge Graph Delivery at Scale
Why Knowledge Graphs?
“Cross-industry studies show that on average, less than half of an
organization’s structured data is actively used in making decisions—and
less than 1% of its unstructured data is analyzed or used at all”
What’s Your Data Strategy? Leandro DalleMule and Thomas H. Davenport, Harvard Business Review
Top 5 USA
Banks
Presentation Outline
Enterprise Knowledge Graphs
Smart Graphs with Embeddings
Implementing Knowledge Graphs
Presentation Outline
What is a Knowledge Graph?
Graph, Semantics, Smart, Alive
Multiple Enterprise Data Management Systems
KG platforms combine capabilities of several enterprise systems:
o Master and reference data management
o Corporate/Enterprise Taxonomy
o Datawarehouse
o Metadata management
o Digital asset management
o Enterprise search
Challenges in Enterprise Semantic Integration
Type Titles
TV Episodes 4’044’529
Short film 681’067
Feature film 516’726
Video 164’061
TV series 164’061
TV movies 126’206
… …
Total * 5’838’514
Type Titles
film 235’707
silent short film 16’377
television film 15’345
short film 11’225
animated film 3’785
… …
… …
Total 289’650
IMDB WikiData
* Later the tests use only 5K crawled datasets
Challenges in Enterprise Semantic Integration
Multiple levels of inconsistencies:
o Types: film vs “TV movie”
o Meta-data: “science fiction”, “military
science fiction” vs “Sci-Fi”
o Reference data: “US” vs. “United States”
o Manually curated cross-links (!) for testing
purposes only
A Classical Approach
o Start with string matching of the Titles
“Harry Potter and the Deathly Hallows: Part II” vs.
“Harry Potter and the Deathly Hallows – Part 2”
“Perfume: The Story of a Murderer” vs “Perfume”
“Pirate Radio” vs. “The Boat That Rocked”
“Avatar” vs ”Avatar” (4 movies)
A Classical Approach with extra Rules
o Add release date matching
Lose 10% of the matches due to bad dates
o Ambiguity is greatly reduced but still many:
tt0238520
16 October 1995
50 min
tt1125875
11 April 1995
48 min
tt0238520
23 June 1995
1h 21 min
Presentation Outline
Enterprise Knowledge Graphs
Smart Graphs with Embeddings
Implementing Knowledge Graphs
Presentation Outline
What is Knowledge Graph Embedding?
o Predict similar graph nodes or properties
o Require no input training data
o Mathematical representation of graph nodes as vectors:
duration
drama
comedy
The Godfather
(2h 58m)
American Pie
(1h 15 min)
vs.
o For each film include all actors, director, country of origin
o Vast matrix with entities and literals
Knowledge Graph Embedding Example
Movie [Actor]
“Adam
LeFevre”
[Actor]
“Anthony
Anderson
”
[Actor]
“Mia
Farrow”
[Country]
“France”
[Country]
”US”
[Country]
”United
states”
[Director]”
Luc
Besson”
…
wd:
Q550232
1 1 1 1 1
imdb:
tt0344854
1 1 1 1
... … … … … … … … …
TermsDocument
Random Indexing (RI) Algorithm
o Reduces the matrix dimension
with elemental vectors
For each term, w calculate a context vector S(w) by
summing the index vectors of all elemental vectors
x appearing in the context of w
o Light-weight and fast
(250K x 1.45M matrix in < 5m)
o Fast sub-second searches and
requires limited RAM
Actors
Movie
Adam
LeFevre
Anthony
Anderson
Mia
Farrow
Elemental
vectors
wd:
Q550232
1 1 1
imdb:
tt0344854
1 0 1
... … … …
Random Indexing (RI) Algorithm #2
o Supports similarity searches for:
Document to Document – similar movies
Document to Term – specific actor/director
Term to Term – similar actor/directors
Term to Document – find movies specific for this
actor/director
o Features all properties of a
Vector Space model
o Partial matching, weights, ranking + context
sensitive semantic search
Actors
Movie
Adam
LeFevre
Anthony
Anderson
Mia
Farrow
Elemental
vectors
wd:
Q550232
1 1 1
imdb:
tt0344854
1 0 1
... … … …
Presentation Outline
Enterprise Knowledge Graphs
Smart Graphs with Embeddings
Implementing Knowledge Graphs
Presentation Outline
KG Consumers
GraphDB
Reference Software Architecture
o Easy consumption of data
o No backend development
o Flexible data processing tools
o Standard and open interfaces
Ontotext Platform
GQL query
SPARQL
RDF /
Structured
data
GQL
mutation
GQL
Federation
Similarity
Plugin
Transform CSV to RDF
o Perform standard ETL tasks
o Trim spaces, parse numbers and dates
o Parse IMDB ids from links for testing
o Map table data to RDF
o SPARQL over tabular data
o Split multi-valued fields like ”Action|Thriller”
o Not yet applied schema level
alignment
Similarity Plugin API
subject predicate object
wd:Q550232 :actor “Adam LeFevre”
imdb:tt0344854 :actor "Adam LeFevre”
… … …
o Accepts a graph described by <s, p, o>
o Indexes any RDF types
o Works with virtual overlays like:
“Adam LeFevre”
imdb:
tt0344854
wd:
Q550232
“Adam LeFevre”
wd:Q2702
964
rdfs:label
wdt:P161
imdb:actor_2_name
Specify KG Embeddings – Select Predicates
o Similarity plugin expects triples <s, p, o>
Specify KG Embeddings – Align Schema
o Set a translation table of the predicates
Results
o Find similar RDF resources to “Pirate Radio”
o Even a limited set of predicates return acceptable results
o Important independent alternative for entity matching
Important Design Considerations
o Prefer RDF over Property Graph
o Much richer technology ecosystem (schema, dataset, reasoning, strings vs things)
o Virtualization versus Consolidation
o Virtualization works only for simple lookup queries, but not real data integration
o Push result federation to the GraphQL data consumption layer
o Integrating Random Indexing in the KG database
o Push heavy computation as closest to the data
o Choose GraphQL over SPARQL for app developers:
Questions & Answering

Weitere ähnliche Inhalte

Ähnlich wie Semantic similarity for faster Knowledge Graph delivery at scale

lecture04_movie_discussion.pdf
lecture04_movie_discussion.pdflecture04_movie_discussion.pdf
lecture04_movie_discussion.pdfKRISLAM4
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsMachine Learning Prague
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualitysblom
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityguestd6c836
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)James Hendler
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...Abhay Prakash
 
Simple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash CourseSimple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash CourseBessie Chu
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative SystemsOptum
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Evolving as a professional software developer
Evolving as a professional software developerEvolving as a professional software developer
Evolving as a professional software developerAnton Kirillov
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with GremlinArtem Chebotko
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Jeff Magnusson
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011sssw2011
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Visualizing your data in JavaScript
Visualizing your data in JavaScriptVisualizing your data in JavaScript
Visualizing your data in JavaScriptMandi Cai
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and MLNeo4j
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendEgor Pushkin
 

Ähnlich wie Semantic similarity for faster Knowledge Graph delivery at scale (20)

lecture04_movie_discussion.pdf
lecture04_movie_discussion.pdflecture04_movie_discussion.pdf
lecture04_movie_discussion.pdf
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle quality
 
An analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle qualityAn analytic framework for estimating puzzle quality
An analytic framework for estimating puzzle quality
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
IJCAI 2015 Presentation: Did you know?- Mining Interesting Trivia for Entitie...
 
Simple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash CourseSimple Slide Design and Data Visualization Crash Course
Simple Slide Design and Data Visualization Crash Course
 
Semantics In Declarative Systems
Semantics In Declarative SystemsSemantics In Declarative Systems
Semantics In Declarative Systems
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Evolving as a professional software developer
Evolving as a professional software developerEvolving as a professional software developer
Evolving as a professional software developer
 
Traversing Graphs with Gremlin
Traversing Graphs with GremlinTraversing Graphs with Gremlin
Traversing Graphs with Gremlin
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
 
Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011Jim Hendler's Presentation at SSSW 2011
Jim Hendler's Presentation at SSSW 2011
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Visualizing your data in JavaScript
Visualizing your data in JavaScriptVisualizing your data in JavaScript
Visualizing your data in JavaScript
 
Graphs for Ai and ML
Graphs for Ai and MLGraphs for Ai and ML
Graphs for Ai and ML
 
State of NLP and Amazon Comprehend
State of NLP and Amazon ComprehendState of NLP and Amazon Comprehend
State of NLP and Amazon Comprehend
 

Mehr von Connected Data World

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenConnected Data World
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaConnected Data World
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Connected Data World
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine LearningConnected Data World
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is hereConnected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2Connected Data World
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3Connected Data World
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data ModelConnected Data World
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseConnected Data World
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Connected Data World
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Connected Data World
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Connected Data World
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the WebConnected Data World
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsConnected Data World
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...Connected Data World
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGOConnected Data World
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?Connected Data World
 

Mehr von Connected Data World (20)

Systems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van HarmelenSystems that learn and reason | Frank Van Harmelen
Systems that learn and reason | Frank Van Harmelen
 
Graph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora LassilaGraph Abstractions Matter by Ora Lassila
Graph Abstractions Matter by Ora Lassila
 
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
Κnowledge Architecture: Combining Strategy, Data Science and Information Arch...
 
How to get started with Graph Machine Learning
How to get started with Graph Machine LearningHow to get started with Graph Machine Learning
How to get started with Graph Machine Learning
 
Graphs in sustainable finance
Graphs in sustainable financeGraphs in sustainable finance
Graphs in sustainable finance
 
The years of the graph: The future of the future is here
The years of the graph: The future of the future is hereThe years of the graph: The future of the future is here
The years of the graph: The future of the future is here
 
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
From Taxonomies and Schemas to Knowledge Graphs: Parts 1 & 2
 
From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3From Taxonomies and Schemas to Knowledge Graphs: Part 3
From Taxonomies and Schemas to Knowledge Graphs: Part 3
 
In Search of the Universal Data Model
In Search of the Universal Data ModelIn Search of the Universal Data Model
In Search of the Universal Data Model
 
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph DatabaseGraph in Apache Cassandra. The World’s Most Scalable Graph Database
Graph in Apache Cassandra. The World’s Most Scalable Graph Database
 
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
Enterprise Data Governance: Leveraging Knowledge Graph & AI in support of a d...
 
Graph Realities
Graph RealitiesGraph Realities
Graph Realities
 
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
Powering Question-Driven Problem Solving to Improve the Chances of Finding Ne...
 
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
Knowledge Graphs and AI to Hyper-Personalise the Fashion Retail Experience at...
 
Schema, Google & The Future of the Web
Schema, Google & The Future of the WebSchema, Google & The Future of the Web
Schema, Google & The Future of the Web
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Elegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property GraphsElegant and Scalable Code Querying with Code Property Graphs
Elegant and Scalable Code Querying with Code Property Graphs
 
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
From Knowledge Graphs to AI-powered SEO: Using taxonomies, schemas and knowle...
 
Graph for Good: Empowering your NGO
Graph for Good: Empowering your NGOGraph for Good: Empowering your NGO
Graph for Good: Empowering your NGO
 
What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?What are we Talking About, When we Talk About Ontology?
What are we Talking About, When we Talk About Ontology?
 

Kürzlich hochgeladen

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 

Kürzlich hochgeladen (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 

Semantic similarity for faster Knowledge Graph delivery at scale

  • 1. making sense of text and data October, 2019 Connected Data London Semantic Similarity for Faster Knowledge Graph Delivery at Scale
  • 2. Why Knowledge Graphs? “Cross-industry studies show that on average, less than half of an organization’s structured data is actively used in making decisions—and less than 1% of its unstructured data is analyzed or used at all” What’s Your Data Strategy? Leandro DalleMule and Thomas H. Davenport, Harvard Business Review Top 5 USA Banks
  • 3. Presentation Outline Enterprise Knowledge Graphs Smart Graphs with Embeddings Implementing Knowledge Graphs Presentation Outline
  • 4. What is a Knowledge Graph? Graph, Semantics, Smart, Alive
  • 5. Multiple Enterprise Data Management Systems KG platforms combine capabilities of several enterprise systems: o Master and reference data management o Corporate/Enterprise Taxonomy o Datawarehouse o Metadata management o Digital asset management o Enterprise search
  • 6. Challenges in Enterprise Semantic Integration Type Titles TV Episodes 4’044’529 Short film 681’067 Feature film 516’726 Video 164’061 TV series 164’061 TV movies 126’206 … … Total * 5’838’514 Type Titles film 235’707 silent short film 16’377 television film 15’345 short film 11’225 animated film 3’785 … … … … Total 289’650 IMDB WikiData * Later the tests use only 5K crawled datasets
  • 7. Challenges in Enterprise Semantic Integration Multiple levels of inconsistencies: o Types: film vs “TV movie” o Meta-data: “science fiction”, “military science fiction” vs “Sci-Fi” o Reference data: “US” vs. “United States” o Manually curated cross-links (!) for testing purposes only
  • 8. A Classical Approach o Start with string matching of the Titles “Harry Potter and the Deathly Hallows: Part II” vs. “Harry Potter and the Deathly Hallows – Part 2” “Perfume: The Story of a Murderer” vs “Perfume” “Pirate Radio” vs. “The Boat That Rocked” “Avatar” vs ”Avatar” (4 movies)
  • 9. A Classical Approach with extra Rules o Add release date matching Lose 10% of the matches due to bad dates o Ambiguity is greatly reduced but still many: tt0238520 16 October 1995 50 min tt1125875 11 April 1995 48 min tt0238520 23 June 1995 1h 21 min
  • 10. Presentation Outline Enterprise Knowledge Graphs Smart Graphs with Embeddings Implementing Knowledge Graphs Presentation Outline
  • 11. What is Knowledge Graph Embedding? o Predict similar graph nodes or properties o Require no input training data o Mathematical representation of graph nodes as vectors: duration drama comedy The Godfather (2h 58m) American Pie (1h 15 min) vs.
  • 12. o For each film include all actors, director, country of origin o Vast matrix with entities and literals Knowledge Graph Embedding Example Movie [Actor] “Adam LeFevre” [Actor] “Anthony Anderson ” [Actor] “Mia Farrow” [Country] “France” [Country] ”US” [Country] ”United states” [Director]” Luc Besson” … wd: Q550232 1 1 1 1 1 imdb: tt0344854 1 1 1 1 ... … … … … … … … … TermsDocument
  • 13. Random Indexing (RI) Algorithm o Reduces the matrix dimension with elemental vectors For each term, w calculate a context vector S(w) by summing the index vectors of all elemental vectors x appearing in the context of w o Light-weight and fast (250K x 1.45M matrix in < 5m) o Fast sub-second searches and requires limited RAM Actors Movie Adam LeFevre Anthony Anderson Mia Farrow Elemental vectors wd: Q550232 1 1 1 imdb: tt0344854 1 0 1 ... … … …
  • 14. Random Indexing (RI) Algorithm #2 o Supports similarity searches for: Document to Document – similar movies Document to Term – specific actor/director Term to Term – similar actor/directors Term to Document – find movies specific for this actor/director o Features all properties of a Vector Space model o Partial matching, weights, ranking + context sensitive semantic search Actors Movie Adam LeFevre Anthony Anderson Mia Farrow Elemental vectors wd: Q550232 1 1 1 imdb: tt0344854 1 0 1 ... … … …
  • 15. Presentation Outline Enterprise Knowledge Graphs Smart Graphs with Embeddings Implementing Knowledge Graphs Presentation Outline
  • 16. KG Consumers GraphDB Reference Software Architecture o Easy consumption of data o No backend development o Flexible data processing tools o Standard and open interfaces Ontotext Platform GQL query SPARQL RDF / Structured data GQL mutation GQL Federation Similarity Plugin
  • 17. Transform CSV to RDF o Perform standard ETL tasks o Trim spaces, parse numbers and dates o Parse IMDB ids from links for testing o Map table data to RDF o SPARQL over tabular data o Split multi-valued fields like ”Action|Thriller” o Not yet applied schema level alignment
  • 18. Similarity Plugin API subject predicate object wd:Q550232 :actor “Adam LeFevre” imdb:tt0344854 :actor "Adam LeFevre” … … … o Accepts a graph described by <s, p, o> o Indexes any RDF types o Works with virtual overlays like: “Adam LeFevre” imdb: tt0344854 wd: Q550232 “Adam LeFevre” wd:Q2702 964 rdfs:label wdt:P161 imdb:actor_2_name
  • 19. Specify KG Embeddings – Select Predicates o Similarity plugin expects triples <s, p, o>
  • 20. Specify KG Embeddings – Align Schema o Set a translation table of the predicates
  • 21. Results o Find similar RDF resources to “Pirate Radio” o Even a limited set of predicates return acceptable results o Important independent alternative for entity matching
  • 22. Important Design Considerations o Prefer RDF over Property Graph o Much richer technology ecosystem (schema, dataset, reasoning, strings vs things) o Virtualization versus Consolidation o Virtualization works only for simple lookup queries, but not real data integration o Push result federation to the GraphQL data consumption layer o Integrating Random Indexing in the KG database o Push heavy computation as closest to the data o Choose GraphQL over SPARQL for app developers: