SlideShare a Scribd company logo
1 of 26
Download to read offline
What makes a linked data pattern
interesting?
Szymon Klarman
Department of Computer Science
Brunel University London
June 7, 2016
Connected Data London
#ConnectedData2016
Linked Data
 data/knowledge represented in W3C standards OWL/RDF(S)
 flexible, unrestrictive, extendible
 machine (and human) accessible
 connected into a global Web of Data
 (open) and reusable (and when combined great things might happen!)
 perfectly functional also in closed environments
RDF(S) = graph structure + logical inference
b p
has participant A
Regulation Protein
type type
has entity idlabel
GRB2 regulates GAB1 UniProt:P34723
RDF(S) = graph structure + logical inference
b p
has participant A
Regulation Protein
type type
RDF(S) = graph structure + logical inference
b p
has participant A
Regulation Protein
type type
Regulation
Molecular interaction
Biological event
subclass of
subclass of
has participant A
has participant
subproperty pf
domain range
Chemical
has participant B
RDF(S) = graph structure + logical inference
b p
has participant A
Regulation Protein
type type
has participant
Molecular Interaction
Biological event
type
type
Chemical
type
Regulation
Molecular interaction
Biological event
subclass of
subclass of
has participant A
has participant
subproperty pf
domain range
Chemical
has participant B
RDF(S) = graph structure + logical inference
b p
has participant A
Regulation Protein
type type
Querying:
?z ?y
has participant
Biological event Chemical
type type
Regulation
Molecular interaction
Biological event
has participant A
has participant
domain range
Chemical
has participant B
subclass of
subclass of
subproperty pf
Linked data mining
Emerging field: Workshop on Knowledge Discovery and Data Mining Meets
Linked Open Data since 2012 (+ Linked Data Mining Challange).
Problems:
 finding novel/surprising/interesting linked data patterns
 identifying relevant semantic connections
 predicting facts/links in knowledge graphs
Most modest yet fundamental task:
What’s in that linked data set?
 Web of Data will soon contain a lot of significant answers (42!)...
 ...so we need to know how to ask the right question...
 ...so we need to understand what’s in these data set.
Examples are from the Big Mechanism project (http://52.26.26.74/).
So what’s in that linked data set?
So what’s in that linked data set?
So what’s in that linked data set?
Too much too noisy...
So what’s in that linked data set?
So what’s in that linked data set?
No structure...
Ontologies on the Web of Data
Concept & property hierarchies + type assertions make up most of the Web of Data.
B. Glimm, A. Hogan, M. Krötzsch, A. Polleres: „OWL: Yet to arrive on the Web of Data?”, 2012
Typical ontologies don’t reflect the actual
graph structure of data...
Biological event
Chemical / Event
Statement
Article
Journal
representsis represented by
is extracted from
Molecular interaction
has participanttype
Submitter
has submitter
The actual „conceptual data model”
published in
GRB2_regulates_GAB1
statement_1
GRB2_MOUSE GAB1_MOUSE
has participant A has participant B
NaCTeM
has submitter
PMC123456
extracted from
Regulation
Protein
Statement
ArticleSubmitter
type
type
typetype
typetype
Linked data pattern
represents is represented by
Biological event
type
?z
?u
?x ?y
has participant A has participant B
?v
has submitter
?w
extracted from
Regulation
Protein
Statement
ArticleSubmitter
type
type
typetype
typetype
Linked data pattern
represents is represented by
Biological event
type
?z
?u
?x ?y
has participant A has participant B
?v
has submitter
?w
extracted from
Regulation
Protein
Statement
ArticleSubmitter
type
type
typetype
typetype
Linked data pattern
represents is represented by
Linked data pattern ≈ conjunctive query / graph query
Query is a set of triples of the form:
( ?x type Concept )
( ?x Property ?y )
Linked data mining ≈ search through the query space
Biological event
type
When is a linked data pattern interesting?
Two evaluation criteria:
 Frequency: the pattern has relatively many matches in the set;
 Semantic content: the pattern contains relatively much information.
Frequency is the central criterion for the related problem of frequent
subgraph mining in the graph & multi-relational data setting.
⇢ linked data is graph data.
Semantic content criterion originates in logical/semantic theories of
information, and is used in inductive logic programming.
⇢ linked data is grounded in logic.
There is an inherent trade-off between the two criteria.
Frequency
The most frequent linked data patterns out there will always be:
X is something...
Something is somehow related to something else...
?x ?y
owl:topObjectProperty
owl:Thing
typetype
X is an event of type...?
Semantic content
regulation
molecular interaction
biological event
The more possibilities you exclude the more you say.
owl:Thing
Semantic content
The linked data pattern with the most
semantic content is the entire RDF graph...
Pattern Q1 has more semantic content than pattern Q2 (over ontology O)
if
Q1 (with O) logically entails Q2
?z ?y
has participant A
Regulation Protein
type type
?z ?y
has participant
Biological event Chemical
type type
Trade-off
FREQ (Q) CONT (Q)
VALUE(Q) =
weighted sum of FREQ(Q) and CONT(Q)
1 - Prob(Q is true a priori)#answers / #possible answers
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600 700 800 900
Value Freq Cont
Trade-off
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600 700 800 900
Value Freq Cont
Q1 = textual_entity(x)
Q2 = statement(x)
Q3 = event(x)
Q4 = journal_article(x), published_in(x, u), journal(u),
is_extracted_from(w, x), statement(w), contained_in(w, y),
table(y), represents(w, v), negative_regulation(v),
has_submitter(y, z), submitter(z), [...] (10 variables)
Q5 = table(x), has_submitter(x, z), submitter(z), contains_statement(x, y), statement(y), contained_in(y, x)
Q6 = positive_regulation(z), is_represented_by(z, y), statement(y), represents(y, z), contained_in(y, x),
table(x), has_submitter(x, v), submitter(v), contains_statement(x, y).
Algorithm
The space of all patterns over realistic linked data sets is virtually infinite.
But there are some good search heuristics:
 use precomputed „promising” building blocks;
 „climb up” over the most successful queries so far (but use a restart rule
to avoid getting stuck locally).
0
0.2
0.4
0.6
0.8
1
1.2
0 100 200 300 400 500 600 700 800 900
Value Freq Cont
What’s next...
The question „what’s in that linked data set?” is perhaps not the major one,
but the suggested notion of interestingness might well be:
 „frequency vs. semantic content” trade-off reflects the dual – graphical
and logical – nature of the RDF(S) representation model.
 many of the linked data mining tasks can be described as: given Q2 find
an interesting Q1 such that:
Q1 ⇢ Q2
 other, more abstract criteria might be also necessary.
Linked data mining requires novel principles and foundational approaches.

More Related Content

What's hot

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningPaul Groth
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinalDeborah McGuinness
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the WebRinke Hoekstra
 
Instance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionInstance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionLihua Zhao
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Rinke Hoekstra
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Lihua Zhao
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwanandrea huang
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...andrea huang
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Enayat Rajabi
 
Knowledge Graph Engineering
Knowledge Graph EngineeringKnowledge Graph Engineering
Knowledge Graph EngineeringArmin Haller
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofNurfadhlina Mohd Sharef
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Modular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyModular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyFrank van Harmelen
 
A Non-Technical, Example-Driven Introduction to Linked Data
A Non-Technical, Example-Driven Introduction to Linked DataA Non-Technical, Example-Driven Introduction to Linked Data
A Non-Technical, Example-Driven Introduction to Linked Datakjanowicz
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISimon Jupp
 

What's hot (20)

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Content + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learningContent + Signals: The value of the entire data estate for machine learning
Content + Signals: The value of the entire data estate for machine learning
 
2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal2011linked science4mccuskermcguinnessfinal
2011linked science4mccuskermcguinnessfinal
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Instance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge AcquisitionInstance-Based Ontological Knowledge Acquisition
Instance-Based Ontological Knowledge Acquisition
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011Mid-Ontology Learning from Linked Data @JIST2011
Mid-Ontology Learning from Linked Data @JIST2011
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives TaiwanA Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
A Linked Data Prototype for the Union Catalog of Digital Archives Taiwan
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)Interlinking educational data to Web of Data (Thesis presentation)
Interlinking educational data to Web of Data (Thesis presentation)
 
Knowledge Graph Engineering
Knowledge Graph EngineeringKnowledge Graph Engineering
Knowledge Graph Engineering
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Modular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxologyModular design patterns for systems that learn and reason: a boxology
Modular design patterns for systems that learn and reason: a boxology
 
A Non-Technical, Example-Driven Introduction to Linked Data
A Non-Technical, Example-Driven Introduction to Linked DataA Non-Technical, Example-Driven Introduction to Linked Data
A Non-Technical, Example-Driven Introduction to Linked Data
 
Semantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBISemantics as a service at EMBL-EBI
Semantics as a service at EMBL-EBI
 

Similar to What makes a linked data pattern interesting?

Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...EmadfHABIB2
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...eSAT Journals
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...Marko Rodriguez
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesData2B
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine LearningNimrita Koul
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...Mark Wilkinson
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshowMark Wilkinson
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
Rules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataRules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataHang Dong
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in JuliaJiahao Chen
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overviewdgarijo
 
Information Networks And Their Dynamics
Information Networks And Their DynamicsInformation Networks And Their Dynamics
Information Networks And Their DynamicsSrinath Srinivasa
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 

Similar to What makes a linked data pattern interesting? (20)

Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
Updated (version 2.3 THRILLER) Easy Perspective to (Complexity)-Thriller 12 S...
 
Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...Dimensionality reduction by matrix factorization using concept lattice in dat...
Dimensionality reduction by matrix factorization using concept lattice in dat...
 
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and ...
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
La résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphesLa résolution de problèmes à l'aide de graphes
La résolution de problèmes à l'aide de graphes
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Nimrita koul Machine Learning
Nimrita koul  Machine LearningNimrita koul  Machine Learning
Nimrita koul Machine Learning
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
FAIR Data Prototype - Interoperability and FAIRness through a novel combinati...
 
IBC FAIR Data Prototype Implementation slideshow
IBC FAIR Data Prototype Implementation   slideshowIBC FAIR Data Prototype Implementation   slideshow
IBC FAIR Data Prototype Implementation slideshow
 
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...
 
Fusing semantic data
Fusing semantic dataFusing semantic data
Fusing semantic data
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Gf o2014talk
Gf o2014talkGf o2014talk
Gf o2014talk
 
Rules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataRules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging data
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
EDBT 2015: Summer School Overview
EDBT 2015: Summer School OverviewEDBT 2015: Summer School Overview
EDBT 2015: Summer School Overview
 
Information Networks And Their Dynamics
Information Networks And Their DynamicsInformation Networks And Their Dynamics
Information Networks And Their Dynamics
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 

More from Szymon Klarman

Formal Verification of Data Provenance Records
Formal Verification of Data Provenance RecordsFormal Verification of Data Provenance Records
Formal Verification of Data Provenance RecordsSzymon Klarman
 
Data driven approaches to empirical discovery
Data driven approaches to empirical discoveryData driven approaches to empirical discovery
Data driven approaches to empirical discoverySzymon Klarman
 
ABox Abduction in the Description Logic
ABox Abduction in the Description LogicABox Abduction in the Description Logic
ABox Abduction in the Description LogicSzymon Klarman
 
Judgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social UtilityJudgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social UtilitySzymon Klarman
 
Description Logics of Context
Description Logics of ContextDescription Logics of Context
Description Logics of ContextSzymon Klarman
 
Prediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data StreamsPrediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data StreamsSzymon Klarman
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLSzymon Klarman
 
Ontology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logicsOntology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logicsSzymon Klarman
 
Knowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic TechniquesKnowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic TechniquesSzymon Klarman
 
SKOS: Building taxonomies with minimum ontological commitment
SKOS: Building taxonomies  with minimum ontological commitmentSKOS: Building taxonomies  with minimum ontological commitment
SKOS: Building taxonomies with minimum ontological commitmentSzymon Klarman
 

More from Szymon Klarman (11)

HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
Formal Verification of Data Provenance Records
Formal Verification of Data Provenance RecordsFormal Verification of Data Provenance Records
Formal Verification of Data Provenance Records
 
Data driven approaches to empirical discovery
Data driven approaches to empirical discoveryData driven approaches to empirical discovery
Data driven approaches to empirical discovery
 
ABox Abduction in the Description Logic
ABox Abduction in the Description LogicABox Abduction in the Description Logic
ABox Abduction in the Description Logic
 
Judgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social UtilityJudgment Aggregation as Maximization of Epistemic and Social Utility
Judgment Aggregation as Maximization of Epistemic and Social Utility
 
Description Logics of Context
Description Logics of ContextDescription Logics of Context
Description Logics of Context
 
Prediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data StreamsPrediction and Explanation over DL-Lite Data Streams
Prediction and Explanation over DL-Lite Data Streams
 
Querying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QLQuerying Temporal Databases via OWL 2 QL
Querying Temporal Databases via OWL 2 QL
 
Ontology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logicsOntology learning from interpretations in lightweight description logics
Ontology learning from interpretations in lightweight description logics
 
Knowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic TechniquesKnowledge Assembly at Scale with Semantic and Probabilistic Techniques
Knowledge Assembly at Scale with Semantic and Probabilistic Techniques
 
SKOS: Building taxonomies with minimum ontological commitment
SKOS: Building taxonomies  with minimum ontological commitmentSKOS: Building taxonomies  with minimum ontological commitment
SKOS: Building taxonomies with minimum ontological commitment
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

What makes a linked data pattern interesting?

  • 1. What makes a linked data pattern interesting? Szymon Klarman Department of Computer Science Brunel University London June 7, 2016 Connected Data London #ConnectedData2016
  • 2. Linked Data  data/knowledge represented in W3C standards OWL/RDF(S)  flexible, unrestrictive, extendible  machine (and human) accessible  connected into a global Web of Data  (open) and reusable (and when combined great things might happen!)  perfectly functional also in closed environments
  • 3. RDF(S) = graph structure + logical inference b p has participant A Regulation Protein type type has entity idlabel GRB2 regulates GAB1 UniProt:P34723
  • 4. RDF(S) = graph structure + logical inference b p has participant A Regulation Protein type type
  • 5. RDF(S) = graph structure + logical inference b p has participant A Regulation Protein type type Regulation Molecular interaction Biological event subclass of subclass of has participant A has participant subproperty pf domain range Chemical has participant B
  • 6. RDF(S) = graph structure + logical inference b p has participant A Regulation Protein type type has participant Molecular Interaction Biological event type type Chemical type Regulation Molecular interaction Biological event subclass of subclass of has participant A has participant subproperty pf domain range Chemical has participant B
  • 7. RDF(S) = graph structure + logical inference b p has participant A Regulation Protein type type Querying: ?z ?y has participant Biological event Chemical type type Regulation Molecular interaction Biological event has participant A has participant domain range Chemical has participant B subclass of subclass of subproperty pf
  • 8. Linked data mining Emerging field: Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data since 2012 (+ Linked Data Mining Challange). Problems:  finding novel/surprising/interesting linked data patterns  identifying relevant semantic connections  predicting facts/links in knowledge graphs Most modest yet fundamental task: What’s in that linked data set?  Web of Data will soon contain a lot of significant answers (42!)...  ...so we need to know how to ask the right question...  ...so we need to understand what’s in these data set. Examples are from the Big Mechanism project (http://52.26.26.74/).
  • 9. So what’s in that linked data set?
  • 10. So what’s in that linked data set?
  • 11. So what’s in that linked data set? Too much too noisy...
  • 12. So what’s in that linked data set?
  • 13. So what’s in that linked data set? No structure...
  • 14. Ontologies on the Web of Data Concept & property hierarchies + type assertions make up most of the Web of Data. B. Glimm, A. Hogan, M. Krötzsch, A. Polleres: „OWL: Yet to arrive on the Web of Data?”, 2012 Typical ontologies don’t reflect the actual graph structure of data...
  • 15. Biological event Chemical / Event Statement Article Journal representsis represented by is extracted from Molecular interaction has participanttype Submitter has submitter The actual „conceptual data model” published in
  • 16. GRB2_regulates_GAB1 statement_1 GRB2_MOUSE GAB1_MOUSE has participant A has participant B NaCTeM has submitter PMC123456 extracted from Regulation Protein Statement ArticleSubmitter type type typetype typetype Linked data pattern represents is represented by Biological event type
  • 17. ?z ?u ?x ?y has participant A has participant B ?v has submitter ?w extracted from Regulation Protein Statement ArticleSubmitter type type typetype typetype Linked data pattern represents is represented by Biological event type
  • 18. ?z ?u ?x ?y has participant A has participant B ?v has submitter ?w extracted from Regulation Protein Statement ArticleSubmitter type type typetype typetype Linked data pattern represents is represented by Linked data pattern ≈ conjunctive query / graph query Query is a set of triples of the form: ( ?x type Concept ) ( ?x Property ?y ) Linked data mining ≈ search through the query space Biological event type
  • 19. When is a linked data pattern interesting? Two evaluation criteria:  Frequency: the pattern has relatively many matches in the set;  Semantic content: the pattern contains relatively much information. Frequency is the central criterion for the related problem of frequent subgraph mining in the graph & multi-relational data setting. ⇢ linked data is graph data. Semantic content criterion originates in logical/semantic theories of information, and is used in inductive logic programming. ⇢ linked data is grounded in logic. There is an inherent trade-off between the two criteria.
  • 20. Frequency The most frequent linked data patterns out there will always be: X is something... Something is somehow related to something else... ?x ?y owl:topObjectProperty owl:Thing typetype X is an event of type...?
  • 21. Semantic content regulation molecular interaction biological event The more possibilities you exclude the more you say. owl:Thing
  • 22. Semantic content The linked data pattern with the most semantic content is the entire RDF graph... Pattern Q1 has more semantic content than pattern Q2 (over ontology O) if Q1 (with O) logically entails Q2 ?z ?y has participant A Regulation Protein type type ?z ?y has participant Biological event Chemical type type
  • 23. Trade-off FREQ (Q) CONT (Q) VALUE(Q) = weighted sum of FREQ(Q) and CONT(Q) 1 - Prob(Q is true a priori)#answers / #possible answers 0 0.2 0.4 0.6 0.8 1 1.2 0 100 200 300 400 500 600 700 800 900 Value Freq Cont
  • 24. Trade-off 0 0.2 0.4 0.6 0.8 1 1.2 0 100 200 300 400 500 600 700 800 900 Value Freq Cont Q1 = textual_entity(x) Q2 = statement(x) Q3 = event(x) Q4 = journal_article(x), published_in(x, u), journal(u), is_extracted_from(w, x), statement(w), contained_in(w, y), table(y), represents(w, v), negative_regulation(v), has_submitter(y, z), submitter(z), [...] (10 variables) Q5 = table(x), has_submitter(x, z), submitter(z), contains_statement(x, y), statement(y), contained_in(y, x) Q6 = positive_regulation(z), is_represented_by(z, y), statement(y), represents(y, z), contained_in(y, x), table(x), has_submitter(x, v), submitter(v), contains_statement(x, y).
  • 25. Algorithm The space of all patterns over realistic linked data sets is virtually infinite. But there are some good search heuristics:  use precomputed „promising” building blocks;  „climb up” over the most successful queries so far (but use a restart rule to avoid getting stuck locally). 0 0.2 0.4 0.6 0.8 1 1.2 0 100 200 300 400 500 600 700 800 900 Value Freq Cont
  • 26. What’s next... The question „what’s in that linked data set?” is perhaps not the major one, but the suggested notion of interestingness might well be:  „frequency vs. semantic content” trade-off reflects the dual – graphical and logical – nature of the RDF(S) representation model.  many of the linked data mining tasks can be described as: given Q2 find an interesting Q1 such that: Q1 ⇢ Q2  other, more abstract criteria might be also necessary. Linked data mining requires novel principles and foundational approaches.