Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM)
The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.
Machine Learning Methods for Analysing and Linking RDF Data
1. Machine Learning Methods
for Analysing and Linking RDF Data
Jens Lehmann
September 16, 2014
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 1 / 35
2. Structured Machine Learning
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
3. Structured Machine Learning
How to analyse
structured data?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 2 / 35
4. Detecting Prime Patterns: Series Finder
Construct "Modus operandi" of criminals - identified 9 new crime
patterns in Cambridge MA, USA
Wang, Tong, et al. "Detecting Patterns of Crime with Series Finder." AAAI 2013.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 3 / 35
5. Discovery of Laws of Physics
Background data generated using experiments
Mathematical functions on input variables form hypothesis space
Schmidt, Lipson. "Distilling free-form natural laws from experimental data." Science 2009.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 4 / 35
6. Protein Interaction
Rules learned via Inductive Logic Programming (ProGolem)
understandable by experts and competitive with statistical learners
Possibly better drug design and reduction of side effects
Santos et al. "Automated identification of protein-ligand interaction features using Inductive
Logic Programming: a hexose binding case study." BMC Bioinformatics 2012.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 5 / 35
7. Background Knowledge
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 6 / 35
8. RDF and the Linked Data Principles
RDF Triple:
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
9. RDF and the Linked Data Principles
RDF Triple:
Example:
|http://cs.o{xz.ac.uk/John}
Subject
|http://cs.ox.{azc.uk/studies}
Predicate
|http://cs.{ozx.ac.uk/CS}
Object
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
10. RDF and the Linked Data Principles
RDF Triple:
Example:
|http://cs.o{xz.ac.uk/John}
Subject
|http://cs.ox.{azc.uk/studies}
Predicate
|http://cs.{ozx.ac.uk/CS}
Object
The term Linked Data refers to a set of best practices for publishing and
interlinking structured data on the Web.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
11. RDF and the Linked Data Principles
RDF Triple:
Example:
|http://cs.o{xz.ac.uk/John}
Subject
|http://cs.ox.{azc.uk/studies}
Predicate
|http://cs.{ozx.ac.uk/CS}
Object
The term Linked Data refers to a set of best practices for publishing and
interlinking structured data on the Web.
Linked Data principles (simplified version):
1 Use RDF and URLs as identifiers
2 Include links to other datasets
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 7 / 35
12. OWL Ontologies
Web Ontology Language (OWL) builds on RDF and Description
Logics
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
13. OWL Ontologies
Web Ontology Language (OWL) builds on RDF and Description
Logics
Objects
Specific resources (constants)
Examples: MARIA, LEIPZIG
Classes
Sets of objects (unary predicates)
Examples: Student, Car, Country
Properties
Connections between objects (binary predicates)
Examples: hasChild, partOf
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
14. OWL Ontologies
Web Ontology Language (OWL) builds on RDF and Description
Logics
Objects
Specific resources (constants)
Examples: MARIA, LEIPZIG
Classes
Sets of objects (unary predicates)
Examples: Student, Car, Country
Properties
Connections between objects (binary predicates)
Examples: hasChild, partOf
Can be combined to complex concepts (OWL Class Expressions), e.g.:
Child u 9hasParent.Professor
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 8 / 35
15. Learning OWL Class Expressions - Definition
Given:
Background Knowledge (OWL ontologies and RDF datasets)
Positive and negative examples (objects in datasets)
Goal:
Find OWL class expression describing positive but not negative
examples
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 9 / 35
16. Application Example: Therapy Response Prediction
0.5-1% of population affected by Rheumatoid Arthritis
Anti-TNF not effective for several million persons for unknown reasons
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 10 / 35
17. Learning OWL Class Expressions - Approaches
Least common subsumers
Cohen et al. Computing least common subsumers in description
logics. AAAI 1992
Terminological decision trees
Fanizzi et al. Induction of concepts in web ontologies through
terminological decision trees. ECML PKDD 2010
Rule-based
Fanizzi et al. DL-FOIL concept learning in description logics. ILP
2008
Genetic Programming
Lehmann, Jens. Hybrid learning of ontology classes. MLDM 2007
Refinement operators
Lehmann et al. Concept learning in description logics using refinement
operators. ML 2010
Iannone et al. An algorithm based on counterfactuals for concept
learning in the semantic web. AI 2007
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 11 / 35
18. Refinement Operators - Definitions
Given a DL L, consider the quasi-ordered space hC(L),vT i over
concepts of L
: C(L) ! 2C(L) is a downward L refinement operator if for any
C 2 C(L):
D 2 (C) implies D vT C
Notation: Write C D instead of D 2 (C)
Example refinement chain:
Person Man Man u 9hasChild.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 12 / 35
19. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
20. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Operator specialises
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
21. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Operator specialises
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
22. Learning using Refinement Operators
0,45
too weak
Car
0,73
Person
0,78
Person u 9attends.
0,97
Person u 9attends.Talk
. . .
. . .
. . .
Start with most
general concept
(top down)
Heuristic evaluates
using pos/neg
examples
Operator specialises
Continue until
termination
criterion met
=
Learning Algorithm
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 13 / 35
23. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
C
C1 . . . . . . Cn
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
24. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
Redundant iff there exist two different refinement chains from a
concept C to a concept D.
C
C1 . . . . . . Cn
C
E . . .
D
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
25. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
Redundant iff there exist two different refinement chains from a
concept C to a concept D.
Proper iff for C,D 2 C(L), C D implies C6T D
C
C1 . . . . . . Cn
C
E . . .
D
C
C E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
26. Properties of Refinement Operators
An L downward refinement operator is called
Finite iff (C) is finite for any concept C 2 C(L)
Redundant iff there exist two different refinement chains from a
concept C to a concept D.
Proper iff for C,D 2 C(L), C D implies C6T D
Complete iff for C,D 2 C(L) with D @T C there is a concept E with
E T D and a refinement chain C · · · E
Weakly complete iff for any concept C with C @T we can reach a
concept E with E T C from by .
C
C1 . . . . . . Cn
C
E . . .
D
C
C E
C
. . .
D E
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 14 / 35
27. Properties of Refinement Operators
Properties indicate how suitable a refinement operator is for solving
the learning problem:
Incomplete operators may miss solutions
Redundant operators may lead to duplicate concepts in the search tree
Improper operators may produce equivalent concepts (which cover the
same examples)
For infinite operators it may not be possible to compute all refinements
of a given concept
Key question: Which properties can be combined?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 15 / 35
28. Theorem: Properties of L Refinement Operators
Theorem
Maximum sets of combinable properties of L refinement operators for
L 2 {ALC,ALCN, SHOIN, SROIQ} are:
1 {weakly complete, complete, finite}
2 {weakly complete, complete, proper}
3 {weakly complete, non-redundant, finite}
4 {weakly complete, non-redundant, proper}
5 {non-redundant, finite, proper}
Concept Learning in Description Logics Using Refinement Operators, Lehmann, Hitzler, Ma-chine
Learning journal, 2010
Foundations of Refinement Operators for Description Logics; Lehmann, Hitzler, ILP confer-ence,
2008
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 16 / 35
29. Definition of
(C) =
n
{?} [ (C) if C =
(C) otherwise
B(C) =
8
:
; if C = ?
{C1 t · · · t Cn | Ci 2 MB (1 i n)} if C =
{A0 | A0 2 sh#(A)} if C = A (A 2 NC )
[{A u D | D 2 B()}
{¬A0 | A0 2 sh(A)} if C = ¬A (A 2 NC )
[{¬A u D | D 2 B()}
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
{8r.E | A = ar(r), E 2 A(D)} if C = 8r.D
[ {8r.D u E | E 2 B()}
[ {8r.? |
D = A 2 NC and sh#(A) = ;}
[ {8s.D | s 2 sh#(r)}
{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u Cn
D 2 B(Ci ), 1 i n} (n 2)
{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t Cn
D 2 B(Ci ), 1 i n} (n 2)
[ {(C1 t · · · t Cn) u D |
D 2 B()}
Base Operator (Excerpt)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
30. Definition of
(C) =
n
{?} [ (C) if C =
(C) otherwise
B(C) =
8
:
; if C = ?
{C1 t · · · t Cn | Ci 2 MB (1 i n)} if C =
{A0 | A0 2 sh#(A)} if C = A (A 2 NC )
[{A u D | D 2 B()}
{¬A0 | A0 2 sh(A)} if C = ¬A (A 2 NC )
[{¬A u D | D 2 B()}
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
{8r.E | A = ar(r), E 2 A(D)} if C = 8r.D
[ {8r.D u E | E 2 B()}
[ {8r.? |
D = A 2 NC and sh#(A) = ;}
[ {8s.D | s 2 sh#(r)}
{C1 u · · · u Ci−1 u D u Ci+1 u · · · u Cn | if C = C1 u · · · u Cn
D 2 B(Ci ), 1 i n} (n 2)
{C1 t · · · t Ci−1 t D t Ci+1 t · · · t Cn | if C = C1 t · · · t Cn
D 2 B(Ci ), 1 i n} (n 2)
[ {(C1 t · · · t Cn) u D |
D 2 B()}
Base Operator (Excerpt)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 17 / 35
31. Definition of
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
Examples:
9takesPartIn.SocialEvent
9takesPartIn.Meeting
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
32. Definition of
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
Examples:
9takesPartIn.SocialEvent
9takesPartIn.Meeting
Student u 9takesPartIn.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
33. Definition of
{9r.E | A = ar(r), E 2 A(D)} if C = 9r.D
[ {9r.D u E | E 2 B()}
[ {9s.D | s 2 sh#(r)}
Examples:
9takesPartIn.SocialEvent
9takesPartIn.Meeting
Student u 9takesPartIn.SocialEvent
9leads.SocialEvent
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 18 / 35
34. Properties of
# is complete
# is infinite, e.g. there are infinitely many refinement steps of the
form:
# C1 t C2 t C3 t . . .
cl
# is proper
# is redundant: 8r1.A1 t 8r2.A1 # 8r1.(A1 u A2) t 8r2.A1
#
#
8r1.A1 t 8r2.(A1 u A2) # 8r1.(A1 u A2) t 8r2.(A1 u A2)
“DL-Learner: Learning Concepts in Description Logics”,
Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 19 / 35
35. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
36. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
37. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
38. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
39. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
40. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
41. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
42. Learning using Refinement Operators
0,457 [01]
too weak
Car
0,7345789 [012345]
Person
0,789 [45]
Person u 9attends.
0,97 [4]
Person u 9attends.Talk
. . .
. . .
. . .
Redundancy
elimination
technique with
polynomial
complexity wrt.
search tree size
Length of children
limited by
expansion value
Infinite applicable
he used by heuristic
(Bias towards short
concepts - Occam’s
Razor)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 20 / 35
43. Scalability
Refinement operator should build coherent concepts
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
44. Scalability
Refinement operator should build coherent concepts
Inference:
Complete sound vs. approximation
Open World Assumption (OWA) vs. Closed World Assumption (CWA)
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
45. Scalability
Refinement operator should build coherent concepts
Inference:
Complete sound vs. approximation
Open World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computation
Pick random example ! perform instance check ! compute
confidence interval (e.g. via Wald Method) wrt. objective function
(e.g. F-measure)
Up to 99% less instance checks in test examples
Low influence on accuracy shown for 380 learning tasks using 7
ontologies (0, 2% ± 0, 4% F-measure difference)
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
46. Scalability
Refinement operator should build coherent concepts
Inference:
Complete sound vs. approximation
Open World Assumption (OWA) vs. Closed World Assumption (CWA)
Stochastic coverage computation
Pick random example ! perform instance check ! compute
confidence interval (e.g. via Wald Method) wrt. objective function
(e.g. F-measure)
Up to 99% less instance checks in test examples
Low influence on accuracy shown for 380 learning tasks using 7
ontologies (0, 2% ± 0, 4% F-measure difference)
Fragment extraction for application on large knowledge bases
Class Expression Learning for Ontology Engineering; Jens Lehmann, Sören Auer, Lorenz
Bühmann, Sebastian Tramp; Journal of Web Semantics (JWS), 2011
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 21 / 35
47. Carcinogenesis
Goal: predict whether substance causes cancer
Why:
Each year 1000 new substances developed
Substances can often be only be validated using time consuming and
expensive experiments with mice ! prioritise those with high risk
Background knowledge:
Database of the US National Toxicology Program (NTP)
“Obtaining accurate structural alerts for the causes of chemical cancers is
a problem of great scientific and humanitarian value.” (A. Srinivasan, R.D.
King, S.H. Muggleton, M.J.E. Sternberg 1997)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 22 / 35
48. Knowledge Base Enrichment
Pattern Based Knowledge Base Enrichment; Lorenz Bühmann, Jens Lehmann; International
Semantic Web Conference (ISWC) 2013
Universal OWL Axiom Enrichment for Large Knowledge Bases; Lorenz Bühmann, Jens
Lehmann; Knowledge Engineering and Knowledge Management (EKAW) 2012
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 23 / 35
49. Protégé Plugin
Support for ontology creation and maintenance
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 24 / 35
50. Ontology Debugging: ORE
ORE - A Tool for Repairing and Enriching Knowledge Bases; Lehmann, Bühmann; Interna-tional
Semantic Web Conference (ISWC) 2010
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 25 / 35
51. Data Quality Measurement: RDFUnit
Test-driven Evaluation of Linked Data Quality; World Wide Web Conference (WWW),
ACM, 2014; Dimitris Kontokostas, Patrick Westphal, Sören Auer, Sebastian Hellmann, Jens
Lehmann, Roland Cornelissen, Amrapali J. Zaveri
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 26 / 35
52. Robot Scientists Adam Eve
Abduction to form hypothesis and 1 000 experiments per day
12 new scientific discoveries regarding functions of genes in yeast
King, Ross D et al. The automation of science. Science 324 (2009): 85-89.
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 27 / 35
53. Link Discovery - Motivation
Links are backbone of traditional WWW and Data Web
Links are central for data integration, deduplication, cross-ontology
question answering, reasoning, federated queries . . .
Central problem for many large IT companies
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
54. Link Discovery - Motivation
Links are backbone of traditional WWW and Data Web
Links are central for data integration, deduplication, cross-ontology
question answering, reasoning, federated queries . . .
Central problem for many large IT companies
Automated tools (LIMES, SILK) can create a high number of links
between RDF resources by using heuristics
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 28 / 35
55. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
56. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
57. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
58. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
= levenshtein(S.rdfs:label,T.dc:title)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
59. Link Discovery - Definition
Definition (Link Discovery)
Given sets S and T of resources and relation R (often owl:sameAs)
Common approach: Find M = {(s, t) 2 S × T : (s, t) }
S: DBpedia
rdfs:label: African Elephant
T: BBC Wildlife
dc:title: African Bush Elephant
dbpedia:AfricanElephant owl:sameAs bbc:hfzw82929 ?
= levenshtein(S.rdfs:label,T.dc:title)
(dbpedia:AfricanElephant, bbc:hfzw82929) = 5
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 29 / 35
60. Example: Link Specification
f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5)
t
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 30 / 35
61. Link Specification Syntax and Semantics
LS [[LS]]
f (m, ,M) {(s, t, r)|(s, t, r) 2 M ^ (m(s, t) )}
LS1 u LS2 {(s, t, r) | (s, t, r1) 2 [[L1]] ^ (s, t, r2) 2 [[L2]] ^ r = min(r1, r2)}
LS1 t LS2
8
:
(s, t, r) |
8 :
r = r1 if 9(s, t, r1) 2 [[L1]] ^ ¬(9r2 : (s, t, r2) 2 [[L2]]),
r = r2 if 9(s, t, r2) 2 [[L2]] ^ ¬(9r1 : (s, t, r1) 2 [[L1]]),
r = max(r1, r2) if (s, t, r1) 2 [[L1]] ^ (s, t, r2) 2 [[L2]].
Syntax and semantics allow to define an ordering similar to
subsumption (more specific specs generate less links)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 31 / 35
62. Link Specification Refinement Operator
#(LS) =
8
:
{f (m1, 1, ) u · · · u f (mn, 1, ) if LS = ?
| mi 2 SM, 1 i n, n 2|SM|}
f (m, dt(),M) [ LS t f (m0, 1,M) if LS = f (m, ,M) (atomic)
(m 2 SM,m6= m0)
LS1 u · · · u LSi−1 u LS0 u LSi+1 u · · · u LSn if LS = LS1 u · · · u LSn(n 2)
with LS0 2 #(LSi )
LS1 t · · · t LSi−1 t LS0 t LSi+1 t · · · t LSn if LS = LS1 t · · · t LSn(n 2)
with LS0 2 #(LSi ) [ LS t f (m, 1,M)
(m 2 SM,m not used in LS)
Upward refinement operator
Postitive: Weakly complete, finite
Negative: Not complete, redundant, not proper
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 32 / 35
63. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
64. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
65. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
f (trigrams(:name, :label), 1.0) f (edit(:socId, :socId), 0.5)
t
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
66. Refinement Chain Example
f (edit(:socId, :socId), 1.0)
f (edit(:socId, :socId), 0.5)
f (trigrams(:name, :label), 1.0) f (edit(:socId, :socId), 0.5)
t
f (trigrams(:name, :label), 0.5) f (edit(:socId, :socId), 0.5)
t
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 33 / 35
67. Projects: DL-Learner and LIMES
DL-Learner
Open-Source-Project: http://dl-learner.org
Extensible Platform for concept learning algorithms
Supports all RDF/OWL serialisations and major reasoners
Several thousand downloads
LIMES (http://aksw.org/Projects/LIMES.html)
Highly scalable engine (fastest RDF link discovery tool)
Several machine learning approaches integrated (including the one
presented)
“DL-Learner: Learning Concepts in Description Logics”,
Jens Lehmann, Journal of Machine Learning Research (JMLR), 2009
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 34 / 35
68. Summary Conclusions
Many interesting applications of structured machine learning (therapy
response prediction, disease prediction, protein folding, data quality
measurement, ontology debugging)
Still few machine learning tools for working with RDF/OWL although
more and more data available
Refinement operators allow to apply supervised machine learning on
complex background knowledge
Can be applied to other languages like link specifications
Jens Lehmann (AKSW, Uni Leipzig) Analysing and Linking RDF Data September 16, 2014 35 / 35