SlideShare ist ein Scribd-Unternehmen logo
1 von 64
Downloaden Sie, um offline zu lesen
Motivation Approach Evaluation Conclusion and Future Work
DEER
RDF Data Extraction and Enrichment Framework
Mohamed Ahmed Sherif
September 15, 2015
Mohamed Ahmed Sherif ā€” DEER 1/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif ā€” DEER 2/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif ā€” DEER 3/31
Motivation Approach Evaluation Conclusion and Future Work
RDF Extraction & Enrichment
Need for enriched datasets
Tourism
Question Answering
Enhanced Reality
...
RDF extraction and enrichment
Triples to be added to the original
KB and/or
Triples to be deleted from the
original KB
Mohamed Ahmed Sherif ā€” DEER 4/31
Motivation Approach Evaluation Conclusion and Future Work
Example
GeoSuPhat
mo:MusicArtist
... artist name of George (Pete) Peterson. Born and raised
near the shores of Lake Michigan and its cold windy winters
he was Inļ¬‚uenced by bands like Pink Floyd, Kraftwerk, Todd
Rundgren, ELP, Hawkwind and his love of computers and
synthesis. After serving in the military and a few years
going to college in Colorado where he studied electronics
and computer programming. George moved to Europe and
settled down in Germany where he had met his wife
Christine. ...
jamendo:336993
foaf:name
a
mo:biography
Mohamed Ahmed Sherif ā€” DEER 5/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Enrichment Functions
Idea
Generate enrichment data based on existing data
Input/output are RDF datasets
Current Enrichment Functions
Dereferencing
Linking
NLP
Conformation
Filter
Mohamed Ahmed Sherif ā€” DEER 6/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Enrichment Operators
Idea
Create complex workļ¬‚ows by connecting modules
Input/output are RDF datasets
Current Enrichment Operators
Clone
Merge
Mohamed Ahmed Sherif ā€” DEER 7/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Speciļ¬cation Paradigm
d1
Dereferencing
d2
cloned3
NLP
d4
Filter
d5 d6Merge
d7
AuthorityConform
d8
Mohamed Ahmed Sherif ā€” DEER 8/31
Motivation Approach Evaluation Conclusion and Future Work
Manual DEER Speciļ¬cation
1 @ p r e f i x : <http :// geoknow . org / s p e c s o n t o l o g y/> .
2 @ p r e f i x r d f s : <http ://www. w3 . org /2000/01/ rdfāˆ’schema#> .
3 @ p r e f i x geo : <http ://www. w3 . org /2003/01/ geo/ wgs84 pos#> .
4 : d1 a : Dataset ;
5 : hasUri <http :// dbpedia . org / r e s o u r c e / B e r l i n> ;
6 : fromEndPoint <http :// dbpedia . org / sparql> .
7 : d2 a : Dataset .
8 : d3 a : Dataset .
9 : d4 a : Dataset .
10 : d5 a : Dataset .
11 : d6 a : Dataset .
12 : d7 a : Dataset .
13 : d8 a : Dataset ;
14 : o u t p u t F i l e ā€ D e e r B e r l i n . t t l ā€ ;
15 : outputFormat ā€ T u r t l e ā€ .
16 : d e r e f a : Module , : DereferencingModule ;
17 r d f s : l a b e l ā€ D e r e f e r e n c i n g moduleā€ ;
18 : hasInput : d1 ;
19 : hasOutput : d2 ;
20 : hasParameter : derefParam1 .
21 : derefParam1 a : ModuleParameter , : DereferencingModuleParameter ;
22 : hasKey ā€ i n p u t P r o p e r t y 1 ā€ ;
23 : hasValue geo : l a t .
24 : c l o n e a : Operator , : CloneOperator ;
25 r d f s : l a b e l ā€ Clone o p e r a t o r ā€ ;
26 : hasInput : d2 ;
27 : hasOutput : d3 , : d4 .
28 : nlp a : Module , : NLPModule ;
29 r d f s : l a b e l ā€NLP moduleā€ ;
30 : hasInput : d3 ;
31 : hasOutput : d5 ;
32 : hasParameter : nlpPram1 , : nlpPram2 .Mohamed Ahmed Sherif ā€” DEER 9/31
Motivation Approach Evaluation Conclusion and Future Work
Manual KB Enrichment
Manual customized enrichment pipelines
āŠ• Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif ā€” DEER 10/31
Motivation Approach Evaluation Conclusion and Future Work
Automatic KB Enrichment
Enrichment pipeline M : K ā†’ K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m āˆˆ M
M =
Ļ† if K = K ,
(m1, . . . , mn), where mi āˆˆ M, 1 ā‰¤ i ā‰¤ n otherwise.
Research questions
1 How to create self-conļ¬guring atomic enrichment
functions m āˆˆ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif ā€” DEER 11/31
Motivation Approach Evaluation Conclusion and Future Work
Automatic KB Enrichment
Enrichment pipeline M : K ā†’ K that maps KB K to an
enriched KB K with K = M(K).
M is an ordered list of atomic enrichment functions
m āˆˆ M
M =
Ļ† if K = K ,
(m1, . . . , mn), where mi āˆˆ M, 1 ā‰¤ i ā‰¤ n otherwise.
Research questions
1 How to create self-conļ¬guring atomic enrichment
functions m āˆˆ M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif ā€” DEER 11/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif ā€” DEER 12/31
Motivation Approach Evaluation Conclusion and Future Work
Running Example
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif ā€” DEER 13/31
Motivation Approach Evaluation Conclusion and Future Work
Running Example
Dataset DrugBank
Goal Gather information about companies related to drugs for
a market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 13/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-speciļ¬ed set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Drug
a
a
a
a
Mohamed Ahmed Sherif ā€” DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-speciļ¬ed set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-speciļ¬ed set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
Mohamed Ahmed Sherif ā€” DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
I. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-speciļ¬ed set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif ā€” DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif ā€” DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif ā€” DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDs
that are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif ā€” DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
I. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
a
a
a
a
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
II. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
II. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-speciļ¬ed
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-speciļ¬ed
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment Functions
III. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-speciļ¬ed
predicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
Self-Conļ¬guration
III. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.
target datasets with the same subject and objects
Changes each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif ā€” DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Reļ¬nement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Reļ¬nement Operator
Ļ(M) =
āˆ€māˆˆM
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif ā€” DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Reļ¬nement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Reļ¬nement Operator
Ļ(M) =
āˆ€māˆˆM
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif ā€” DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Reļ¬nement Operator
Input
Set of atomic enrichment functions M
Set of positive examples E
Reļ¬nement Operator
Ļ(M) =
āˆ€māˆˆM
M ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif ā€” DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
Positive Example
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research arm
of Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen
Mohamed Ahmed Sherif ā€” DEER 22/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
(m1, m2) (m1, m3)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = āŠ„
2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„
3 Select most promising node
4 Expand most promising node
āŠ„
(m1) (m2) (m3)
(m1, m2) (m1, m3) (m3, m1) (m3, m2)
(m3, m2, m1)
Mohamed Ahmed Sherif ā€” DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the nodeā€™s children count and level
Node ļ¬tness f (n)
Diļ¬€erence between nodeā€™s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) āˆ’ Ļ‰.c(n)
Ļ‰ controls the tradeoļ¬€ between
Greedy search (Ļ‰ = 0)
Search strategies closer to breadth-ļ¬rst search (Ļ‰ > 0).
Most promising node
The leaf node with the maximum ļ¬tness through the
whole reļ¬nement tree
Mohamed Ahmed Sherif ā€” DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the nodeā€™s children count and level
Node ļ¬tness f (n)
Diļ¬€erence between nodeā€™s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) āˆ’ Ļ‰.c(n)
Ļ‰ controls the tradeoļ¬€ between
Greedy search (Ļ‰ = 0)
Search strategies closer to breadth-ļ¬rst search (Ļ‰ > 0).
Most promising node
The leaf node with the maximum ļ¬tness through the
whole reļ¬nement tree
Mohamed Ahmed Sherif ā€” DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the nodeā€™s children count and level
Node ļ¬tness f (n)
Diļ¬€erence between nodeā€™s enrichment pipeline F-measure
and weighted complexity, f (n) = F(n) āˆ’ Ļ‰.c(n)
Ļ‰ controls the tradeoļ¬€ between
Greedy search (Ļ‰ = 0)
Search strategies closer to breadth-ļ¬rst search (Ļ‰ > 0).
Most promising node
The leaf node with the maximum ļ¬tness through the
whole reļ¬nement tree
Mohamed Ahmed Sherif ā€” DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif ā€” DEER 25/31
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif ā€” DEER 26/31
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia
(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10
Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif ā€” DEER 26/31
Motivation Approach Evaluation Conclusion and Future Work
Conļ¬guration of the Search Strategy
Node ļ¬tness
f (n) = F(n) āˆ’ Ļ‰.c(n)
Ļ‰ controls the tradeoļ¬€ between
Greedy search (Ļ‰ = 0)
Search strategies closer to
breadth ļ¬rst search (Ļ‰ > 0).
Result: Ļ‰ = 0.75 leads to the
best results
Ļ‰ P R F
0 1.0 0.99 0.99
0.25 1.0 0.99 0.99
0.50 1.0 0.99 0.99
0.75 1.0 1.0 1.0
1.0 1.0 0.99 0.99
Mohamed Ahmed Sherif ā€” DEER 27/31
Motivation Approach Evaluation Conclusion and Future Work
Eļ¬€ect of Positive Examples
Manual Examples Size of Time Size of Time Learn Iterations
M count M M(KB) learned
M
M (KB) Time count F-score
M1
DBpedia
1 1 0.2 1 1.6 1.3 1 1.0
2 1 0.2 1 1.8 1.3 1 1.0
M2
DBpedia
1 2 23.3 1 0.1 0.2 1 0.99
2 2 15 2 17 0.3 9 0.99
M3
DBpedia
1 3 14.7 3 15.2 6.1 9 0.99
2 3 15 2 15.1 0.1 9 0.99
M4
DBpedia
1 4 0.4 2 0.1 0.7 2 0.99
2 4 0.6 2 0.3 0.9 2 0.99
M5
DBpedia
1 5 22 2 0.1 0.7 2 1.0
2 5 25.5 2 0.2 0.9 2 1.0
M1
DrugBank
1 2 3.5 1 4.1 0.1 10 0.99
2 2 3.6 1 3.4 0.1 10 0.99
M2
DrugBank
1 3 25.2 1 0.1 0.1 10 0.99
2 3 22.8 1 0.1 0.1 10 0.99
M1
Jamendo
1 1 10.9 2 10.6 0.1 2 0.99
2 1 10.4 2 10.4 0.1 1 0.99
Mohamed Ahmed Sherif ā€” DEER 28/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif ā€” DEER 29/31
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Introduced DEER
Presented self-conļ¬guring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a reļ¬nement operator
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
speciļ¬cations by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Implement more enrichment function and operators
Mohamed Ahmed Sherif ā€” DEER 30/31
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Introduced DEER
Presented self-conļ¬guring atomic enrichment functions
Presented an approach for learning enrichment pipelines
based on a reļ¬nement operator
Future Work
Parallelize the algorithm on several CPUs as well as load
balancing
Support directed acyclic graphs as enrichment
speciļ¬cations by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Implement more enrichment function and operators
Mohamed Ahmed Sherif ā€” DEER 30/31
Motivation Approach Evaluation Conclusion and Future Work
Thank You! Questions?
Mohamed Sherif
Augustusplatz 10
D-04109 Leipzig
sherif@informatik.uni-leipzig.de
http://aksw.org/MohamedSherif
http://aksw.org/Projects/DEER
Automating RDF Dataset Transformation and Enrichment by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga
Ngomo, and Jens Lehmann in 12th Extended Semantic Web Conference, Portoroz, Slovenia, 2015
Mohamed Ahmed Sherif ā€” DEER 31/31

Weitere Ƥhnliche Inhalte

Andere mochten auch

Apuntes segundo parcial
Apuntes segundo parcialApuntes segundo parcial
Apuntes segundo parcialluisin777
Ā 
DEER - Automating RDF Dataset Transformation and Enrichment
DEER - Automating RDF Dataset Transformation and EnrichmentDEER - Automating RDF Dataset Transformation and Enrichment
DEER - Automating RDF Dataset Transformation and EnrichmentMohamed Sherif
Ā 
Gerencia de proyectos
Gerencia de proyectosGerencia de proyectos
Gerencia de proyectosGONZALO MENDOZA
Ā 
CĆ”ncer de prĆ³stata
CĆ”ncer de prĆ³stataCĆ”ncer de prĆ³stata
CĆ”ncer de prĆ³statafont Fawn
Ā 
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark ThomasFeeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark ThomasDAIReXNET
Ā 
A guide to dairy calf feeding and management
A guide to dairy calf feeding and managementA guide to dairy calf feeding and management
A guide to dairy calf feeding and managementvincentwambua
Ā 
Power point Presentation on SEVAI - COW PROJECT, .
Power point Presentation on SEVAI - COW PROJECT, .Power point Presentation on SEVAI - COW PROJECT, .
Power point Presentation on SEVAI - COW PROJECT, .sevaingo
Ā 
Natural Parks in the U.S.
Natural Parks in the U.S.Natural Parks in the U.S.
Natural Parks in the U.S.Mallorca Vegas
Ā 
Avoiding Disease in Dairy Calves
Avoiding Disease in Dairy CalvesAvoiding Disease in Dairy Calves
Avoiding Disease in Dairy CalvesDAIReXNET
Ā 

Andere mochten auch (10)

Apuntes segundo parcial
Apuntes segundo parcialApuntes segundo parcial
Apuntes segundo parcial
Ā 
DEER - Automating RDF Dataset Transformation and Enrichment
DEER - Automating RDF Dataset Transformation and EnrichmentDEER - Automating RDF Dataset Transformation and Enrichment
DEER - Automating RDF Dataset Transformation and Enrichment
Ā 
Gerencia de proyectos
Gerencia de proyectosGerencia de proyectos
Gerencia de proyectos
Ā 
CĆ”ncer de prĆ³stata
CĆ”ncer de prĆ³stataCĆ”ncer de prĆ³stata
CĆ”ncer de prĆ³stata
Ā 
Endogamia e heterose
Endogamia e heteroseEndogamia e heterose
Endogamia e heterose
Ā 
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark ThomasFeeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Feeding Systems for Group-Housed Dairy Calves- Dr. Mark Thomas
Ā 
A guide to dairy calf feeding and management
A guide to dairy calf feeding and managementA guide to dairy calf feeding and management
A guide to dairy calf feeding and management
Ā 
Power point Presentation on SEVAI - COW PROJECT, .
Power point Presentation on SEVAI - COW PROJECT, .Power point Presentation on SEVAI - COW PROJECT, .
Power point Presentation on SEVAI - COW PROJECT, .
Ā 
Natural Parks in the U.S.
Natural Parks in the U.S.Natural Parks in the U.S.
Natural Parks in the U.S.
Ā 
Avoiding Disease in Dairy Calves
Avoiding Disease in Dairy CalvesAvoiding Disease in Dairy Calves
Avoiding Disease in Dairy Calves
Ā 

Ƅhnlich wie DEER - RDF Data Extraction and Enrichment Framework

Unit 5 research structure template
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure templateRob Fogerty
Ā 
Unit 5 research structure template
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure templateRob Fogerty
Ā 
Unit 5 research structure template referral 1/2
Unit 5 research structure template referral 1/2 Unit 5 research structure template referral 1/2
Unit 5 research structure template referral 1/2 Rob Fogerty
Ā 
Unit 5 research structure template referral
Unit 5 research structure template referralUnit 5 research structure template referral
Unit 5 research structure template referralRob Fogerty
Ā 
The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014multimediaeval
Ā 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationMuhammad Saleem
Ā 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedSyed Muhammad Ali Hasnain
Ā 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedMuhammad Saleem
Ā 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsSri Ambati
Ā 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Maria Eskevich
Ā 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata ProvenanceKai Eckert
Ā 
Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Qingxia Liu
Ā 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearcheXascale Infolab
Ā 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
Ā 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialMuhammad Saleem
Ā 
Extending faceted search to the general web
Extending faceted search to the general webExtending faceted search to the general web
Extending faceted search to the general webWei-Yuan Chang
Ā 
Logistics Management 214 - Effective searching
Logistics Management 214 - Effective searchingLogistics Management 214 - Effective searching
Logistics Management 214 - Effective searchingpvhead123
Ā 
Me4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Me4DCAP V0.1: a Method for the Development of Dublin Core Application ProfilesMe4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Me4DCAP V0.1: a Method for the Development of Dublin Core Application ProfilesMariana Curado Malta
Ā 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
Ā 

Ƅhnlich wie DEER - RDF Data Extraction and Enrichment Framework (20)

Unit 5 research structure template
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure template
Ā 
Unit 5 research structure template
Unit 5 research structure templateUnit 5 research structure template
Unit 5 research structure template
Ā 
Unit 5 research structure template referral 1/2
Unit 5 research structure template referral 1/2 Unit 5 research structure template referral 1/2
Unit 5 research structure template referral 1/2
Ā 
Unit 5 research structure template referral
Unit 5 research structure template referralUnit 5 research structure template referral
Unit 5 research structure template referral
Ā 
The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014The Search and Hyperlinking Task at MediaEval 2014
The Search and Hyperlinking Task at MediaEval 2014
Ā 
Efficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federationEfficient source selection for sparql endpoint federation
Efficient source selection for sparql endpoint federation
Ā 
Federated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFedFederated Query Formulation and Processing through BioFed
Federated Query Formulation and Processing through BioFed
Ā 
Federated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFedFederated Query Formulation and Processing Through BioFed
Federated Query Formulation and Processing Through BioFed
Ā 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
Ā 
Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014Search and Hyperlinking Overview @MediaEval2014
Search and Hyperlinking Overview @MediaEval2014
Ā 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
Ā 
Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)Entity Summarization with User Feedback (ESWC 2020)
Entity Summarization with User Feedback (ESWC 2020)
Ā 
Entities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web SearchEntities, Graphs, and Crowdsourcing for better Web Search
Entities, Graphs, and Crowdsourcing for better Web Search
Ā 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
Ā 
Federated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 TutorialFederated SPARQL Query Processing ISWC2015 Tutorial
Federated SPARQL Query Processing ISWC2015 Tutorial
Ā 
Extending faceted search to the general web
Extending faceted search to the general webExtending faceted search to the general web
Extending faceted search to the general web
Ā 
Logistics Management 214 - Effective searching
Logistics Management 214 - Effective searchingLogistics Management 214 - Effective searching
Logistics Management 214 - Effective searching
Ā 
Computer Scientists Retrieval - PDF Report
Computer Scientists Retrieval - PDF ReportComputer Scientists Retrieval - PDF Report
Computer Scientists Retrieval - PDF Report
Ā 
Me4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Me4DCAP V0.1: a Method for the Development of Dublin Core Application ProfilesMe4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Me4DCAP V0.1: a Method for the Development of Dublin Core Application Profiles
Ā 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Ā 

KĆ¼rzlich hochgeladen

Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
Ā 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
Ā 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
Ā 
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
Ā 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
Ā 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
Ā 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
Ā 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
Ā 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
Ā 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
Ā 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringJuanCarlosMorales19600
Ā 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm Systemirfanmechengr
Ā 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
Ā 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
Ā 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
Ā 
Study on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube ExchangerAnamika Sarkar
Ā 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
Ā 

KĆ¼rzlich hochgeladen (20)

Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
Ā 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
Ā 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
Ā 
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTACĀ® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
Ā 
Design and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdfDesign and analysis of solar grass cutter.pdf
Design and analysis of solar grass cutter.pdf
Ā 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
Ā 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
Ā 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Ā 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
Ā 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
Ā 
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Serviceyoung call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
young call girls in Rajiv ChowkšŸ” 9953056974 šŸ” Delhi escort Service
Ā 
young call girls in Green ParkšŸ” 9953056974 šŸ” escort Service
young call girls in Green ParkšŸ” 9953056974 šŸ” escort Serviceyoung call girls in Green ParkšŸ” 9953056974 šŸ” escort Service
young call girls in Green ParkšŸ” 9953056974 šŸ” escort Service
Ā 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
Ā 
Piping Basic stress analysis by engineering
Piping Basic stress analysis by engineeringPiping Basic stress analysis by engineering
Piping Basic stress analysis by engineering
Ā 
Class 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm SystemClass 1 | NFPA 72 | Overview Fire Alarm System
Class 1 | NFPA 72 | Overview Fire Alarm System
Ā 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
Ā 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
Ā 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
Ā 
Study on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned ļ»æTube Exchanger
Ā 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
Ā 

DEER - RDF Data Extraction and Enrichment Framework

  • 1. Motivation Approach Evaluation Conclusion and Future Work DEER RDF Data Extraction and Enrichment Framework Mohamed Ahmed Sherif September 15, 2015 Mohamed Ahmed Sherif ā€” DEER 1/31
  • 2. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif ā€” DEER 2/31
  • 3. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif ā€” DEER 3/31
  • 4. Motivation Approach Evaluation Conclusion and Future Work RDF Extraction & Enrichment Need for enriched datasets Tourism Question Answering Enhanced Reality ... RDF extraction and enrichment Triples to be added to the original KB and/or Triples to be deleted from the original KB Mohamed Ahmed Sherif ā€” DEER 4/31
  • 5. Motivation Approach Evaluation Conclusion and Future Work Example GeoSuPhat mo:MusicArtist ... artist name of George (Pete) Peterson. Born and raised near the shores of Lake Michigan and its cold windy winters he was Inļ¬‚uenced by bands like Pink Floyd, Kraftwerk, Todd Rundgren, ELP, Hawkwind and his love of computers and synthesis. After serving in the military and a few years going to college in Colorado where he studied electronics and computer programming. George moved to Europe and settled down in Germany where he had met his wife Christine. ... jamendo:336993 foaf:name a mo:biography Mohamed Ahmed Sherif ā€” DEER 5/31
  • 6. Motivation Approach Evaluation Conclusion and Future Work DEER Enrichment Functions Idea Generate enrichment data based on existing data Input/output are RDF datasets Current Enrichment Functions Dereferencing Linking NLP Conformation Filter Mohamed Ahmed Sherif ā€” DEER 6/31
  • 7. Motivation Approach Evaluation Conclusion and Future Work DEER Enrichment Operators Idea Create complex workļ¬‚ows by connecting modules Input/output are RDF datasets Current Enrichment Operators Clone Merge Mohamed Ahmed Sherif ā€” DEER 7/31
  • 8. Motivation Approach Evaluation Conclusion and Future Work DEER Speciļ¬cation Paradigm d1 Dereferencing d2 cloned3 NLP d4 Filter d5 d6Merge d7 AuthorityConform d8 Mohamed Ahmed Sherif ā€” DEER 8/31
  • 9. Motivation Approach Evaluation Conclusion and Future Work Manual DEER Speciļ¬cation 1 @ p r e f i x : <http :// geoknow . org / s p e c s o n t o l o g y/> . 2 @ p r e f i x r d f s : <http ://www. w3 . org /2000/01/ rdfāˆ’schema#> . 3 @ p r e f i x geo : <http ://www. w3 . org /2003/01/ geo/ wgs84 pos#> . 4 : d1 a : Dataset ; 5 : hasUri <http :// dbpedia . org / r e s o u r c e / B e r l i n> ; 6 : fromEndPoint <http :// dbpedia . org / sparql> . 7 : d2 a : Dataset . 8 : d3 a : Dataset . 9 : d4 a : Dataset . 10 : d5 a : Dataset . 11 : d6 a : Dataset . 12 : d7 a : Dataset . 13 : d8 a : Dataset ; 14 : o u t p u t F i l e ā€ D e e r B e r l i n . t t l ā€ ; 15 : outputFormat ā€ T u r t l e ā€ . 16 : d e r e f a : Module , : DereferencingModule ; 17 r d f s : l a b e l ā€ D e r e f e r e n c i n g moduleā€ ; 18 : hasInput : d1 ; 19 : hasOutput : d2 ; 20 : hasParameter : derefParam1 . 21 : derefParam1 a : ModuleParameter , : DereferencingModuleParameter ; 22 : hasKey ā€ i n p u t P r o p e r t y 1 ā€ ; 23 : hasValue geo : l a t . 24 : c l o n e a : Operator , : CloneOperator ; 25 r d f s : l a b e l ā€ Clone o p e r a t o r ā€ ; 26 : hasInput : d2 ; 27 : hasOutput : d3 , : d4 . 28 : nlp a : Module , : NLPModule ; 29 r d f s : l a b e l ā€NLP moduleā€ ; 30 : hasInput : d3 ; 31 : hasOutput : d5 ; 32 : hasParameter : nlpPram1 , : nlpPram2 .Mohamed Ahmed Sherif ā€” DEER 9/31
  • 10. Motivation Approach Evaluation Conclusion and Future Work Manual KB Enrichment Manual customized enrichment pipelines āŠ• Leads to the expected results Time consuming Cannot be ported easily to other datasets Mohamed Ahmed Sherif ā€” DEER 10/31
  • 11. Motivation Approach Evaluation Conclusion and Future Work Automatic KB Enrichment Enrichment pipeline M : K ā†’ K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m āˆˆ M M = Ļ† if K = K , (m1, . . . , mn), where mi āˆˆ M, 1 ā‰¤ i ā‰¤ n otherwise. Research questions 1 How to create self-conļ¬guring atomic enrichment functions m āˆˆ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif ā€” DEER 11/31
  • 12. Motivation Approach Evaluation Conclusion and Future Work Automatic KB Enrichment Enrichment pipeline M : K ā†’ K that maps KB K to an enriched KB K with K = M(K). M is an ordered list of atomic enrichment functions m āˆˆ M M = Ļ† if K = K , (m1, . . . , mn), where mi āˆˆ M, 1 ā‰¤ i ā‰¤ n otherwise. Research questions 1 How to create self-conļ¬guring atomic enrichment functions m āˆˆ M? 2 How to automatically generate an enrichment pipeline M? Mohamed Ahmed Sherif ā€” DEER 11/31
  • 13. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif ā€” DEER 12/31
  • 14. Motivation Approach Evaluation Conclusion and Future Work Running Example Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif ā€” DEER 13/31
  • 15. Motivation Approach Evaluation Conclusion and Future Work Running Example Dataset DrugBank Goal Gather information about companies related to drugs for a market study :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 13/31
  • 16. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-speciļ¬ed set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine :Drug a a a a Mohamed Ahmed Sherif ā€” DEER 14/31
  • 17. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-speciļ¬ed set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 14/31
  • 18. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-speciļ¬ed set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment Mohamed Ahmed Sherif ā€” DEER 14/31
  • 19. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions I. Dereferencing atomic enrichment function Datasets are linked (e.g., using owl:sameAs) Deferences pre-speciļ¬ed set of predicates Adds found predicates to source the dataset :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:commentrdfs:comment rdfs:comment Mohamed Ahmed Sherif ā€” DEER 14/31
  • 20. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif ā€” DEER 15/31
  • 21. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif ā€” DEER 15/31
  • 22. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif ā€” DEER 15/31
  • 23. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Finds the set of predicates Dp from the enriched CBDs that are missing from source CBDs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Dp = {:relatedCompany, rdfs:comment} Mohamed Ahmed Sherif ā€” DEER 15/31
  • 24. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 16/31
  • 25. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 16/31
  • 26. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 16/31
  • 27. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration I. Dereferencing Enrichment Functions Dereferences Dp = {:relatedCompany, rdfs:comment} CBD of Ibuprofen :Aspirin :Paracetamol :Ibuprofen :Quinine db:Ibuprofen db:Aspirin Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug a a a a owl:sameAs owl:sameAs rdfs:comment Finds only rdfs:comment, adds it to the source dataset Dereferencing enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 16/31
  • 28. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 17/31
  • 29. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 17/31
  • 30. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 17/31
  • 31. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions II. NLP atomic enrichment function Datatype objects contain unstructured information Uses Named Entity Recognition to extract implicit data Adds extracted entities to the source datasets :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 17/31
  • 32. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 18/31
  • 33. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drugaowl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 18/31
  • 34. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration II. NLP Enrichment Function Extracts all possible named entity types Adds extracted entities to the source dataset NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 18/31
  • 35. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-speciļ¬ed predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 19/31
  • 36. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-speciļ¬ed predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 19/31
  • 37. Motivation Approach Evaluation Conclusion and Future Work Atomic Enrichment Functions III. Predicate conformation atomic enrichment function Enriched datasets may contain diverse ontologies Predicate conformation maps a set of a pre-speciļ¬ed predicates to a target ontology :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 19/31
  • 38. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 20/31
  • 39. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 20/31
  • 40. Motivation Approach Evaluation Conclusion and Future Work Self-Conļ¬guration III. Predicate conformation Enrichment Function Finds list of predicates Ps and Pt from the source resp. target datasets with the same subject and objects Changes each Ps with its respective Pt NLP enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a fox:relatedTo:relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen (positive example target) :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Mohamed Ahmed Sherif ā€” DEER 20/31
  • 41. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Reļ¬nement Operator Input Set of atomic enrichment functions M Set of positive examples E Reļ¬nement Operator Ļ(M) = āˆ€māˆˆM M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif ā€” DEER 21/31
  • 42. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Reļ¬nement Operator Input Set of atomic enrichment functions M Set of positive examples E Reļ¬nement Operator Ļ(M) = āˆ€māˆˆM M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif ā€” DEER 21/31
  • 43. Motivation Approach Evaluation Conclusion and Future Work KB Enrichment Reļ¬nement Operator Input Set of atomic enrichment functions M Set of positive examples E Reļ¬nement Operator Ļ(M) = āˆ€māˆˆM M ++ m ( ++ is the list append operator) Output Enrichment pipeline M Mohamed Ahmed Sherif ā€” DEER 21/31
  • 44. Motivation Approach Evaluation Conclusion and Future Work Positive Example :Ibuprofendb:Ibuprofen :Drugaowl:sameAs Non-enriched CBD of Ibuprofen :Ibuprofendb:Ibuprofen Ibuprofen was extracted by the research arm of Boots company during the 1960s ... :Drug :BootsCompany a :relatedCompany owl:sameAs rdfs:comment Enriched CBD of Ibuprofen Mohamed Ahmed Sherif ā€” DEER 22/31
  • 45. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ Mohamed Ahmed Sherif ā€” DEER 23/31
  • 46. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 47. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 48. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 49. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) (m1, m2) (m1, m3) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 50. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 51. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 52. Motivation Approach Evaluation Conclusion and Future Work Learning Algorithm 1 Start by empty enrichment pipeline M = āŠ„ 2 Self-conļ¬gure all mi āˆˆ M, add as child to āŠ„ 3 Select most promising node 4 Expand most promising node āŠ„ (m1) (m2) (m3) (m1, m2) (m1, m3) (m3, m1) (m3, m2) (m3, m2, m1) Mohamed Ahmed Sherif ā€” DEER 23/31
  • 53. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the nodeā€™s children count and level Node ļ¬tness f (n) Diļ¬€erence between nodeā€™s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) āˆ’ Ļ‰.c(n) Ļ‰ controls the tradeoļ¬€ between Greedy search (Ļ‰ = 0) Search strategies closer to breadth-ļ¬rst search (Ļ‰ > 0). Most promising node The leaf node with the maximum ļ¬tness through the whole reļ¬nement tree Mohamed Ahmed Sherif ā€” DEER 24/31
  • 54. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the nodeā€™s children count and level Node ļ¬tness f (n) Diļ¬€erence between nodeā€™s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) āˆ’ Ļ‰.c(n) Ļ‰ controls the tradeoļ¬€ between Greedy search (Ļ‰ = 0) Search strategies closer to breadth-ļ¬rst search (Ļ‰ > 0). Most promising node The leaf node with the maximum ļ¬tness through the whole reļ¬nement tree Mohamed Ahmed Sherif ā€” DEER 24/31
  • 55. Motivation Approach Evaluation Conclusion and Future Work Most Promising Node Selection Node complexity c(n) Linear combination of the nodeā€™s children count and level Node ļ¬tness f (n) Diļ¬€erence between nodeā€™s enrichment pipeline F-measure and weighted complexity, f (n) = F(n) āˆ’ Ļ‰.c(n) Ļ‰ controls the tradeoļ¬€ between Greedy search (Ļ‰ = 0) Search strategies closer to breadth-ļ¬rst search (Ļ‰ > 0). Most promising node The leaf node with the maximum ļ¬tness through the whole reļ¬nement tree Mohamed Ahmed Sherif ā€” DEER 24/31
  • 56. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif ā€” DEER 25/31
  • 57. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif ā€” DEER 26/31
  • 58. Motivation Approach Evaluation Conclusion and Future Work Experimental Setup Datasets 1 manual experimental enrichment pipelines for Jamendo 2 manual experimental enrichment pipelines for DrugBank 5 manual experimental enrichment pipelines for DBpedia (AdministrativeRegion) Learning Algorithm 6 atomic enrichment functions Termination criterion: Maximum number of iterations of 10 Optimal enrichment pipeline found (F-score = 1) Mohamed Ahmed Sherif ā€” DEER 26/31
  • 59. Motivation Approach Evaluation Conclusion and Future Work Conļ¬guration of the Search Strategy Node ļ¬tness f (n) = F(n) āˆ’ Ļ‰.c(n) Ļ‰ controls the tradeoļ¬€ between Greedy search (Ļ‰ = 0) Search strategies closer to breadth ļ¬rst search (Ļ‰ > 0). Result: Ļ‰ = 0.75 leads to the best results Ļ‰ P R F 0 1.0 0.99 0.99 0.25 1.0 0.99 0.99 0.50 1.0 0.99 0.99 0.75 1.0 1.0 1.0 1.0 1.0 0.99 0.99 Mohamed Ahmed Sherif ā€” DEER 27/31
  • 60. Motivation Approach Evaluation Conclusion and Future Work Eļ¬€ect of Positive Examples Manual Examples Size of Time Size of Time Learn Iterations M count M M(KB) learned M M (KB) Time count F-score M1 DBpedia 1 1 0.2 1 1.6 1.3 1 1.0 2 1 0.2 1 1.8 1.3 1 1.0 M2 DBpedia 1 2 23.3 1 0.1 0.2 1 0.99 2 2 15 2 17 0.3 9 0.99 M3 DBpedia 1 3 14.7 3 15.2 6.1 9 0.99 2 3 15 2 15.1 0.1 9 0.99 M4 DBpedia 1 4 0.4 2 0.1 0.7 2 0.99 2 4 0.6 2 0.3 0.9 2 0.99 M5 DBpedia 1 5 22 2 0.1 0.7 2 1.0 2 5 25.5 2 0.2 0.9 2 1.0 M1 DrugBank 1 2 3.5 1 4.1 0.1 10 0.99 2 2 3.6 1 3.4 0.1 10 0.99 M2 DrugBank 1 3 25.2 1 0.1 0.1 10 0.99 2 3 22.8 1 0.1 0.1 10 0.99 M1 Jamendo 1 1 10.9 2 10.6 0.1 2 0.99 2 1 10.4 2 10.4 0.1 1 0.99 Mohamed Ahmed Sherif ā€” DEER 28/31
  • 61. Motivation Approach Evaluation Conclusion and Future Work Outline 1 Motivation 2 Approach 3 Evaluation 4 Conclusion and Future Work Mohamed Ahmed Sherif ā€” DEER 29/31
  • 62. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Introduced DEER Presented self-conļ¬guring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a reļ¬nement operator Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment speciļ¬cations by allowing to split and merge datasets Pro-active enrichment strategies and active learning Implement more enrichment function and operators Mohamed Ahmed Sherif ā€” DEER 30/31
  • 63. Motivation Approach Evaluation Conclusion and Future Work Conclusion and Future Work Conclusion Introduced DEER Presented self-conļ¬guring atomic enrichment functions Presented an approach for learning enrichment pipelines based on a reļ¬nement operator Future Work Parallelize the algorithm on several CPUs as well as load balancing Support directed acyclic graphs as enrichment speciļ¬cations by allowing to split and merge datasets Pro-active enrichment strategies and active learning Implement more enrichment function and operators Mohamed Ahmed Sherif ā€” DEER 30/31
  • 64. Motivation Approach Evaluation Conclusion and Future Work Thank You! Questions? Mohamed Sherif Augustusplatz 10 D-04109 Leipzig sherif@informatik.uni-leipzig.de http://aksw.org/MohamedSherif http://aksw.org/Projects/DEER Automating RDF Dataset Transformation and Enrichment by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga Ngomo, and Jens Lehmann in 12th Extended Semantic Web Conference, Portoroz, Slovenia, 2015 Mohamed Ahmed Sherif ā€” DEER 31/31