SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Detailed
Provenance Capture
of Data Processing
Ben De Meester, Anastasia Dimou,
Ruben Verborgh, and Erik Mannens
Ghent University – imec – IDLab, Belgium
*
Outline
Linked Data Generation
Problem
Solution
Results
Outline
Linked Data Generation
Problem
Solution
Results
Linked Data comes from
data
Unstructured data
Semi-structured data
Structured data
…
Linked Data comes from
processed data
Unstructured data Parse
Semi-structured data Extract
Structured data Add schema annotations
… …
Going from data to linked data
Data
Schema transformations
Data transformations
Linked Data
Linked Data generation =
schema + data transformations
dbr:
Barney_G
umble
Linked Data generation =
schema + data transformations
dbr:
Barney_G
umble
dbo:birthDate
dbp:voiceactor
dbp:gender
dbp:name
…
Linked Data generation =
schema + data transformations
dbr:
Hawaii
dbr:
Barney_G
umble
dbo:birthDate
dbp:voiceactor
dbp:gender
dbp:name
“1954-4-20"
dbr:
Dan_Caste
llaneta
“Male”
“Barney Gumble”@en
… …
Problem: there’s always a drunk Barney
Data
Schema processing
Data processing
Linked Data
Data
Data
Processing
Knowing where the data
comes from is as important as
the data itself
Oh Yeah?
Outline
Linked Data Generation
Problem
Solution
Results
dbr:
Hawaii
dbr:
Barney_G
umble
dbo:birthDate
dbp:voiceactor
dbp:gender
dbp:name
“1954-4-20"
dbr:
Dan_Caste
llaneta
“Male”
“Barney Gumble”@en
… …
Linked Data re-generation?
Provenance of those transformations
How it’s done for data processing
Provenance log:
Person A used Software B, on System C
Problem: how to reproduce?
Provenance log:
Person A used Software B, on System C
Software B offline?
System C not booting?
What do we need?
Fine-grained provenance
for schema transformations
for data transformations
Independent of the implementation
How can we tell where the
data comes from, without
depending on the system?
Outline
Linked Data Generation
Problem: Data Processing Provenance
Solution
Results
What do we want?
Term-level,
implementation-independent provenance
for schema transformations
for data transformations
What do we want?
Term-level,
implementation-independent provenance
for schema transformations
for data transformations
Generated automatically
What do we want?
Term-level,
implementation-independent provenance
for schema transformations
for data transformations
Generated automatically
Declarative generation process
Steps
Align schema and data transformations
in a declarative document
Generate provenance based on
declarative schema transformations
Generate provenance based on
declarative data transformations
Declarative generation process?
Align schema and data transformations in a declarative
document
Declarative generation process? Solved!
Align schema and data transformations in a declarative
document
RML + FnO
Declarative generation process? Solved!
Align schema and data transformations in a declarative
document
RML + FnO for DBpedia EF
Declarative data transformations for Linked Data generation: the case of DBpedia
De Meester, B., Maroy, W., Dimou, A., Verborgh, R., and Mannens, E.
Sustainable Linked Data Generation: The Case of DBpedia
Maroy, W., Dimou, A., Kontokostas, D., De Meester, B., Verborgh, R., Lehmann, J., Mannens, E. and Hellmann, S.
Schema transformations provenance?
Generate provenance based on declarative mapping
document
Schema transformations provenance?
Solved!
Generate provenance based on declarative mapping
document
RML + PROV
Automated metadata generation for Linked Data generation and publishing workflows
Dimou, A., De Nies, T., Verborgh, R., Mannens, E., and Van de Walle, R.
Data transformations provenance?
Generate provenance based on declarative data
transformations
Data transformations provenance?
Outline
Linked Data Generation
Problem: Data Processing Provenance
Solution
Declarative generation
FnO and PROV
Results
FnO: Function
expects
output
inputString
predicate
outputString
predicate
DBpedia_
date_parser
fno:Function
FnO: Execution
DBpedia_
date_parser
Function
“April 20th 1954”
parseExecution
fno:Execution
“1954-04-20”
outputString
executesinputString
FnO: General Execution
Function
Data Transformation Output
Input
Aligning FnO and PROV
Output
prov:Entity
Tool
prov:Agent
wasGeneratedBy
Data Transformation
prov:Activity
Function
prov:Entity
used
Input
prov:Entity
used wasAssociatedWith
wasAttributedTo
Outline
Linked Data Generation
Problem: Data Processing Provenance
Solution
Declarative generation
FnO and PROV
Results
Uncool thing:
It’s big
When including provenance generation,
for every processed term,
you add 10 triples
Cool thing #1:
System details complementary
Output
prov:Entity
Tool
prov:Agent
wasGeneratedBy
Data Transformation
prov:Activity
Function
prov:Entity
used
Input
prov:Entity
used wasAssociatedWith
wasAttributedTo
Cool thing #2:
Aligning with RML complementary
wasInformedBy
Output
prov:Entity
Tool
prov:Agent
wasGeneratedBy
Data Transformation
prov:Activity
Function
prov:Entity
used
Input
prov:Entity
Schema
Transformation
prov:Activity
used wasAssociatedWith
wasAttributedTo
Cool thing #3:
It actually works
RMLMapper
https://github.com/RMLio/RML-Mapper
FunctionProcessor
https://github.com/FnOio/function-processor-java
DBpedia Extraction Sample
https://fno.io/prov/dbpedia/
How can we find a drunk Barney?
Query for long-lasting processes
Query all outputs of a certain function/tool
Query all input-output pairs
What to do with a drunk Barney?
Performance evaluation
Qualitative comparison
Iterative improvement
(only changing what is needed!)
Outline
Linked Data Generation
Problem: Data Processing Provenance
Solution
Declarative generation
FnO and PROV
Results
Detailed
Provenance Capture
of Data Processing
Ben De Meester, Anastasia Dimou,
Ruben Verborgh, and Erik Mannens
Ghent University – imec – IDLab, Belgium

Weitere ähnliche Inhalte

Mehr von Ben De Meester

OrdRing2015 - Event-Driven Rule-based Reasoning using EYE
OrdRing2015 - Event-Driven Rule-based Reasoning using EYEOrdRing2015 - Event-Driven Rule-based Reasoning using EYE
OrdRing2015 - Event-Driven Rule-based Reasoning using EYEBen De Meester
 
LINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
LINKed2015 - SERIF - A Semantic ExeRcise Interchange FormatLINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
LINKed2015 - SERIF - A Semantic ExeRcise Interchange FormatBen De Meester
 
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...Ben De Meester
 
ISWC2015 P&D - StoryBlink
ISWC2015 P&D - StoryBlinkISWC2015 P&D - StoryBlink
ISWC2015 P&D - StoryBlinkBen De Meester
 
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...Ben De Meester
 
Creating discoverable learning content using a user-friendly authoring enviro...
Creating discoverable learning content using a user-friendly authoring enviro...Creating discoverable learning content using a user-friendly authoring enviro...
Creating discoverable learning content using a user-friendly authoring enviro...Ben De Meester
 

Mehr von Ben De Meester (6)

OrdRing2015 - Event-Driven Rule-based Reasoning using EYE
OrdRing2015 - Event-Driven Rule-based Reasoning using EYEOrdRing2015 - Event-Driven Rule-based Reasoning using EYE
OrdRing2015 - Event-Driven Rule-based Reasoning using EYE
 
LINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
LINKed2015 - SERIF - A Semantic ExeRcise Interchange FormatLINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
LINKed2015 - SERIF - A Semantic ExeRcise Interchange Format
 
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
NLP&DBpedia2015 - Exposing Digital Content as Linked Data, and Linking them ...
 
ISWC2015 P&D - StoryBlink
ISWC2015 P&D - StoryBlinkISWC2015 P&D - StoryBlink
ISWC2015 P&D - StoryBlink
 
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
LocWeb2015 - Reconnecting Digital Publications to the Web Using their Spatial...
 
Creating discoverable learning content using a user-friendly authoring enviro...
Creating discoverable learning content using a user-friendly authoring enviro...Creating discoverable learning content using a user-friendly authoring enviro...
Creating discoverable learning content using a user-friendly authoring enviro...
 

Kürzlich hochgeladen

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...HyderabadDolls
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...SOFTTECHHUB
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themeitharjee
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...nirzagarg
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 

Kürzlich hochgeladen (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 

SemSci2017 - Detailed Provenance Capture of Data Processing

Hinweis der Redaktion

  1. Not just a person, could be buggy software as well
  2. Explain term-level (example)
  3. Because it’s declarative, it _can_ be generated automatically Declarative explains complete generation workflow without implementaiton
  4. Ideal because declarative and in RDF
  5. In summary, we propose a fully declarative generation process and applied this by aligning FnO to PROV. There’s a lot of cool things about this, but there’s one uncool thing…
  6. Any schema tf
  7. Provenance that provides more insight in the generation of a dataset, thus helps evaluation, comparison, and improvement of dataset generation