SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Towards Query Generation for
PROV-O Data
Jun Zhao1, HongHanWu2 and Jeff Z. Pan2
1Lancaster University
@junszhao | j.zhao5 at lancaster.ac.uk
2University of Aberdeen
honghan.wu | jeff.z.pan at abdn.ac.uk
Outline
• Motivation
• Profile-driven query generation
– K-Drive
– ProvQ
• Result discussion
• Future work
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
The Big Picture of PROV: A Motivation Scenario
Adapted from:
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
Provenance information
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
Provenance in the Wild v.s. ProvBench
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Workflow
/ scientific
domain
• 11 repositories so far
• Various representations
• Cross different domains
• Openly accessible under
different open licenses
Web
resources
Social
domain
https://github.com/provbench
https://sites.google.com/site/provbench/home
Next Step: Access PROV Datasets
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Can we query
across them?
Can we learn
something by
querying
across them?
What can we
do with them?
……
Query Generation: A Bottom-up Approach
Taverna-
PROV
Wings
PROV
Wikipedia
-PROV
OBIAMA
(social
simulation)
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for PROV-O
datasets
Example profiles:
• Class associations
• Property associations
Query Generation: A First Step
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Example profiles:
• Class associations
• Property associations
Big City:
Big Road:
Slide credit: Dr Wu at Scottish Linked Data Workshop 2014
http://www.kdrive-project.eu EU FP7 Marie-Curie 286348
Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116
• University of Aberdeen
• A generic query generation
tool for semantic web data
• Find key sub-graphs in the
RDF data
– Big City: The most
instantialised concepts in the
data
– Big Road: The most frequent
relations connecting those
big cities
K-Drive Query Generation
K-Drive Generator
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
SELECT ?Generation ?x4_1 ?x3_1 ?x0_1
WHERE {
?Generation rdf:type <http://www.w3.org/ns/prov#Generation>.
?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 .
?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 .
?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation .
}
K-Drive Generator
ProvQ: Property Association Mining
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Discover properties that are
used together with each
PROV-O properties
Expand a set of “seed” PROV-O
queries using the discovered
associating properties
https://github.com/junszhao/ProvQ
ProvQ: Property Association Mining
• Advantages
– Reduce the performance challenge usually faced
in association rule mining
– Produce provenance-centric queries
• Disadvantages
– Could miss queries that are not related to PROV-
O terms at all
Expanding Starting Queries
Approach Walk-Through
• Given a seed atomic query,
we have seed property:
• We find all properties used together with
– http://purl.org/wf4ever/wfprov#describedByParameter
– http://purl.org/wf4ever/wfprov#wasOutputFrom
– http://www.w3.org/ns/prov#qualifiedGeneration
• Return resulting conjunctive SPARQL query
Results Comparison
• K-Drive Generator
– 7 Queries
– 3 of them are not
exactly provenance
queries
– Probably easier to
understand because
classes are included in
the queries
– But queries can be
complex
• ProvQ
– 7 Queries
– 1 not returned by K-Drive
(prov:wasDerivedFrom)
– Only provenance queries
are returned
– Queries are simple,
based on properties
associations starting from
“seed” PROV-O
properties
https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
Future Work
• Define and evaluate usefulness
• Test against more datasets
• Experiment with reasoning
• Query generation across multiple datasets
Thank you!
These slides have been created by Jun Zhao
This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0
Unported
http://creativecommons.org/licenses/by-nc-sa/3.0/

Weitere ähnliche Inhalte

Ähnlich wie Query-generation-for-provo-data-201406

Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataGong Cheng
 
Cool Tools For Library
Cool Tools For Library Cool Tools For Library
Cool Tools For Library Johnson888
 
Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007Darlene Fichter
 
Invincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea, Inc.
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific dataBruno Vieira
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaFabrizio Orlandi
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaJazz Yao-Tsung Wang
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01Richard Nurse
 
Building communities around open-source scientific software
Building communities around open-source scientific softwareBuilding communities around open-source scientific software
Building communities around open-source scientific softwareKaren Cranston
 
PhD Projects in Java Research Help
PhD Projects in Java Research HelpPhD Projects in Java Research Help
PhD Projects in Java Research HelpPhD Services
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiimeZech Xu
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 

Ähnlich wie Query-generation-for-provo-data-201406 (20)

Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
Cool Tools For Library
Cool Tools For Library Cool Tools For Library
Cool Tools For Library
 
Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007
 
Invincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in Tapio
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Ccanz webinar-oaw
Ccanz webinar-oawCcanz webinar-oaw
Ccanz webinar-oaw
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific data
 
Java PathFinder
Java PathFinderJava PathFinder
Java PathFinder
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01
 
Building communities around open-source scientific software
Building communities around open-source scientific softwareBuilding communities around open-source scientific software
Building communities around open-source scientific software
 
PhD Projects in Java Research Help
PhD Projects in Java Research HelpPhD Projects in Java Research Help
PhD Projects in Java Research Help
 
Milex 2010 final
Milex 2010 finalMilex 2010 final
Milex 2010 final
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiime
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 

Mehr von Jun Zhao

2012 05-swpm-provo
2012 05-swpm-provo2012 05-swpm-provo
2012 05-swpm-provoJun Zhao
 
2012 04-ldow-prov
2012 04-ldow-prov2012 04-ldow-prov
2012 04-ldow-provJun Zhao
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmvJun Zhao
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovukJun Zhao
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_nextJun Zhao
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prvJun Zhao
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburghJun Zhao
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf OpenflydataJun Zhao
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod LondonJun Zhao
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils FlywebJun Zhao
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Jun Zhao
 
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDAJun Zhao
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao EswcJun Zhao
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao LdowJun Zhao
 

Mehr von Jun Zhao (17)

2012 05-swpm-provo
2012 05-swpm-provo2012 05-swpm-provo
2012 05-swpm-provo
 
2012 04-ldow-prov
2012 04-ldow-prov2012 04-ldow-prov
2012 04-ldow-prov
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburgh
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod London
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
 
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDA
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow
 

Kürzlich hochgeladen

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsYoss Cohen
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialJoão Esperancinha
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Kürzlich hochgeladen (20)

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
JET Technology Labs White Paper for Virtualized Security and Encryption Techn...
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Kuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorialKuma Meshes Part I - The basics - A tutorial
Kuma Meshes Part I - The basics - A tutorial
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Query-generation-for-provo-data-201406

  • 1. Towards Query Generation for PROV-O Data Jun Zhao1, HongHanWu2 and Jeff Z. Pan2 1Lancaster University @junszhao | j.zhao5 at lancaster.ac.uk 2University of Aberdeen honghan.wu | jeff.z.pan at abdn.ac.uk
  • 2. Outline • Motivation • Profile-driven query generation – K-Drive – ProvQ • Result discussion • Future work
  • 3. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
  • 4. The Big Picture of PROV: A Motivation Scenario Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png Provenance information
  • 5. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
  • 6. Provenance in the Wild v.s. ProvBench Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Workflow / scientific domain • 11 repositories so far • Various representations • Cross different domains • Openly accessible under different open licenses Web resources Social domain https://github.com/provbench https://sites.google.com/site/provbench/home
  • 7. Next Step: Access PROV Datasets Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Can we query across them? Can we learn something by querying across them? What can we do with them? ……
  • 8. Query Generation: A Bottom-up Approach Taverna- PROV Wings PROV Wikipedia -PROV OBIAMA (social simulation) Provenance Data Profile Generator Provenance Query Builder SPARQL queries for PROV-O datasets Example profiles: • Class associations • Property associations
  • 9. Query Generation: A First Step A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Example profiles: • Class associations • Property associations
  • 10. Big City: Big Road: Slide credit: Dr Wu at Scottish Linked Data Workshop 2014 http://www.kdrive-project.eu EU FP7 Marie-Curie 286348 Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116 • University of Aberdeen • A generic query generation tool for semantic web data • Find key sub-graphs in the RDF data – Big City: The most instantialised concepts in the data – Big Road: The most frequent relations connecting those big cities K-Drive Query Generation
  • 12. Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html SELECT ?Generation ?x4_1 ?x3_1 ?x0_1 WHERE { ?Generation rdf:type <http://www.w3.org/ns/prov#Generation>. ?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 . ?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 . ?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation . } K-Drive Generator
  • 13. ProvQ: Property Association Mining A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Discover properties that are used together with each PROV-O properties Expand a set of “seed” PROV-O queries using the discovered associating properties https://github.com/junszhao/ProvQ
  • 14. ProvQ: Property Association Mining • Advantages – Reduce the performance challenge usually faced in association rule mining – Produce provenance-centric queries • Disadvantages – Could miss queries that are not related to PROV- O terms at all
  • 16. Approach Walk-Through • Given a seed atomic query, we have seed property: • We find all properties used together with – http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration • Return resulting conjunctive SPARQL query
  • 17. Results Comparison • K-Drive Generator – 7 Queries – 3 of them are not exactly provenance queries – Probably easier to understand because classes are included in the queries – But queries can be complex • ProvQ – 7 Queries – 1 not returned by K-Drive (prov:wasDerivedFrom) – Only provenance queries are returned – Queries are simple, based on properties associations starting from “seed” PROV-O properties https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
  • 18. Future Work • Define and evaluate usefulness • Test against more datasets • Experiment with reasoning • Query generation across multiple datasets
  • 19. Thank you! These slides have been created by Jun Zhao This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/

Hinweis der Redaktion

  1. wasGeneratedBy, startedAtTime, endedAtTime, wasAssociatedWith, wasAttributedTo, actedOnBehalfOf, wasInformedBy
  2. From prov:wasGeneratedBy: Select distinct * where { ?s prov:wasGeneratedBy ?o . optional {?s <http://purl.org/wf4ever/wfprov#describedByParameter> ?o1.} optional {?s <http://purl.org/wf4ever/wfprov#wasOutputFrom> ?o3 .} optional {?s <http://www.w3.org/ns/prov#qualifiedGeneration> ?o4 .} } limit 100 2. From prov:used <http://purl.org/wf4ever/wfprov#usedInput>; 1 rdfs:label; 1 prov:endedAtTime; 1 prov:startedAtTime; 1 prov:qualifiedAssociation; 1 prov:qualifiedUsage; 1 <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.98 <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.98 Select distinct * where { ?s prov:used ?o . ?s <http://purl.org/wf4ever/wfprov#usedInput> ?o1 . ?s rdfs:label ?o2 . ?s prov:endedAtTime ?o3 . ?s prov:startedAtTime ?o4 . ?s prov:qualifiedAssociation ?o5 . ?s prov:qualifiedUsage ?o6 . optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o7 .} optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o8 .} } limit 100 3. From prov:wasDerivedFrom <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage>; 1 <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace>; 1 Select distinct * where { ?s prov:wasDerivedFrom ?o . ?s <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage> ?o1. ?s <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace> ?o2 . } limit 100 4. From prov:startedAtTime and prov:endedAtTime, will produce similar result as query 2 rdfs:label; 1 prov:endedAtTime; 1 prov:qualifiedAssociation; 1 <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.97 <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.97 prov:qualifiedUsage; 0.90 prov:used; 0.90 <http://purl.org/wf4ever/wfprov#usedInput>; 0.90 Select distinct * where { ?s prov:startedAtTime?o . ?s rdfs:label ?o1 . ?s prov:endedAtTime ?o2 . ?s prov:qualifiedAssociation ?o3 . optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o4 .} optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o5 .} optional {?s <http://purl.org/wf4ever/wfprov#usedInput> ?o6 .} optional {?s prov:qualifiedUsage ?o7 .} optional {?s prov:used ?o8 .} } limit 100
  3. 3 queries were largely the same, 3 queries were only returned by K-Drive, and the rest had different degrees of overlap. 1 query not returned