SlideShare ist ein Scribd-Unternehmen logo
1 von 17
Downloaden Sie, um offline zu lesen
Generating Executable Mappings from RDF
Data Cube Data Structure Definitions
Christophe Debruyne, Dave Lewis, Declan O’Sullivan
Trinity College Dublin
2018-10-23 @ ODBASE
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
www.adaptcentre.ieIntroduction
• Data processing is increasingly the subject of various
internal and external regulations – e.g., GDPR.
• Datasets are created and used for a particular purpose.
E.g., sending newsletters or using the purchase history of
users to suggest recommendations. In the context of GDPR,
these purposes require a user’s informed consent.
• Can we generate datasets for a particular purpose “just in
time” that complies with informed consent?
2018-10-23 2
www.adaptcentre.ieIntroduction
• R2RML is a convenient way to transform (relational) non-
RDF data into RDF to create these datasets.
• One can create mappings from databases to vocabularies,
ontologies, etc. for data processing activities.
• We, however, chose to adopt the RDF Data Cube
Vocabulary (QB) for representing datasets.
2018-10-23 3
www.adaptcentre.ieIntroduction
• QB is an ontology for multi-dimensional datasets.
A Data Structure Definition prescribes how a Dataset and
its Observations are structure. An Observation is identified
by Dimensions and captures a value for a Measure.
• QB’s foundations is rooted in a schema for statistical
datasets and the ontology seemingly complicated, but the
RDF vocabulary is useful for other types of datasets as well.
• Our choice was also influenced by projects in the health
domain where statistical processing of data is key*
*AVERT project: https://www.tcd.ie/medicine/thkc/avert/index.php/
2018-10-23 4
www.adaptcentre.ieResearch Question
• From “Can we generate datasets for a particular purpose
“just in time” that complies with informed consent?”
• To: “If we have a DSD for a particular purpose, how can we
create an executable R2RML mapping to generate a dataset
that complies with that DSD’s structure?”
• A solution could is subsequently be extended to take into
account policies so as to generate mapping that is
compliant. In other words: “policy-aware”. To be reported.
2018-10-23 5
www.adaptcentre.ieApproach
• R2DQB – pronounced R-2-D-cube
• Data Structure Definitions
• Dimensions
• Measures
• Attributes
• References to tables
• References to columns
• Transformation functions
• …
Mapping
Engine
R2RML Mapping
R2RML
Processor
Data Cube Dataset
extended with
according
to
1
2
3
Validation 4
Provenance
Information
captured with
5
2018-10-23 6
www.adaptcentre.ieApproach
Step 1: annotating DSDs
• May be done in a separate graph (separation of concerns)
• We chose to reuse R2RML to assess the feasibility in this
study. A bespoke vocabulary may be considered in the
future.
(example from RDF Data Vocabulary Recommendation)
2018-10-23 7
@base <http://www.example.org/>
<#refPeriod> a rdf:Property, qb:DimensionProperty;
rdfs:subPropertyOf sdmx-dimension:refPeriod .
<#refArea> a rdf:Property, qb:DimensionProperty;
rdfs:subPropertyOf sdmx-dimension:refArea .
<#lifeExpectancy> a rdf:Property, qb:MeasureProperty;
rdfs:subPropertyOf sdmx-measure:obsValue;
rdfs:range xsd:decimal .
sdmx-dimension:sex a rdf:Property, qb:DimensionProperty .
<#dsd-le> a qb:DataStructureDefinition;
# The dimensions
qb:component [ qb:dimension <#refArea> ];
qb:component [ qb:dimension <#refPeriod> ];
qb:component [ qb:dimension sdmx-dimension:sex ];
# The measure(s)
qb:component [ qb:measure <#lifeExpectancy> ] .
@base <http://www.example.org/>
<#refPeriod> rr:column "period";
<#refArea> rr:column "area";
<#lifeExpectancy> rr:column "lifeexpectancy";
sdmx-dimension:sex rr:column "sex" .
<#dsd-le> rr:tableName "statssimple";
The DSD
The annotations
Note: prefixes
omitted for brevity.
www.adaptcentre.ieApproach
Step 2: Generating the R2RML mapping
• Adopting a declarative approach with SPARQL CONSTRUCT
queries:
1. Generating a triples map for each DSD
2. Generating a subject map for each DSD and a predicate
object map for linking observations to dataset
Subject map is based on dimensions, as
observations are identified by those.
3. Generating predicate object maps from measures
4. Generating predicate object maps from dimensions
5. Generating a link between dataset and DSD
2018-10-23 9
1. CONSTRUCT {
2. ?tm rr:subjectMap [
3. rr:class qb:Observation ;
4. rr:termType rr:BlankNode ;
5. rr:template ?x ;
6. ] .
7. ?tm rr:predicateObjectMap [
8. rr:predicate qb:dataSet ;
9. rr:object ?ds;
10. ] .
11.} WHERE {
12. ?tm pam:correspondsWith ?dsd ;
13. rr:logicalTable [ rr:tableName ?t ] ;
14. BIND(IRI(?t) AS ?ds)
15. {
16. SELECT
17. (CONCAT("{", GROUP_CONCAT(?c; SEPARATOR="}-{"), "}") as ?x) {
18. ?dsd qb:component ?component .
19. { ?component qb:dimension [ rr:column ?c ] }
20. UNION
21. # OMITTED FOR CLARITY (SEE PAPER)
22. } GROUP BY ?dsd
23. }
24.}
Constructing a subject map for observations
and a predicate object map for linking
observations to a dataset.
All queries can be found in the paper.
www.adaptcentre.ieApproach
Step 2: Generating the R2RML mapping
1. [ pam:correspondsWith <http://www.example.org/#dsd-le> ;
2. rr:logicalTable [ rr:tableName "statssimple" ] ;
3. rr:predicateObjectMap [
4. rr:objectMap [ rr:column "area" ] ;
5. rr:predicate <http://www.example.org/#refArea>
6. ] ;
7. # Omitted
8. rr:predicateObjectMap [
9. rr:object <statssimple> ;
10. rr:predicate qb:dataSet
11. ] ;
12. rr:subjectMap [
13. rr:class qb:Observation ;
14. rr:template "{area}-{period}-{sex}" ;
15. rr:termType rr:BlankNode
16. ]
17.] .
Result
CONSTRUCT
query previous
slide.
2018-10-23 11
www.adaptcentre.ieApproach
Step 3: Executing the R2RML Mapping – straightforward
We did use our implementation of R2RML which extends the
specification with JavaScript functions called R2RML-F
Step 4: Validating the generated RDF
Using the integrity constraints specified by the RDF Data Cube
Vocabulary Recommendation
2018-10-23 12
www.adaptcentre.ieApproach
Step 5: Provenance Information
Keep track of activities and intermediate results with PROV-O.
This will become key for a posteriori compliance analysis in
future work.
pam:Validation_Report
pam:DSD_Document
pam:Generate_Mapping
pam:Execute_Mapping
pam:Validate_Dataset
pam:Mapping_Generator
pam:R2RML_Processor
pam:DSD_Document
pam:R2RML_Mapping
pam:Validatorowl:Thing
prov:Entity
prov:Agent prov:SoftwareAgent
prov:Activity
2018-10-23 13
www.adaptcentre.ieFeatures
Mapping values onto URIs, and
Inclusion of data transformation functions
• Mapping languages such as D2R had so-called translation tables,
which mapped elements of one set to elements of another. Ideal
for mapping values to IRIs. R2RML has no such functionality.
That is why we choose to adopt R2RML-F, where such
“translation tables” can be written in a JavaScript function.
• R2RML-F also allows for transformation functions to be written
when the underlying database technology has not support for
that.
Possibility to interlink with external datasets provided by R2RML
2018-10-23 14
www.adaptcentre.ieRelated Work
Related Work – generation of R2RML to the best of our
knowledge limited.
• Skjaeveland et al. 2015 proposed a method to generate an
ontology, rules and a mapping from one description
• TabLinker and CSV2DataCube are two tools for generating
QB graphs from Excel files (in a certain format) and CSV
data respectively
• The Open Cube Toolkit has a built-in R2RML compliant D2R
server, but it relies on a bespoke XML that maps source and
DSD.
2018-10-23 15
www.adaptcentre.ieConclusions
• We argued that datasets are used for a purpose and that
datasets should be built suitable for a purpose, including
any policies it should comply with.
• Before we can do the latter, we investigated the former by
trying to answer the question: “Can we generate an R2RML
mapping from a data structure definition?”
• The answer is yes and we presented the R2DQB approach
showing how. We strived for a declarative approach using
SPARQL CONSTRUCT queries. A demonstration of the
approach is presented in the paper.
2018-10-23 16
www.adaptcentre.ieFuture work
Tackling the problem of policy-aware mapping, which would
complement research on post-hoc compliance analysis (e.g.,
Harsh et al. 2017). To be reported.
The Metadata Vocabulary for Tabular Data (W3C Rec.). A
vocabulary for describing the “schemas” of tabular data,
including constraints. This might be another representation
worth considering (future work)
2018-10-23 17

Weitere ähnliche Inhalte

Was ist angesagt?

Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchRim Moussa
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Dippy Aggarwal
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityMongoDB
 
Graph Databases introduction to rug-b
Graph Databases introduction to rug-bGraph Databases introduction to rug-b
Graph Databases introduction to rug-bPere Urbón-Bayes
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesPere Urbón-Bayes
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkIRJET Journal
 
Graph computation
Graph computationGraph computation
Graph computationSigmoid
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET Journal
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingRahul Singh
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoDB Database
 
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemChallenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemAlluxio, Inc.
 
Big Data Summer training presentation
Big Data Summer training presentationBig Data Summer training presentation
Big Data Summer training presentationHarshitaKamboj
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoopdbpublications
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methodspaperpublications3
 

Was ist angesagt? (18)

Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP benchMultidimensional DB design, revolving TPC-H benchmark into OLAP bench
Multidimensional DB design, revolving TPC-H benchmark into OLAP bench
 
Working with Scientific Data in MATLAB
Working with Scientific Data in MATLABWorking with Scientific Data in MATLAB
Working with Scientific Data in MATLAB
 
parallel OLAP
parallel OLAPparallel OLAP
parallel OLAP
 
Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...Employing Graph Databases as a Standardization Model towards Addressing Heter...
Employing Graph Databases as a Standardization Model towards Addressing Heter...
 
Giving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS CommunityGiving MongoDB a Way to Play with the GIS Community
Giving MongoDB a Way to Play with the GIS Community
 
Graph Databases introduction to rug-b
Graph Databases introduction to rug-bGraph Databases introduction to rug-b
Graph Databases introduction to rug-b
 
Graph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage EnginesGraph Databases, The Web of Data Storage Engines
Graph Databases, The Web of Data Storage Engines
 
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame WorkA Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
A Big-Data Process Consigned Geographically by Employing Mapreduce Frame Work
 
Graph computation
Graph computationGraph computation
Graph computation
 
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of HadoopIRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
IRJET- Big Data-A Review Study with Comparitive Analysis of Hadoop
 
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab SingBig data | Hadoop | components of hadoop |Rahul Gulab Sing
Big data | Hadoop | components of hadoop |Rahul Gulab Sing
 
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning MetadataArangoML Pipeline Cloud - Managed Machine Learning Metadata
ArangoML Pipeline Cloud - Managed Machine Learning Metadata
 
Challenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data SystemChallenge And Evolution Of Data Orchestration at Rakuten Data System
Challenge And Evolution Of Data Orchestration at Rakuten Data System
 
Advancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGISAdvancing Scientific Data Support in ArcGIS
Advancing Scientific Data Support in ArcGIS
 
Big Data Summer training presentation
Big Data Summer training presentationBig Data Summer training presentation
Big Data Summer training presentation
 
Web Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using HadoopWeb Oriented FIM for large scale dataset using Hadoop
Web Oriented FIM for large scale dataset using Hadoop
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
 
Survey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization MethodsSurvey on Performance of Hadoop Map reduce Optimization Methods
Survey on Performance of Hadoop Map reduce Optimization Methods
 

Ähnlich wie Generating Executable Mappings from RDF Data Cube Data Structure Definitions

RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingPrithwis Mukerjee
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemGregg Barrett
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALASaikiran Panjala
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
BDAS RDD study report v1.2
BDAS RDD study report v1.2BDAS RDD study report v1.2
BDAS RDD study report v1.2Stefanie Zhao
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; pythonMaloy Manna, PMP®
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?IJCSIS Research Publications
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)Christophe Debruyne
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataEDINA, University of Edinburgh
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...Debraj GuhaThakurta
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...Debraj GuhaThakurta
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.tomasknap
 
An efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence apiAn efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence apiJoão Gabriel Lima
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 

Ähnlich wie Generating Executable Mappings from RDF Data Cube Data Structure Definitions (20)

RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
 
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALADATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
DATA WAREHOUSE IMPLEMENTATION BY SAIKIRAN PANJALA
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
BDAS RDD study report v1.2
BDAS RDD study report v1.2BDAS RDD study report v1.2
BDAS RDD study report v1.2
 
Data processing with spark in r &amp; python
Data processing with spark in r &amp; pythonData processing with spark in r &amp; python
Data processing with spark in r &amp; python
 
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
Which NoSQL Database to Combine with Spark for Real Time Big Data Analytics?
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
TDWI Accelerate, Seattle, Oct 16, 2017: Distributed and In-Database Analytics...
 
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
TWDI Accelerate Seattle, Oct 16, 2017: Distributed and In-Database Analytics ...
 
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
UnifiedViews: Towards ETL Tool for Simple yet Powerful RDF Data Management.
 
NUIG LOSD tools
NUIG LOSD toolsNUIG LOSD tools
NUIG LOSD tools
 
An efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence apiAn efficient data mining framework on hadoop using java persistence api
An efficient data mining framework on hadoop using java persistence api
 
The Ariadne Project
The Ariadne ProjectThe Ariadne Project
The Ariadne Project
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 

Mehr von Christophe Debruyne

One year of DALIDA Data Literacy Workshops for Adults: a Report
One year of DALIDA Data Literacy Workshops for Adults: a ReportOne year of DALIDA Data Literacy Workshops for Adults: a Report
One year of DALIDA Data Literacy Workshops for Adults: a ReportChristophe Debruyne
 
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologieProjet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologieChristophe Debruyne
 
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
Knowledge Graphs: Concept, mogelijkheden en aandachtspuntenKnowledge Graphs: Concept, mogelijkheden en aandachtspunten
Knowledge Graphs: Concept, mogelijkheden en aandachtspuntenChristophe Debruyne
 
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
Reusable SHACL Constraint Components for Validating Geospatial Linked DataReusable SHACL Constraint Components for Validating Geospatial Linked Data
Reusable SHACL Constraint Components for Validating Geospatial Linked DataChristophe Debruyne
 
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
Hidden Amongst the Data: the Beyond 2022 Knowledge GraphHidden Amongst the Data: the Beyond 2022 Knowledge Graph
Hidden Amongst the Data: the Beyond 2022 Knowledge GraphChristophe Debruyne
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainChristophe Debruyne
 
Using Maps for Interlinking Geospatial Linked Data
Using Maps for Interlinking Geospatial Linked DataUsing Maps for Interlinking Geospatial Linked Data
Using Maps for Interlinking Geospatial Linked DataChristophe Debruyne
 
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...Christophe Debruyne
 
Towards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsTowards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsChristophe Debruyne
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLChristophe Debruyne
 
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...Christophe Debruyne
 
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Client-side Processing of GeoSPARQL Functions with Triple Pattern FragmentsClient-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Client-side Processing of GeoSPARQL Functions with Triple Pattern FragmentsChristophe Debruyne
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataChristophe Debruyne
 
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Christophe Debruyne
 
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML MappingsR2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML MappingsChristophe Debruyne
 
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...Christophe Debruyne
 
Creating and Consuming Metadata from Transcribed Historical Vital Records for...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...Creating and Consuming Metadata from Transcribed Historical Vital Records for...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...Christophe Debruyne
 
Using Semantic Technologies to Create Virtual Families from Historical Vital ...
Using Semantic Technologies to Create Virtual Families from Historical Vital ...Using Semantic Technologies to Create Virtual Families from Historical Vital ...
Using Semantic Technologies to Create Virtual Families from Historical Vital ...Christophe Debruyne
 
2014 06-04-presentation-mdn-2014
2014 06-04-presentation-mdn-20142014 06-04-presentation-mdn-2014
2014 06-04-presentation-mdn-2014Christophe Debruyne
 

Mehr von Christophe Debruyne (20)

One year of DALIDA Data Literacy Workshops for Adults: a Report
One year of DALIDA Data Literacy Workshops for Adults: a ReportOne year of DALIDA Data Literacy Workshops for Adults: a Report
One year of DALIDA Data Literacy Workshops for Adults: a Report
 
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologieProjet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
Projet TOXIN : Des graphes de connaissances pour la recherche en toxicologie
 
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
Knowledge Graphs: Concept, mogelijkheden en aandachtspuntenKnowledge Graphs: Concept, mogelijkheden en aandachtspunten
Knowledge Graphs: Concept, mogelijkheden en aandachtspunten
 
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
Reusable SHACL Constraint Components for Validating Geospatial Linked DataReusable SHACL Constraint Components for Validating Geospatial Linked Data
Reusable SHACL Constraint Components for Validating Geospatial Linked Data
 
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
Hidden Amongst the Data: the Beyond 2022 Knowledge GraphHidden Amongst the Data: the Beyond 2022 Knowledge Graph
Hidden Amongst the Data: the Beyond 2022 Knowledge Graph
 
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology DomainFacilitating Data Curation: a Solution Developed in the Toxicology Domain
Facilitating Data Curation: a Solution Developed in the Toxicology Domain
 
Using Maps for Interlinking Geospatial Linked Data
Using Maps for Interlinking Geospatial Linked DataUsing Maps for Interlinking Geospatial Linked Data
Using Maps for Interlinking Geospatial Linked Data
 
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
Linked Data Publication and Interlinking Research within the SFI funded ADAPT...
 
Towards Generating Policy-compliant Datasets
Towards Generating Policy-compliant DatasetsTowards Generating Policy-compliant Datasets
Towards Generating Policy-compliant Datasets
 
Uplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RMLUplift – Generating RDF datasets from non-RDF data with R2RML
Uplift – Generating RDF datasets from non-RDF data with R2RML
 
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
A Lightweight Approach to Explore, Enrich and Use Data with a Geospatial Dime...
 
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Client-side Processing of GeoSPARQL Functions with Triple Pattern FragmentsClient-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments
 
Serving Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked DataServing Ireland's Geospatial Information as Linked Data
Serving Ireland's Geospatial Information as Linked Data
 
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
Serving Ireland's Geospatial Information as Linked Data (ISWC 2016 Poster)
 
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML MappingsR2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
R2RML-F: Towards Sharing and Executing Domain Logic in R2RML Mappings
 
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
Towards a Project Centric Metadata Model and Lifecycle for Ontology Mapping G...
 
Creating and Consuming Metadata from Transcribed Historical Vital Records for...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...Creating and Consuming Metadata from Transcribed Historical Vital Records for...
Creating and Consuming Metadata from Transcribed Historical Vital Records for...
 
What is Linked Data?
What is Linked Data?What is Linked Data?
What is Linked Data?
 
Using Semantic Technologies to Create Virtual Families from Historical Vital ...
Using Semantic Technologies to Create Virtual Families from Historical Vital ...Using Semantic Technologies to Create Virtual Families from Historical Vital ...
Using Semantic Technologies to Create Virtual Families from Historical Vital ...
 
2014 06-04-presentation-mdn-2014
2014 06-04-presentation-mdn-20142014 06-04-presentation-mdn-2014
2014 06-04-presentation-mdn-2014
 

Kürzlich hochgeladen

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 

Kürzlich hochgeladen (20)

Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 

Generating Executable Mappings from RDF Data Cube Data Structure Definitions

  • 1. Generating Executable Mappings from RDF Data Cube Data Structure Definitions Christophe Debruyne, Dave Lewis, Declan O’Sullivan Trinity College Dublin 2018-10-23 @ ODBASE The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
  • 2. www.adaptcentre.ieIntroduction • Data processing is increasingly the subject of various internal and external regulations – e.g., GDPR. • Datasets are created and used for a particular purpose. E.g., sending newsletters or using the purchase history of users to suggest recommendations. In the context of GDPR, these purposes require a user’s informed consent. • Can we generate datasets for a particular purpose “just in time” that complies with informed consent? 2018-10-23 2
  • 3. www.adaptcentre.ieIntroduction • R2RML is a convenient way to transform (relational) non- RDF data into RDF to create these datasets. • One can create mappings from databases to vocabularies, ontologies, etc. for data processing activities. • We, however, chose to adopt the RDF Data Cube Vocabulary (QB) for representing datasets. 2018-10-23 3
  • 4. www.adaptcentre.ieIntroduction • QB is an ontology for multi-dimensional datasets. A Data Structure Definition prescribes how a Dataset and its Observations are structure. An Observation is identified by Dimensions and captures a value for a Measure. • QB’s foundations is rooted in a schema for statistical datasets and the ontology seemingly complicated, but the RDF vocabulary is useful for other types of datasets as well. • Our choice was also influenced by projects in the health domain where statistical processing of data is key* *AVERT project: https://www.tcd.ie/medicine/thkc/avert/index.php/ 2018-10-23 4
  • 5. www.adaptcentre.ieResearch Question • From “Can we generate datasets for a particular purpose “just in time” that complies with informed consent?” • To: “If we have a DSD for a particular purpose, how can we create an executable R2RML mapping to generate a dataset that complies with that DSD’s structure?” • A solution could is subsequently be extended to take into account policies so as to generate mapping that is compliant. In other words: “policy-aware”. To be reported. 2018-10-23 5
  • 6. www.adaptcentre.ieApproach • R2DQB – pronounced R-2-D-cube • Data Structure Definitions • Dimensions • Measures • Attributes • References to tables • References to columns • Transformation functions • … Mapping Engine R2RML Mapping R2RML Processor Data Cube Dataset extended with according to 1 2 3 Validation 4 Provenance Information captured with 5 2018-10-23 6
  • 7. www.adaptcentre.ieApproach Step 1: annotating DSDs • May be done in a separate graph (separation of concerns) • We chose to reuse R2RML to assess the feasibility in this study. A bespoke vocabulary may be considered in the future. (example from RDF Data Vocabulary Recommendation) 2018-10-23 7
  • 8. @base <http://www.example.org/> <#refPeriod> a rdf:Property, qb:DimensionProperty; rdfs:subPropertyOf sdmx-dimension:refPeriod . <#refArea> a rdf:Property, qb:DimensionProperty; rdfs:subPropertyOf sdmx-dimension:refArea . <#lifeExpectancy> a rdf:Property, qb:MeasureProperty; rdfs:subPropertyOf sdmx-measure:obsValue; rdfs:range xsd:decimal . sdmx-dimension:sex a rdf:Property, qb:DimensionProperty . <#dsd-le> a qb:DataStructureDefinition; # The dimensions qb:component [ qb:dimension <#refArea> ]; qb:component [ qb:dimension <#refPeriod> ]; qb:component [ qb:dimension sdmx-dimension:sex ]; # The measure(s) qb:component [ qb:measure <#lifeExpectancy> ] . @base <http://www.example.org/> <#refPeriod> rr:column "period"; <#refArea> rr:column "area"; <#lifeExpectancy> rr:column "lifeexpectancy"; sdmx-dimension:sex rr:column "sex" . <#dsd-le> rr:tableName "statssimple"; The DSD The annotations Note: prefixes omitted for brevity.
  • 9. www.adaptcentre.ieApproach Step 2: Generating the R2RML mapping • Adopting a declarative approach with SPARQL CONSTRUCT queries: 1. Generating a triples map for each DSD 2. Generating a subject map for each DSD and a predicate object map for linking observations to dataset Subject map is based on dimensions, as observations are identified by those. 3. Generating predicate object maps from measures 4. Generating predicate object maps from dimensions 5. Generating a link between dataset and DSD 2018-10-23 9
  • 10. 1. CONSTRUCT { 2. ?tm rr:subjectMap [ 3. rr:class qb:Observation ; 4. rr:termType rr:BlankNode ; 5. rr:template ?x ; 6. ] . 7. ?tm rr:predicateObjectMap [ 8. rr:predicate qb:dataSet ; 9. rr:object ?ds; 10. ] . 11.} WHERE { 12. ?tm pam:correspondsWith ?dsd ; 13. rr:logicalTable [ rr:tableName ?t ] ; 14. BIND(IRI(?t) AS ?ds) 15. { 16. SELECT 17. (CONCAT("{", GROUP_CONCAT(?c; SEPARATOR="}-{"), "}") as ?x) { 18. ?dsd qb:component ?component . 19. { ?component qb:dimension [ rr:column ?c ] } 20. UNION 21. # OMITTED FOR CLARITY (SEE PAPER) 22. } GROUP BY ?dsd 23. } 24.} Constructing a subject map for observations and a predicate object map for linking observations to a dataset. All queries can be found in the paper.
  • 11. www.adaptcentre.ieApproach Step 2: Generating the R2RML mapping 1. [ pam:correspondsWith <http://www.example.org/#dsd-le> ; 2. rr:logicalTable [ rr:tableName "statssimple" ] ; 3. rr:predicateObjectMap [ 4. rr:objectMap [ rr:column "area" ] ; 5. rr:predicate <http://www.example.org/#refArea> 6. ] ; 7. # Omitted 8. rr:predicateObjectMap [ 9. rr:object <statssimple> ; 10. rr:predicate qb:dataSet 11. ] ; 12. rr:subjectMap [ 13. rr:class qb:Observation ; 14. rr:template "{area}-{period}-{sex}" ; 15. rr:termType rr:BlankNode 16. ] 17.] . Result CONSTRUCT query previous slide. 2018-10-23 11
  • 12. www.adaptcentre.ieApproach Step 3: Executing the R2RML Mapping – straightforward We did use our implementation of R2RML which extends the specification with JavaScript functions called R2RML-F Step 4: Validating the generated RDF Using the integrity constraints specified by the RDF Data Cube Vocabulary Recommendation 2018-10-23 12
  • 13. www.adaptcentre.ieApproach Step 5: Provenance Information Keep track of activities and intermediate results with PROV-O. This will become key for a posteriori compliance analysis in future work. pam:Validation_Report pam:DSD_Document pam:Generate_Mapping pam:Execute_Mapping pam:Validate_Dataset pam:Mapping_Generator pam:R2RML_Processor pam:DSD_Document pam:R2RML_Mapping pam:Validatorowl:Thing prov:Entity prov:Agent prov:SoftwareAgent prov:Activity 2018-10-23 13
  • 14. www.adaptcentre.ieFeatures Mapping values onto URIs, and Inclusion of data transformation functions • Mapping languages such as D2R had so-called translation tables, which mapped elements of one set to elements of another. Ideal for mapping values to IRIs. R2RML has no such functionality. That is why we choose to adopt R2RML-F, where such “translation tables” can be written in a JavaScript function. • R2RML-F also allows for transformation functions to be written when the underlying database technology has not support for that. Possibility to interlink with external datasets provided by R2RML 2018-10-23 14
  • 15. www.adaptcentre.ieRelated Work Related Work – generation of R2RML to the best of our knowledge limited. • Skjaeveland et al. 2015 proposed a method to generate an ontology, rules and a mapping from one description • TabLinker and CSV2DataCube are two tools for generating QB graphs from Excel files (in a certain format) and CSV data respectively • The Open Cube Toolkit has a built-in R2RML compliant D2R server, but it relies on a bespoke XML that maps source and DSD. 2018-10-23 15
  • 16. www.adaptcentre.ieConclusions • We argued that datasets are used for a purpose and that datasets should be built suitable for a purpose, including any policies it should comply with. • Before we can do the latter, we investigated the former by trying to answer the question: “Can we generate an R2RML mapping from a data structure definition?” • The answer is yes and we presented the R2DQB approach showing how. We strived for a declarative approach using SPARQL CONSTRUCT queries. A demonstration of the approach is presented in the paper. 2018-10-23 16
  • 17. www.adaptcentre.ieFuture work Tackling the problem of policy-aware mapping, which would complement research on post-hoc compliance analysis (e.g., Harsh et al. 2017). To be reported. The Metadata Vocabulary for Tabular Data (W3C Rec.). A vocabulary for describing the “schemas” of tabular data, including constraints. This might be another representation worth considering (future work) 2018-10-23 17