SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
The RDFIndex approach | MTSR 2013

Leveraging semantics to represent and compute
quantitative indexes.
The RDFIndex approach.
Michalis Vafopoulos (speaker)
and
{Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and
Jose Emilio Labra-Gayo}
MTSR 2013 | 7th Metadata and Semantics Research Conference
Main Track

1 / 43
The RDFIndex approach | MTSR 2013

1

Introduction

2

Related Work

3

Main Contributions

4

The RDFIndex approach

5

Evaluation and Discussion

6

Conclusions and Future Work

7

Metadata and Information

2 / 43
The RDFIndex approach | MTSR 2013
Introduction

The Motivating example...
Let’s suppose that...
1

We want to create a “Health
index”...

2

...to know which is the most
“healthy” country.

3

...to know which is the
performance of the health
expenditure .

4

...to collect in just one value a
set of indicators.

5

...to make some new policy.
3 / 43
The RDFIndex approach | MTSR 2013
Introduction

The Motivating example...
It is not a big deal because...
We have the “Health Index” by an international network of physicians
and researchers (http://www.healthindex.com/).
...or the “Health Index” by the United Nations
(http://hdrstats.undp.org/en/indicators/72206.html).
...or the “Ocean Health Index” by an international collaborative
effort (http://www.oceanhealthindex.org/).
...or the indicators in the World Bank
(http://data.worldbank.org/topic/health).
...

4 / 43
The RDFIndex approach | MTSR 2013
Introduction

The Motivating example...

Benefits of using an index...
1

Creation of valuable data and information.

2

Generation of know-how to make some policy.

3

Re-use of a great effort and commitment by experts in some area.

4

Rank entities according to a quantitative value.

5

...

5 / 43
The RDFIndex approach | MTSR 2013
Introduction

The Motivating example...
Drawbacks of existing indexes...
1

Data heterogeneity: different datasources, formats and access protocols.

2

Structure: math models to aggregate some indicators that can change over time.

3

Computation process: observations are gathered and processed, somehow, generating a
final value.

4

Documentation: multilingual and multicultural character of information.

5

...

..that imply the necessity of...
1

Accessing data and information under a common and shared data model.

2

Representing the evolving structure of the index.

3

Computing the index to improve transparency.

4

Providing context-aware documentation: user-profile.

5

... Exploiting valuable data and metadata.

6 / 43
The RDFIndex approach | MTSR 2013
Introduction

The Motivating example...
Drawbacks of existing indexes...
1

Data heterogeneity: different datasources, formats and access protocols.

2

Structure: math models to aggregate some indicators that can change over time.

3

Computation process: observations are gathered and processed, somehow, generating a
final value.

4

Documentation: multilingual and multicultural character of information.

5

...

..that imply the necessity of...
1

Accessing data and information under a common and shared data model.

2

Representing the evolving structure of the index.

3

Computing the index to improve transparency.

4

Providing context-aware documentation: user-profile.

5

... Exploiting valuable data and metadata.

7 / 43
The RDFIndex approach | MTSR 2013
Introduction

...but...Is it a common problem?
...some indexes (per domain)...
1

Bibliography: the JCR and JSR,
etc.

2

Government: the GDP, etc.

3

Web: the Webindex, etc.

4

Health: (the aforementioned
ones).

5

Cloud: the CSC Cloud Usage
Index, the VMWare index, the
SMI index, etc.

6

...to name a few per domain and
creators.
8 / 43
The RDFIndex approach | MTSR 2013
Introduction

A quantitative index is...
1

It is technique to collect in just one value a set of indicators.

2

It can be divided into: index, component and indicator.

3

An index is calculated by aggregating n components using an OWA
operator 1 .

4

A component is also calculated by aggregating n indicators using an
OWA operator.

5

Observations are aligned to an indicator including some metadata such
as: location, measure, value, etc.

1

Ordered weighted averaging operator:
aggregated element ai .

n
i =1

wi ai , where wi is the weight of the
9 / 43
The RDFIndex approach | MTSR 2013
Introduction

Graphical view of a quantitative index...

10 / 43
The RDFIndex approach | MTSR 2013
Related Work

Statistics and the Web of Data
Vocabularies
1

The Statistical Core Vocabulary [5] (SCOVO), a former standard to
describe statistical information in the Web of Data (2009).

2

The RDF Data Cube Vocabulary [2], an adaptation of the ISO
standard SDMX (Statistical Data and Metadata Exchange
Vocabulary) (2013).

3

“DDI-RDF discovery vocabulary. a metadata vocabulary for
documenting research and survey data.” (2013) [1] .

Preliminary evaluation...
Existing RDF-based vocabularies enable us the possibility of modelling and
representing statistical data.
11 / 43
The RDFIndex approach | MTSR 2013
Related Work

Statistics and the Web of Data
Vocabularies
1

The Statistical Core Vocabulary [5] (SCOVO), a former standard to
describe statistical information in the Web of Data (2009).

2

The RDF Data Cube Vocabulary [2], an adaptation of the ISO
standard SDMX (Statistical Data and Metadata Exchange
Vocabulary) (2013).

3

“DDI-RDF discovery vocabulary. a metadata vocabulary for
documenting research and survey data.” (2013) [1] .

Preliminary evaluation...
Existing RDF-based vocabularies enable us the possibility of modelling and
representing statistical data.
12 / 43
The RDFIndex approach | MTSR 2013
Related Work

Statistics and the Web of Data
Statistics and Linked Data
1

“Defining and Executing Assessment Tests on Linked Data for Statistical Analysis”
(2011) [8].

2

“Publishing Statistical Data on the Web” (2012) [7].

3

“Publishing open statistical data: the Spanish census” (2011) [4].

4

“Publishing Statistical Data following the Linked Open Data Principles: The Web Index
Project” (2012) [6].

5

“Linked Open Data Statistics: Collection and Exploitation” (2013) [3].

6

Some works in the “RDF Validation Workshop 2013”
(http://www.w3.org/2012/12/rdf-val/).

Preliminary evaluation...
All of the approaches are/were focused on data publishing/consumption...but...
Validation of statistical data and/or structure...and
the Computation process are still open issues.
13 / 43
The RDFIndex approach | MTSR 2013
Related Work

Statistics and the Web of Data
Statistics and Linked Data
1

“Defining and Executing Assessment Tests on Linked Data for Statistical Analysis”
(2011) [8].

2

“Publishing Statistical Data on the Web” (2012) [7].

3

“Publishing open statistical data: the Spanish census” (2011) [4].

4

“Publishing Statistical Data following the Linked Open Data Principles: The Web Index
Project” (2012) [6].

5

“Linked Open Data Statistics: Collection and Exploitation” (2013) [3].

6

Some works in the “RDF Validation Workshop 2013”
(http://www.w3.org/2012/12/rdf-val/).

Preliminary evaluation...
All of the approaches are/were focused on data publishing/consumption...but...
Validation of statistical data and/or structure...and
the Computation process are still open issues.
14 / 43
The RDFIndex approach | MTSR 2013
Main Contributions

Main Contributions

1-Representation
A high-level model on top of the RDF Data Cube Vocabulary for
representing quantitative indexes.

2-Computation
A Java-SPARQL based processor to exploit the meta-information,
validate and compute the new index values.

15 / 43
The RDFIndex approach | MTSR 2013
Main Contributions

Main Contributions

1-Representation
A high-level model on top of the RDF Data Cube Vocabulary for
representing quantitative indexes.

2-Computation
A Java-SPARQL based processor to exploit the meta-information,
validate and compute the new index values.

16 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Example: Building the “The World Bank Naive Index”.
Components: “Aid Efectiveness” (c1 ) and “Health” (c2 ).
Indicators: “Life Expectancy” (in1 ) and “Health expenditure, total
( %) of GDP” (in2 ).
The index, i, is calculated through the ordered weighted averaging
(OWA) operator: n wi ci , where wi is the weight of the component
i=1
ci
All observations must be normalized using the z-score before
computing intermediate and final values for each indicator, component
and index.
It is necessary to populate the average age by country and year to
create a new “derivate” indicator of “Life Expectancy” without the
“sex” dimension.
17 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Example of indicator observations from the WorldBank.

Description

Year

Country

Value

Status

Life Expectancy Male

2010

Spain

79

Normal

Life Expectancy Male

2011

Spain

79

Normal

Life Expectancy Female

2010

Spain

85

Normal

Life Expectancy Female

2011

Spain

85

Normal

Life Expectancy Male

2010

Greece

78

Normal

Life Expectancy Male

2011

Greece

79

Normal

Life Expectancy Female

2010

Greece

83

Normal

Life Expectancy Female

2011

Greece

83

Normal

Health expenditure, total ( % of GDP)

2010

Spain

74,2

Normal

Health expenditure, total ( % of GDP)

2011

Spain

73,6

Normal

Health expenditure, total ( % of GDP)

2010

Spain

61,5

Normal

Health expenditure, total ( % of GDP)

2011

Spain

61,2

Normal

18 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Representing and computing data with the RDFIndex
(summary)
Steps
1

2

Define the structure and computation of the index with the RDFIndex
vocabulary.
Run the Java SPARQL based processor:
Load the RDFIndex ontology to have access to common definitions.
Create a kind of Abstract Syntax Tree (AST) containing the defined
meta-data (our index).
Validate the structure with an AST walker (structure and RDF Data
Cube normalization).
Create the SPARQL queries to compute the index (other AST walker).
Load the observations and run the SPARQL queries to generate new
observations.
19 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Graphical view of the RDFIndex workflow...

20 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Underlying definitions
Observation-o
It is a tuple {v , m, s}, where v is a numerical value for the measure m with an status s that
belongs to only one dataset of observations O.

Dataset-q
It is a tuple {O, m, D, A, T } where O is a set of observations for only one measure m that is
described under a set of dimensions D and a set of annotations A. Additionally, some attributes
can be defined in the set T for structure enrichment.

Aggregated dataset-aq
It is an aggregation of n datasets qi (identified by the set Q) which set of observations O is
derivate by applying an OWA operator p to the observations Oqi .

21 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Scope and Consequences
Necessary condition for the computation process
An aggregated dataset aq defined by means of the set of dimensions Daq can be computed iif
∀qj ∈ Q : Daq ⊆ Dqj . Furthermore the OWA operator p can only aggregate values belonging to
the same measure m.
The set of dimensions D, annotations A and attributes T for a given dataset Q is always
the same with the aim of describing all observations under the same context.
An index i and a component c are aggregated datasets. Nevertheless this restriction is
relaxed if observations can be directly mapped to these elements without any computation
process.
An indicator in can be both dataset or aggregated dataset.
All elements in definitions must be uniquely identified.
An aggregated dataset is also a dataset.

22 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Mapping the RDFIndex to the RDF Data Cube Vocabulary
Concept

Vocabulary element

Comments

Observation o

qb:Observation

Enrichment through annotations

Numerical value v

xsd:double

Restriction to numerical values

Measure m

qb:MeasureProperty
sdmx-measure:obsValue

Restriction to one measure

Status s

sdmx-concept:obsStatus

Defined by the SDMX-RDF vocabulary

Dataset q

qb:dataSet
and
qb:qb:DataStructureDefinition

Metadata of the qb:dataSet

Dimension di ∈ D

qb:DimensionProperty

Context of observations

Annotation ai ∈ A

owl:AnnotationProperty

Intensive use of Dublin Core

Attribute ati ∈ T

qb:AttributeProperty

Link to existing datasets such as
DBPedia

OWA operator p

SPARQL 1.1 aggregation operators

Other extensions depend on the
RDF repository

Index, Component
and Indicator

skos:Concept

SKOS taxonomy (logical structure)

23 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Example of the “World Bank Naive index” structure in RDF.
§
@prefix
@prefix
@prefix

¤
r d f i n d e x : < h t t p : // p u r l . o r g / r d f i n d e x / o n t o l o g y /> .
r d f i n d e x −wb: < h t t p : // p u r l . o r g / r d f i n d e x /wb/ r e s o u r c e /> .
r d f i n d e x −w b o n t : < h t t p : // p u r l . o r g / r d f i n d e x /wb/ o n t o l o g y /> .

r d f i n d e x −w b : T h e W o r l d B a n k N a i v e I n d e x
a rdfindex:Index ;
r d f s : l a b e l " The ␣ World ␣ Bank ␣ N a i v e ␣ I n d e x " @en ;
rdfindex:type rdfindex:Quantitative ;
rdfindex:aggregates [
r d f i n d e x : a g g r e g a t i o n −o p e r a t o r r d f i n d e x : O W A ;
r d f i n d e x : p a r t −o f [
r d f i n d e x : d a t a s e t r d f i n d e x −w b : A i d E f f e c t i v e n e s s ;
rdfindex:weight 0.4];
r d f i n d e x : p a r t −o f [ r d f i n d e x : d a t a s e t r d f i n d e x −w b : H e a l t h ;
rdfindex:weight 0.6];
];
#More meta−d a t a p r o p e r t i e s . . .
qb:structure
r d f i n d e x −wb:TheWorldBankNaiveIndexDSD ; .
r d f i n d e x −wb:TheWorldBankNaiveIndexDSD
a qb:DataStructureDefinition ;
qb:component
[ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −a r e a ] ,
[ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −y e a r ] ,
[ qb:measure
rdfindex:value ] ,
[ q b : a t t r i b u t e sdmx−a t t r i b u t e : u n i t M e a s u r e ] ;
#More meta−d a t a p r o p e r t i e s . . .
.

¦


¥

24 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

SPARQL template for building aggregated observations.
§

¤

SELECT ( di ∈ D ) [ ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e ) | OWA( ? m e a s u r e ) ]
WHERE{
q rdfindex : aggregates ? parts .
? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f .
? p a r t o f r d f i n d e x : d a t a s e t qi .
FILTER ( ?partof ∈ Q ) .
? o b s e r v a t i o n r d f : t y p e qb : O b s e r v a t i o n .
? part r d f i n d e x : weight ? defaultw .
OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } .
BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w)
? o b s e r v a t i o n m ? measure .
? o b s e r v a t i o n ? dim ? dimR ef .
FILTER ( ?dim ∈ D ) .
}GROUP BY ( di ∈ D )

¦


¥

25 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Example of generated SPARQL query.
§

¤

p r e f i x r d f i n d e x : h t t p : / / p u r l . o r g / r d f i n d e x / o n t o l o g y /
SELECT ? dim0 ? dim1 ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e )
WHERE{
r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x
rdfindex : aggregates ? parts .
? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f .
? partof rdfindex : dataset ? part .
FILTER ( ( ? p a r t =r d f i n d e x −wb : A i d E f f e c t i v e n e s s ) | |
( ? p a r t =r d f i n d e x −wb : H e a l t h ) ) .
? o b s e r v a t i o n qb : d a t a S e t ? p a r t .
? part r d f i n d e x : weight ? defaultw .
OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } .
BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w)
? o b s e r v a t i o n r d f i n d e x : v a l u e ? measure .
? o b s e r v a t i o n r d f i n d e x −wbont : r e f −a r e a ? dim0 .
? o b s e r v a t i o n r d f i n d e x −wbont : r e f −y e a r ? dim1 .
} GROUP BY ? dim0 ? dim1

¦


¥

26 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Partial example of a populated observation for “The World
Bank Naive Index”.
§
r d f i n d e x −wb : o680 810085 1579
a
qb : o b s e r v a t i o n ;
qb : d a t a S e t r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x
;
r d f i n d e x −wbont : r e f −a r e a d b p e d i a −r e s : S p a i n ;
r d f i n d e x −wbont : r e f −y e a r
h t t p : / / r e f e r e n c e . d a t a . gov . uk / i d /
g r e g o r i a n − i n t e r v a l /2010−01−01T00 : 0 0 : 0 0 / P1Y ;
...
#r d f s : { l a b e l , comment} { l i t e r a l } ;
#d c t e r m s : { i s s u e d , d a t e , c o n t r i b u t o r , a u t h o r ,
#p u b l i s h e r , i d e n t i f i e r }
#{ r e s o u r c e , l i t e r a l } ;
r d f i n d e x : md5−h a s h  6 c d d a 7 6 0 8 8 c d 1 6 1 d 7 6 6 8 0 9 d 6 a 7 8 d 3 5 f 6  ;
sdmx−c o n c e p t : o b s S t a t u s
sdmx−c o d e : o b s S t a t u s −E ;
r d f i n d e x : v a l u e  0 . 7 0 7 ^^ x s d : d o u b l e .

¦


¤

¥
27 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Evolution: joint effort between the RDFIndex and Computex
Computex
1

Vocabulary for statistical computations.

2

Extends RDF Data Cube.

3

Ontology at: http://purl.org/weso/computex.

4

Data and structure validation using SPARQL Queries.

5

Some terms:

cex:Concept
cex:Indicator
cex:Computation
cex:WeightSchema
qb:Observation
6

Learn more: http://computex.herokuapp.com/

Preliminary Evaluation...
Similar approaches that have been merged between July and September, 2013.
28 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

Demo of the RDFIndex/Computex...

http://computex.herokuapp.com/
29 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

On-going working examples
The Cloud Index
Compilation of Key Performance Indicators for cloud services.
Creation of a Cloud Index to measure Quality of Service of cloud providers.
Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in
the subproject “Quality Management in Service-based Systems and Cloud
Applications”.
Demo: http://cloudindex-doc.herokuapp.com/

The Web Index 2012 and 2013
Compilation of several indicators (85) from 61 countries (100 in 2013).
Creation of the Web Index to measure the web impact over the world.
Funded by The Webfoundation.
Demo: http://thewebindex.org

30 / 43
The RDFIndex approach | MTSR 2013
The RDFIndex approach

On-going working examples
The Cloud Index
Compilation of Key Performance Indicators for cloud services.
Creation of a Cloud Index to measure Quality of Service of cloud providers.
Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in
the subproject “Quality Management in Service-based Systems and Cloud
Applications”.
Demo: http://cloudindex-doc.herokuapp.com/

The Web Index 2012 and 2013
Compilation of several indicators (85) from 61 countries (100 in 2013).
Creation of the Web Index to measure the web impact over the world.
Funded by The Webfoundation.
Demo: http://thewebindex.org

31 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Advantages
Data Sources
Data management applying the Linked Data principles.

Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.

Computation process
A native approach using SPARQL queries that can help to boost
transparency.

Documentation
Implicit multilingual and multicultural support.
32 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Advantages
Data Sources
Data management applying the Linked Data principles.

Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.

Computation process
A native approach using SPARQL queries that can help to boost
transparency.

Documentation
Implicit multilingual and multicultural support.
33 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Advantages
Data Sources
Data management applying the Linked Data principles.

Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.

Computation process
A native approach using SPARQL queries that can help to boost
transparency.

Documentation
Implicit multilingual and multicultural support.
34 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Advantages
Data Sources
Data management applying the Linked Data principles.

Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.

Computation process
A native approach using SPARQL queries that can help to boost
transparency.

Documentation
Implicit multilingual and multicultural support.
35 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Detailed advantages (I)
Feature

Main advantages

Data sources
Common and shared data model, RDF.
Description of data providers (provenance and trust).
A formal query language to query data, SPARQL.
Use of Internet protocols, HTTP.
Data enrichment and validation (domain and range).
Unique identification of entities, concepts, etc. through (HTTP) URIs.
Possibility of publishing new data under the aforementioned characteristics.
Standardization and integration of data sources.
Structure
Meta-description of index structure (validation).
Re-use of existing semantic web vocabularies.
Re-use of existing datasets to enrich meta-data.
Context-aware definitions.
Underlying logic formalism.
Orthogonal and flexible.

36 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Detailed advantages (II)
Computation
cess

proMeta-description of datasets aggregation.
Validation of composed datasets.
OWA operators support.
Direct translation to SPARQL queries.

Documentation
Multilingual support to describe datasets, etc.
Easy generation with existing tools.
Cross-Domain Features

Separation of concerns and responsibilities: data and meta-data (structure
and computation).
Standardization (put in action specs from organisms such as W3C).
Declarative and adaptive approach.
Non-vendor lock-in (format, access and computation).
Integration, Interoperability and Transparency.
Align to existing trend (data management: quality and filtering).
Easy integration with third-party services such as visualization.
Contribution to the Web of Data.
37 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Restrictions
SPARQL 1.1 support
1

Some built-in functions are not standardized:
Example: z-score employs standard deviation.
It requires built-in function: sqrt (only available in some SPARQL
implementations).

2

Ranking of values is not obvious:
GROUP_CONCAT
Check a value against all the other values.
...neither solution is efficient.

Limitations of the RDFIndex expressivity
The working examples show its applicability but some lack of
terms/relationships should be expected.
38 / 43
The RDFIndex approach | MTSR 2013
Evaluation and Discussion

Restrictions
SPARQL 1.1 support
1

Some built-in functions are not standardized:
Example: z-score employs standard deviation.
It requires built-in function: sqrt (only available in some SPARQL
implementations).

2

Ranking of values is not obvious:
GROUP_CONCAT
Check a value against all the other values.
...neither solution is efficient.

Limitations of the RDFIndex expressivity
The working examples show its applicability but some lack of
terms/relationships should be expected.
39 / 43
The RDFIndex approach | MTSR 2013
Conclusions and Future Work

Conclusions
1

The use of quantitative indexes is a common practice to compile and
rank indicators for making decisions.

2

Existing indexes are completely opaque (PDF, HTML, etc.)

3

It is necessary to improve the access to data/metadata and to
boost transparency.

4

The RDFIndex/Computex vocabulary seeks for providing the
appropriate building blocks to represent and compute indexes.
However:

5

SPARQL limitations avoid a fully development of “pedantic” Linked
Data approach.
The vocabulary likely does not cover all required elements to represent
any index.

40 / 43
The RDFIndex approach | MTSR 2013
Conclusions and Future Work

Future Work

1

Extend the RDFIndex/Computex vocabulary.

2

Externalize the computation with a hybrid approach R+SPARQL.

3

Parallelization of the computation process.

41 / 43
The RDFIndex approach | MTSR 2013
End of the presentation...

42 / 43
The RDFIndex approach | MTSR 2013
Metadata and Information

Roster...
Dr. Jose María Alvarez-Rodríguez
SEERC (until August, 2013) and Carlos III
University of Madrid, Spain
E-mail: josemaria.alvarez@uc3m.es
WWW: http://www.josemalvarez.es

Prof. Dr. Patricia Ordoñez De Pablos
University of Oviedo, Spain
E-mail: patriop@uniovi.es

Prof. Dr. Jose Emilio Labra-Gayo
University of Oviedo, Spain
E-mail: labra@uniovi.es
WWW: http://www.di.uniovi.es/~labra
43 / 43
The RDFIndex approach | MTSR 2013
Some SPARQL queries

Z-score normalization using SPARQL...
§

¤

prefix afn :  http :// jena . hpl . hp . com / ARQ / function # 
SELECT ( (? measure -? mean )/? stddev as ? zscore )
WHERE {
...
? observation rdfindex : value ? measure
{
SELECT ? mean ( afn : sqrt (( SUM ((? measure -? mean )*
(? measure -? mean ))/? count )) as ? stddev )
WHERE {
? observation rdfindex : value ? measure
{
SELECT ( COUNT (? measure ) as ? count ) ( AVG (? measure ) as ? mean )
WHERE {
? observation rdfindex : value ? measure
} GROUP BY ? count ? mean LIMIT 1
}
} GROUP BY ? mean ? count LIMIT 1
}
}

¦


¥
44 / 43
The RDFIndex approach | MTSR 2013
References

T. Bosch, R. Cyganiak, A. Gregory, and J. Wackerow.
DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data.
In 6th Workshop on Linked Data on the Web (LDOW2013), 2013.
R. Cyganiak and D. Reynolds.
The RDF Data Cube Vocabulary.
Working Draft, W3C, 2013.
http://www.w3.org/TR/vocab-data-cube/.
I. Ermilov, M. Martin, J. Lehmann, and S. Auer.
Linked Open Data Statistics: Collection and Exploitation.
In KESW, pages 242–249, 2013.
J. D. Fernández, M. A. Martínez-Prieto, and C. Gutiérrez.
Publishing open statistical data: the spanish census.
In DG.O, pages 20–25, 2011.
M. Hausenblas, W. Halb, Y. Raimond, L. Feigenbaum, and D. Ayers.
SCOVO: Using Statistics on the Web of Data.
In L. A. et al., editor, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in
Computer Science, pages 708–722. Springer Berlin Heidelberg, 2009.
J. M. A. Rodríguez, J. Clement, J. E. L. Gayo, H. Farhan, and P. Ordoñez De Pablos.
Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project, pages
199–226.
IGI Global, 2013.
P. E. R. Salas, F. M. D. Mota, K. K. Breitman, M. A. Casanova, M. Martin, and S. Auer.
Publishing Statistical Data on the Web.
Int. J. Semantic Computing, 6(4):373–388, 2012.
45 / 43
The RDFIndex approach | MTSR 2013
References

B. Zapilko and B. Mathiak.
Defining and Executing Assessment Tests on Linked Data for Statistical Analysis.
In COLD, 2011.

46 / 43
The RDFIndex approach | MTSR 2013
References

Leveraging semantics to represent and compute
quantitative indexes.
The RDFIndex approach.
Michalis Vafopoulos (speaker)
and
{Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and
Jose Emilio Labra-Gayo}
MTSR 2013 | 7th Metadata and Semantics Research Conference
Main Track

47 / 43

Weitere ähnliche Inhalte

Was ist angesagt?

Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionJennifer D'Souza
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingOntotext
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open DataIvan Herman
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesOpen Data Support
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsJenn Riley
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQLOpen Data Support
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good booklahorisher
 
[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and News[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and NewsOntotext
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesOntotext
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityOscar Corcho
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview Jennifer D'Souza
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphIntegrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphJennifer D'Souza
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) SkillsOscar Corcho
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataSören Auer
 
Finding and consuming (Linked) Open Data
Finding and consuming (Linked) Open DataFinding and consuming (Linked) Open Data
Finding and consuming (Linked) Open DataChristophe Guéret
 

Was ist angesagt? (20)

Entity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph CompletionEntity Linking, Link Prediction, and Knowledge Graph Completion
Entity Linking, Link Prediction, and Knowledge Graph Completion
 
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data LinkingAnalytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
Analytics on Big Knowledge Graphs Deliver Entity Awareness and Help Data Linking
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
Standardizing for Open Data
Standardizing for Open DataStandardizing for Open Data
Standardizing for Open Data
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
Linked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and ExamplesLinked Open Data Principles, Technologies and Examples
Linked Open Data Principles, Technologies and Examples
 
Applying Digital Library Metadata Standards
Applying Digital Library Metadata StandardsApplying Digital Library Metadata Standards
Applying Digital Library Metadata Standards
 
Introduction to RDF & SPARQL
Introduction to RDF & SPARQLIntroduction to RDF & SPARQL
Introduction to RDF & SPARQL
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
 
[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and News[Conference] Cognitive Graph Analytics on Company Data and News
[Conference] Cognitive Graph Analytics on Company Data and News
 
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven RecipesReasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
Reasoning with Big Knowledge Graphs: Choices, Pitfalls and Proven Recipes
 
Research Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibilityResearch Objects for improved sharing and reproducibility
Research Objects for improved sharing and reproducibility
 
Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview   Open Research Knowledge Graph (ORKG) - an overview
Open Research Knowledge Graph (ORKG) - an overview
 
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge GraphIntegrating Covid-19 Bioassays in the Open Research Knowledge Graph
Integrating Covid-19 Bioassays in the Open Research Knowledge Graph
 
(Big) Data (Science) Skills
(Big) Data (Science) Skills(Big) Data (Science) Skills
(Big) Data (Science) Skills
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Finding and consuming (Linked) Open Data
Finding and consuming (Linked) Open DataFinding and consuming (Linked) Open Data
Finding and consuming (Linked) Open Data
 

Andere mochten auch (6)

Simple Presentation for Slideshare
Simple Presentation for SlideshareSimple Presentation for Slideshare
Simple Presentation for Slideshare
 
Continuous testing
Continuous testingContinuous testing
Continuous testing
 
Búsqueda y Creación de Mashups (paso a paso)
Búsqueda y Creación de Mashups (paso a paso)Búsqueda y Creación de Mashups (paso a paso)
Búsqueda y Creación de Mashups (paso a paso)
 
SKOS intro
SKOS introSKOS intro
SKOS intro
 
MOLDEAS at City College
MOLDEAS at City CollegeMOLDEAS at City College
MOLDEAS at City College
 
Map/Reduce intro
Map/Reduce introMap/Reduce intro
Map/Reduce intro
 

Ähnlich wie Representing and Computing Quantitative Indexes in RDF

Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Digitalmikkeli
 
Implementing Open Access: Effective Management of Your Research Data
Implementing Open Access: Effective Management of Your Research DataImplementing Open Access: Effective Management of Your Research Data
Implementing Open Access: Effective Management of Your Research DataMartin Hamilton
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsThe University of Edinburgh
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...L Molloy
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDASarah Jones
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs UsageIRJET Journal
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...ResearchSpace
 
Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ... Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ... Research Data Alliance
 
Forms of cooperation between National Statistical Institutes and data archive...
Forms of cooperation between National Statistical Institutes and data archive...Forms of cooperation between National Statistical Institutes and data archive...
Forms of cooperation between National Statistical Institutes and data archive...Arhiv družboslovnih podatkov
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLHJisc
 
INVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDY
INVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDYINVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDY
INVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDYijseajournal
 

Ähnlich wie Representing and Computing Quantitative Indexes in RDF (20)

Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
Datajalostamo-seminaari 5.6.2014: Tutkimusdatan avoimuus – globaalit tutkimus...
 
Summary of workshop.ppt
Summary of workshop.pptSummary of workshop.ppt
Summary of workshop.ppt
 
Implementing Open Access: Effective Management of Your Research Data
Implementing Open Access: Effective Management of Your Research DataImplementing Open Access: Effective Management of Your Research Data
Implementing Open Access: Effective Management of Your Research Data
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Paving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflowsPaving the way to open and interoperable research data service workflows
Paving the way to open and interoperable research data service workflows
 
RDA Update
RDA UpdateRDA Update
RDA Update
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...Supporting Research Data Management in UK Universities: the Jisc Managing Res...
Supporting Research Data Management in UK Universities: the Jisc Managing Res...
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
Mining Social Media Data for Understanding Drugs Usage
Mining Social Media Data for Understanding Drugs  UsageMining Social Media Data for Understanding Drugs  Usage
Mining Social Media Data for Understanding Drugs Usage
 
ESRA presentation of ADP SORS cooperation
ESRA presentation of ADP SORS cooperationESRA presentation of ADP SORS cooperation
ESRA presentation of ADP SORS cooperation
 
Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...Paving the way to open and interoperable research data service workflows Prog...
Paving the way to open and interoperable research data service workflows Prog...
 
Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ... Research Data Alliance .. The Why, How, What ...
Research Data Alliance .. The Why, How, What ...
 
Dive deep into your Data Pools
Dive deep into your Data PoolsDive deep into your Data Pools
Dive deep into your Data Pools
 
Data literacy
Data literacyData literacy
Data literacy
 
2013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 20132013.05 - LDOW 2013 @ WWW 2013
2013.05 - LDOW 2013 @ WWW 2013
 
Bosch, Wackerow: Linked data on the web
Bosch, Wackerow: Linked data on the web Bosch, Wackerow: Linked data on the web
Bosch, Wackerow: Linked data on the web
 
Forms of cooperation between National Statistical Institutes and data archive...
Forms of cooperation between National Statistical Institutes and data archive...Forms of cooperation between National Statistical Institutes and data archive...
Forms of cooperation between National Statistical Institutes and data archive...
 
Data discovery and sharing at UCLH
Data discovery and sharing at UCLHData discovery and sharing at UCLH
Data discovery and sharing at UCLH
 
INVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDY
INVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDYINVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDY
INVESTIGATE,IDENTIFY AND ESTIMATE THE TECHNICAL DEBT: A SYSTEMATIC MAPPING STUDY
 

Mehr von CARLOS III UNIVERSITY OF MADRID

Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuseCARLOS III UNIVERSITY OF MADRID
 
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...CARLOS III UNIVERSITY OF MADRID
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...CARLOS III UNIVERSITY OF MADRID
 
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...CARLOS III UNIVERSITY OF MADRID
 
Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...CARLOS III UNIVERSITY OF MADRID
 
OSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainOSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainCARLOS III UNIVERSITY OF MADRID
 
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...CARLOS III UNIVERSITY OF MADRID
 
Systems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modellingSystems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modellingCARLOS III UNIVERSITY OF MADRID
 
Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...CARLOS III UNIVERSITY OF MADRID
 
News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...CARLOS III UNIVERSITY OF MADRID
 

Mehr von CARLOS III UNIVERSITY OF MADRID (20)

Proyecto IVERES-UC3M
Proyecto IVERES-UC3MProyecto IVERES-UC3M
Proyecto IVERES-UC3M
 
RTVE: Sustainable Development Goal Radar
RTVE: Sustainable Development Goal  RadarRTVE: Sustainable Development Goal  Radar
RTVE: Sustainable Development Goal Radar
 
Engineering 4.0: Digitization through task automation and reuse
Engineering 4.0:  Digitization through task automation and reuseEngineering 4.0:  Digitization through task automation and reuse
Engineering 4.0: Digitization through task automation and reuse
 
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
LOTAR-PDES: Engineering digitalization through task automation and reuse in t...
 
SESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/MLSESE 2021: Where Systems Engineering meets AI/ML
SESE 2021: Where Systems Engineering meets AI/ML
 
Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...Sailing the V: Engineering digitalization through task automation and reuse i...
Sailing the V: Engineering digitalization through task automation and reuse i...
 
Deep Learning Notes
Deep Learning NotesDeep Learning Notes
Deep Learning Notes
 
H2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional DesignH2020-AHTOOLS Use Case 3 Functional Design
H2020-AHTOOLS Use Case 3 Functional Design
 
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...AI4SE: Challenges and opportunities in the integration of Systems Engineering...
AI4SE: Challenges and opportunities in the integration of Systems Engineering...
 
INCOSE IS 2019: AI and Systems Engineering
INCOSE IS 2019: AI and Systems EngineeringINCOSE IS 2019: AI and Systems Engineering
INCOSE IS 2019: AI and Systems Engineering
 
Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...Challenges in the integration of Systems Engineering and the AI/ML model life...
Challenges in the integration of Systems Engineering and the AI/ML model life...
 
Blockchain en la Industria Musical
Blockchain en la Industria MusicalBlockchain en la Industria Musical
Blockchain en la Industria Musical
 
OSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchainOSLC KM: Elevating the meaning of data and operations within the toolchain
OSLC KM: Elevating the meaning of data and operations within the toolchain
 
Blockchain y sector asegurador
Blockchain y sector aseguradorBlockchain y sector asegurador
Blockchain y sector asegurador
 
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
OSLC KM (Knowledge Management): elevating the meaning of data and operations ...
 
Systems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modellingSystems and Software Architecture: an introduction to architectural modelling
Systems and Software Architecture: an introduction to architectural modelling
 
Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...Detection of fraud in financial blockchain-based transactions through big dat...
Detection of fraud in financial blockchain-based transactions through big dat...
 
News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...News headline generation with sentiment and patterns: A case study of sports ...
News headline generation with sentiment and patterns: A case study of sports ...
 
Blockchain y la industria musical
Blockchain y la industria musicalBlockchain y la industria musical
Blockchain y la industria musical
 
Preparing your Big Data start-up pitch
Preparing your Big Data start-up pitchPreparing your Big Data start-up pitch
Preparing your Big Data start-up pitch
 

Kürzlich hochgeladen

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 

Kürzlich hochgeladen (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 

Representing and Computing Quantitative Indexes in RDF

  • 1. The RDFIndex approach | MTSR 2013 Leveraging semantics to represent and compute quantitative indexes. The RDFIndex approach. Michalis Vafopoulos (speaker) and {Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and Jose Emilio Labra-Gayo} MTSR 2013 | 7th Metadata and Semantics Research Conference Main Track 1 / 43
  • 2. The RDFIndex approach | MTSR 2013 1 Introduction 2 Related Work 3 Main Contributions 4 The RDFIndex approach 5 Evaluation and Discussion 6 Conclusions and Future Work 7 Metadata and Information 2 / 43
  • 3. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Let’s suppose that... 1 We want to create a “Health index”... 2 ...to know which is the most “healthy” country. 3 ...to know which is the performance of the health expenditure . 4 ...to collect in just one value a set of indicators. 5 ...to make some new policy. 3 / 43
  • 4. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... It is not a big deal because... We have the “Health Index” by an international network of physicians and researchers (http://www.healthindex.com/). ...or the “Health Index” by the United Nations (http://hdrstats.undp.org/en/indicators/72206.html). ...or the “Ocean Health Index” by an international collaborative effort (http://www.oceanhealthindex.org/). ...or the indicators in the World Bank (http://data.worldbank.org/topic/health). ... 4 / 43
  • 5. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Benefits of using an index... 1 Creation of valuable data and information. 2 Generation of know-how to make some policy. 3 Re-use of a great effort and commitment by experts in some area. 4 Rank entities according to a quantitative value. 5 ... 5 / 43
  • 6. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Drawbacks of existing indexes... 1 Data heterogeneity: different datasources, formats and access protocols. 2 Structure: math models to aggregate some indicators that can change over time. 3 Computation process: observations are gathered and processed, somehow, generating a final value. 4 Documentation: multilingual and multicultural character of information. 5 ... ..that imply the necessity of... 1 Accessing data and information under a common and shared data model. 2 Representing the evolving structure of the index. 3 Computing the index to improve transparency. 4 Providing context-aware documentation: user-profile. 5 ... Exploiting valuable data and metadata. 6 / 43
  • 7. The RDFIndex approach | MTSR 2013 Introduction The Motivating example... Drawbacks of existing indexes... 1 Data heterogeneity: different datasources, formats and access protocols. 2 Structure: math models to aggregate some indicators that can change over time. 3 Computation process: observations are gathered and processed, somehow, generating a final value. 4 Documentation: multilingual and multicultural character of information. 5 ... ..that imply the necessity of... 1 Accessing data and information under a common and shared data model. 2 Representing the evolving structure of the index. 3 Computing the index to improve transparency. 4 Providing context-aware documentation: user-profile. 5 ... Exploiting valuable data and metadata. 7 / 43
  • 8. The RDFIndex approach | MTSR 2013 Introduction ...but...Is it a common problem? ...some indexes (per domain)... 1 Bibliography: the JCR and JSR, etc. 2 Government: the GDP, etc. 3 Web: the Webindex, etc. 4 Health: (the aforementioned ones). 5 Cloud: the CSC Cloud Usage Index, the VMWare index, the SMI index, etc. 6 ...to name a few per domain and creators. 8 / 43
  • 9. The RDFIndex approach | MTSR 2013 Introduction A quantitative index is... 1 It is technique to collect in just one value a set of indicators. 2 It can be divided into: index, component and indicator. 3 An index is calculated by aggregating n components using an OWA operator 1 . 4 A component is also calculated by aggregating n indicators using an OWA operator. 5 Observations are aligned to an indicator including some metadata such as: location, measure, value, etc. 1 Ordered weighted averaging operator: aggregated element ai . n i =1 wi ai , where wi is the weight of the 9 / 43
  • 10. The RDFIndex approach | MTSR 2013 Introduction Graphical view of a quantitative index... 10 / 43
  • 11. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Vocabularies 1 The Statistical Core Vocabulary [5] (SCOVO), a former standard to describe statistical information in the Web of Data (2009). 2 The RDF Data Cube Vocabulary [2], an adaptation of the ISO standard SDMX (Statistical Data and Metadata Exchange Vocabulary) (2013). 3 “DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data.” (2013) [1] . Preliminary evaluation... Existing RDF-based vocabularies enable us the possibility of modelling and representing statistical data. 11 / 43
  • 12. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Vocabularies 1 The Statistical Core Vocabulary [5] (SCOVO), a former standard to describe statistical information in the Web of Data (2009). 2 The RDF Data Cube Vocabulary [2], an adaptation of the ISO standard SDMX (Statistical Data and Metadata Exchange Vocabulary) (2013). 3 “DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data.” (2013) [1] . Preliminary evaluation... Existing RDF-based vocabularies enable us the possibility of modelling and representing statistical data. 12 / 43
  • 13. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Statistics and Linked Data 1 “Defining and Executing Assessment Tests on Linked Data for Statistical Analysis” (2011) [8]. 2 “Publishing Statistical Data on the Web” (2012) [7]. 3 “Publishing open statistical data: the Spanish census” (2011) [4]. 4 “Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project” (2012) [6]. 5 “Linked Open Data Statistics: Collection and Exploitation” (2013) [3]. 6 Some works in the “RDF Validation Workshop 2013” (http://www.w3.org/2012/12/rdf-val/). Preliminary evaluation... All of the approaches are/were focused on data publishing/consumption...but... Validation of statistical data and/or structure...and the Computation process are still open issues. 13 / 43
  • 14. The RDFIndex approach | MTSR 2013 Related Work Statistics and the Web of Data Statistics and Linked Data 1 “Defining and Executing Assessment Tests on Linked Data for Statistical Analysis” (2011) [8]. 2 “Publishing Statistical Data on the Web” (2012) [7]. 3 “Publishing open statistical data: the Spanish census” (2011) [4]. 4 “Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project” (2012) [6]. 5 “Linked Open Data Statistics: Collection and Exploitation” (2013) [3]. 6 Some works in the “RDF Validation Workshop 2013” (http://www.w3.org/2012/12/rdf-val/). Preliminary evaluation... All of the approaches are/were focused on data publishing/consumption...but... Validation of statistical data and/or structure...and the Computation process are still open issues. 14 / 43
  • 15. The RDFIndex approach | MTSR 2013 Main Contributions Main Contributions 1-Representation A high-level model on top of the RDF Data Cube Vocabulary for representing quantitative indexes. 2-Computation A Java-SPARQL based processor to exploit the meta-information, validate and compute the new index values. 15 / 43
  • 16. The RDFIndex approach | MTSR 2013 Main Contributions Main Contributions 1-Representation A high-level model on top of the RDF Data Cube Vocabulary for representing quantitative indexes. 2-Computation A Java-SPARQL based processor to exploit the meta-information, validate and compute the new index values. 16 / 43
  • 17. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example: Building the “The World Bank Naive Index”. Components: “Aid Efectiveness” (c1 ) and “Health” (c2 ). Indicators: “Life Expectancy” (in1 ) and “Health expenditure, total ( %) of GDP” (in2 ). The index, i, is calculated through the ordered weighted averaging (OWA) operator: n wi ci , where wi is the weight of the component i=1 ci All observations must be normalized using the z-score before computing intermediate and final values for each indicator, component and index. It is necessary to populate the average age by country and year to create a new “derivate” indicator of “Life Expectancy” without the “sex” dimension. 17 / 43
  • 18. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example of indicator observations from the WorldBank. Description Year Country Value Status Life Expectancy Male 2010 Spain 79 Normal Life Expectancy Male 2011 Spain 79 Normal Life Expectancy Female 2010 Spain 85 Normal Life Expectancy Female 2011 Spain 85 Normal Life Expectancy Male 2010 Greece 78 Normal Life Expectancy Male 2011 Greece 79 Normal Life Expectancy Female 2010 Greece 83 Normal Life Expectancy Female 2011 Greece 83 Normal Health expenditure, total ( % of GDP) 2010 Spain 74,2 Normal Health expenditure, total ( % of GDP) 2011 Spain 73,6 Normal Health expenditure, total ( % of GDP) 2010 Spain 61,5 Normal Health expenditure, total ( % of GDP) 2011 Spain 61,2 Normal 18 / 43
  • 19. The RDFIndex approach | MTSR 2013 The RDFIndex approach Representing and computing data with the RDFIndex (summary) Steps 1 2 Define the structure and computation of the index with the RDFIndex vocabulary. Run the Java SPARQL based processor: Load the RDFIndex ontology to have access to common definitions. Create a kind of Abstract Syntax Tree (AST) containing the defined meta-data (our index). Validate the structure with an AST walker (structure and RDF Data Cube normalization). Create the SPARQL queries to compute the index (other AST walker). Load the observations and run the SPARQL queries to generate new observations. 19 / 43
  • 20. The RDFIndex approach | MTSR 2013 The RDFIndex approach Graphical view of the RDFIndex workflow... 20 / 43
  • 21. The RDFIndex approach | MTSR 2013 The RDFIndex approach Underlying definitions Observation-o It is a tuple {v , m, s}, where v is a numerical value for the measure m with an status s that belongs to only one dataset of observations O. Dataset-q It is a tuple {O, m, D, A, T } where O is a set of observations for only one measure m that is described under a set of dimensions D and a set of annotations A. Additionally, some attributes can be defined in the set T for structure enrichment. Aggregated dataset-aq It is an aggregation of n datasets qi (identified by the set Q) which set of observations O is derivate by applying an OWA operator p to the observations Oqi . 21 / 43
  • 22. The RDFIndex approach | MTSR 2013 The RDFIndex approach Scope and Consequences Necessary condition for the computation process An aggregated dataset aq defined by means of the set of dimensions Daq can be computed iif ∀qj ∈ Q : Daq ⊆ Dqj . Furthermore the OWA operator p can only aggregate values belonging to the same measure m. The set of dimensions D, annotations A and attributes T for a given dataset Q is always the same with the aim of describing all observations under the same context. An index i and a component c are aggregated datasets. Nevertheless this restriction is relaxed if observations can be directly mapped to these elements without any computation process. An indicator in can be both dataset or aggregated dataset. All elements in definitions must be uniquely identified. An aggregated dataset is also a dataset. 22 / 43
  • 23. The RDFIndex approach | MTSR 2013 The RDFIndex approach Mapping the RDFIndex to the RDF Data Cube Vocabulary Concept Vocabulary element Comments Observation o qb:Observation Enrichment through annotations Numerical value v xsd:double Restriction to numerical values Measure m qb:MeasureProperty sdmx-measure:obsValue Restriction to one measure Status s sdmx-concept:obsStatus Defined by the SDMX-RDF vocabulary Dataset q qb:dataSet and qb:qb:DataStructureDefinition Metadata of the qb:dataSet Dimension di ∈ D qb:DimensionProperty Context of observations Annotation ai ∈ A owl:AnnotationProperty Intensive use of Dublin Core Attribute ati ∈ T qb:AttributeProperty Link to existing datasets such as DBPedia OWA operator p SPARQL 1.1 aggregation operators Other extensions depend on the RDF repository Index, Component and Indicator skos:Concept SKOS taxonomy (logical structure) 23 / 43
  • 24. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example of the “World Bank Naive index” structure in RDF. § @prefix @prefix @prefix ¤ r d f i n d e x : < h t t p : // p u r l . o r g / r d f i n d e x / o n t o l o g y /> . r d f i n d e x −wb: < h t t p : // p u r l . o r g / r d f i n d e x /wb/ r e s o u r c e /> . r d f i n d e x −w b o n t : < h t t p : // p u r l . o r g / r d f i n d e x /wb/ o n t o l o g y /> . r d f i n d e x −w b : T h e W o r l d B a n k N a i v e I n d e x a rdfindex:Index ; r d f s : l a b e l " The ␣ World ␣ Bank ␣ N a i v e ␣ I n d e x " @en ; rdfindex:type rdfindex:Quantitative ; rdfindex:aggregates [ r d f i n d e x : a g g r e g a t i o n −o p e r a t o r r d f i n d e x : O W A ; r d f i n d e x : p a r t −o f [ r d f i n d e x : d a t a s e t r d f i n d e x −w b : A i d E f f e c t i v e n e s s ; rdfindex:weight 0.4]; r d f i n d e x : p a r t −o f [ r d f i n d e x : d a t a s e t r d f i n d e x −w b : H e a l t h ; rdfindex:weight 0.6]; ]; #More meta−d a t a p r o p e r t i e s . . . qb:structure r d f i n d e x −wb:TheWorldBankNaiveIndexDSD ; . r d f i n d e x −wb:TheWorldBankNaiveIndexDSD a qb:DataStructureDefinition ; qb:component [ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −a r e a ] , [ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −y e a r ] , [ qb:measure rdfindex:value ] , [ q b : a t t r i b u t e sdmx−a t t r i b u t e : u n i t M e a s u r e ] ; #More meta−d a t a p r o p e r t i e s . . . . ¦ ¥ 24 / 43
  • 25. The RDFIndex approach | MTSR 2013 The RDFIndex approach SPARQL template for building aggregated observations. § ¤ SELECT ( di ∈ D ) [ ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e ) | OWA( ? m e a s u r e ) ] WHERE{ q rdfindex : aggregates ? parts . ? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f . ? p a r t o f r d f i n d e x : d a t a s e t qi . FILTER ( ?partof ∈ Q ) . ? o b s e r v a t i o n r d f : t y p e qb : O b s e r v a t i o n . ? part r d f i n d e x : weight ? defaultw . OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } . BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w) ? o b s e r v a t i o n m ? measure . ? o b s e r v a t i o n ? dim ? dimR ef . FILTER ( ?dim ∈ D ) . }GROUP BY ( di ∈ D ) ¦ ¥ 25 / 43
  • 26. The RDFIndex approach | MTSR 2013 The RDFIndex approach Example of generated SPARQL query. § ¤ p r e f i x r d f i n d e x : h t t p : / / p u r l . o r g / r d f i n d e x / o n t o l o g y / SELECT ? dim0 ? dim1 ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e ) WHERE{ r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x rdfindex : aggregates ? parts . ? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f . ? partof rdfindex : dataset ? part . FILTER ( ( ? p a r t =r d f i n d e x −wb : A i d E f f e c t i v e n e s s ) | | ( ? p a r t =r d f i n d e x −wb : H e a l t h ) ) . ? o b s e r v a t i o n qb : d a t a S e t ? p a r t . ? part r d f i n d e x : weight ? defaultw . OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } . BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w) ? o b s e r v a t i o n r d f i n d e x : v a l u e ? measure . ? o b s e r v a t i o n r d f i n d e x −wbont : r e f −a r e a ? dim0 . ? o b s e r v a t i o n r d f i n d e x −wbont : r e f −y e a r ? dim1 . } GROUP BY ? dim0 ? dim1 ¦ ¥ 26 / 43
  • 27. The RDFIndex approach | MTSR 2013 The RDFIndex approach Partial example of a populated observation for “The World Bank Naive Index”. § r d f i n d e x −wb : o680 810085 1579 a qb : o b s e r v a t i o n ; qb : d a t a S e t r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x ; r d f i n d e x −wbont : r e f −a r e a d b p e d i a −r e s : S p a i n ; r d f i n d e x −wbont : r e f −y e a r h t t p : / / r e f e r e n c e . d a t a . gov . uk / i d / g r e g o r i a n − i n t e r v a l /2010−01−01T00 : 0 0 : 0 0 / P1Y ; ... #r d f s : { l a b e l , comment} { l i t e r a l } ; #d c t e r m s : { i s s u e d , d a t e , c o n t r i b u t o r , a u t h o r , #p u b l i s h e r , i d e n t i f i e r } #{ r e s o u r c e , l i t e r a l } ; r d f i n d e x : md5−h a s h 6 c d d a 7 6 0 8 8 c d 1 6 1 d 7 6 6 8 0 9 d 6 a 7 8 d 3 5 f 6 ; sdmx−c o n c e p t : o b s S t a t u s sdmx−c o d e : o b s S t a t u s −E ; r d f i n d e x : v a l u e 0 . 7 0 7 ^^ x s d : d o u b l e . ¦ ¤ ¥ 27 / 43
  • 28. The RDFIndex approach | MTSR 2013 The RDFIndex approach Evolution: joint effort between the RDFIndex and Computex Computex 1 Vocabulary for statistical computations. 2 Extends RDF Data Cube. 3 Ontology at: http://purl.org/weso/computex. 4 Data and structure validation using SPARQL Queries. 5 Some terms: cex:Concept cex:Indicator cex:Computation cex:WeightSchema qb:Observation 6 Learn more: http://computex.herokuapp.com/ Preliminary Evaluation... Similar approaches that have been merged between July and September, 2013. 28 / 43
  • 29. The RDFIndex approach | MTSR 2013 The RDFIndex approach Demo of the RDFIndex/Computex... http://computex.herokuapp.com/ 29 / 43
  • 30. The RDFIndex approach | MTSR 2013 The RDFIndex approach On-going working examples The Cloud Index Compilation of Key Performance Indicators for cloud services. Creation of a Cloud Index to measure Quality of Service of cloud providers. Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in the subproject “Quality Management in Service-based Systems and Cloud Applications”. Demo: http://cloudindex-doc.herokuapp.com/ The Web Index 2012 and 2013 Compilation of several indicators (85) from 61 countries (100 in 2013). Creation of the Web Index to measure the web impact over the world. Funded by The Webfoundation. Demo: http://thewebindex.org 30 / 43
  • 31. The RDFIndex approach | MTSR 2013 The RDFIndex approach On-going working examples The Cloud Index Compilation of Key Performance Indicators for cloud services. Creation of a Cloud Index to measure Quality of Service of cloud providers. Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in the subproject “Quality Management in Service-based Systems and Cloud Applications”. Demo: http://cloudindex-doc.herokuapp.com/ The Web Index 2012 and 2013 Compilation of several indicators (85) from 61 countries (100 in 2013). Creation of the Web Index to measure the web impact over the world. Funded by The Webfoundation. Demo: http://thewebindex.org 31 / 43
  • 32. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 32 / 43
  • 33. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 33 / 43
  • 34. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 34 / 43
  • 35. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Advantages Data Sources Data management applying the Linked Data principles. Structure Automatic validation and verification of the structure and metadata of quantitative indexes. Computation process A native approach using SPARQL queries that can help to boost transparency. Documentation Implicit multilingual and multicultural support. 35 / 43
  • 36. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Detailed advantages (I) Feature Main advantages Data sources Common and shared data model, RDF. Description of data providers (provenance and trust). A formal query language to query data, SPARQL. Use of Internet protocols, HTTP. Data enrichment and validation (domain and range). Unique identification of entities, concepts, etc. through (HTTP) URIs. Possibility of publishing new data under the aforementioned characteristics. Standardization and integration of data sources. Structure Meta-description of index structure (validation). Re-use of existing semantic web vocabularies. Re-use of existing datasets to enrich meta-data. Context-aware definitions. Underlying logic formalism. Orthogonal and flexible. 36 / 43
  • 37. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Detailed advantages (II) Computation cess proMeta-description of datasets aggregation. Validation of composed datasets. OWA operators support. Direct translation to SPARQL queries. Documentation Multilingual support to describe datasets, etc. Easy generation with existing tools. Cross-Domain Features Separation of concerns and responsibilities: data and meta-data (structure and computation). Standardization (put in action specs from organisms such as W3C). Declarative and adaptive approach. Non-vendor lock-in (format, access and computation). Integration, Interoperability and Transparency. Align to existing trend (data management: quality and filtering). Easy integration with third-party services such as visualization. Contribution to the Web of Data. 37 / 43
  • 38. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Restrictions SPARQL 1.1 support 1 Some built-in functions are not standardized: Example: z-score employs standard deviation. It requires built-in function: sqrt (only available in some SPARQL implementations). 2 Ranking of values is not obvious: GROUP_CONCAT Check a value against all the other values. ...neither solution is efficient. Limitations of the RDFIndex expressivity The working examples show its applicability but some lack of terms/relationships should be expected. 38 / 43
  • 39. The RDFIndex approach | MTSR 2013 Evaluation and Discussion Restrictions SPARQL 1.1 support 1 Some built-in functions are not standardized: Example: z-score employs standard deviation. It requires built-in function: sqrt (only available in some SPARQL implementations). 2 Ranking of values is not obvious: GROUP_CONCAT Check a value against all the other values. ...neither solution is efficient. Limitations of the RDFIndex expressivity The working examples show its applicability but some lack of terms/relationships should be expected. 39 / 43
  • 40. The RDFIndex approach | MTSR 2013 Conclusions and Future Work Conclusions 1 The use of quantitative indexes is a common practice to compile and rank indicators for making decisions. 2 Existing indexes are completely opaque (PDF, HTML, etc.) 3 It is necessary to improve the access to data/metadata and to boost transparency. 4 The RDFIndex/Computex vocabulary seeks for providing the appropriate building blocks to represent and compute indexes. However: 5 SPARQL limitations avoid a fully development of “pedantic” Linked Data approach. The vocabulary likely does not cover all required elements to represent any index. 40 / 43
  • 41. The RDFIndex approach | MTSR 2013 Conclusions and Future Work Future Work 1 Extend the RDFIndex/Computex vocabulary. 2 Externalize the computation with a hybrid approach R+SPARQL. 3 Parallelization of the computation process. 41 / 43
  • 42. The RDFIndex approach | MTSR 2013 End of the presentation... 42 / 43
  • 43. The RDFIndex approach | MTSR 2013 Metadata and Information Roster... Dr. Jose María Alvarez-Rodríguez SEERC (until August, 2013) and Carlos III University of Madrid, Spain E-mail: josemaria.alvarez@uc3m.es WWW: http://www.josemalvarez.es Prof. Dr. Patricia Ordoñez De Pablos University of Oviedo, Spain E-mail: patriop@uniovi.es Prof. Dr. Jose Emilio Labra-Gayo University of Oviedo, Spain E-mail: labra@uniovi.es WWW: http://www.di.uniovi.es/~labra 43 / 43
  • 44. The RDFIndex approach | MTSR 2013 Some SPARQL queries Z-score normalization using SPARQL... § ¤ prefix afn : http :// jena . hpl . hp . com / ARQ / function # SELECT ( (? measure -? mean )/? stddev as ? zscore ) WHERE { ... ? observation rdfindex : value ? measure { SELECT ? mean ( afn : sqrt (( SUM ((? measure -? mean )* (? measure -? mean ))/? count )) as ? stddev ) WHERE { ? observation rdfindex : value ? measure { SELECT ( COUNT (? measure ) as ? count ) ( AVG (? measure ) as ? mean ) WHERE { ? observation rdfindex : value ? measure } GROUP BY ? count ? mean LIMIT 1 } } GROUP BY ? mean ? count LIMIT 1 } } ¦ ¥ 44 / 43
  • 45. The RDFIndex approach | MTSR 2013 References T. Bosch, R. Cyganiak, A. Gregory, and J. Wackerow. DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data. In 6th Workshop on Linked Data on the Web (LDOW2013), 2013. R. Cyganiak and D. Reynolds. The RDF Data Cube Vocabulary. Working Draft, W3C, 2013. http://www.w3.org/TR/vocab-data-cube/. I. Ermilov, M. Martin, J. Lehmann, and S. Auer. Linked Open Data Statistics: Collection and Exploitation. In KESW, pages 242–249, 2013. J. D. Fernández, M. A. Martínez-Prieto, and C. Gutiérrez. Publishing open statistical data: the spanish census. In DG.O, pages 20–25, 2011. M. Hausenblas, W. Halb, Y. Raimond, L. Feigenbaum, and D. Ayers. SCOVO: Using Statistics on the Web of Data. In L. A. et al., editor, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in Computer Science, pages 708–722. Springer Berlin Heidelberg, 2009. J. M. A. Rodríguez, J. Clement, J. E. L. Gayo, H. Farhan, and P. Ordoñez De Pablos. Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project, pages 199–226. IGI Global, 2013. P. E. R. Salas, F. M. D. Mota, K. K. Breitman, M. A. Casanova, M. Martin, and S. Auer. Publishing Statistical Data on the Web. Int. J. Semantic Computing, 6(4):373–388, 2012. 45 / 43
  • 46. The RDFIndex approach | MTSR 2013 References B. Zapilko and B. Mathiak. Defining and Executing Assessment Tests on Linked Data for Statistical Analysis. In COLD, 2011. 46 / 43
  • 47. The RDFIndex approach | MTSR 2013 References Leveraging semantics to represent and compute quantitative indexes. The RDFIndex approach. Michalis Vafopoulos (speaker) and {Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and Jose Emilio Labra-Gayo} MTSR 2013 | 7th Metadata and Semantics Research Conference Main Track 47 / 43