The document describes an approach called RDFIndex for representing and computing quantitative indexes using semantic web technologies. The main contributions are a high-level model built on top of the RDF Data Cube Vocabulary for representing indexes, and a Java-SPARQL based processor to exploit metadata, validate indexes, and compute new index values. An example index called the "World Bank Naive Index" is used to illustrate how the RDFIndex approach can represent the structure of an index and its components/indicators in RDF, and compute the index values.
Representing and Computing Quantitative Indexes in RDF
1. The RDFIndex approach | MTSR 2013
Leveraging semantics to represent and compute
quantitative indexes.
The RDFIndex approach.
Michalis Vafopoulos (speaker)
and
{Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and
Jose Emilio Labra-Gayo}
MTSR 2013 | 7th Metadata and Semantics Research Conference
Main Track
1 / 43
2. The RDFIndex approach | MTSR 2013
1
Introduction
2
Related Work
3
Main Contributions
4
The RDFIndex approach
5
Evaluation and Discussion
6
Conclusions and Future Work
7
Metadata and Information
2 / 43
3. The RDFIndex approach | MTSR 2013
Introduction
The Motivating example...
Let’s suppose that...
1
We want to create a “Health
index”...
2
...to know which is the most
“healthy” country.
3
...to know which is the
performance of the health
expenditure .
4
...to collect in just one value a
set of indicators.
5
...to make some new policy.
3 / 43
4. The RDFIndex approach | MTSR 2013
Introduction
The Motivating example...
It is not a big deal because...
We have the “Health Index” by an international network of physicians
and researchers (http://www.healthindex.com/).
...or the “Health Index” by the United Nations
(http://hdrstats.undp.org/en/indicators/72206.html).
...or the “Ocean Health Index” by an international collaborative
effort (http://www.oceanhealthindex.org/).
...or the indicators in the World Bank
(http://data.worldbank.org/topic/health).
...
4 / 43
5. The RDFIndex approach | MTSR 2013
Introduction
The Motivating example...
Benefits of using an index...
1
Creation of valuable data and information.
2
Generation of know-how to make some policy.
3
Re-use of a great effort and commitment by experts in some area.
4
Rank entities according to a quantitative value.
5
...
5 / 43
6. The RDFIndex approach | MTSR 2013
Introduction
The Motivating example...
Drawbacks of existing indexes...
1
Data heterogeneity: different datasources, formats and access protocols.
2
Structure: math models to aggregate some indicators that can change over time.
3
Computation process: observations are gathered and processed, somehow, generating a
final value.
4
Documentation: multilingual and multicultural character of information.
5
...
..that imply the necessity of...
1
Accessing data and information under a common and shared data model.
2
Representing the evolving structure of the index.
3
Computing the index to improve transparency.
4
Providing context-aware documentation: user-profile.
5
... Exploiting valuable data and metadata.
6 / 43
7. The RDFIndex approach | MTSR 2013
Introduction
The Motivating example...
Drawbacks of existing indexes...
1
Data heterogeneity: different datasources, formats and access protocols.
2
Structure: math models to aggregate some indicators that can change over time.
3
Computation process: observations are gathered and processed, somehow, generating a
final value.
4
Documentation: multilingual and multicultural character of information.
5
...
..that imply the necessity of...
1
Accessing data and information under a common and shared data model.
2
Representing the evolving structure of the index.
3
Computing the index to improve transparency.
4
Providing context-aware documentation: user-profile.
5
... Exploiting valuable data and metadata.
7 / 43
8. The RDFIndex approach | MTSR 2013
Introduction
...but...Is it a common problem?
...some indexes (per domain)...
1
Bibliography: the JCR and JSR,
etc.
2
Government: the GDP, etc.
3
Web: the Webindex, etc.
4
Health: (the aforementioned
ones).
5
Cloud: the CSC Cloud Usage
Index, the VMWare index, the
SMI index, etc.
6
...to name a few per domain and
creators.
8 / 43
9. The RDFIndex approach | MTSR 2013
Introduction
A quantitative index is...
1
It is technique to collect in just one value a set of indicators.
2
It can be divided into: index, component and indicator.
3
An index is calculated by aggregating n components using an OWA
operator 1 .
4
A component is also calculated by aggregating n indicators using an
OWA operator.
5
Observations are aligned to an indicator including some metadata such
as: location, measure, value, etc.
1
Ordered weighted averaging operator:
aggregated element ai .
n
i =1
wi ai , where wi is the weight of the
9 / 43
10. The RDFIndex approach | MTSR 2013
Introduction
Graphical view of a quantitative index...
10 / 43
11. The RDFIndex approach | MTSR 2013
Related Work
Statistics and the Web of Data
Vocabularies
1
The Statistical Core Vocabulary [5] (SCOVO), a former standard to
describe statistical information in the Web of Data (2009).
2
The RDF Data Cube Vocabulary [2], an adaptation of the ISO
standard SDMX (Statistical Data and Metadata Exchange
Vocabulary) (2013).
3
“DDI-RDF discovery vocabulary. a metadata vocabulary for
documenting research and survey data.” (2013) [1] .
Preliminary evaluation...
Existing RDF-based vocabularies enable us the possibility of modelling and
representing statistical data.
11 / 43
12. The RDFIndex approach | MTSR 2013
Related Work
Statistics and the Web of Data
Vocabularies
1
The Statistical Core Vocabulary [5] (SCOVO), a former standard to
describe statistical information in the Web of Data (2009).
2
The RDF Data Cube Vocabulary [2], an adaptation of the ISO
standard SDMX (Statistical Data and Metadata Exchange
Vocabulary) (2013).
3
“DDI-RDF discovery vocabulary. a metadata vocabulary for
documenting research and survey data.” (2013) [1] .
Preliminary evaluation...
Existing RDF-based vocabularies enable us the possibility of modelling and
representing statistical data.
12 / 43
13. The RDFIndex approach | MTSR 2013
Related Work
Statistics and the Web of Data
Statistics and Linked Data
1
“Defining and Executing Assessment Tests on Linked Data for Statistical Analysis”
(2011) [8].
2
“Publishing Statistical Data on the Web” (2012) [7].
3
“Publishing open statistical data: the Spanish census” (2011) [4].
4
“Publishing Statistical Data following the Linked Open Data Principles: The Web Index
Project” (2012) [6].
5
“Linked Open Data Statistics: Collection and Exploitation” (2013) [3].
6
Some works in the “RDF Validation Workshop 2013”
(http://www.w3.org/2012/12/rdf-val/).
Preliminary evaluation...
All of the approaches are/were focused on data publishing/consumption...but...
Validation of statistical data and/or structure...and
the Computation process are still open issues.
13 / 43
14. The RDFIndex approach | MTSR 2013
Related Work
Statistics and the Web of Data
Statistics and Linked Data
1
“Defining and Executing Assessment Tests on Linked Data for Statistical Analysis”
(2011) [8].
2
“Publishing Statistical Data on the Web” (2012) [7].
3
“Publishing open statistical data: the Spanish census” (2011) [4].
4
“Publishing Statistical Data following the Linked Open Data Principles: The Web Index
Project” (2012) [6].
5
“Linked Open Data Statistics: Collection and Exploitation” (2013) [3].
6
Some works in the “RDF Validation Workshop 2013”
(http://www.w3.org/2012/12/rdf-val/).
Preliminary evaluation...
All of the approaches are/were focused on data publishing/consumption...but...
Validation of statistical data and/or structure...and
the Computation process are still open issues.
14 / 43
15. The RDFIndex approach | MTSR 2013
Main Contributions
Main Contributions
1-Representation
A high-level model on top of the RDF Data Cube Vocabulary for
representing quantitative indexes.
2-Computation
A Java-SPARQL based processor to exploit the meta-information,
validate and compute the new index values.
15 / 43
16. The RDFIndex approach | MTSR 2013
Main Contributions
Main Contributions
1-Representation
A high-level model on top of the RDF Data Cube Vocabulary for
representing quantitative indexes.
2-Computation
A Java-SPARQL based processor to exploit the meta-information,
validate and compute the new index values.
16 / 43
17. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Example: Building the “The World Bank Naive Index”.
Components: “Aid Efectiveness” (c1 ) and “Health” (c2 ).
Indicators: “Life Expectancy” (in1 ) and “Health expenditure, total
( %) of GDP” (in2 ).
The index, i, is calculated through the ordered weighted averaging
(OWA) operator: n wi ci , where wi is the weight of the component
i=1
ci
All observations must be normalized using the z-score before
computing intermediate and final values for each indicator, component
and index.
It is necessary to populate the average age by country and year to
create a new “derivate” indicator of “Life Expectancy” without the
“sex” dimension.
17 / 43
18. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Example of indicator observations from the WorldBank.
Description
Year
Country
Value
Status
Life Expectancy Male
2010
Spain
79
Normal
Life Expectancy Male
2011
Spain
79
Normal
Life Expectancy Female
2010
Spain
85
Normal
Life Expectancy Female
2011
Spain
85
Normal
Life Expectancy Male
2010
Greece
78
Normal
Life Expectancy Male
2011
Greece
79
Normal
Life Expectancy Female
2010
Greece
83
Normal
Life Expectancy Female
2011
Greece
83
Normal
Health expenditure, total ( % of GDP)
2010
Spain
74,2
Normal
Health expenditure, total ( % of GDP)
2011
Spain
73,6
Normal
Health expenditure, total ( % of GDP)
2010
Spain
61,5
Normal
Health expenditure, total ( % of GDP)
2011
Spain
61,2
Normal
18 / 43
19. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Representing and computing data with the RDFIndex
(summary)
Steps
1
2
Define the structure and computation of the index with the RDFIndex
vocabulary.
Run the Java SPARQL based processor:
Load the RDFIndex ontology to have access to common definitions.
Create a kind of Abstract Syntax Tree (AST) containing the defined
meta-data (our index).
Validate the structure with an AST walker (structure and RDF Data
Cube normalization).
Create the SPARQL queries to compute the index (other AST walker).
Load the observations and run the SPARQL queries to generate new
observations.
19 / 43
20. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Graphical view of the RDFIndex workflow...
20 / 43
21. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Underlying definitions
Observation-o
It is a tuple {v , m, s}, where v is a numerical value for the measure m with an status s that
belongs to only one dataset of observations O.
Dataset-q
It is a tuple {O, m, D, A, T } where O is a set of observations for only one measure m that is
described under a set of dimensions D and a set of annotations A. Additionally, some attributes
can be defined in the set T for structure enrichment.
Aggregated dataset-aq
It is an aggregation of n datasets qi (identified by the set Q) which set of observations O is
derivate by applying an OWA operator p to the observations Oqi .
21 / 43
22. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Scope and Consequences
Necessary condition for the computation process
An aggregated dataset aq defined by means of the set of dimensions Daq can be computed iif
∀qj ∈ Q : Daq ⊆ Dqj . Furthermore the OWA operator p can only aggregate values belonging to
the same measure m.
The set of dimensions D, annotations A and attributes T for a given dataset Q is always
the same with the aim of describing all observations under the same context.
An index i and a component c are aggregated datasets. Nevertheless this restriction is
relaxed if observations can be directly mapped to these elements without any computation
process.
An indicator in can be both dataset or aggregated dataset.
All elements in definitions must be uniquely identified.
An aggregated dataset is also a dataset.
22 / 43
23. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Mapping the RDFIndex to the RDF Data Cube Vocabulary
Concept
Vocabulary element
Comments
Observation o
qb:Observation
Enrichment through annotations
Numerical value v
xsd:double
Restriction to numerical values
Measure m
qb:MeasureProperty
sdmx-measure:obsValue
Restriction to one measure
Status s
sdmx-concept:obsStatus
Defined by the SDMX-RDF vocabulary
Dataset q
qb:dataSet
and
qb:qb:DataStructureDefinition
Metadata of the qb:dataSet
Dimension di ∈ D
qb:DimensionProperty
Context of observations
Annotation ai ∈ A
owl:AnnotationProperty
Intensive use of Dublin Core
Attribute ati ∈ T
qb:AttributeProperty
Link to existing datasets such as
DBPedia
OWA operator p
SPARQL 1.1 aggregation operators
Other extensions depend on the
RDF repository
Index, Component
and Indicator
skos:Concept
SKOS taxonomy (logical structure)
23 / 43
24. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Example of the “World Bank Naive index” structure in RDF.
§
@prefix
@prefix
@prefix
¤
r d f i n d e x : < h t t p : // p u r l . o r g / r d f i n d e x / o n t o l o g y /> .
r d f i n d e x −wb: < h t t p : // p u r l . o r g / r d f i n d e x /wb/ r e s o u r c e /> .
r d f i n d e x −w b o n t : < h t t p : // p u r l . o r g / r d f i n d e x /wb/ o n t o l o g y /> .
r d f i n d e x −w b : T h e W o r l d B a n k N a i v e I n d e x
a rdfindex:Index ;
r d f s : l a b e l " The ␣ World ␣ Bank ␣ N a i v e ␣ I n d e x " @en ;
rdfindex:type rdfindex:Quantitative ;
rdfindex:aggregates [
r d f i n d e x : a g g r e g a t i o n −o p e r a t o r r d f i n d e x : O W A ;
r d f i n d e x : p a r t −o f [
r d f i n d e x : d a t a s e t r d f i n d e x −w b : A i d E f f e c t i v e n e s s ;
rdfindex:weight 0.4];
r d f i n d e x : p a r t −o f [ r d f i n d e x : d a t a s e t r d f i n d e x −w b : H e a l t h ;
rdfindex:weight 0.6];
];
#More meta−d a t a p r o p e r t i e s . . .
qb:structure
r d f i n d e x −wb:TheWorldBankNaiveIndexDSD ; .
r d f i n d e x −wb:TheWorldBankNaiveIndexDSD
a qb:DataStructureDefinition ;
qb:component
[ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −a r e a ] ,
[ q b : d i m e n s i o n r d f i n d e x −w b o n t : r e f −y e a r ] ,
[ qb:measure
rdfindex:value ] ,
[ q b : a t t r i b u t e sdmx−a t t r i b u t e : u n i t M e a s u r e ] ;
#More meta−d a t a p r o p e r t i e s . . .
.
¦
¥
24 / 43
25. The RDFIndex approach | MTSR 2013
The RDFIndex approach
SPARQL template for building aggregated observations.
§
¤
SELECT ( di ∈ D ) [ ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e ) | OWA( ? m e a s u r e ) ]
WHERE{
q rdfindex : aggregates ? parts .
? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f .
? p a r t o f r d f i n d e x : d a t a s e t qi .
FILTER ( ?partof ∈ Q ) .
? o b s e r v a t i o n r d f : t y p e qb : O b s e r v a t i o n .
? part r d f i n d e x : weight ? defaultw .
OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } .
BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w)
? o b s e r v a t i o n m ? measure .
? o b s e r v a t i o n ? dim ? dimR ef .
FILTER ( ?dim ∈ D ) .
}GROUP BY ( di ∈ D )
¦
¥
25 / 43
26. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Example of generated SPARQL query.
§
¤
p r e f i x r d f i n d e x : h t t p : / / p u r l . o r g / r d f i n d e x / o n t o l o g y /
SELECT ? dim0 ? dim1 ( sum ( ?w∗? m e a s u r e ) a s ? n e w v a l u e )
WHERE{
r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x
rdfindex : aggregates ? parts .
? p a r t s r d f i n d e x : p a r t −o f ? p a r t o f .
? partof rdfindex : dataset ? part .
FILTER ( ( ? p a r t =r d f i n d e x −wb : A i d E f f e c t i v e n e s s ) | |
( ? p a r t =r d f i n d e x −wb : H e a l t h ) ) .
? o b s e r v a t i o n qb : d a t a S e t ? p a r t .
? part r d f i n d e x : weight ? defaultw .
OPTIONAL {? p a r t o f r d f i n d e x : w e i g h t ? a g g r e g a t i o n w . } .
BIND ( i f ( BOUND( ? a g g r e g a t i o n w ) , ? a g g r e g a t i o n w , ? d e f a u l t w ) AS ?w)
? o b s e r v a t i o n r d f i n d e x : v a l u e ? measure .
? o b s e r v a t i o n r d f i n d e x −wbont : r e f −a r e a ? dim0 .
? o b s e r v a t i o n r d f i n d e x −wbont : r e f −y e a r ? dim1 .
} GROUP BY ? dim0 ? dim1
¦
¥
26 / 43
27. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Partial example of a populated observation for “The World
Bank Naive Index”.
§
r d f i n d e x −wb : o680 810085 1579
a
qb : o b s e r v a t i o n ;
qb : d a t a S e t r d f i n d e x −wb : T h e W o r l d B a n k N a i v e I n d e x
;
r d f i n d e x −wbont : r e f −a r e a d b p e d i a −r e s : S p a i n ;
r d f i n d e x −wbont : r e f −y e a r
h t t p : / / r e f e r e n c e . d a t a . gov . uk / i d /
g r e g o r i a n − i n t e r v a l /2010−01−01T00 : 0 0 : 0 0 / P1Y ;
...
#r d f s : { l a b e l , comment} { l i t e r a l } ;
#d c t e r m s : { i s s u e d , d a t e , c o n t r i b u t o r , a u t h o r ,
#p u b l i s h e r , i d e n t i f i e r }
#{ r e s o u r c e , l i t e r a l } ;
r d f i n d e x : md5−h a s h 6 c d d a 7 6 0 8 8 c d 1 6 1 d 7 6 6 8 0 9 d 6 a 7 8 d 3 5 f 6 ;
sdmx−c o n c e p t : o b s S t a t u s
sdmx−c o d e : o b s S t a t u s −E ;
r d f i n d e x : v a l u e 0 . 7 0 7 ^^ x s d : d o u b l e .
¦
¤
¥
27 / 43
28. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Evolution: joint effort between the RDFIndex and Computex
Computex
1
Vocabulary for statistical computations.
2
Extends RDF Data Cube.
3
Ontology at: http://purl.org/weso/computex.
4
Data and structure validation using SPARQL Queries.
5
Some terms:
cex:Concept
cex:Indicator
cex:Computation
cex:WeightSchema
qb:Observation
6
Learn more: http://computex.herokuapp.com/
Preliminary Evaluation...
Similar approaches that have been merged between July and September, 2013.
28 / 43
29. The RDFIndex approach | MTSR 2013
The RDFIndex approach
Demo of the RDFIndex/Computex...
http://computex.herokuapp.com/
29 / 43
30. The RDFIndex approach | MTSR 2013
The RDFIndex approach
On-going working examples
The Cloud Index
Compilation of Key Performance Indicators for cloud services.
Creation of a Cloud Index to measure Quality of Service of cloud providers.
Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in
the subproject “Quality Management in Service-based Systems and Cloud
Applications”.
Demo: http://cloudindex-doc.herokuapp.com/
The Web Index 2012 and 2013
Compilation of several indicators (85) from 61 countries (100 in 2013).
Creation of the Web Index to measure the web impact over the world.
Funded by The Webfoundation.
Demo: http://thewebindex.org
30 / 43
31. The RDFIndex approach | MTSR 2013
The RDFIndex approach
On-going working examples
The Cloud Index
Compilation of Key Performance Indicators for cloud services.
Creation of a Cloud Index to measure Quality of Service of cloud providers.
Funded by the RELATE-ITN (FP7-PEOPLE-2010-ITN, 264840) project and developed in
the subproject “Quality Management in Service-based Systems and Cloud
Applications”.
Demo: http://cloudindex-doc.herokuapp.com/
The Web Index 2012 and 2013
Compilation of several indicators (85) from 61 countries (100 in 2013).
Creation of the Web Index to measure the web impact over the world.
Funded by The Webfoundation.
Demo: http://thewebindex.org
31 / 43
32. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Advantages
Data Sources
Data management applying the Linked Data principles.
Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.
Computation process
A native approach using SPARQL queries that can help to boost
transparency.
Documentation
Implicit multilingual and multicultural support.
32 / 43
33. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Advantages
Data Sources
Data management applying the Linked Data principles.
Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.
Computation process
A native approach using SPARQL queries that can help to boost
transparency.
Documentation
Implicit multilingual and multicultural support.
33 / 43
34. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Advantages
Data Sources
Data management applying the Linked Data principles.
Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.
Computation process
A native approach using SPARQL queries that can help to boost
transparency.
Documentation
Implicit multilingual and multicultural support.
34 / 43
35. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Advantages
Data Sources
Data management applying the Linked Data principles.
Structure
Automatic validation and verification of the structure and metadata of
quantitative indexes.
Computation process
A native approach using SPARQL queries that can help to boost
transparency.
Documentation
Implicit multilingual and multicultural support.
35 / 43
36. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Detailed advantages (I)
Feature
Main advantages
Data sources
Common and shared data model, RDF.
Description of data providers (provenance and trust).
A formal query language to query data, SPARQL.
Use of Internet protocols, HTTP.
Data enrichment and validation (domain and range).
Unique identification of entities, concepts, etc. through (HTTP) URIs.
Possibility of publishing new data under the aforementioned characteristics.
Standardization and integration of data sources.
Structure
Meta-description of index structure (validation).
Re-use of existing semantic web vocabularies.
Re-use of existing datasets to enrich meta-data.
Context-aware definitions.
Underlying logic formalism.
Orthogonal and flexible.
36 / 43
37. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Detailed advantages (II)
Computation
cess
proMeta-description of datasets aggregation.
Validation of composed datasets.
OWA operators support.
Direct translation to SPARQL queries.
Documentation
Multilingual support to describe datasets, etc.
Easy generation with existing tools.
Cross-Domain Features
Separation of concerns and responsibilities: data and meta-data (structure
and computation).
Standardization (put in action specs from organisms such as W3C).
Declarative and adaptive approach.
Non-vendor lock-in (format, access and computation).
Integration, Interoperability and Transparency.
Align to existing trend (data management: quality and filtering).
Easy integration with third-party services such as visualization.
Contribution to the Web of Data.
37 / 43
38. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Restrictions
SPARQL 1.1 support
1
Some built-in functions are not standardized:
Example: z-score employs standard deviation.
It requires built-in function: sqrt (only available in some SPARQL
implementations).
2
Ranking of values is not obvious:
GROUP_CONCAT
Check a value against all the other values.
...neither solution is efficient.
Limitations of the RDFIndex expressivity
The working examples show its applicability but some lack of
terms/relationships should be expected.
38 / 43
39. The RDFIndex approach | MTSR 2013
Evaluation and Discussion
Restrictions
SPARQL 1.1 support
1
Some built-in functions are not standardized:
Example: z-score employs standard deviation.
It requires built-in function: sqrt (only available in some SPARQL
implementations).
2
Ranking of values is not obvious:
GROUP_CONCAT
Check a value against all the other values.
...neither solution is efficient.
Limitations of the RDFIndex expressivity
The working examples show its applicability but some lack of
terms/relationships should be expected.
39 / 43
40. The RDFIndex approach | MTSR 2013
Conclusions and Future Work
Conclusions
1
The use of quantitative indexes is a common practice to compile and
rank indicators for making decisions.
2
Existing indexes are completely opaque (PDF, HTML, etc.)
3
It is necessary to improve the access to data/metadata and to
boost transparency.
4
The RDFIndex/Computex vocabulary seeks for providing the
appropriate building blocks to represent and compute indexes.
However:
5
SPARQL limitations avoid a fully development of “pedantic” Linked
Data approach.
The vocabulary likely does not cover all required elements to represent
any index.
40 / 43
41. The RDFIndex approach | MTSR 2013
Conclusions and Future Work
Future Work
1
Extend the RDFIndex/Computex vocabulary.
2
Externalize the computation with a hybrid approach R+SPARQL.
3
Parallelization of the computation process.
41 / 43
43. The RDFIndex approach | MTSR 2013
Metadata and Information
Roster...
Dr. Jose María Alvarez-Rodríguez
SEERC (until August, 2013) and Carlos III
University of Madrid, Spain
E-mail: josemaria.alvarez@uc3m.es
WWW: http://www.josemalvarez.es
Prof. Dr. Patricia Ordoñez De Pablos
University of Oviedo, Spain
E-mail: patriop@uniovi.es
Prof. Dr. Jose Emilio Labra-Gayo
University of Oviedo, Spain
E-mail: labra@uniovi.es
WWW: http://www.di.uniovi.es/~labra
43 / 43
44. The RDFIndex approach | MTSR 2013
Some SPARQL queries
Z-score normalization using SPARQL...
§
¤
prefix afn : http :// jena . hpl . hp . com / ARQ / function #
SELECT ( (? measure -? mean )/? stddev as ? zscore )
WHERE {
...
? observation rdfindex : value ? measure
{
SELECT ? mean ( afn : sqrt (( SUM ((? measure -? mean )*
(? measure -? mean ))/? count )) as ? stddev )
WHERE {
? observation rdfindex : value ? measure
{
SELECT ( COUNT (? measure ) as ? count ) ( AVG (? measure ) as ? mean )
WHERE {
? observation rdfindex : value ? measure
} GROUP BY ? count ? mean LIMIT 1
}
} GROUP BY ? mean ? count LIMIT 1
}
}
¦
¥
44 / 43
45. The RDFIndex approach | MTSR 2013
References
T. Bosch, R. Cyganiak, A. Gregory, and J. Wackerow.
DDI-RDF discovery vocabulary. a metadata vocabulary for documenting research and survey data.
In 6th Workshop on Linked Data on the Web (LDOW2013), 2013.
R. Cyganiak and D. Reynolds.
The RDF Data Cube Vocabulary.
Working Draft, W3C, 2013.
http://www.w3.org/TR/vocab-data-cube/.
I. Ermilov, M. Martin, J. Lehmann, and S. Auer.
Linked Open Data Statistics: Collection and Exploitation.
In KESW, pages 242–249, 2013.
J. D. Fernández, M. A. Martínez-Prieto, and C. Gutiérrez.
Publishing open statistical data: the spanish census.
In DG.O, pages 20–25, 2011.
M. Hausenblas, W. Halb, Y. Raimond, L. Feigenbaum, and D. Ayers.
SCOVO: Using Statistics on the Web of Data.
In L. A. et al., editor, The Semantic Web: Research and Applications, volume 5554 of Lecture Notes in
Computer Science, pages 708–722. Springer Berlin Heidelberg, 2009.
J. M. A. Rodríguez, J. Clement, J. E. L. Gayo, H. Farhan, and P. Ordoñez De Pablos.
Publishing Statistical Data following the Linked Open Data Principles: The Web Index Project, pages
199–226.
IGI Global, 2013.
P. E. R. Salas, F. M. D. Mota, K. K. Breitman, M. A. Casanova, M. Martin, and S. Auer.
Publishing Statistical Data on the Web.
Int. J. Semantic Computing, 6(4):373–388, 2012.
45 / 43
46. The RDFIndex approach | MTSR 2013
References
B. Zapilko and B. Mathiak.
Defining and Executing Assessment Tests on Linked Data for Statistical Analysis.
In COLD, 2011.
46 / 43
47. The RDFIndex approach | MTSR 2013
References
Leveraging semantics to represent and compute
quantitative indexes.
The RDFIndex approach.
Michalis Vafopoulos (speaker)
and
{Jose María Álvarez-Rodríguez, Patricia Ordoñez De Pablos and
Jose Emilio Labra-Gayo}
MTSR 2013 | 7th Metadata and Semantics Research Conference
Main Track
47 / 43