Governments, public agencies and institutions, and companies produce a great amount of statistical data every year. Much of these data are released as Open Data and published on the Web, although usually as documents, not as Linked Data. In this talk I'll introduce RDF Data Cube (QB), a W3C standard for publishing multidimensional data, such as statistics, on the Web in such a way that they can be linked to other datasets and concepts. However, QB is pretty open towards how users should model dimensions and codes (variables and values in QB jargon), which hampers reusability of existing ones. To this end, I'll show you LSD Dimensions, a web based application that monitors the usage of dimensions and codes over five hundred public SPARQL endpoints.
10. RDF Data Cube
• 4-star LSD: use URIs to denote (statistical)
things
• 5-star LSD: link own (statistical) things to
other (statistical) things
“There are many situations where it would be useful to
be able to publish multi-dimensional data, such as
statistics, on the web in such a way that they can be
linked to related data sets and concepts.”
11.
12.
13. RDF Data Cube vocabulary (QB)
• SDMX compatible
• Defines cubes as a set of observations that consist of
dimensions, measures and attributes
• Dimensions: time period, region, sex (qb:DimensionProperty)
• Measure: population life expectancy (qb:MeasureProperty)
• Attribute: unit of measure = years, metadata status =
measured (qb:AttributeProperty)
Observation: “the measured life expectancy of males in
Newport in the period 2004-2006 is 76.7 years”
15. Are we done?
• P1: Comparability? Can we arbitrarily
combine any pair of these
datasets/dimensions?
• P2: Reusability? How often are dimensions
reused? Can we reuse dimensions created by
others?
• P3: Discoverability? How to discover
dimensions created by others?
• P4: Relevance? What’s the size of LSD?
16. P1: Comparability of LSD: SSCLSDA
Sarven Capadisli, Albert Meroño-Peñuela, Sören Auer, Reinhard Riedl. “Semantic Similarity
and Correlation of Linked Statistical Data Analysis”. 2nd Int. Workshop on Semantic Statistics
(SemStats) ISWC 2014.
17. P2+P3+P4: LSD Dimensions
Need for an intelligent system that helps us on (1)
discovering (2) reusing (3) analyzing dimensions in LSD
23. Are we done?
• P1: Comparability? Can we arbitrarily combine
any pair of these datasets/dimensions? Unclear
• P2: Reusability? How often are dimensions
reused? Can we reuse dimensions created by
others? Logarithmic law / Probably yes
• P3: Discoverability? How to discover dimensions
created by others? LSD Dimensions
• P4: Relevance? What’s the size of LSD? ~8.5% of
the LOD cloud
24. Future Work
• Monitor additional metadata
(rdfs:subPropertyOf, rdfs:range)
• Generate PROV during crawling
• Modeling of formulas in RDF Data Cube
• Plug to LOD Laundromat
• Crawl dimensions and codes from
qb:Observation
• SPARQL endpoint and API
– Suggest dimensions and codes to users
25. Thank you
Questions, suggestions, comments most
welcome
@albertmeronyo
http://lsd-dimensions.org/
https://github.com/albertmeronyo/LSD-Dimensions
https://github.com/csarven/sense-of-lsd-analysis
http://www.cedar-project.nl