Authors: Sébastien Martin, Muriel Foulonneau, Slim Turki
Paris VIII University, France
Public Research Centre Henri Tudor, Luxembourg
http://link.springer.com/chapter/10.1007%2F978-3-319-03437-9_24
Presented during MTSR 2013 / 7th Metadata and Semantics Research Conference
http://mtsr2013.teithe.gr/
Abstract. The development of open data requires a better reusability of data. Indeed, the catalogs listing data dispersed in different countries have a crucial role. However, the degree of openness is also a key success factor for open data. In this paper, we study the PublicData.eu catalogue, which allows accessing open datasets from European countries and analyse the metadata recorded for each dataset. The objectives are to (i) identify the quality of a sample of metadata properties, which are critical to enable data reuse and to (ii) study the stated level of data openness. The study uses the Tim Berners-Lee’s five star evaluation scale.
SQL Database Design For Developers at php[tek] 2024
1-5 stars: Metadata on the Openness Level of Open Data Sets in Europe
1. 1-5 stars: Metadata on the Openness
Level of Open Data Sets in Europe
Sébastien Martin, Muriel Foulonneau, Slim Turki
2. Context & Objectives
•
•
•
•
Level of reuse of open data is still disappointing.
Development of open data requires a better reusability of data.
Degree of openness is a key success factor.
Catalogs listing data have a crucial role.
Analyse PublicData.eu catalogue
(i) identify the quality of a sample of metadata properties, which
are critical to enable data reuse
(ii) study the stated level of data openness.
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
2
3. PublicData.eu
•
•
Many local and national portals to provide access to public sector open
datasets - 114 EU catalogues on datacatalogs.org
Gather datasets across geographic and institutional boundaries
PublicData.eu
•
•
•
•
•
•
pan-European catalogue launched under the FP7 LOD2 project.
aggregates data from CKAN open data catalogues all over Europe.
collects data from 26 sources
1st to be published in Europe in 2011
data beyond the European Union, e.g., Serbian datasets.
not exhaustive, it represents a unique aggregation of European datasets.
•
•
17.027 datasets
UK: largest provider
21/11/2013
3
4. Methodology
Descriptions of datasets collected in May 2013
236 distinct dataset properties identified, partially due to
•
•
linguistic diversity; some providers adapt property names in their language
problems of consistency in naming (upper / lower case, spaces /
underscore for a single field).
Major challenge to understand the content of the PublicData.eu
Data collected and analysed to identify information made available
on data openness and reusability in particular the licensing
conditions and the data formats.
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
4
5. Tim Berners-Lee’s evaluation scale
★
Available on the web (whatever format) but with an
open license, to be Open Data
★★ Available as machine-readable structured data
★★★ 2 + non-proprietary format
★★★★
★★★★★
21/11/2013
3 + Use open standards from W3C (RDF and SPARQL)
to identify things
4 + Link your data to other people’s data to provide
context
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
5
6. ★ Data Licences
13.535 / 17.027 datasets have at least 1 license indication
12.470 datasets can be considered having some form of open
license 73,24%
769 datasets have a Creative Commons license
Significant number of datasets have a national license:
•
•
•
apie v2 to publish information created by French public authorities
UK-crown which “covers material created by civil servants, ministers and
government departments and agencies” in the UK,
UK Open Government License
128 datasets with an explicitly closed license
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
6
7. ★★ Machine readable format
• Facilitates data reusability
• 4.051 / 17.027 with
content_TYPE
• 11.285 with at least one
indication about format
• 56 datasets in RDF
• Dominant proportion of
spreadsheets type’s formats
Distribution of formats
40% not a machine readable format
34% of datasets available in a machine readable format
machine readability cond. for openness levels of 2★ and >
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
7
8. ★★★ Use of non-proprietary formats
Creates ambiguities as the openness nature of formats can be
debated in some cases:
•
•
Certain formats are proprietary but their specifications are open.
Some formats have been open at a certain point of time but additions and
further evolutions remain proprietary
In many cases, value of property was too vague to determine
whether the format was or not proprietary.
It was possible to identify:
•
•
For 49% of the datasets, a non-proprietary format
For 21% a proprietary format.
Use of proprietary formats is a critical issue for improving the
level of openness of datasets.
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
8
9. ★★★★ Use of open standards from
W3C
Including HTML, XML, and RDF in particular.
•
XML-based formats may be entirely independent from W3C (e.g. KML)
Availability in W3C standards: 9,5% of datasets
Availability in XML based formats: 10%
Information remains unknown in most cases
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
9
10. ★★★★★ Linked data
Linked data are only mentioned in the description of a single
dataset (Brandweer Amsterdam-Amstelland Uitrukberichten)
for which the format is described as “linked data api, rdf json”.
58 datasets mention RDF (or RDFa) as a format or content type,
i.e., 0,34%.
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
10
11. Level of openness (1/2)
6.891 / 17.027 datasets show at least one information about their
degree of openness.
All come from Data.gov.uk (8 689 datasets)
For a majority of datasets, the level of openness is unknown.
•
21/11/2013
Coherent with lack of licensing information without which it is impossible
to conclude on even ★ openness level.
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
Distribution of openness levels in UK datasets
11
12. Level of openness (2/2)
Approximate level of openness derived from licensing and format
properties
•
•
73,24% of the datasets should have ★ or above.
Reference to 5★ should take into consideration linkages, cannot be
inferred from dataset metadata.
Level of openness according
to Format and License
related properties
Data openness mainly related to 1st level of compliance: licensing
issue.
•
21/11/2013
Data providers have clearly not focused on publication of data in reusable
formats.
1-5 stars: Metadata on the Openness Level of
12
Open Data Sets in Europe
13. Conclusion
• Limited openness of datasets advertised as open data
• Heterogeneity of associated metadata
Difficulty for reusers to (i) discover datasets, despite the
creation of large catalogues of datasets, and to (ii) effectively
reuse machine readable and contextualized data.
★ may be sufficient to ensure transparency of gov. action,
facilitating reuse of data through services is not served below 2★
Confirmed risks regarding major challenges that data providers
have to face: (i) language barrier and (ii) lack of consistency of
metadata.
Harmonization of practices, training and tools necessary to
ensure that datasets are available in relevant formats.
21/11/2013
1-5 stars: Metadata on the Openness Level of
Open Data Sets in Europe
13
14. 1-5 stars: Metadata on the Openness
Level of Open Data Sets in Europe
Sébastien Martin, Muriel Foulonneau, Slim Turki
Contact:
muriel.foulonneau@tudor.lu
Hinweis der Redaktion
The study uses the Tim Berners-Lee’s five star evaluation scale.
The one star openness level depends upon data licenses. Licensing information can be found in 10 distinct metadata properties, i.e., licence, License, licence_url, License_details, License_ID, License_summary, License_title, License_uri, License_url, and mandate.
The two star level depends upon the format in which the data is made available.