Dimitris Skoutas presents the OpenDataMonitor
Workshop title: Open Science Monitor
Workshop overview:
Which are the measurable components of Open Science? How do we build a trustworthy, global open science monitor? This workshop will discuss a potential framework to measure Open Science, including the path from the publishing of an open policy (registries of policies and how these are represented or machine read), to the use of open methodologies, and the opening up of research results, their recording and measurement.
DAY 2 - PARALLEL SESSION 5
User Guide: Orion™ Weather Station (Columbia Weather Systems)
OSFair2017 Workshop | OpenDataMonitor
1. OpenDataMonitor
Monitoring, Analysis and Visualisation of Open Data Catalogues, Hubs and Repositories
Collaborative Project
FP7-ICT-2013.4.3 SME initiative on analytics
http://opendatamonitor.eu
Dimitris Skoutas
IMIS, R.C. Athena
dskoutas@imis.athena-innovation.gr
Open Science Fair 2017
7/9/2017, Athens
3. Landscape & Challenges
3
In a nutshell…
Numerous organisations and public bodies already
publish open data or currently start to do so but use
different systems and vocabularies.
Open data of relevant stakeholders is stored either on
a local, regional, national or pan-European level.
Entered metadata is often incomplete or inaccurate.
On all levels the open data situation appears to be
very fractured and hard to monitor. This leads to
planning problems and raises questions such as:
▪ Which catalogues/datasets are available?
▪ Quality of available resources?
▪ Which gaps reduce re-use of open data?
4. OpenDataMonitor 1st Review Meeting 4
Stakeholders Survey
Research question
• What roles do stakeholders in the open data ecosystem play and what are their
interests in open data?
Participant selection
• Two-fold strategy:
• contact information extracted from open data portals across Europe:
1. downloaded the metadata of more than 10,000 open data sets stored in about 50
open data repositories of 18 European countries
2. extracted email addresses from these metadata
3. used those addresses to invite their owners to the survey
• partners forwarded the survey invitation through their mailing lists, newsletters,
posted Facebook messages and Twitter tweets via the OpenDataMonitor project
5. OpenDataMonitor 1st Review Meeting 5
Participant Statistics
Participant types
• 63% of all participants came from Germany, Spain and the United Kingdom
6. 6
Q: "Please indicate the extent to which each of the following issues influence your
company’s decision to use open data."
Survey Results
7. OpenDataMonitor 1st Review Meeting 7
Survey Results
Differences in regard to what content interests stakeholders
• “Transport and Traffic”, “Environment and Climate”, “Finance and Budget”
attract the highest interest
• Stakeholders distinguish noticeably between the different kinds of open data
8. OpenDataMonitor 1st Review Meeting 8
Survey Results
Topical interests vary between stakeholder groups:
• self-identified activists more concerned about data that is consistent with the
FOI/transparency-tradition (e.g. Politics and Elections-data, Public Sector-data)
• stakeholders in public administration favour data that is less politicised
9. OpenDataMonitor 1st Review Meeting 9
Survey Results
Stakeholders at the policy-level:
• Businesses, politicians and public managers are perceived as least supportive
• Activists rank highest as supporters
10. 10
“We take data from over 330 different publishers [...] not one of
them does the same thing as the next one and most of them don't
do the same thing month to month. I’ve got 170 councils in our
dataset [...] Some publish virtually nothing, some publish a lot. The
variance in quality of the data is incredibly difficult. Data quality is
a big issue.”
- Ian Makgill, Spend Network
Survey Results
11. 11
“..If there’s too much [data], you can’t find it. There [are]
different places to find different bits, you've got data.gov, you've
got all these different websites, all these different agencies.
They’ve all grown organically and separate from each other and I
know everybody would love one enormous data place where
you go to get all your data.”
- Rod Plummer, Shoothill
Survey Results
12. 12
Statistics of Monitored Catalogues
31 Countries
173 Data Catalogues
213,730 Datasets Harvested
158,165 Unique Datasets
588,303 Total Distributions
1,400+ GB Total Size Distribution
12,523 Unique Publishers (Organisations and Aut
13. • Open licence: total count of open licences over total count of distributions
with a licence
• Machine readable: what portion of datasets are provided in a machine
readable format
• Open formats: the portion of dataset distributions with a non-proprietary
format
• Metadata completeness: the frequency of missing metadata for each
attribute
• Availability: portion of datasets without broken links
• Discoverability: an estimation of how important a catalogue is in the web
based on two traffic ranking systems: Google and Alexa.
13
Qualitative Metrics
14. • Number of datasets: the total number of available datasets in a catalogue
• Total distribution size: the total size of all resources, regardless of their
format, for every dataset in a catalogue
• Number of distributions: the average number of distributions per datasets
• Number of unique publishers: the total number of unique publishing
organisations of a specific catalogue
• Number of catalogues: number of catalogues harvested per country
14
Quantitative Metrics
16. Technical Perspective - The ODM System
The ODM system consists of two main parts:
16
Metadata collection and processing:
• collects metadata from open data catalogues
• performs metadata cleaning and harmonization
• computes metrics and provides results via an API
Demonstration site:
• visualises results for monitoring
• allows for search and browsing
• produces charts and reports
• includes information about
methodology and usage