The document outlines a workshop on quality assessment and improvements for open data portals, including results from requirements elicitation, proposed data quality metrics, and plans for the ADEQUATE project which aims to improve data quality through assessment, algorithms, linked data principles, and community involvement. The workshop agenda covers user requirements, best practices, and an open discussion on data quality issues.
Driving Behavioral Change for Information Management through Data-Driven Gree...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvements in Open Data (Catalogues)
1. www.adequate.at
Workshop on Quality Assessment and
Improvements on Open Data (Portals)
opendata.ch conference, 14.6.2016, 12.45 - 14:00pm CEST
Lausanne, Casino de Montbenon, Allée Ernest-Ansermet 3
Slides published CC-BY AT 3.0
Jürgen Umbrich
Vienna University of
Economics and Business
juergen.umbrich@wu.ac.at
Johann Höchtl
Donau-Universität Krems
johann.hoechtl@donau-uni.ac.at
Martin Kaltenböck
Semantic Web Company
m.kaltenboeck@semantic-web.at
2. www.adequate.at
Agenda
2
Time Session Remarks
20’ incl q&a Welcome & Introduction
● WS Objectives, Agenda & WS Team
● Participants
● The ADEQUATe project: basics, objectives, status & outlook
Martin Kaltenböck (SWC)
20’ incl q&a Results of Requirements Elicitation, DQ Metrics and Interaction items
● What do the users want?
● What are the most “important” ones? What are metrics specifically targeting
openness?
● Why data portal quality interaction items with end users and what do we
plan to do in ADEQUATe?
Johann Höchtl (DUK)
20’ incl q&a Best Practise & the ADEQUATe OD Framework
● Data & CSV on the web working group recommendations (W3C)
● AD Framework: architecture & components
Jürgen Umbrich (WU)
15’ open discussion Interactive & open discussion on DQ issues:
● Requirements for DQ in Open Data
● What is in place or planned for DQ
Moderated by the WS Team
3. www.adequate.at
FFG Project
http://www.adequate.at
3
Das Projekt „ADEQUATe“ wird im Rahmen des FTI - Programms „IKT der Zukunft“ durch das Bundesministerium für Verkehr, Innovation und
Technologie gefördert und von der Österreichischen Forschungsförderungsgesellschaft abgewickelt [Projektnummer: 849982].
4. www.adequate.at
What is ?
ADEQUATe Open Data: Analytics & Data Enrichment to improve
the QUAliTy of Open Data builds on two observations:
An increasing amount of Open Data becomes available as an important resource for emerging businesses and further on the
integration of such open, freely re-usable data sources into organisations’ data warehouse and data management systems is
seen as a key success factor for competitive advantages in a data-driven economy.
The project now identifies crucial issues which have to be tackled to fully exploit the value of open data and the efficient
integration with other data sources:
● the overall quality issues with metadata and the data itself
● the lack of interoperability between data sources
The project's approach is to address this points already in an early stage – when the open data is freshly provided by either
governmental organisations or others.
4
6. www.adequate.at
What is ?
✓ 3 Partners:
1. Semantic Web Company
2. Danube University Krems
3. University of Economics Vienna
✓ 30 months project duration, Oct. 2015 - March 2018
✓ 2 Use Case Partners: data.gv.at & opendataportal.at
✓ Objective: Improvement of Data Quality through:
○ Quality Assessment and Monitoring
○ Automatic Algorithms
○ Making use of Linked Data principles
○ Improvements of the data by the user (community)
6
7. www.adequate.at
Project Structure & Schedule
7
WP1 - Requirements & Specification
WP2 - Quality Improvement & Monitoring Framework
WP3 - Algorithms & Tools for Quality Improvements
WP4 - Data Linkage
WP5 - Community driven Quality Improvements
WP6 - Use Case Integration
WP7 - Project Management & Dissemination
8. www.adequate.at
Outlook & Timing of Results
8
M30 (03/2018)
Evaluation, Refinements, Improvements
M21 (06/2017)
Quality improvements Use case connection
M15 (12/2016)
Quality monitoring framework Data linkage
M10 (07/2016)
Architecture Blueprint
M9 (06/2017)
Quality metrics Requirements
9. www.adequate.at
Concrete Outputs & Outlook
✓ End of June 2016: 3 Deliverables
○ State of the Art
○ Requirements Elicitation
○ Quality Metrics
✓ End of July 2016: 1 Deliverable
○ Architecture Blueprint
○ All components specified
✓ End of 2016: ADEQUATe Framework - 1st release
○ Assessment & Monitoring Framework
○ Data Quality Algorithms & Tools
○ Linked Data Mechanisms
○ 1st set of user driven Mechanisms
✓ Early 2017: Dock onto ODP & data.gv.at
9
12. www.adequate.at
Contents and Formats
○ I would really prefer to have the data themselves consistent. [...] metadata does not
match; standards regarding the representation of their content
○ It would be really great if we could shift somehow to UTF-8
○ meta data for CSV files were incomplete [...] header for CSV was missing
○ no static identifiers for objects in data sets. This in turn leads to problems if you want
to track changes related to these objects over time
Results of Requirements Elicitation
12
13. www.adequate.at
Communication
○ central communication point for exchanging experiences and issues
○ Meta data should be written in English language
Reliability
○ Servers are restarted every day [...] hosted data becomes unavailable
Results of Requirements Elicitation
13
14. www.adequate.at
DQ metrics (1)
Completeness
● Metadata Completeness: How many (manadatory) metadata keys have values?
● Table completeness: How many (CSV) cells have non-null values
Timeliness
● Tau of Data: How “outdated” are datasets based on the promised update
frequency
14
15. www.adequate.at
DQ metrics (2)
Machine readability
● Regularity of CSV-files (CSV-Lint), RDF, ...
● Structural consistency - variations in structure of CSV files
Openness
● Open formats - no well-defined definition of what constitutes an open
● Open Licenses - Seems opendefinitions.org has them all covered
Persistence
15
19. www.adequate.at
Contributors to DQ Improvements (1/2)
● Providers
○ Correctness and Completeness of Data and Metadata
○ SLAs governing availability
○ Readiness for feedback, discussion and interaction
● Algorithms
○ Automated improvements
■ Availability checks and reporting
■ Missing information, outliers
■ Check of format (valid UTF8?), size
■ Data format conversions: CSV → CSV on the web specification
○ Semi-automated Improvements and Enhancements
■ Identification of related data sets
■ Mapping of (data) attributes, ...
● Interaction with the Data Community 19
20. www.adequate.at
Interaction: Data Community
20
● Control the results of automated enhancements
○ Interlinking
○ format conversions
○ encodings
● Correct mistakes and report mistakes
● Data enrichment and transformations
26. www.adequate.at
The ADEQUATe Framework
26
● The ADEQUATe framework offers:
○ quality assessment and monitoring
○ a set of data quality improvement algorithms
○ a set of algorithms to create, maintain a knowledge graph and “link” data into this graph
■ Think about shared identifiers for addresses, companies, departments, parties, ...
○ community involvement ( e.g., data editors, feedback loops, forking & merging)
● Main objectives:
○ all developed components will be Open Source ( see the ADEQUATe Github Repo)
○ components should be used as standalone components
■ Use only what you need
27. www.adequate.at
The ADEQUATe Framework
27
● Core Components
1. Data monitoring
2. Knowledge Vault
3. Quality Assessment
4. Quality Improvement
5. Data Linkage
6. Community Improvement
7. UI, API & User authentication
Users
(Meta)Data
Monitor
Knowledge
Vault
Quality
Assessment
Orchestration / API
Quality
Improvement
Linkage
Community
Improvement
Authentication / Load Balancing /
UI Public API catalog
data.gv.at
ODP
Clients
RESTful API
Component
Data
28. www.adequate.at
W3C CSV on the Web & ADEQUATe
One core feature in ADEQUATe will be to use the CSV on the Web metadata
standard, which allows to:
➢ describe CSV files
○ used dialect & encoding
○ table & column descriptions ( with language tags)
○ data types and value ranges for columns
➢ add semantics to it
○ primary & foreign key, URIs, entity types, ...
➢ validate CSV files against a predefined schema
➢ specify the transformation
○ CSV -> JSON or RDF
28
34. www.adequate.at
W3C CSV on the Web: Example (JSON-LD) 3/3
"tableSchema": {
"columns": [{
"name": "exhibition_id",
"titles": "Exhibition Identifier",
"dc:description": "A unique identifier for the exhibition.",
"datatype": "integer",
"required": true
}, {
"name": "city",
"titles": "City",
"dc:description": "The city in which the exhibition took place (no language defined, mostly in German).",
"datatype": "string"
}
34
36. www.adequate.at
CSV on the Web Summary
● Don’t publish CSV on the Web for humans, publish also for machines
○ e.g., EXCEL exports
● RFC 4180
● Encoding
○ Use UTF-8, don’t mix encodings
● File extension: .csv
● Content-type: text/csv
Optional, but big improvement!
● Ideally, publish CSV MetaData along your CSV file
● Avoid acronyms or encodings (e.g., sex=1,2,3)
36
37. www.adequate.at
CSV on the Web Summary
37
● CSV URLs
● CSVs link to other CSVs
● CSVs link to other resources
● RDF and JSON conversion
REFERENCES
● CSV on the Web Working Group
● CSV on the Web Community Group
● CSV on the Web Github Repository
● Tabular Data on the Web - A Introduction to CSV on the Web (Slides)
● Implementing CSV on the Web ( Gregg Kellogg)
●
39. www.adequate.at
Contact
39
Jürgen Umbrich
Vienna University of Economics and Business
Juergen.umbrich @ wu.ac.at
Short CV:https://www.wu.ac.at/en/infobiz/team/umbrich/
Johann Höchtl
Donau-Universität Krems
Johann.hoechtl @ donau-uni.ac.at
Short CV: https://at.linkedin.com/in/johannhoechtl
http://adequate.at/ http://vienna.theodi.org
Martin Kaltenböck
Semantic Web Company
m.kaltenboeck@semantic-web.at
Short CV: https://www.linkedin.com/in/martinkaltenboeck