Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvements in Open Data (Catalogues)

391 Aufrufe

Veröffentlicht am

Presentation of the ADEQUATe Project in the course of the Workshop on Quality Assessment and Improvements in Open Data (Catalogues), taking place at the annual open data conference Switzerland (that took place 14 June 2016 in Lausanne, see: http://www.opendata.ch).
Workshop speakers / facilitators: Johann Höchtl (Danube University Krems), Jürgen Umbrich (University of Economics, Vienna), Martin Kaltenböck (Semantic Web Company).
More infos: http://www.adequate.at

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvements in Open Data (Catalogues)

  1. 1. www.adequate.at Workshop on Quality Assessment and Improvements on Open Data (Portals) opendata.ch conference, 14.6.2016, 12.45 - 14:00pm CEST Lausanne, Casino de Montbenon, Allée Ernest-Ansermet 3 Slides published CC-BY AT 3.0 Jürgen Umbrich Vienna University of Economics and Business juergen.umbrich@wu.ac.at Johann Höchtl Donau-Universität Krems johann.hoechtl@donau-uni.ac.at Martin Kaltenböck Semantic Web Company m.kaltenboeck@semantic-web.at
  2. 2. www.adequate.at Agenda 2 Time Session Remarks 20’ incl q&a Welcome & Introduction ● WS Objectives, Agenda & WS Team ● Participants ● The ADEQUATe project: basics, objectives, status & outlook Martin Kaltenböck (SWC) 20’ incl q&a Results of Requirements Elicitation, DQ Metrics and Interaction items ● What do the users want? ● What are the most “important” ones? What are metrics specifically targeting openness? ● Why data portal quality interaction items with end users and what do we plan to do in ADEQUATe? Johann Höchtl (DUK) 20’ incl q&a Best Practise & the ADEQUATe OD Framework ● Data & CSV on the web working group recommendations (W3C) ● AD Framework: architecture & components Jürgen Umbrich (WU) 15’ open discussion Interactive & open discussion on DQ issues: ● Requirements for DQ in Open Data ● What is in place or planned for DQ Moderated by the WS Team
  3. 3. www.adequate.at FFG Project http://www.adequate.at 3 Das Projekt „ADEQUATe“ wird im Rahmen des FTI - Programms „IKT der Zukunft“ durch das Bundesministerium für Verkehr, Innovation und Technologie gefördert und von der Österreichischen Forschungsförderungsgesellschaft abgewickelt [Projektnummer: 849982].
  4. 4. www.adequate.at What is ? ADEQUATe Open Data: Analytics & Data Enrichment to improve the QUAliTy of Open Data builds on two observations: An increasing amount of Open Data becomes available as an important resource for emerging businesses and further on the integration of such open, freely re-usable data sources into organisations’ data warehouse and data management systems is seen as a key success factor for competitive advantages in a data-driven economy. The project now identifies crucial issues which have to be tackled to fully exploit the value of open data and the efficient integration with other data sources: ● the overall quality issues with metadata and the data itself ● the lack of interoperability between data sources The project's approach is to address this points already in an early stage – when the open data is freshly provided by either governmental organisations or others. 4
  5. 5. www.adequate.at What is ? 5
  6. 6. www.adequate.at What is ? ✓ 3 Partners: 1. Semantic Web Company 2. Danube University Krems 3. University of Economics Vienna ✓ 30 months project duration, Oct. 2015 - March 2018 ✓ 2 Use Case Partners: data.gv.at & opendataportal.at ✓ Objective: Improvement of Data Quality through: ○ Quality Assessment and Monitoring ○ Automatic Algorithms ○ Making use of Linked Data principles ○ Improvements of the data by the user (community) 6
  7. 7. www.adequate.at Project Structure & Schedule 7 WP1 - Requirements & Specification WP2 - Quality Improvement & Monitoring Framework WP3 - Algorithms & Tools for Quality Improvements WP4 - Data Linkage WP5 - Community driven Quality Improvements WP6 - Use Case Integration WP7 - Project Management & Dissemination
  8. 8. www.adequate.at Outlook & Timing of Results 8 M30 (03/2018) Evaluation, Refinements, Improvements M21 (06/2017) Quality improvements Use case connection M15 (12/2016) Quality monitoring framework Data linkage M10 (07/2016) Architecture Blueprint M9 (06/2017) Quality metrics Requirements
  9. 9. www.adequate.at Concrete Outputs & Outlook ✓ End of June 2016: 3 Deliverables ○ State of the Art ○ Requirements Elicitation ○ Quality Metrics ✓ End of July 2016: 1 Deliverable ○ Architecture Blueprint ○ All components specified ✓ End of 2016: ADEQUATe Framework - 1st release ○ Assessment & Monitoring Framework ○ Data Quality Algorithms & Tools ○ Linked Data Mechanisms ○ 1st set of user driven Mechanisms ✓ Early 2017: Dock onto ODP & data.gv.at 9
  10. 10. www.adequate.at Requirements Elicitation, DQ metrics and Interaction items 10
  11. 11. www.adequate.at Results of Requirements Elicitation 11
  12. 12. www.adequate.at Contents and Formats ○ I would really prefer to have the data themselves consistent. [...] metadata does not match; standards regarding the representation of their content ○ It would be really great if we could shift somehow to UTF-8 ○ meta data for CSV files were incomplete [...] header for CSV was missing ○ no static identifiers for objects in data sets. This in turn leads to problems if you want to track changes related to these objects over time Results of Requirements Elicitation 12
  13. 13. www.adequate.at Communication ○ central communication point for exchanging experiences and issues ○ Meta data should be written in English language Reliability ○ Servers are restarted every day [...] hosted data becomes unavailable Results of Requirements Elicitation 13
  14. 14. www.adequate.at DQ metrics (1) Completeness ● Metadata Completeness: How many (manadatory) metadata keys have values? ● Table completeness: How many (CSV) cells have non-null values Timeliness ● Tau of Data: How “outdated” are datasets based on the promised update frequency 14
  15. 15. www.adequate.at DQ metrics (2) Machine readability ● Regularity of CSV-files (CSV-Lint), RDF, ... ● Structural consistency - variations in structure of CSV files Openness ● Open formats - no well-defined definition of what constitutes an open ● Open Licenses - Seems opendefinitions.org has them all covered Persistence 15
  16. 16. www.adequate.at DQ Metrics - Persistence? 16
  17. 17. www.adequate.at ADEQUATe: 11 Dimensions & 46 Metrics 17
  18. 18. www.adequate.at Contributors to DQ improvement Publishers Community 18 Algorithms & Linked Data
  19. 19. www.adequate.at Contributors to DQ Improvements (1/2) ● Providers ○ Correctness and Completeness of Data and Metadata ○ SLAs governing availability ○ Readiness for feedback, discussion and interaction ● Algorithms ○ Automated improvements ■ Availability checks and reporting ■ Missing information, outliers ■ Check of format (valid UTF8?), size ■ Data format conversions: CSV → CSV on the web specification ○ Semi-automated Improvements and Enhancements ■ Identification of related data sets ■ Mapping of (data) attributes, ... ● Interaction with the Data Community 19
  20. 20. www.adequate.at Interaction: Data Community 20 ● Control the results of automated enhancements ○ Interlinking ○ format conversions ○ encodings ● Correct mistakes and report mistakes ● Data enrichment and transformations
  21. 21. www.adequate.at Interaction: Data Community 21 https://open.wien.gv.at/site/riesenbaum-in-wien-entdeckt/#more-87184
  22. 22. www.adequate.at Interaction: Forking: Identify - Improve - Share 22 1 47 11 2 48 15 1 47 11 2 48 151 1 47 11 2 47 15 2
  23. 23. www.adequate.at Interaction: Forking: Identify - Improve - Share 23
  24. 24. www.adequate.at Making results tangible 24 https://github.com/antontarasenko/gpq/blob/master/notebooks/contracts_intro.ipynb Government Procurement Queries project US Government contracts 2000 - 2016 (USAspending.gov)
  25. 25. www.adequate.at The ADEQUATe OD Framework & publishing CSVs for humans and machines 25
  26. 26. www.adequate.at The ADEQUATe Framework 26 ● The ADEQUATe framework offers: ○ quality assessment and monitoring ○ a set of data quality improvement algorithms ○ a set of algorithms to create, maintain a knowledge graph and “link” data into this graph ■ Think about shared identifiers for addresses, companies, departments, parties, ... ○ community involvement ( e.g., data editors, feedback loops, forking & merging) ● Main objectives: ○ all developed components will be Open Source ( see the ADEQUATe Github Repo) ○ components should be used as standalone components ■ Use only what you need
  27. 27. www.adequate.at The ADEQUATe Framework 27 ● Core Components 1. Data monitoring 2. Knowledge Vault 3. Quality Assessment 4. Quality Improvement 5. Data Linkage 6. Community Improvement 7. UI, API & User authentication Users (Meta)Data Monitor Knowledge Vault Quality Assessment Orchestration / API Quality Improvement Linkage Community Improvement Authentication / Load Balancing / UI Public API catalog data.gv.at ODP Clients RESTful API Component Data
  28. 28. www.adequate.at W3C CSV on the Web & ADEQUATe One core feature in ADEQUATe will be to use the CSV on the Web metadata standard, which allows to: ➢ describe CSV files ○ used dialect & encoding ○ table & column descriptions ( with language tags) ○ data types and value ranges for columns ➢ add semantics to it ○ primary & foreign key, URIs, entity types, ... ➢ validate CSV files against a predefined schema ➢ specify the transformation ○ CSV -> JSON or RDF 28
  29. 29. www.adequate.at W3C CSV on the Web: Metadata standard 29
  30. 30. www.adequate.at W3C CSV on the Web: Metadata standard 30
  31. 31. www.adequate.at W3C CSV on the Web: Metadata standard 31
  32. 32. www.adequate.at W3C CSV on the Web: Example (JSON-LD) 1/3 { "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}], "url": "http://data.mumok.at/exhibition.csv", "dc:title": "Exhibitions for objects from the mumok collection", "dcat:keyword": ["art", "museum", "exhibition"], "dc:publisher": { "schema:name": "mumok - museum moderner kunst stiftung ludwig wien", "schema:url": {"@id": "http://www.mumok.at"} }, "dc:license": {"@id": "https://creativecommons.org/licenses/by/3.0/at/legalcode"}, "dc:modified": {"@value": "2015-07-04", "@type": "xsd:date"}, …. 32
  33. 33. www.adequate.at W3C CSV on the Web: Example (JSON-LD) 2/3 "dialect": { "encoding": "utf-8", "lineTerminators": ["rn", "n"], "quoteChar": """, "doubleQuote": true, "skipRows": 0, "commentPrefix": "#", "header": true, "headerRowCount": 1, "delimiter": ",", "skipColumns": 0, "skipBlankRows": false, "skipInitialSpace": false, "trim": false }, 33
  34. 34. www.adequate.at W3C CSV on the Web: Example (JSON-LD) 3/3 "tableSchema": { "columns": [{ "name": "exhibition_id", "titles": "Exhibition Identifier", "dc:description": "A unique identifier for the exhibition.", "datatype": "integer", "required": true }, { "name": "city", "titles": "City", "dc:description": "The city in which the exhibition took place (no language defined, mostly in German).", "datatype": "string" } 34
  35. 35. www.adequate.at W3C CSV on the Web: Discovery ● Registered content type: application/csvm+json ● 3 discovery mechanisms ○ File extension ■ http://data.mumok.at/exhibition.csv -> http://data.mumok.at/exhibition.csv-metadata.json ○ Well-known location ■ /.well-known/csvm ○ LINK HTTP Header 35 » curl -I http://data.mumok.at/exhibition.csv HTTP/1.1 200 OK Date: Thu, 26 Nov 2015 22:18:47 GMT Server: Apache/2.2.22 (Debian) …. Content-Length: 112723 Content-Type: text/csv; charset=utf-8; header=present Link: </exhibition.csv-metadata.json>;rel=describedBy; type=application/csvm+json
  36. 36. www.adequate.at CSV on the Web Summary ● Don’t publish CSV on the Web for humans, publish also for machines ○ e.g., EXCEL exports ● RFC 4180 ● Encoding ○ Use UTF-8, don’t mix encodings ● File extension: .csv ● Content-type: text/csv Optional, but big improvement! ● Ideally, publish CSV MetaData along your CSV file ● Avoid acronyms or encodings (e.g., sex=1,2,3) 36
  37. 37. www.adequate.at CSV on the Web Summary 37 ● CSV URLs ● CSVs link to other CSVs ● CSVs link to other resources ● RDF and JSON conversion REFERENCES ● CSV on the Web Working Group ● CSV on the Web Community Group ● CSV on the Web Github Repository ● Tabular Data on the Web - A Introduction to CSV on the Web (Slides) ● Implementing CSV on the Web ( Gregg Kellogg) ●
  38. 38. www.adequate.at Announcements & Pointers 38 @adequate_od 17-19 May 2017 Danube University Krems 30.8.-02.09.2016, Helsinki
  39. 39. www.adequate.at Contact 39 Jürgen Umbrich Vienna University of Economics and Business Juergen.umbrich @ wu.ac.at Short CV:https://www.wu.ac.at/en/infobiz/team/umbrich/ Johann Höchtl Donau-Universität Krems Johann.hoechtl @ donau-uni.ac.at Short CV: https://at.linkedin.com/in/johannhoechtl http://adequate.at/ http://vienna.theodi.org Martin Kaltenböck Semantic Web Company m.kaltenboeck@semantic-web.at Short CV: https://www.linkedin.com/in/martinkaltenboeck

×