Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Are we there yet?
What?An Open (Govt.) Data Monitoring Tool–   Metadata Quality and Consistency–   Benchmarking: Who fixed what and how fast...
Why?●    Dangling URLs into Nirvana    –   Data is meant to stay●    (Meta-)Data is required to be consistent in order    ...
How?●    Watcher    –   Get all metadata from CKAN data portal (legacy API calls)    –   Analyse metadata and URLs    –   ...
How? ctd.●    Presentation    –   Make some fancy display from the Redis results    –   Data drill-down    –    –   What e...
Architecture●    Heroku PaaS●    PostgreSQL data store●    Redis for ephemeral data●    Application logic in Go●    Front-...
Whats there●    Metadata spec machine readable    http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master...
Show me and I believe●    Uhm … nothing fancy yet●    Business logic & server processes●    Source: https://github.com/the...
Lessons learned●    There are many (minor) issues with metadata●    Heroku is easy to get going●    Go as a novel language...
Contact    Johann Höchtl    johann.hoechtl@gmail.com    @myprivate42    http://www.slideshare.net/jhoechtl/    https://www...
Nächste SlideShare
Wird geladen in …5
×

Are we there yet?

782 Aufrufe

Veröffentlicht am

An Open Data Metadata quality checker

Veröffentlicht in: Bildung, Technologie
  • Hi rossdjones, I choose a brute force approach right now, if the API takes too long and times out, I iterate a maximum of three times when fetching data. Seems like some queries take some time and require the database / engine to warm up, what ever, but that way I reliably could get all the data. Thank you for your advice!
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • I don't know if this would help you workaround issues with the API timing out, but at data.gov.uk we provide a data-dump in JSON every week - http://data.gov.uk/data/dumps/. Perhaps this might help make the analysis easier?
       Antworten 
    Sind Sie sicher, dass Sie …  Ja  Nein
    Ihre Nachricht erscheint hier
  • Gehören Sie zu den Ersten, denen das gefällt!

Are we there yet?

  1. 1. Are we there yet?
  2. 2. What?An Open (Govt.) Data Monitoring Tool– Metadata Quality and Consistency– Benchmarking: Who fixed what and how fast?– Is the data still there?
  3. 3. Why?● Dangling URLs into Nirvana – Data is meant to stay● (Meta-)Data is required to be consistent in order to be useful● Tendency to give without monitoring – Decoupled Metadata from Data – Question of responsibility
  4. 4. How?● Watcher – Get all metadata from CKAN data portal (legacy API calls) – Analyse metadata and URLs – Write result into staging database (SQL) – Watch for new / changed datasets● Analyser – Perform analysis on staging area (partly long-running and tedious), write result into RedisDB ● Who has the most data released? EASY! ● Who uploaded when which datasets? ● Who fixed the most mistakes during the last week? ● Who has the longest outstanding bugs? ● Which datasets are no more available?
  5. 5. How? ctd.● Presentation – Make some fancy display from the Redis results – Data drill-down – – What else?
  6. 6. Architecture● Heroku PaaS● PostgreSQL data store● Redis for ephemeral data● Application logic in Go● Front-end using Bootstrap & AngularJS
  7. 7. Whats there● Metadata spec machine readable http://htmlpreview.github.io/?https://github.com/the42/ogdat/blob/master/ppogdatspec/ogdat_s (automated conversion process from PDF [sic!])● Watcher stable● Analyser work in progress● Presentation layer: HELP
  8. 8. Show me and I believe● Uhm … nothing fancy yet● Business logic & server processes● Source: https://github.com/the42/ogdat/
  9. 9. Lessons learned● There are many (minor) issues with metadata● Heroku is easy to get going● Go as a novel language is easy to develop in – Built-in concurrency features come in handy when checking eg. Urls in parallel● CKAN API@data.gv.at is not that fast and times
  10. 10. Contact Johann Höchtl johann.hoechtl@gmail.com @myprivate42 http://www.slideshare.net/jhoechtl/ https://www.facebook.com/myprivate42●

×