Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享
1. 將 Open Data 放上 Open Source Platforms
開源資料入口平台 CKAN 開發經驗分享
@ FOSS and Project Collaboration (Spring 2015)
This work is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Taiwan License.
Presenter: 李承錱 Cheng-Jen Lee (Sol)
Email: cjlee AT iis.sinica.edu.tw
2. 2
About Me
●
Sol, @u10313335
●
Institute of Information Science, Academia Sinica
●
https://about.me/SolLee
●
Python / R / Java
●
Focused Areas
– CMS
– Data Repository
– Open Data
– *nix System Administration
3. 3
Agenda
●
Open Data and Open Data Portals
●
About CKAN
●
CKAN and 5 Open Data★
●
Experiences
●
Contribution: What and How?
4. 4
Open Data and Open Data Portals
●
Open Data
– The idea that certain data should be freely available to
everyone to use and republish as they wish, without
restrictions from copyright, patents or other
mechanisms of control1
.
●
Open Data Portals
– Facilitate access to and re-use of public sector
information2
.
– “Infrastruction” of open data
1. Wikipeida: open data https://en.wikipedia.org/wiki/Open_data
2. Open Data Portals - Digital Agenda for Europe
http://ec.europa.eu/digital-agenda/en/open-data-portals
8. 8
The Most Popular Platform for
Open Data
116 instances
around the world
in March 2015
http://ckan.org/instances
9. 9
The Most Popular Platform for
Open Data
●
Widely used in government data portal
– In EU member states, 30% open data portals adopted
CKAN (OpenDataMonitor1
, March 2015)
●
Workflow support for publishing data
●
Data Visualization
●
100+ Extensions
●
Powerful APIs
●
Open-sourced (AGPLv3)
1. http://www.opendatamonitor.eu
33. 33
CKAN and 5 Open Data★ 1
1. Tim Berners-Lee, “Linked Data”
http://www.w3.org/DesignIssues/LinkedData.html
34. 34
CKAN and 5 Open Data★
●
★ Make your stuff available on the Web (whatever
format) under an open license
Customizable licenses
35. 35
CKAN and 5 Open Data★
●
★★ Make it available as structured data (e.g., Excel
instead of image scan of a table)
★★★ Use non-proprietary formats (e.g., CSV instead of
Excel)
– Upload any data format
– Data API
●
Get records from
structured data
Data API
36. 36
CKAN and 5 Open Data★
●
★★★★ Use open standards from W3C (RDF and SPARQL) to
identify things, so that people can point at your stuff
●
★★★★★ Link your data to other data to provide context
– Built-in RDF exporting capabilities
– Expose or consume metadata from other catalogs using RDF
(DCAT) docs1
●
ckanext-qa2
: Check the openess of datasets or resources
1. Supported by ckanext-dcat extension
2. https://github.com/ckan/ckanext-qa
40. 40
Customizations for Taijiang.tw
●
Custom Metadata
●
Data Visualization
●
Custom filters
●
Harvest
●
Localization
●
Source Code Released under AGPLv3 (On GitHub: u10313335)
– ckanext-taijiang
– ckanext-spatial
– taijiang-ckan-translations
– taijiang-bulk-uploader
41. 41
Custom Metadata
●
Extension ckanext-scheming1
– Configure and share CKAN schemas using a JSON
schema description.
– Custom template snippets for editing and display
fields.
Template Name Function
text.html a simple text field for free-form text
large_text.html a larger text field
date.html a date widget
markdown.html a markdown field
select.html a select box
multiple_choice.html a group of checkboxes
repeating.html a repeating fields
1. https://github.com/open-data/ckanext-scheming, only for CKAN 2.3+
44. 44
Validator and Converter
●
Validator
– Validate user inputs
– Ex. json_validator
def json_validator(value, context):
if value == '':
return value
try:
json.loads(value)
except ValueError:
raise Invalid('Invalid JSON')
return value
45. 45
Validator and Converter
●
Converter
– Convert data to storage
– Ex. duplicate_validator
def duplicate_validator(key, data, errors,
context):
if errors[key]:
return
value = json.loads(data[key])
unduplicated = list(set(value))
data[key] = json.dumps(unduplicated)
46. 46
Data Visualization
●
There is no viewer for some GIS formats
– WMTS services
– ESRI Shapefile (*.shp and *.dbf)
●
Do It Ourselves!
– wmts_view
– shp_view
47. 47
Write a CKAN Plugin
●
PyUtilib Component Architecture (PCA)
●
Inherits from
– ckan.plugins.SingletonPlugin
●
Implements
– one (or several) ckan.plugins.* interfaces
48. 48
To Build a "viewer"
●
We need more…
– View template (Jinja template engine)
– JavaScript module
●
Ex. Shapefile preview includes shp2geojson.js1
.
1. http://gipong.github.io/shp2geojson.js/ (Released under MIT license)
49. 49
Example: Plugin for SHP Preview
from ckan import plugins as p
class SHPView(p.SingletonPlugin):
p.implements(p.IResourceView,
inherit=True)
def info(self):
return {'name': shp_view',
'title': 'shp',
'icon': 'map-marker',
'iframed': True,
'default_title': 'SHP',
}
def can_view(self, data_dict):
resource = data_dict['resource']
format_lower = resource['format'].lower()
if format_lower in self.SHP:
return self.same_domain or
self.proxy_is_enabled
return False
def view_template(self, context, data_dict):
<div data-module="shppreview" id="data-
preview" data-module-
map_config="{{ h.dump_json(map_config) }}"
></div>
// shapefile preview module
ckan.module('shppreview', function (jQuery, _)
{
Return {
initialize: function () {
…
}
showPreview: function (url, data) {
…
}
}
}
Python Plugin View Template
(shp.html)
JS Module (shp_view.js)
52. 52
Custom Filters
●
Find Datasets by
– Time period
– Self-defined categories
●
A New Plugin
– For Time Search
●
Implement IPackageController.before_search
– For Self-defined Categories
●
Implement IPackageController.before_index and
Ifacets.dataset_facets
– Both needs new definitions in solr schema
53. 53
Example: Plugin for Time Search
from ckan import plugins as p
class TaijiangDatasets(p.SingletonPlugin):
p.implements(p.IPackageController, inherit=True)
p.implements(p.IFacets)
def before_search(self, search_params):
…
begin = parse_date(search_params['extras']
['ext_begin_date'])
end = parse_date(search_params['extras']['ext_end_date'])
...
query = ("(start_time: [* TO {0}Z] AND end_time: [{0}Z TO
*]) OR (start_time: [{0}Z TO {1}Z] AND end_time: [{0}Z TO *])")
query = query.format(begin.isoformat(), end.isoformat())
search_params['q'] = query
return search_params
def dataset_facets(self, facets_dict, package_type):
facets_dict['date_facet'] = p.toolkit._('Date of Dataset')
return facets_dict
<dynamicField
name="*_time"
type="date"
indexed="true"
stored="true"
multiValued="false"/>
Python Plugin Solr Schema
63. 63
What to Contribute?
●
CKAN Core Features
– Time and spatial search for private datasets
– Publish datasets as a catalogue service Ex. CSW
– Web interface for bulk uploads
– A simplified deployment process
– Issues on GitHub: https://github.com/ckan/ckan/issues
– More ideas:
https://github.com/ckan/ideas-and-roadmap
65. 65
What to Contribute?
●
More Functions for Using Data in Web Browser
– Audios & Videos playback (Ex. Integrates plyr.io)
– Link to third party services1
, like Shiny2
(R-based) or
Ipython Notebook (Python-based)
1. http://www.data.gov/meta/open-apps/
2. https://github.com/ckan/ideas-and-roadmap/issues/35
66. 66
What to Contribute?
●
Rebuild data.g0v.tw with CKAN?
● data.g0v.tw ( 零時資料中心 )
– Built with DKAN (A CKAN clone for Drupal)
●
Problems of DKAN
– Development is much slower than CKAN
– Lack of features introduced in latter versions of CKAN
●
Ex. Multiple persistent views of data (In CKAN 2.3)
– Most gov sites in TW use (or will use) CKAN instead of
DKAN
67. 67
How to Contribute?
●
CKAN Core: ckan/ckan (GitHub)
●
Most plugins are also available on GitHub
– http://extensions.ckan.org/
●
Development Discussions (Mailing List)
– https://lists.okfn.org/mailman/listinfo/ckan-dev
●
Contributing Guide
– http://docs.ckan.org/en/latest/contributing/index.html
68. 68
Thanks for your attention!
Any Q?
Email: cjlee AT iis.sinica.edu.tw
Profile: http://about.me/sollee
Google Groups: CKAN Taiwan Interest Group
Hinweis der Redaktion
地理資訊圖資雲服務平台
由 Open Knowledge Foundation 支持
公布網址
由 Open Knowledge Foundation 支持
由 Open Knowledge Foundation 支持
Store the raw data and metadata. Visualise structured data with interactive tables, graphs and maps.