SlideShare ist ein Scribd-Unternehmen logo
1 von 68
將 Open Data 放上 Open Source Platforms
開源資料入口平台 CKAN 開發經驗分享
@ FOSS and Project Collaboration (Spring 2015)
This work is licensed under a
Creative Commons Attribution-ShareAlike 3.0 Taiwan License.
Presenter: 李承錱 Cheng-Jen Lee (Sol)
Email: cjlee AT iis.sinica.edu.tw
2
About Me
●
Sol, @u10313335
●
Institute of Information Science, Academia Sinica
●
https://about.me/SolLee
●
Python / R / Java
●
Focused Areas
– CMS
– Data Repository
– Open Data
– *nix System Administration
3
Agenda
●
Open Data and Open Data Portals
●
About CKAN
●
CKAN and 5 Open Data★
●
Experiences
●
Contribution: What and How?
4
Open Data and Open Data Portals
●
Open Data
– The idea that certain data should be freely available to
everyone to use and republish as they wish, without
restrictions from copyright, patents or other
mechanisms of control1
.
●
Open Data Portals
– Facilitate access to and re-use of public sector
information2
.
– “Infrastruction” of open data
1. Wikipeida: open data https://en.wikipedia.org/wiki/Open_data
2. Open Data Portals - Digital Agenda for Europe
http://ec.europa.eu/digital-agenda/en/open-data-portals
5
About CKAN
6
CKAN
●
The Comprehensive Knowledge Archive
Network
●
A powerful data management system
– Publishing
– Sharing
– Finding
– Using Data
7
Screenshot
8
The Most Popular Platform for
Open Data
116 instances
around the world
in March 2015
http://ckan.org/instances
9
The Most Popular Platform for
Open Data
●
Widely used in government data portal
– In EU member states, 30% open data portals adopted
CKAN (OpenDataMonitor1
, March 2015)
●
Workflow support for publishing data
●
Data Visualization
●
100+ Extensions
●
Powerful APIs
●
Open-sourced (AGPLv3)
1. http://www.opendatamonitor.eu
10
United Kingdom
DATA.GOV.UK
11
United States
DATA.GOV
12
Japan
DATA.GO.JP
13
European Union
PUBLICDATA.EU
14
Tainan City
DATA.TAINAN.GOV.TW
15
Nantou County
DATA.NANTOU.GOV.TW
16
Hsinchu City
OPENDATA.HCCG.GOV.TW
17
Taipei City
DATA.TAIPEI
18
台江內海研究資料集
TAIJIANG.TW
19
Demo Site
demo.ckan.org
20
Publish Datasets
① Add Dataset Information
21
Publish Datasets
② Add Data under the Dataset
22
Find Datasets
By Keyword
23
Find Datasets
By Location
24
Find Datasets
By filters
25
Data Preview and Visualization
recline_view (csv, xls)
Grid
26
Data Preview and Visualization
recline_view (csv, xls)
Graph
27
Data Preview and Visualization
recline_view (csv, xls)
Lat/Long fields
28
Data Preview and Visualization
wms_preview
29
Data Preview and Visualization
geojson_preview
30
Data Preview and Visualization
●
Docs: recline_view, text_view, json_view, pdf_view,
webpage_view, officedocs_view...
●
Pics: image_view
●
And more!
31
Authorization
organization
http://opendata.hccg.gov.tw/organization
32
Data Exchange
Harvest and Federation
33
CKAN and 5 Open Data★ 1
1. Tim Berners-Lee, “Linked Data”
http://www.w3.org/DesignIssues/LinkedData.html
34
CKAN and 5 Open Data★
●
★ Make your stuff available on the Web (whatever
format) under an open license
Customizable licenses
35
CKAN and 5 Open Data★
●
★★ Make it available as structured data (e.g., Excel
instead of image scan of a table)
★★★ Use non-proprietary formats (e.g., CSV instead of
Excel)
– Upload any data format
– Data API
●
Get records from
structured data
Data API
36
CKAN and 5 Open Data★
●
★★★★ Use open standards from W3C (RDF and SPARQL) to
identify things, so that people can point at your stuff
●
★★★★★ Link your data to other data to provide context
– Built-in RDF exporting capabilities
– Expose or consume metadata from other catalogs using RDF
(DCAT) docs1
●
ckanext-qa2
: Check the openess of datasets or resources
1. Supported by ckanext-dcat extension
2. https://github.com/ckan/ckanext-qa
37
Experiences
38
System Architecture
39
Installation
●
Official Documents:
– http://docs.ckan.org/en/latest/
●
Installation Notes (In Chinese):
– https://ckan-docs-tw.readthedocs.org/
40
Customizations for Taijiang.tw
●
Custom Metadata
●
Data Visualization
●
Custom filters
●
Harvest
●
Localization
●
Source Code Released under AGPLv3 (On GitHub: u10313335)
– ckanext-taijiang
– ckanext-spatial
– taijiang-ckan-translations
– taijiang-bulk-uploader
41
Custom Metadata
●
Extension ckanext-scheming1
– Configure and share CKAN schemas using a JSON
schema description.
– Custom template snippets for editing and display
fields.
Template Name Function
text.html a simple text field for free-form text
large_text.html a larger text field
date.html a date widget
markdown.html a markdown field
select.html a select box
multiple_choice.html a group of checkboxes
repeating.html a repeating fields
1. https://github.com/open-data/ckanext-scheming, only for CKAN 2.3+
42
Custom Metadata – Example
{
"field_name": "data_type",
"label": {"en": "Data Type", "zh_TW": " 資料類型 "},
"preset": "select",
"form_attrs": {"data-module": "autocomplete"},
"choices": [{"value": "statistics", "label": Statistics"}]
}
{
"field_name": "ref",
"preset": "repeating_text",
"label": {"en": "Reference", "zh_TW": " 參考來源 "},
"form_blanks": 3
}
select
repeating_text
43
Validator and Converter
●
Ensure data quality
44
Validator and Converter
●
Validator
– Validate user inputs
– Ex. json_validator
def json_validator(value, context):
if value == '':
return value
try:
json.loads(value)
except ValueError:
raise Invalid('Invalid JSON')
return value
45
Validator and Converter
●
Converter
– Convert data to storage
– Ex. duplicate_validator
def duplicate_validator(key, data, errors,
context):
if errors[key]:
return
value = json.loads(data[key])
unduplicated = list(set(value))
data[key] = json.dumps(unduplicated)
46
Data Visualization
●
There is no viewer for some GIS formats
– WMTS services
– ESRI Shapefile (*.shp and *.dbf)
●
Do It Ourselves!
– wmts_view
– shp_view
47
Write a CKAN Plugin
●
PyUtilib Component Architecture (PCA)
●
Inherits from
– ckan.plugins.SingletonPlugin
●
Implements
– one (or several) ckan.plugins.* interfaces
48
To Build a "viewer"
●
We need more…
– View template (Jinja template engine)
– JavaScript module
●
Ex. Shapefile preview includes shp2geojson.js1
.
1. http://gipong.github.io/shp2geojson.js/ (Released under MIT license)
49
Example: Plugin for SHP Preview
from ckan import plugins as p
class SHPView(p.SingletonPlugin):
p.implements(p.IResourceView,
inherit=True)
def info(self):
return {'name': shp_view',
'title': 'shp',
'icon': 'map-marker',
'iframed': True,
'default_title': 'SHP',
}
def can_view(self, data_dict):
resource = data_dict['resource']
format_lower = resource['format'].lower()
if format_lower in self.SHP:
return self.same_domain or
self.proxy_is_enabled
return False
def view_template(self, context, data_dict):
<div data-module="shppreview" id="data-
preview" data-module-
map_config="{{ h.dump_json(map_config) }}"
></div>
// shapefile preview module
ckan.module('shppreview', function (jQuery, _)
{
Return {
initialize: function () {
…
}
showPreview: function (url, data) {
…
}
}
}
Python Plugin View Template
(shp.html)
JS Module (shp_view.js)
50
Result
http://taijiang.tw/dataset/tainangis-wmts
wmts_view
51
Result
shp_view QGIS
http://taijiang.tw/dataset/proj4-29
shp_view
52
Custom Filters
●
Find Datasets by
– Time period
– Self-defined categories
●
A New Plugin
– For Time Search
●
Implement IPackageController.before_search
– For Self-defined Categories
●
Implement IPackageController.before_index and
Ifacets.dataset_facets
– Both needs new definitions in solr schema
53
Example: Plugin for Time Search
from ckan import plugins as p
class TaijiangDatasets(p.SingletonPlugin):
p.implements(p.IPackageController, inherit=True)
p.implements(p.IFacets)
def before_search(self, search_params):
…
begin = parse_date(search_params['extras']
['ext_begin_date'])
end = parse_date(search_params['extras']['ext_end_date'])
...
query = ("(start_time: [* TO {0}Z] AND end_time: [{0}Z TO
*]) OR (start_time: [{0}Z TO {1}Z] AND end_time: [{0}Z TO *])")
query = query.format(begin.isoformat(), end.isoformat())
search_params['q'] = query
return search_params
def dataset_facets(self, facets_dict, package_type):
facets_dict['date_facet'] = p.toolkit._('Date of Dataset')
return facets_dict
<dynamicField
name="*_time"
type="date"
indexed="true"
stored="true"
multiValued="false"/>
Python Plugin Solr Schema
54
Result
55
Harvest
●
ckanext-harvest
– Remote harvesting extension
– https://github.com/okfn/ckanext-harvest
●
Source Type
– CKAN
– CSW* (Catalog Service for the Web)
– WAF* (Web Accessible Folder)
– Custom (csv/xls/website… etc.)
*Provided by ckanext-spatial
56
Harvest
Job Dashboard
57
Harvest
Background Process
●
Manually
– (pyenv) $ paster --plugin=ckanext-harvest harvester
gather_consumer/fetch_consumer/run -c
/etc/ckan/default/production.ini
●
Automatically
– Supervisor (for gather & fetch consumer)
– Cron (for run)
58
Harvest
The Harvesting Interface
from base import HarvesterBase
class SRDAHarvester(HarvesterBase):
def _set_config(self,config_str):
def info(self):
...
def gather_stage(self, harvest_job):
…
def fetch_stage(self, harvest_object):
...
def import_stage(self, harvest_object):
...
See http://goo.gl/ZMnND7 for
details.
59
Localization
●
Translation for UI
– Gettext Style i18n
– Babel (*.po & *.mo)
●
In Python
p.toolkit._('String')
●
In Jinja Template
{{ _('String') }}
●
Transifex
Open Knowledge / CKAN
– Jed (For JavaScript Modules)
●
_('String')_
60
Localization
●
Translation for Extensions
– opendatatrentino/ckan-custom-translations (GitHub)
●
Translation for Metadata
– Defined in JSON Schema
– "label": {"en": "Data Type", "zh_TW": " 資料類型 "}
61
Localization
●
Chinese Search
– Solr + mmseg4j1
(A Java Tokenizer)
– Maximum Matching Algorithm2
(By Dr. Chih-Hao Tsai)
– Copy to Solr folder and modify Solr schema
– Ref: http://is.gd/2Vpzgb
1. https://github.com/chenlb/mmseg4j-solr (Released under Apache 2.0
license)
2. http://technology.chtsai.org/mmseg/
62
Contribution: What and How?
63
What to Contribute?
●
CKAN Core Features
– Time and spatial search for private datasets
– Publish datasets as a catalogue service Ex. CSW
– Web interface for bulk uploads
– A simplified deployment process
– Issues on GitHub: https://github.com/ckan/ckan/issues
– More ideas:
https://github.com/ckan/ideas-and-roadmap
64
What to Contribute?
●
i18n
– Non-ascii Filename
– Translate JS Modules (Ex. Recline.js)
– UI Translation (Transifex)
65
What to Contribute?
●
More Functions for Using Data in Web Browser
– Audios & Videos playback (Ex. Integrates plyr.io)
– Link to third party services1
, like Shiny2
(R-based) or
Ipython Notebook (Python-based)
1. http://www.data.gov/meta/open-apps/
2. https://github.com/ckan/ideas-and-roadmap/issues/35
66
What to Contribute?
●
Rebuild data.g0v.tw with CKAN?
● data.g0v.tw ( 零時資料中心 )
– Built with DKAN (A CKAN clone for Drupal)
●
Problems of DKAN
– Development is much slower than CKAN
– Lack of features introduced in latter versions of CKAN
●
Ex. Multiple persistent views of data (In CKAN 2.3)
– Most gov sites in TW use (or will use) CKAN instead of
DKAN
67
How to Contribute?
●
CKAN Core: ckan/ckan (GitHub)
●
Most plugins are also available on GitHub
– http://extensions.ckan.org/
●
Development Discussions (Mailing List)
– https://lists.okfn.org/mailman/listinfo/ckan-dev
●
Contributing Guide
– http://docs.ckan.org/en/latest/contributing/index.html
68
Thanks for your attention!
Any Q?
Email: cjlee AT iis.sinica.edu.tw
Profile: http://about.me/sollee
Google Groups: CKAN Taiwan Interest Group

Weitere ähnliche Inhalte

Was ist angesagt?

Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Databricks
 

Was ist angesagt? (20)

Introduction to oracle
Introduction to oracleIntroduction to oracle
Introduction to oracle
 
SPARQL and Linked Data Benchmarking
SPARQL and Linked Data BenchmarkingSPARQL and Linked Data Benchmarking
SPARQL and Linked Data Benchmarking
 
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to KnowSpark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
 
Apache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack WorldApache Tajo on Swift: Bringing SQL to the OpenStack World
Apache Tajo on Swift: Bringing SQL to the OpenStack World
 
Apache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlibApache Spark — Fundamentals and MLlib
Apache Spark — Fundamentals and MLlib
 
Introduction to oracle
Introduction to oracleIntroduction to oracle
Introduction to oracle
 
Introduction to oracle(2)
Introduction to oracle(2)Introduction to oracle(2)
Introduction to oracle(2)
 
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst OptimizerDeep Dive : Spark Data Frames, SQL and Catalyst Optimizer
Deep Dive : Spark Data Frames, SQL and Catalyst Optimizer
 
Introduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCFIntroduction to Globus: Research Data Management Software at the ALCF
Introduction to Globus: Research Data Management Software at the ALCF
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...Easy, scalable, fault tolerant stream processing with structured streaming - ...
Easy, scalable, fault tolerant stream processing with structured streaming - ...
 
Oracle training-in-hyderabad
Oracle training-in-hyderabadOracle training-in-hyderabad
Oracle training-in-hyderabad
 
Apache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming modelApache spark - Spark's distributed programming model
Apache spark - Spark's distributed programming model
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Orcale dba training
Orcale dba trainingOrcale dba training
Orcale dba training
 
Introducing Datawave
Introducing DatawaveIntroducing Datawave
Introducing Datawave
 
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
Datasets and tools_from_ncbi_and_elsewhere_for_microbiome_research_v_62817
 

Andere mochten auch

CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
Chengjen Lee
 

Andere mochten auch (10)

CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)CKAN 技術介紹 (基礎篇)
CKAN 技術介紹 (基礎篇)
 
CKAN 中文簡介
CKAN 中文簡介CKAN 中文簡介
CKAN 中文簡介
 
CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)CKAN 技術介紹 (開發篇)
CKAN 技術介紹 (開發篇)
 
從open data角度談網站api應用
從open data角度談網站api應用從open data角度談網站api應用
從open data角度談網站api應用
 
CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例CKAN 應用介紹 - 以台江計畫為例
CKAN 應用介紹 - 以台江計畫為例
 
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
Open LY - 「立法院開放資料服務平台建置案」服務建議書(部分)
 
Google Analytics 網站分析: 學習心得分享
Google Analytics 網站分析: 學習心得分享Google Analytics 網站分析: 學習心得分享
Google Analytics 網站分析: 學習心得分享
 
2015 Google Analytics (分析) 認證考試 考古題
2015 Google Analytics (分析) 認證考試 考古題2015 Google Analytics (分析) 認證考試 考古題
2015 Google Analytics (分析) 認證考試 考古題
 
大數據時代的必備工具-Google Analytics
大數據時代的必備工具-Google Analytics大數據時代的必備工具-Google Analytics
大數據時代的必備工具-Google Analytics
 
Google analytics教學手冊
Google analytics教學手冊Google analytics教學手冊
Google analytics教學手冊
 

Ähnlich wie 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
Data Finder
 

Ähnlich wie 將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享 (20)

Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
The Quest for an Open Source Data Science Platform
 The Quest for an Open Source Data Science Platform The Quest for an Open Source Data Science Platform
The Quest for an Open Source Data Science Platform
 
Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21Apache Calcite Tutorial - BOSS 21
Apache Calcite Tutorial - BOSS 21
 
Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)Enabling Secure Data Discoverability (SC21 Tutorial)
Enabling Secure Data Discoverability (SC21 Tutorial)
 
Hot tutorials
Hot tutorialsHot tutorials
Hot tutorials
 
Google cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache FlinkGoogle cloud Dataflow & Apache Flink
Google cloud Dataflow & Apache Flink
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)DataFinder concepts and example: General (20100503)
DataFinder concepts and example: General (20100503)
 
query_tuning.pdf
query_tuning.pdfquery_tuning.pdf
query_tuning.pdf
 
Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
DataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data ManagementDataFinder: A Python Application for Scientific Data Management
DataFinder: A Python Application for Scientific Data Management
 
Data Science on Google Cloud Platform
Data Science on Google Cloud PlatformData Science on Google Cloud Platform
Data Science on Google Cloud Platform
 
Oracle by Muhammad Iqbal
Oracle by Muhammad IqbalOracle by Muhammad Iqbal
Oracle by Muhammad Iqbal
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Organizing the Data Chaos of Scientists
Organizing the Data Chaos of ScientistsOrganizing the Data Chaos of Scientists
Organizing the Data Chaos of Scientists
 
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
SplunkLive! Tampa: Splunk Ninjas: New Features, Pivot, and Search Dojo
 

Mehr von Chengjen Lee (8)

Preserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary EventsPreserving Collaborative Documents in Contemporary Events
Preserving Collaborative Documents in Contemporary Events
 
Retooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.ioRetooling a Research Data Repository: data.depositar.io
Retooling a Research Data Repository: data.depositar.io
 
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
跨領域區域研究資料集 (data.depositar.io): CKAN 應用介紹
 
Report 140227
Report 140227Report 140227
Report 140227
 
Report 140213
Report 140213Report 140213
Report 140213
 
Introduction to Pelican
Introduction to PelicanIntroduction to Pelican
Introduction to Pelican
 
ckan 2.0: a deeper look
ckan 2.0: a deeper lookckan 2.0: a deeper look
ckan 2.0: a deeper look
 
ckan 2.0 Introduction
ckan 2.0 Introductionckan 2.0 Introduction
ckan 2.0 Introduction
 

Kürzlich hochgeladen

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

將 Open Data 放上 Open Source Platforms: 開源資料入口平台 CKAN 開發經驗分享

  • 1. 將 Open Data 放上 Open Source Platforms 開源資料入口平台 CKAN 開發經驗分享 @ FOSS and Project Collaboration (Spring 2015) This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Taiwan License. Presenter: 李承錱 Cheng-Jen Lee (Sol) Email: cjlee AT iis.sinica.edu.tw
  • 2. 2 About Me ● Sol, @u10313335 ● Institute of Information Science, Academia Sinica ● https://about.me/SolLee ● Python / R / Java ● Focused Areas – CMS – Data Repository – Open Data – *nix System Administration
  • 3. 3 Agenda ● Open Data and Open Data Portals ● About CKAN ● CKAN and 5 Open Data★ ● Experiences ● Contribution: What and How?
  • 4. 4 Open Data and Open Data Portals ● Open Data – The idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control1 . ● Open Data Portals – Facilitate access to and re-use of public sector information2 . – “Infrastruction” of open data 1. Wikipeida: open data https://en.wikipedia.org/wiki/Open_data 2. Open Data Portals - Digital Agenda for Europe http://ec.europa.eu/digital-agenda/en/open-data-portals
  • 6. 6 CKAN ● The Comprehensive Knowledge Archive Network ● A powerful data management system – Publishing – Sharing – Finding – Using Data
  • 8. 8 The Most Popular Platform for Open Data 116 instances around the world in March 2015 http://ckan.org/instances
  • 9. 9 The Most Popular Platform for Open Data ● Widely used in government data portal – In EU member states, 30% open data portals adopted CKAN (OpenDataMonitor1 , March 2015) ● Workflow support for publishing data ● Data Visualization ● 100+ Extensions ● Powerful APIs ● Open-sourced (AGPLv3) 1. http://www.opendatamonitor.eu
  • 20. 20 Publish Datasets ① Add Dataset Information
  • 21. 21 Publish Datasets ② Add Data under the Dataset
  • 25. 25 Data Preview and Visualization recline_view (csv, xls) Grid
  • 26. 26 Data Preview and Visualization recline_view (csv, xls) Graph
  • 27. 27 Data Preview and Visualization recline_view (csv, xls) Lat/Long fields
  • 28. 28 Data Preview and Visualization wms_preview
  • 29. 29 Data Preview and Visualization geojson_preview
  • 30. 30 Data Preview and Visualization ● Docs: recline_view, text_view, json_view, pdf_view, webpage_view, officedocs_view... ● Pics: image_view ● And more!
  • 33. 33 CKAN and 5 Open Data★ 1 1. Tim Berners-Lee, “Linked Data” http://www.w3.org/DesignIssues/LinkedData.html
  • 34. 34 CKAN and 5 Open Data★ ● ★ Make your stuff available on the Web (whatever format) under an open license Customizable licenses
  • 35. 35 CKAN and 5 Open Data★ ● ★★ Make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ Use non-proprietary formats (e.g., CSV instead of Excel) – Upload any data format – Data API ● Get records from structured data Data API
  • 36. 36 CKAN and 5 Open Data★ ● ★★★★ Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ● ★★★★★ Link your data to other data to provide context – Built-in RDF exporting capabilities – Expose or consume metadata from other catalogs using RDF (DCAT) docs1 ● ckanext-qa2 : Check the openess of datasets or resources 1. Supported by ckanext-dcat extension 2. https://github.com/ckan/ckanext-qa
  • 39. 39 Installation ● Official Documents: – http://docs.ckan.org/en/latest/ ● Installation Notes (In Chinese): – https://ckan-docs-tw.readthedocs.org/
  • 40. 40 Customizations for Taijiang.tw ● Custom Metadata ● Data Visualization ● Custom filters ● Harvest ● Localization ● Source Code Released under AGPLv3 (On GitHub: u10313335) – ckanext-taijiang – ckanext-spatial – taijiang-ckan-translations – taijiang-bulk-uploader
  • 41. 41 Custom Metadata ● Extension ckanext-scheming1 – Configure and share CKAN schemas using a JSON schema description. – Custom template snippets for editing and display fields. Template Name Function text.html a simple text field for free-form text large_text.html a larger text field date.html a date widget markdown.html a markdown field select.html a select box multiple_choice.html a group of checkboxes repeating.html a repeating fields 1. https://github.com/open-data/ckanext-scheming, only for CKAN 2.3+
  • 42. 42 Custom Metadata – Example { "field_name": "data_type", "label": {"en": "Data Type", "zh_TW": " 資料類型 "}, "preset": "select", "form_attrs": {"data-module": "autocomplete"}, "choices": [{"value": "statistics", "label": Statistics"}] } { "field_name": "ref", "preset": "repeating_text", "label": {"en": "Reference", "zh_TW": " 參考來源 "}, "form_blanks": 3 } select repeating_text
  • 44. 44 Validator and Converter ● Validator – Validate user inputs – Ex. json_validator def json_validator(value, context): if value == '': return value try: json.loads(value) except ValueError: raise Invalid('Invalid JSON') return value
  • 45. 45 Validator and Converter ● Converter – Convert data to storage – Ex. duplicate_validator def duplicate_validator(key, data, errors, context): if errors[key]: return value = json.loads(data[key]) unduplicated = list(set(value)) data[key] = json.dumps(unduplicated)
  • 46. 46 Data Visualization ● There is no viewer for some GIS formats – WMTS services – ESRI Shapefile (*.shp and *.dbf) ● Do It Ourselves! – wmts_view – shp_view
  • 47. 47 Write a CKAN Plugin ● PyUtilib Component Architecture (PCA) ● Inherits from – ckan.plugins.SingletonPlugin ● Implements – one (or several) ckan.plugins.* interfaces
  • 48. 48 To Build a "viewer" ● We need more… – View template (Jinja template engine) – JavaScript module ● Ex. Shapefile preview includes shp2geojson.js1 . 1. http://gipong.github.io/shp2geojson.js/ (Released under MIT license)
  • 49. 49 Example: Plugin for SHP Preview from ckan import plugins as p class SHPView(p.SingletonPlugin): p.implements(p.IResourceView, inherit=True) def info(self): return {'name': shp_view', 'title': 'shp', 'icon': 'map-marker', 'iframed': True, 'default_title': 'SHP', } def can_view(self, data_dict): resource = data_dict['resource'] format_lower = resource['format'].lower() if format_lower in self.SHP: return self.same_domain or self.proxy_is_enabled return False def view_template(self, context, data_dict): <div data-module="shppreview" id="data- preview" data-module- map_config="{{ h.dump_json(map_config) }}" ></div> // shapefile preview module ckan.module('shppreview', function (jQuery, _) { Return { initialize: function () { … } showPreview: function (url, data) { … } } } Python Plugin View Template (shp.html) JS Module (shp_view.js)
  • 52. 52 Custom Filters ● Find Datasets by – Time period – Self-defined categories ● A New Plugin – For Time Search ● Implement IPackageController.before_search – For Self-defined Categories ● Implement IPackageController.before_index and Ifacets.dataset_facets – Both needs new definitions in solr schema
  • 53. 53 Example: Plugin for Time Search from ckan import plugins as p class TaijiangDatasets(p.SingletonPlugin): p.implements(p.IPackageController, inherit=True) p.implements(p.IFacets) def before_search(self, search_params): … begin = parse_date(search_params['extras'] ['ext_begin_date']) end = parse_date(search_params['extras']['ext_end_date']) ... query = ("(start_time: [* TO {0}Z] AND end_time: [{0}Z TO *]) OR (start_time: [{0}Z TO {1}Z] AND end_time: [{0}Z TO *])") query = query.format(begin.isoformat(), end.isoformat()) search_params['q'] = query return search_params def dataset_facets(self, facets_dict, package_type): facets_dict['date_facet'] = p.toolkit._('Date of Dataset') return facets_dict <dynamicField name="*_time" type="date" indexed="true" stored="true" multiValued="false"/> Python Plugin Solr Schema
  • 55. 55 Harvest ● ckanext-harvest – Remote harvesting extension – https://github.com/okfn/ckanext-harvest ● Source Type – CKAN – CSW* (Catalog Service for the Web) – WAF* (Web Accessible Folder) – Custom (csv/xls/website… etc.) *Provided by ckanext-spatial
  • 57. 57 Harvest Background Process ● Manually – (pyenv) $ paster --plugin=ckanext-harvest harvester gather_consumer/fetch_consumer/run -c /etc/ckan/default/production.ini ● Automatically – Supervisor (for gather & fetch consumer) – Cron (for run)
  • 58. 58 Harvest The Harvesting Interface from base import HarvesterBase class SRDAHarvester(HarvesterBase): def _set_config(self,config_str): def info(self): ... def gather_stage(self, harvest_job): … def fetch_stage(self, harvest_object): ... def import_stage(self, harvest_object): ... See http://goo.gl/ZMnND7 for details.
  • 59. 59 Localization ● Translation for UI – Gettext Style i18n – Babel (*.po & *.mo) ● In Python p.toolkit._('String') ● In Jinja Template {{ _('String') }} ● Transifex Open Knowledge / CKAN – Jed (For JavaScript Modules) ● _('String')_
  • 60. 60 Localization ● Translation for Extensions – opendatatrentino/ckan-custom-translations (GitHub) ● Translation for Metadata – Defined in JSON Schema – "label": {"en": "Data Type", "zh_TW": " 資料類型 "}
  • 61. 61 Localization ● Chinese Search – Solr + mmseg4j1 (A Java Tokenizer) – Maximum Matching Algorithm2 (By Dr. Chih-Hao Tsai) – Copy to Solr folder and modify Solr schema – Ref: http://is.gd/2Vpzgb 1. https://github.com/chenlb/mmseg4j-solr (Released under Apache 2.0 license) 2. http://technology.chtsai.org/mmseg/
  • 63. 63 What to Contribute? ● CKAN Core Features – Time and spatial search for private datasets – Publish datasets as a catalogue service Ex. CSW – Web interface for bulk uploads – A simplified deployment process – Issues on GitHub: https://github.com/ckan/ckan/issues – More ideas: https://github.com/ckan/ideas-and-roadmap
  • 64. 64 What to Contribute? ● i18n – Non-ascii Filename – Translate JS Modules (Ex. Recline.js) – UI Translation (Transifex)
  • 65. 65 What to Contribute? ● More Functions for Using Data in Web Browser – Audios & Videos playback (Ex. Integrates plyr.io) – Link to third party services1 , like Shiny2 (R-based) or Ipython Notebook (Python-based) 1. http://www.data.gov/meta/open-apps/ 2. https://github.com/ckan/ideas-and-roadmap/issues/35
  • 66. 66 What to Contribute? ● Rebuild data.g0v.tw with CKAN? ● data.g0v.tw ( 零時資料中心 ) – Built with DKAN (A CKAN clone for Drupal) ● Problems of DKAN – Development is much slower than CKAN – Lack of features introduced in latter versions of CKAN ● Ex. Multiple persistent views of data (In CKAN 2.3) – Most gov sites in TW use (or will use) CKAN instead of DKAN
  • 67. 67 How to Contribute? ● CKAN Core: ckan/ckan (GitHub) ● Most plugins are also available on GitHub – http://extensions.ckan.org/ ● Development Discussions (Mailing List) – https://lists.okfn.org/mailman/listinfo/ckan-dev ● Contributing Guide – http://docs.ckan.org/en/latest/contributing/index.html
  • 68. 68 Thanks for your attention! Any Q? Email: cjlee AT iis.sinica.edu.tw Profile: http://about.me/sollee Google Groups: CKAN Taiwan Interest Group

Hinweis der Redaktion

  1. 地理資訊圖資雲服務平台
  2. 由 Open Knowledge Foundation 支持
  3. 公布網址
  4. 由 Open Knowledge Foundation 支持
  5. 由 Open Knowledge Foundation 支持
  6. Store the raw data and metadata. Visualise structured data with interactive tables, graphs and maps.
  7. 地理資訊圖資雲服務平台