SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Use of Open Data in Hong Kong
Sammy Fung
sammy.hk
Incu-Lab ICE in StartMeUpHK - Open Data Initiative Gathering
2013/12/04
We want a better life with
public data.
We want a easier way to
access the public data.
Agenda
●

What is Open Data ?

●

Use of Open Source Software in web crawling.

●

Starting new Open Source project hk0weather
to create Open Weather Data.
Sammy Fung
●

Software Developer
–

to use and develop open source sofware.

–

Perl → PHP → Python.

–

interests on Data Mining / Web Crawling.

–

own a startup of web and mobile technology.
Sammy Fung
●

15+ years in Open Source Communities.
–

Founding Chairman, Hong Kong Linux User Group.

–

Founding Chairman, Open Source Hong Kong.

–

Member, GNOME Asia committee.

–

Mozilla Representative

–

Member, program committee at COSCUP
●

Conference for Open Source Coders, Users and Developers.

●

Largest open source conference in Taiwan.
What is Open Data ?
Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format, it
can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
Open Data
●

Tim Berners-Lee, the inventor of the Web.
–

5stardata.info

–

5 star deployment scheme of Open Data.
* One Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
** Two Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
*** Three Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
**** Four Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
***** Five Star - Open Data
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
Open Data in Hong Kong
Open Data in Hong Kong
●

Data.One
–

http://www.gov.hk/en/theme/psi

–

released on 2011/3/31.

–

First App Competition on Data.One
●

Call for Submission now till 2014/02/28.
Weather Information in Hong Kong
●

Hong Kong Observatory
–

Hourly Hong Kong Weather Report

–

Regional Weather in Hong Kong (10 min updates)

–

Weather Forecast and Weekly Weather Forecast

–

Typhoon Report and Forecast
Hong Kong Observatory RSS
Hong Kong Observatory RSS
Weather at Data.One
●

●

I posted a blog 'Progress of Open
Government Data in Hong Kong' on
2013/01/17.
Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
–

One word: Useless.

–

Data.One dataset (RSS) is completely different
with HKO own paid service (XML).
Weather at Data.One
●

Example - Current local weather report:

●

Plain text report in RSS.

●

Difference to quote report content:
–
–

●

Website: a pair of HTML tags, eg. <PRE>....</PRE>.
Data.One: a pair of RSS description tags,
<description>....</description>.

Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
Weather at Data.One
●

●

●

Weather at Data.One is 'report' but not 'data'.
Weather RSS is already released by HKO
before launch of Data.One.
Technically, json/xml format is better
readable by computer programs.
Data.One
●

In November 2013, 43 datasets are available.
–

JSON/XML = 18

–

RSS = 10

–

XLS = 6

–

CSV = 4

–

JPG/PNG = 3

–

HTML/MDB = 2
Data.One
●

JSON/XML (18 datasets)
–

Air Pollution.
●

Past 24-hour Air Pollution Index from stations.

–

Approved Charitable Fund-raising Activities

–

Restaurant and Food Licences.

–

Details of facility locations.

–

Reward Notices from Police Force.

–

Marine Traffic (Arrival/Departure).

–

Traffic Speed and special news.

–

EventHK information.
Data.One
●

RSS (10 datasets)
–

Weather Information (7 datasets)

–

Beach Water Quality (1 datasets)

–

Current Air Pollution Index range and forecase (2
datasets)
Data.One
●

JPG/PNG (3 datasets)
–

Exhibition gallery of government building
projects.

–

Speed map panels.

–

Traffic snapshot images.
Data.One
●

CSV
–
–

Locations of Public Facility and GovWifi

–
●

Past Record of Air Pollution Index
Marine Shipping directory of HK

HTML
–

●

HTML version of Marine Traffic.

XLS, MDB
–

2011 Population Census.

–

Property Market Statistics.

–

Monthly Digested Stats and Registers of Auth Persons from Building Dept.

–

Routes and fares of public transport.
Data.One
●

Many departments does not release their useful data, and
release current information available on their website.
–

●

Few of them keep available open data in their own.

Most of them does not understand what is 'real' open data.
–
–

Open data format insteads of proprietary data format.

–
●

Data insteads of Information.
Useful of data.

Some departments should manage their open data in better
data structure.
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
●

●

●

In October 2013, LegCo start to publish voting
results of House Committe in XML.
It is not a part of Data.One project.
My open source software on LegCo vote
result XML:
–

http://github.com/smamyfung/legcovotes
Open Data is important to citizens.
User of Open Source
Software in web
crawling
Web Scraping
●

a computer software technique of extracting
information from websites. (Wikipedia)

●

for business, hobbies, research purposes.
Web Scraping
●

Look for right URLs to scrap.

●

Look for right content from webpages.

●

Saving data into data store.

●

When to run the web scraping program ?
Use of Open Source Software in
Web Crawling
●

●

Use Open Source Tools to collect useful and
meaningful machine-readable data.
Doesn't need to wait provider to release data
in machine-readable format.
Open Source Tools
●

Python programming lanugage

●

with Regular Expression library

●

Scrapy web crawling framework
Why python + scrapy ?
●

●

python: my current favourite programming
language for few years.
scrapy: web crawling framework written in
Python.
What is Scrapy ?
●

●

An open source web scraping framework for
Python.
Scrapy is a fast high-level screen scraping and
web crawling framework, used to crawl
websites and extract structured data from
their pages. It can be used for a wide range of
purposes, from data mining to monitoring
and automated testing.
Scrapy Features
●

define data you want to scrapy

●

write spider to extract data

●

Built-in: selecting and extracting data from HTML
and XML

●

Built-in: JSON, CSV, XML output

●

Interactive shell console

●

Built-in: web service, telnet console, logging

●

Others
Programme List of Paid TVs in 2004
Programme List of Paid TVs in 2004
●

I want to know live football match was
showing on which channel.

●

Paid TV web site = M$ + IIS + ASP + Flash

●

Slow....... Very Slow...... Extremely Slow!

●

Couldn't connect at any peak hours!

●

Wrote my first web crawler in PHP in 2004.
Public Transportation in 2006-2010
●

Kowloon Motor Bus (KMB)
–

●

No map view for a bus route

Public Transportation Enquiry System (PTES)
–

Exteremly Poor, Ugly (or much worse) map UI on
PTES.
HK Observatory and Joint Typhoon
Warning Center
●

Any typhoon is coming to Hong Kong ? And
When will it come ?

●

No easy data exchange format.

●

No RSS nor ATOM.

●

We aren't check websites everyday.
My Products
●

WeatherHK ← ← ←

●

TCTrack
WeatherHK
●

http://twitter.com/weatherhk

●

hourly current weather report

●

weather forecast report

●

tropical signal warning
WeatherHK
●

●

Backend: Python + Scrapy + Database +
Twitter + NNTP......
Frontend: Twitter + Newsgroup
WeatherHK
●

http://twitter.com/weatherhk

●

Interview by MetroPop in 2009.
My Products
●

WeatherHK

●

TCTrack ← ← ←
TCTrack
●

●

●

http://sammy.hk/projects/tctrack/tctrack.php
Plot TC current and forecast tracks over
Google Map.
Source:
–

JTWC

–

HKO
TCTrack
●

●

●

http://sammy.hk/projects/tctrack/tctrack.php
Probably first tctrack map in HK using
GoogleMap
Use of GMap: TCTrack -> Weather
Underground Hong Kong -> HKO
TCTrack
●

http://twitter.com/tctrack

●

Tweet JTWC updates for Northwest Pacific.
Releases information to citizens
in a better presentation.
Starting new Open
Source project
hk0weather to create
Open Weather Data.
Starting new Open Source projects
to create Open Data
●

●

Develop a open source project.
Release data in standard machine-readable
data format.
hk0weather
●

https://github.com/sammyfung/hk0weather

●

Open Source Hong Kong Weather Project.

●

convert to JSON data from HKO webpages.

●

python + scrapy

●

1st version: from current weather report,
extracting temperture and humidity from 20+
weather stations, export in json format.
hk0weather
●

https://github.com/sammyfung/hk0weather

●

$ virtualenv hk0weatherenv

●

$ source hk0weatherenv/bin/activate

●

$ pip install scrapy

●

$ git clone
https://github.com/sammyfung/hk0weather.git

●

$ cd hk0weather

●

$ scrapy crawl currwx -t json -o testresult
hk0weather
●

Python
–

●

import re

Scrapy
–

web crawling framework written in Python.

–

HtmlXPathSelector.

–

built-in JSON, CSV, XML output.
hk0weather
[{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720},
{"station": "kingspark", "temperture": 16, "time": 1360785720},
{"station": "wongchukhang", "temperture": 17, "time": 1360785720},
{"station": "takwuling", "temperture": 16, "time": 1360785720},
{"station": "laufaushan", "temperture": 15, "time": 1360785720},
{"station": "taipo", "temperture": 16, "time": 1360785720},
{"station": "shatin", "temperture": 17, "time": 1360785720},
{"station": "tuenmun", "temperture": 17, "time": 1360785720},
{"station": "tseungkwano", "temperture": 16, "time": 1360785720},
{"station": "saikung", "temperture": 16, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "tsingyi", "temperture": 17, "time": 1360785720},
{"station": "shekkong", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720},
{"station": "hongkongpark", "temperture": 17, "time": 1360785720},
{"station": "shaukeiwan", "temperture": 16, "time": 1360785720},
{"station": "kowlooncity", "temperture": 16, "time": 1360785720},
{"station": "happyvalley", "temperture": 18, "time": 1360785720},
{"station": "wongtaisin", "temperture": 17, "time": 1360785720},
{"station": "stanley", "temperture": 16, "time": 1360785720},
{"station": "kwuntong", "temperture": 15, "time": 1360785720},
{"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
Items.py
class Hk0WeatherItem(Item):
time = Field()
station = Field()
temperture = Field()
humidity = Field()
Currwx.py
start_urls = (
'http://www.weather.gov.hk/wxinfo/currwx/curr
entc.htm',
)
Currwx.py
def parse(self, response):
laststation = ''
temperture = int()
stations = []
hxs = HtmlXPathSelector(response)
report = hxs.select('//div[@id="ming"]')
libhk0
class hk0:
stations = [
(u' 天 文 台 ', 'hko'),
(u' 京 士 柏 ', 'kingspark'),
(u' 黃 竹 坑 ', 'wongchukhang'),
(u' 打 鼓 嶺 ', 'takwuling'),
(u' 流 浮 山 ', 'laufaushan'),
libhk0
class hk0:
def gettime(self, report):
…
def hk0current(self, report):
…
Agenda
●

What is Open Data ?

●

Use of Open Source Software in web crawling.

●

Starting new Open Source project hk0weather
to create Open Weather Data.
We want a easier way to
access the public data.
We want a better life with
public data.
Thank You!
sammy.hk

Weitere ähnliche Inhalte

Was ist angesagt?

Scraping talk public
Scraping talk publicScraping talk public
Scraping talk public
Nesta
 

Was ist angesagt? (20)

Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common CrawlBuilding a Web-Scale Dependency-Parsed Corpus from Common Crawl
Building a Web-Scale Dependency-Parsed Corpus from Common Crawl
 
Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
[Webinar] FactForge Debuts: Trump World Data and Instant Ranking of Industry ...
 
Link Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-OnLink Discovery Tutorial Part V: Hands-On
Link Discovery Tutorial Part V: Hands-On
 
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUSSemantics 2017 - Trying Not to Die Benchmarking using LITMUS
Semantics 2017 - Trying Not to Die Benchmarking using LITMUS
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
IN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority CollaborationIN2N: Cross-institutional Authority Collaboration
IN2N: Cross-institutional Authority Collaboration
 
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching SystemsLink Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
Link Discovery Tutorial Part III: Benchmarking for Instance Matching Systems
 
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
Brief State of the Art - Semantic Web technologies for geospatial data - Mode...
 
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
APTA TransITech 2013 - "Open Transit Data - A Developers Perspective"
 
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
Almost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without ProgrammingAlmost Scraping: Web Scraping without Programming
Almost Scraping: Web Scraping without Programming
 
What the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open DataWhat the Adoption of schema.org Tells about Linked Open Data
What the Adoption of schema.org Tells about Linked Open Data
 
Linked data life cycles
Linked data life cyclesLinked data life cycles
Linked data life cycles
 
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked DataIntroduction to the Data Web, DBpedia and the Life-cycle of Linked Data
Introduction to the Data Web, DBpedia and the Life-cycle of Linked Data
 
Scraping talk public
Scraping talk publicScraping talk public
Scraping talk public
 

Andere mochten auch

Intro to editing
Intro to editingIntro to editing
Intro to editing
Ms Olive
 
Applying representation theory
Applying representation theoryApplying representation theory
Applying representation theory
Ms Olive
 
Sli̇deshar eyenidf
Sli̇deshar eyenidfSli̇deshar eyenidf
Sli̇deshar eyenidf
Sefa Doğan
 
Tembang macapat2
Tembang macapat2Tembang macapat2
Tembang macapat2
Ayu Spears
 
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Solocal Group UK
 
Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.
alenochka94-94
 
5. new technologies
5. new technologies5. new technologies
5. new technologies
Ms Olive
 

Andere mochten auch (15)

الجهاز التنفسي
الجهاز التنفسيالجهاز التنفسي
الجهاز التنفسي
 
Chardham ytara tours
Chardham ytara toursChardham ytara tours
Chardham ytara tours
 
Intro to editing
Intro to editingIntro to editing
Intro to editing
 
Applying representation theory
Applying representation theoryApplying representation theory
Applying representation theory
 
Sli̇deshar eyenidf
Sli̇deshar eyenidfSli̇deshar eyenidf
Sli̇deshar eyenidf
 
David lee cates pp
David lee cates ppDavid lee cates pp
David lee cates pp
 
Friends Forever
Friends ForeverFriends Forever
Friends Forever
 
Christmas at Ysgol Rhewl
Christmas at Ysgol RhewlChristmas at Ysgol Rhewl
Christmas at Ysgol Rhewl
 
rediscovering DUNCANVILLE
rediscovering DUNCANVILLErediscovering DUNCANVILLE
rediscovering DUNCANVILLE
 
Tembang macapat2
Tembang macapat2Tembang macapat2
Tembang macapat2
 
Drone Hijacking
Drone HijackingDrone Hijacking
Drone Hijacking
 
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
Pierre-Andre Pochon, Leadformance - Solocal Group UK Event "How To Drive Onli...
 
Northern Illinois Rockford Heart Walk Slated for May of 2015
Northern Illinois Rockford Heart Walk Slated for May of 2015 Northern Illinois Rockford Heart Walk Slated for May of 2015
Northern Illinois Rockford Heart Walk Slated for May of 2015
 
Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.Как я представляю использование информационных технологий в социальной работе.
Как я представляю использование информационных технологий в социальной работе.
 
5. new technologies
5. new technologies5. new technologies
5. new technologies
 

Ähnlich wie Ice dec04-04-sammy

Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
Sammy Fung
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
European Data Forum
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
Dublinked .
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 

Ähnlich wie Ice dec04-04-sammy (20)

Use of Open Data in Hong Kong
Use of Open Data in Hong KongUse of Open Data in Hong Kong
Use of Open Data in Hong Kong
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
Open Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object StorageOpen Source Weather Information Project with OpenStack Object Storage
Open Source Weather Information Project with OpenStack Object Storage
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
EDF2012  Rufus Pollock - Open Data. Where we are where we are goingEDF2012  Rufus Pollock - Open Data. Where we are where we are going
EDF2012 Rufus Pollock - Open Data. Where we are where we are going
 
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
 
Brand Niemann Tutorial12242009
Brand Niemann Tutorial12242009Brand Niemann Tutorial12242009
Brand Niemann Tutorial12242009
 
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...Put Your Desktop in the Cloud In Support of the Open Government Directive and...
Put Your Desktop in the Cloud In Support of the Open Government Directive and...
 
Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011Dublinked tech workshop_15_dec2011
Dublinked tech workshop_15_dec2011
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
Data Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact SolutionsData Science: Harnessing Open Data for High Impact Solutions
Data Science: Harnessing Open Data for High Impact Solutions
 
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
Semantics on services: the story so far (SALAD2015 keynote at ESWC2015)
 
Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...Stop making tools! Nobody likes them anyway...
Stop making tools! Nobody likes them anyway...
 
Open Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they CompareOpen Data Portals: 9 Solutions and How they Compare
Open Data Portals: 9 Solutions and How they Compare
 
WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410WWW2014 Overview of W3C Linked Data Platform 20140410
WWW2014 Overview of W3C Linked Data Platform 20140410
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2
 
The Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open DataThe Power of Semantic Technologies to Explore Linked Open Data
The Power of Semantic Technologies to Explore Linked Open Data
 
SplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding OverviewSplunkLive! Munich 2018: Data Onboarding Overview
SplunkLive! Munich 2018: Data Onboarding Overview
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Config Management and Data Service Deep Dive
Config Management and Data Service Deep DiveConfig Management and Data Service Deep Dive
Config Management and Data Service Deep Dive
 

Mehr von Chun Ming Au Yeung (8)

Ice dec05-04-wan leung
Ice dec05-04-wan leungIce dec05-04-wan leung
Ice dec05-04-wan leung
 
Ice dec02-02-andrew
Ice dec02-02-andrewIce dec02-02-andrew
Ice dec02-02-andrew
 
Ice dec06-03-kim
Ice dec06-03-kimIce dec06-03-kim
Ice dec06-03-kim
 
Ice dec06-02-mo
Ice dec06-02-moIce dec06-02-mo
Ice dec06-02-mo
 
Ice dec02-03-marlon
Ice dec02-03-marlonIce dec02-03-marlon
Ice dec02-03-marlon
 
Ice dec02-01-pindar
Ice dec02-01-pindarIce dec02-01-pindar
Ice dec02-01-pindar
 
Ice dec03 03-billy
Ice dec03 03-billyIce dec03 03-billy
Ice dec03 03-billy
 
Ice dec06-02-christina
Ice dec06-02-christinaIce dec06-02-christina
Ice dec06-02-christina
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Ice dec04-04-sammy

  • 1. Use of Open Data in Hong Kong Sammy Fung sammy.hk Incu-Lab ICE in StartMeUpHK - Open Data Initiative Gathering 2013/12/04
  • 2. We want a better life with public data.
  • 3. We want a easier way to access the public data.
  • 4. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source project hk0weather to create Open Weather Data.
  • 5. Sammy Fung ● Software Developer – to use and develop open source sofware. – Perl → PHP → Python. – interests on Data Mining / Web Crawling. – own a startup of web and mobile technology.
  • 6. Sammy Fung ● 15+ years in Open Source Communities. – Founding Chairman, Hong Kong Linux User Group. – Founding Chairman, Open Source Hong Kong. – Member, GNOME Asia committee. – Mozilla Representative – Member, program committee at COSCUP ● Conference for Open Source Coders, Users and Developers. ● Largest open source conference in Taiwan.
  • 7. What is Open Data ?
  • 8. Open Data Three Laws of Open Government Data by David Eaves. 1.If it can't be spidered or indexed, it doesn't exist. 2.If it isn't available in open and machine readable format, it can't engage. 3.If a legal framework doesn't allow it to be repurposed, it doesn't empower. http://eaves.ca/2009/09/30/three-law-of-open-government-data/
  • 9. Open Data ● Tim Berners-Lee, the inventor of the Web. – 5stardata.info – 5 star deployment scheme of Open Data.
  • 10. * One Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 11. ** Two Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 12. *** Three Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 13. **** Four Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 14. ***** Five Star - Open Data 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context. 5stardata.info by Tim Berners-Lee, the inventor of the Web.
  • 15. Open Data in Hong Kong
  • 16. Open Data in Hong Kong ● Data.One – http://www.gov.hk/en/theme/psi – released on 2011/3/31. – First App Competition on Data.One ● Call for Submission now till 2014/02/28.
  • 17. Weather Information in Hong Kong ● Hong Kong Observatory – Hourly Hong Kong Weather Report – Regional Weather in Hong Kong (10 min updates) – Weather Forecast and Weekly Weather Forecast – Typhoon Report and Forecast
  • 20. Weather at Data.One ● ● I posted a blog 'Progress of Open Government Data in Hong Kong' on 2013/01/17. Weather at Data.One provides 7 dataset URLs, returns RSS (XML) format (Eng/TChi/SChi) – One word: Useless. – Data.One dataset (RSS) is completely different with HKO own paid service (XML).
  • 21. Weather at Data.One ● Example - Current local weather report: ● Plain text report in RSS. ● Difference to quote report content: – – ● Website: a pair of HTML tags, eg. <PRE>....</PRE>. Data.One: a pair of RSS description tags, <description>....</description>. Other weather data is missing, eg. Regional temperture updates per each 12 mins.
  • 22. Weather at Data.One ● ● ● Weather at Data.One is 'report' but not 'data'. Weather RSS is already released by HKO before launch of Data.One. Technically, json/xml format is better readable by computer programs.
  • 23. Data.One ● In November 2013, 43 datasets are available. – JSON/XML = 18 – RSS = 10 – XLS = 6 – CSV = 4 – JPG/PNG = 3 – HTML/MDB = 2
  • 24. Data.One ● JSON/XML (18 datasets) – Air Pollution. ● Past 24-hour Air Pollution Index from stations. – Approved Charitable Fund-raising Activities – Restaurant and Food Licences. – Details of facility locations. – Reward Notices from Police Force. – Marine Traffic (Arrival/Departure). – Traffic Speed and special news. – EventHK information.
  • 25. Data.One ● RSS (10 datasets) – Weather Information (7 datasets) – Beach Water Quality (1 datasets) – Current Air Pollution Index range and forecase (2 datasets)
  • 26. Data.One ● JPG/PNG (3 datasets) – Exhibition gallery of government building projects. – Speed map panels. – Traffic snapshot images.
  • 27. Data.One ● CSV – – Locations of Public Facility and GovWifi – ● Past Record of Air Pollution Index Marine Shipping directory of HK HTML – ● HTML version of Marine Traffic. XLS, MDB – 2011 Population Census. – Property Market Statistics. – Monthly Digested Stats and Registers of Auth Persons from Building Dept. – Routes and fares of public transport.
  • 28. Data.One ● Many departments does not release their useful data, and release current information available on their website. – ● Few of them keep available open data in their own. Most of them does not understand what is 'real' open data. – – Open data format insteads of proprietary data format. – ● Data insteads of Information. Useful of data. Some departments should manage their open data in better data structure.
  • 29. Legco Meeting Minutes and Voting Results
  • 30. Legco Meeting Minutes and Voting Results
  • 31. Legco Meeting Minutes and Voting Results ● ● ● In October 2013, LegCo start to publish voting results of House Committe in XML. It is not a part of Data.One project. My open source software on LegCo vote result XML: – http://github.com/smamyfung/legcovotes
  • 32. Open Data is important to citizens.
  • 33. User of Open Source Software in web crawling
  • 34. Web Scraping ● a computer software technique of extracting information from websites. (Wikipedia) ● for business, hobbies, research purposes.
  • 35. Web Scraping ● Look for right URLs to scrap. ● Look for right content from webpages. ● Saving data into data store. ● When to run the web scraping program ?
  • 36. Use of Open Source Software in Web Crawling ● ● Use Open Source Tools to collect useful and meaningful machine-readable data. Doesn't need to wait provider to release data in machine-readable format.
  • 37. Open Source Tools ● Python programming lanugage ● with Regular Expression library ● Scrapy web crawling framework
  • 38. Why python + scrapy ? ● ● python: my current favourite programming language for few years. scrapy: web crawling framework written in Python.
  • 39. What is Scrapy ? ● ● An open source web scraping framework for Python. Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
  • 40. Scrapy Features ● define data you want to scrapy ● write spider to extract data ● Built-in: selecting and extracting data from HTML and XML ● Built-in: JSON, CSV, XML output ● Interactive shell console ● Built-in: web service, telnet console, logging ● Others
  • 41. Programme List of Paid TVs in 2004
  • 42. Programme List of Paid TVs in 2004 ● I want to know live football match was showing on which channel. ● Paid TV web site = M$ + IIS + ASP + Flash ● Slow....... Very Slow...... Extremely Slow! ● Couldn't connect at any peak hours! ● Wrote my first web crawler in PHP in 2004.
  • 43. Public Transportation in 2006-2010 ● Kowloon Motor Bus (KMB) – ● No map view for a bus route Public Transportation Enquiry System (PTES) – Exteremly Poor, Ugly (or much worse) map UI on PTES.
  • 44. HK Observatory and Joint Typhoon Warning Center ● Any typhoon is coming to Hong Kong ? And When will it come ? ● No easy data exchange format. ● No RSS nor ATOM. ● We aren't check websites everyday.
  • 45. My Products ● WeatherHK ← ← ← ● TCTrack
  • 46. WeatherHK ● http://twitter.com/weatherhk ● hourly current weather report ● weather forecast report ● tropical signal warning
  • 47. WeatherHK ● ● Backend: Python + Scrapy + Database + Twitter + NNTP...... Frontend: Twitter + Newsgroup
  • 50. TCTrack ● ● ● http://sammy.hk/projects/tctrack/tctrack.php Plot TC current and forecast tracks over Google Map. Source: – JTWC – HKO
  • 51. TCTrack ● ● ● http://sammy.hk/projects/tctrack/tctrack.php Probably first tctrack map in HK using GoogleMap Use of GMap: TCTrack -> Weather Underground Hong Kong -> HKO
  • 53. Releases information to citizens in a better presentation.
  • 54. Starting new Open Source project hk0weather to create Open Weather Data.
  • 55. Starting new Open Source projects to create Open Data ● ● Develop a open source project. Release data in standard machine-readable data format.
  • 56. hk0weather ● https://github.com/sammyfung/hk0weather ● Open Source Hong Kong Weather Project. ● convert to JSON data from HKO webpages. ● python + scrapy ● 1st version: from current weather report, extracting temperture and humidity from 20+ weather stations, export in json format.
  • 57. hk0weather ● https://github.com/sammyfung/hk0weather ● $ virtualenv hk0weatherenv ● $ source hk0weatherenv/bin/activate ● $ pip install scrapy ● $ git clone https://github.com/sammyfung/hk0weather.git ● $ cd hk0weather ● $ scrapy crawl currwx -t json -o testresult
  • 58. hk0weather ● Python – ● import re Scrapy – web crawling framework written in Python. – HtmlXPathSelector. – built-in JSON, CSV, XML output.
  • 59. hk0weather [{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720}, {"station": "kingspark", "temperture": 16, "time": 1360785720}, {"station": "wongchukhang", "temperture": 17, "time": 1360785720}, {"station": "takwuling", "temperture": 16, "time": 1360785720}, {"station": "laufaushan", "temperture": 15, "time": 1360785720}, {"station": "taipo", "temperture": 16, "time": 1360785720}, {"station": "shatin", "temperture": 17, "time": 1360785720}, {"station": "tuenmun", "temperture": 17, "time": 1360785720}, {"station": "tseungkwano", "temperture": 16, "time": 1360785720}, {"station": "saikung", "temperture": 16, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "tsingyi", "temperture": 17, "time": 1360785720}, {"station": "shekkong", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720}, {"station": "hongkongpark", "temperture": 17, "time": 1360785720}, {"station": "shaukeiwan", "temperture": 16, "time": 1360785720}, {"station": "kowlooncity", "temperture": 16, "time": 1360785720}, {"station": "happyvalley", "temperture": 18, "time": 1360785720}, {"station": "wongtaisin", "temperture": 17, "time": 1360785720}, {"station": "stanley", "temperture": 16, "time": 1360785720}, {"station": "kwuntong", "temperture": 15, "time": 1360785720}, {"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
  • 60. Items.py class Hk0WeatherItem(Item): time = Field() station = Field() temperture = Field() humidity = Field()
  • 62. Currwx.py def parse(self, response): laststation = '' temperture = int() stations = [] hxs = HtmlXPathSelector(response) report = hxs.select('//div[@id="ming"]')
  • 63. libhk0 class hk0: stations = [ (u' 天 文 台 ', 'hko'), (u' 京 士 柏 ', 'kingspark'), (u' 黃 竹 坑 ', 'wongchukhang'), (u' 打 鼓 嶺 ', 'takwuling'), (u' 流 浮 山 ', 'laufaushan'),
  • 64. libhk0 class hk0: def gettime(self, report): … def hk0current(self, report): …
  • 65. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● Starting new Open Source project hk0weather to create Open Weather Data.
  • 66. We want a easier way to access the public data.
  • 67. We want a better life with public data.