3. Sammy Fung
● Software Developer
– to use and develop open source
sofware.
– Perl → PHP → Python.
– Interests in data mining / web
scraping.
– Consultant in web technology.
4. Sammy Fung
● 15+ years in Open Source Communities.
– Founding Chairman, Hong Kong Linux User Group.
– Founding Chairman, Open Source Hong Kong.
– Member, GNOME Asia committee.
– Mozilla Representative in Hong Kong.
– Organize, speak and participate in open source
conferences in East Asia and U.S. in recent years –
Taiwan, Korea, Japan, Malayisa, Singapore, and
Bay Area, CA.
5. Agenda
● Local Weather Information
– From Local Meteorological Observatories to Open
Data.
● GNOME Shell Extension
– Weather Widgets
14. Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable
format, it can't engage.
3.If a legal framework doesn't allow it to be
repurposed, it doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
15. Open Data
● Tim Berners-Lee
– the inventor of the
Web.
– 5stardata.info
● 5 star deployment
scheme of Open Data
suggested by Tim
Berners-Lee.
16. Five Star Open Data - 5stardata.info
1.make your stuff available on the Web (whatever format)
under an open license.
2.make it available as structured data (e.g., Excel instead of
image scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your
stuff.
5.link your data to other data to provide context.
5stardata.info by Tim Berners-Lee, the inventor of the Web.
18. Open Data in Hong Kong
● Data.One
– http://www.gov.hk/en/theme/psi
– released on 2011/3/31.
– First App Competition on Data.One
● Call for Submission now till 2014/02/28.
19. Weather Information in Hong Kong
● Hong Kong Observatory
– Hourly Hong Kong Weather Report
– Regional Weather in Hong Kong (10 min updates)
– Weather Forecast and Weekly Weather Forecast
– Typhoon Report and Forecast
22. Weather at Data.One
● I posted a blog 'Progress of Open
Government Data in Hong Kong' on
2013/01/17.
● Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
– One word: Useless.
– Data.One dataset (RSS) is completely different
with HKO own paid service (XML).
23. Weather at Data.One
● Example - Current local weather report:
● Plain text report in RSS.
● Difference to quote report content:
– Website: a pair of HTML tags, eg. <PRE>....</PRE>.
– Data.One: a pair of RSS description tags,
<description>....</description>.
● Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
24. Weather at Data.One
● Weather at Data.One is 'report' but not 'data'.
● Weather RSS is already released by HKO
before launch of Data.One.
● Technically, json/xml format is better
readable by computer programs.
25. Digital21 Strategy
Public Consultation Document
(G) Public Sector Information (PSI) as Default
"34. Through different channels (like press releases, publications, websites, etc.), the
Government releases a lot of information in different areas. However, most of such
information can only be read but cannot be used. In view of the immense benefits
of widening access to PSI for free and easy re-use, we propose to make all
Government information released for public consumption machine-readable by
default. Where appropriate, datasets will be released with application programming
interfaces (APIs), providing predefined functions to make their retrieval easier."
(G) 廣泛提供公共資料
"34. 政府透過不同途徑(例如新聞稿、出版物、網站等)發放大量不同範疇的資料。然
而,這些資料大都只可供閱讀而不能使用。有見開放公共資料以供免費再用可帶來巨大
效益,我們建議所有開放予公眾使用的政府資料都須以數碼格式編製。在適用情況下 ,
資料發布時會同時推出應用程式界面,以便提供預設功能, 讓公眾輕易地檢索資料。"
26. Digital21 Strategy
Public Consultation Document
"33. PSI datasets can be used and meshed together to create
innovative new applications, as demonstrated by the creative and
useful products and services developed from PSI in Hong Kong and
around the world. For example, using PSI datasets on traffic snapshot
images, a number of mobile apps have been developed to provide
real-time traffic situation for users to avoid traffic jams in planning
their traffic routes. Experience from other developed economies
shows that widening access to PSI datasets can open up lucrative
business opportunities and bring social benefits. By tapping the
creativity of the community and entrepreneurs, the use of PSI can
lead to positive social outcomes. For instance, in some cities in the
United States, application of PSI on hygiene inspections has led to a
significant drop in food poisoning incidents."
28. Digital21 Strategy
Public Consultation Document
"35. Apart from Government data, there are vast amounts of PSI
handled, collected and disseminated by public organisations,
which are equally useful for the development of innovative
services and products. Therefore, we propose to encourage
public organisations (e.g. public utilities and transport operators)
to release data owned by them in machine-readable format."
"35. 除了政府資料外 , 本港亦備有大量經公共機構處理、收集及發
放的公共資料 , 這些資料對開發創新服務及產品同樣有用。因此 , 我
們建議鼓勵公共機構 ( 例如公用事業及運輸機構 ) 發放以數碼格式編
製的資料。 "
29. Open Data in Taiwan
● Open Weather Data from CWB.
● Community – g0v.tw
34. g0v.tw
● Promote information transparency.
● Develop information platform and tools for a
society of citizen participation.
● Open Source model.
● Stackoverflow-like Q&A system for public to
asking for data which they are looking for.
35. g0v.tw
● Established after Taiwan Yahoo! Open Hack
Day in October 2012.
● Hackers, Professors, NGO/NPO, Students,
Writers, Visual Media, Legal Professionals.
● Organize 5+ bi-monthly hackathons since
December 2012.
37. Air Pollution Index
● http://g0v.github.io/twgeojson/air.html
● Develop a web-based visualized map of air
pollution.
● Use Open Data provided by Environmental
Protection Administration
(opendata.epa.gov.tw)
● Air Pollution Indexes and Data from different
stations.
39. Moedict 萌典
● Raw data from Ministry of Education (edu.tw)
● Community build of web-based chinese
dictionary with 160,000 Chinese items and
other items.
● Support auto-completion, searching and
offline versions.
● Source codes, other platforms, data are
available on 3du.tw (hackpad).
42. Programme List of Paid TVs in 2004
● I want to know live football match was
showing on which channel.
● Paid TV web site = M$ + IIS + ASP + Flash
● Slow....... Very Slow...... Extremely Slow!
● Couldn't connect at any peak hours!
● Wrote my first web crawler in PHP in 2004.
43. Public Transportation in 2006-2010
● Kowloon Motor Bus (KMB)
– No map view for a bus route
● Public Transportation Enquiry System (PTES)
– Exteremly Poor, Ugly (or much worse) map UI on
PTES.
44. HK Observatory and Joint Typhoon
Warning Center
● Any typhoon is coming to Hong Kong ? And
When will it come ?
● No easy data exchange format.
● No RSS nor ATOM.
● We aren't check websites everyday.
53. Web Scraping
● a computer software technique of extracting
information from websites. (Wikipedia)
● for business, hobbies, research purposes.
54. Web Scraping
● Look for right URLs to scrap.
● Look for right content from webpages.
● Saving data into data store.
● When to run the web scraping program ?
56. Use of Open Source Software in
Web Crawling
● Use Open Source Tools to collect useful and
meaningful machine-readable data.
● Doesn't need to wait provider to release data
in machine-readable format.
57. Open Source Tools
● Python programming lanugage
● with Regular Expression library
● Scrapy web crawling framework
58. Why python + scrapy ?
● python: my current favourite programming
language for few years.
● scrapy: web crawling framework written in
Python.
59. What is Scrapy ?
● An open source web scraping framework for
Python.
● Scrapy is a fast high-level screen scraping and
web crawling framework, used to crawl
websites and extract structured data from
their pages. It can be used for a wide range of
purposes, from data mining to monitoring
and automated testing.
60. Scrapy Features
● define data you want to scrapy
● write spider to extract data
● Built-in: selecting and extracting data from HTML
and XML
● Built-in: JSON, CSV, XML output
● Interactive shell console
● Built-in: web service, telnet console, logging
● Others
62. hk0weather
● Open Source.
● Web scraping HKO website.
● Output datas in standard machine-readable
data format – JSON, XML.
● https://github.com/sammyfung/hk0weather
● python + scrapy
63. hk0weather
● 1st version:
– from hourly weather report
– extracting temperture and humidity from 20+
weather stations, export in json format.
● 2nd version:
– From 10-minute update regional weather report.
– Including wind directions, wind speeds, max
gusts.
70. GNOME Shell
● Core user interface functions for GNOME
– 2 Screenshots followings after this slide.
● Switching to windows
● Launching applications
● Panel at the top of the screen
● the Activities Overview
● Message Tray at the bottom of the screen.
71.
72.
73. GNOME Shell Extensions
● Small pieces of code
● Written by third party developers
– That's mean they could be most of you!
● Modify the way GNOME works.
● Similiar: Chrome Extensions or Firefox Addons.
● Extensions can be found and installed from
extensions.gnome.org.
74. What can GNOME Shell Extensions
do ?
● Extensions may make small changes.
– like moving your clock to the right-hand side of
the screen
● Or make big changes
– like arranging the windows in the Activities
Overview in a different way.
77. Installation of GNOME Shell
Extensions
● "GNOME Shell Integration" plugin is installed
and enabled in your browser preferences
● Go to extensions.gnome.org, find and install.
– Whitelisted this website or turn off the click-to-
play feature at your browser.
● Make sure Unzip is installed.
79. Weather (by Neroth)
● A simple extension for displaying weather
information from several cities in GNOME
Shell
● https://github.com/Neroth/gnome-shell-
extension-weather
82. OpenWeather (by jens)
● Weather extension to display weather
information from OpenWeatherMap for many
cities in GNOME Shell.
● https://github.com/jenslody/gnome-shell-
extension-openweather
84. Developing next Weather Widget
● With data from cities/town-level weather
stations.
– Yahoo Weather ?
– OpenWeatherMap ?
– Open ''Weather'' Data ?
● From Observatories
● Web Scraping