SlideShare a Scribd company logo
1 of 73
Download to read offline
Open Source Weather
Information Project with
OpenStack Object Storage
Sammy Fung
blog.linuxharbour.com
sammy.hk
OpenStack Summit 2013
Welcome to
Hong Kong!
Sammy Fung
●

Software Developer
–

to use and develop open source sofware.

–

Perl → PHP → Python.

–

Startup works on online job board, job research
and web crawling.

–

Consultancy works at a internet service company
43 Global to deploy OpenStack cloud service.
Sammy Fung
●

Open Source Community Leader.
–
–

Community Manager, opensource.hk.

–

GNOME Asia committee member.

–

Mozilla Rep.

–

●

Founding Chairman, Hong Kong Linux User Group.

Program committee member of COSCUP - the largest
Open Source conference in Taiwan.

Blogger at sammy.hk.
About this presentation
●

I presents my hk0weather project in different
open source events and conference in Hong
Kong and Asia this year.

●

Weather information is my personal interests

●

Started open source project hk0weather.

●

Traditional Database or Object Storage ?
OpenStack at ISP
●

Compute: nova

●

Block storage: cinder

●

Networking: quantum / neutron

●

Dashboard: Horizon
BUT
OpenStack is not just
a platform of
Virtual Servers
OpenStack is a platform of

cloud services.
We should do some education.
So, I talk about use of object storage in this talk.
Agenda
●

What is Open Data ?

●

Use of Open Source Software in web crawling.

●

●

Starting new Open Source project hk0weather
to create Open Weather Data.
Use of OpenStack Object Storage
What is Open Data ?
Open Data
Three Laws of Open Government Data by David Eaves.
1.If it can't be spidered or indexed, it doesn't exist.
2.If it isn't available in open and machine readable format, it
can't engage.
3.If a legal framework doesn't allow it to be repurposed, it
doesn't empower.
http://eaves.ca/2009/09/30/three-law-of-open-government-data/
Open Data
●

Tim Berners-Lee, the inventor of the Web.

●

5stardata.info - 5 star deployment scheme of Open Data.
1.make your stuff available on the Web (whatever format) under an
open license.
2.make it available as structured data (e.g., Excel instead of image
scan of a table)
3.use non-proprietary formats (e.g., CSV instead of Excel)
4.use URIs to denote things, so that people can point at your stuff.
5.link your data to other data to provide context.
Legco Meeting Minutes
and Voting Results
Legco Meeting Minutes
and Voting Results
Weather Information in Hong Kong
●

Hong Kong Observatory
–

Hourly Hong Kong Weather Report

–

Regional Weather in Hong Kong (10 min updates)

–

Weather Forecast and Weekly Weather Forecast

–

Typhoon Report and Forecast

–

Weather Maps and Images
Weather Chart
Weather Radar Image
Hong Kong Observatory RSS
Hong Kong Observatory RSS
Weather at Data.One
●

●

●

My Chinese Blog Post 'Progress of Open
Government Data in Hong Kong' on 2013/1/17.
Data.One released on 2011/3/31.
Weather at Data.One provides 7 dataset URLs,
returns RSS (XML) format (Eng/TChi/SChi)
–

One word: Useless.

–

Data.One dataset (RSS) is completely different with
HKO own paid service (XML).
Weather at Data.One
●

Example - Current local weather report:

●

Plain text report in RSS.

●

Difference to quote report content:
–
–

●

Website: a pair of HTML tags, eg. <PRE>....</PRE>.
Data.One: a pair of RSS description tags,
<description>....</description>.

Other weather data is missing, eg. Regional
temperture updates per each 12 mins.
Weather at Data.One
●

●

●

Weather at Data.One is 'report' but not 'data'.
Weather RSS is already released by HKO
before launch of Data.One.
Technically, json/xml format is better
readable by computer programs.
Open Data is important to citizens.
User of Open Source
Software in web
crawling
Web Scraping
●

a computer software technique of extracting
information from websites. (Wikipedia)

●

for business, hobbies, research purposes.
Web Scraping
●

Look for right URLs to scrap.

●

Look for right content from webpages.

●

Saving data into data store.

●

When to run the web scraping program ?
Use of Open Source Software in
Web Crawling
●

●

Use Open Source Tools to collect useful and
meaningful machine-readable data.
Doesn't need to wait provider to release data
in machine-readable format.
Open Source Tools
●

Python programming lanugage

●

with Regular Expression library

●

Scrapy web crawling framework
Why python + scrapy ?
●

●

python: my current favourite programming
language for few years.
scrapy: web crawling framework written in
Python.
What is Scrapy ?
●

●

An open source web scraping framework for
Python.
Scrapy is a fast high-level screen scraping and
web crawling framework, used to crawl
websites and extract structured data from
their pages. It can be used for a wide range of
purposes, from data mining to monitoring
and automated testing.
Scrapy Features
●

define data you want to scrapy

●

write spider to extract data

●

Built-in: selecting and extracting data from HTML
and XML

●

Built-in: JSON, CSV, XML output

●

Interactive shell console

●

Built-in: web service, telnet console, logging

●

Others
Programme List of Paid TVs in 2004
Programme List of Paid TVs in 2004
●

I want to know live football match was
showing on which channel.

●

Paid TV web site = M$ + IIS + ASP + Flash

●

Slow....... Very Slow...... Extremely Slow!

●

Couldn't connect at any peak hours!

●

Wrote my first web crawler in PHP in 2004.
Public Transportation in 2006-2010
●

Kowloon Motor Bus (KMB)
–

●

No map view for a bus route

Public Transportation Enquiry System (PTES)
–

Exteremly Poor, Ugly (or much worse) map UI on
PTES.
HK Observatory and Joint Typhoon
Warning Center
●

Any typhoon is coming to Hong Kong ? And
When will it come ?

●

No easy data exchange format.

●

No RSS nor ATOM.

●

We aren't check websites everyday.
My Products
●

WeatherHK ← ← ←

●

TCTrack
WeatherHK
●

http://twitter.com/weatherhk

●

hourly current weather report

●

weather forecast report

●

tropical signal warning
WeatherHK
●

●

Backend: Python + Scrapy + Database +
Twitter + NNTP......
Frontend: Twitter + Newsgroup
WeatherHK
●

http://twitter.com/weatherhk

●

Interview by MetroPop in 2009.
My Products
●

WeatherHK

●

TCTrack ← ← ←
TCTrack
●

●

●

http://sammy.hk/projects/tctrack/tctrack.php
Plot TC current and forecast tracks over
Google Map.
Source:
–

JTWC

–

HKO
TCTrack
●

●

●

http://sammy.hk/projects/tctrack/tctrack.php
Probably first tctrack map in HK using
GoogleMap
Use of GMap: TCTrack -> Weather
Underground Hong Kong -> HKO
TCTrack
●

http://twitter.com/tctrack

●

Tweet JTWC updates for Northwest Pacific.
Releases information to citizens
in a better presentation.
Starting new Open
Source project
hk0weather to create
Open Weather Data.
Starting new Open Source projects
to create Open Data
●

●

Develop a open source project.
Release data in standard machine-readable
data format.
hk0weather
●

https://github.com/sammyfung/hk0weather

●

Open Source Hong Kong Weather Project.

●

convert to JSON data from HKO webpages.

●

python + scrapy

●

1st version: from current weather report,
extracting temperture and humidity from 20+
weather stations, export in json format.
hk0weather
●

https://github.com/sammyfung/hk0weather

●

$ virtualenv hk0weatherenv

●

$ source hk0weatherenv/bin/activate

●

$ pip install scrapy

●

$ git clone
https://github.com/sammyfung/hk0weather.git

●

$ cd hk0weather

●

$ scrapy crawl currwx -t json -o testresult
hk0weather
●

Python
–

●

import re

Scrapy
–

web crawling framework written in Python.

–

HtmlXPathSelector.

–

built-in JSON, CSV, XML output.
hk0weather
[{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720},
{"station": "kingspark", "temperture": 16, "time": 1360785720},
{"station": "wongchukhang", "temperture": 17, "time": 1360785720},
{"station": "takwuling", "temperture": 16, "time": 1360785720},
{"station": "laufaushan", "temperture": 15, "time": 1360785720},
{"station": "taipo", "temperture": 16, "time": 1360785720},
{"station": "shatin", "temperture": 17, "time": 1360785720},
{"station": "tuenmun", "temperture": 17, "time": 1360785720},
{"station": "tseungkwano", "temperture": 16, "time": 1360785720},
{"station": "saikung", "temperture": 16, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "cheungchau", "temperture": 17, "time": 1360785720},
{"station": "tsingyi", "temperture": 17, "time": 1360785720},
{"station": "shekkong", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720},
{"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720},
{"station": "hongkongpark", "temperture": 17, "time": 1360785720},
{"station": "shaukeiwan", "temperture": 16, "time": 1360785720},
{"station": "kowlooncity", "temperture": 16, "time": 1360785720},
{"station": "happyvalley", "temperture": 18, "time": 1360785720},
{"station": "wongtaisin", "temperture": 17, "time": 1360785720},
{"station": "stanley", "temperture": 16, "time": 1360785720},
{"station": "kwuntong", "temperture": 15, "time": 1360785720},
{"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
Items.py
class Hk0WeatherItem(Item):
time = Field()
station = Field()
temperture = Field()
humidity = Field()
Currwx.py
start_urls = (
'http://www.weather.gov.hk/wxinfo/currwx/curr
entc.htm',
)
Currwx.py
def parse(self, response):
laststation = ''
temperture = int()
stations = []
hxs = HtmlXPathSelector(response)
report = hxs.select('//div[@id="ming"]')
libhk0
class hk0:
stations = [
(u' 天 文 台 ', 'hko'),
(u' 京 士 柏 ', 'kingspark'),
(u' 黃 竹 坑 ', 'wongchukhang'),
(u' 打 鼓 嶺 ', 'takwuling'),
(u' 流 浮 山 ', 'laufaushan'),
libhk0
class hk0:
def gettime(self, report):
…
def hk0current(self, report):
…
Data Store
●

Scrapy
–

MySQL

–

SQLite
Solution 1 – MySQL / SQLite
●

Develop:
–

Web Crawler: Scrapy with MySQL/SQLite client

–

Backend: Handling query request with Django

–

Frontend: UI/UX design, query to backend

●

Image Files ?

●

Redundancy ?
Infrastructure as a Service
●

Public Cloud:
–

●

Private Cloud:
–

●

Rackspace, AWS.....
OpenStack

Object Services on IaaS:
–

Amazon S3 (Simple Storage Service)

–

Open Source: OpenStack Swift
Use of OpenStack
Data Storage
Application Software = Front-end + Back-end
Web:
Front-end = UI/UX at Web Browser
Back-end = Handling JSON, REST......
Mobile:
Front-end = UI/UX at Mobile App
Back-end = Handling JSON, REST......
Solution 1 – MySQL / SQLite
●

Develop:
–

Web Crawler: Scrapy with MySQL/SQLite client

–

Backend: Handling query request with Django

–

Frontend: UI/UX design, query to backend

●

Image Files ?

●

Redundancy ?
Solution 2 – OpenStack Swift
●

Develop:
–
–

Backend: Handling query request with Swift

–
●

Web Crawler: Scrapy with Swift client
Frontend: UI/UX design, query to backend

Image Files ? Redundancy ?
–

Both are solved, OpenStack or provider provide
Object Services.
Swift – OpenStack Object Storage
●

●

●

Object Types from standard data (int or string) to image /
video files.
Supports S3 API
REST API (Get / Put / Delete) to access data stored on
storage through HTTP.

●

Easily add capacity unlike RAID resize

●

Data Replication

●

No central database, RAID not required

●

Memcached (Fast Data Caching)
Some Swift Clients in Python
●

OpenStack python-swiftclient

●

Ceph: ceph object gateway (swift-compatible)

●

Rackspace pyrax (most OpenStack works)

●

Others
Solution 2 – OpenStack Swift
●

Web Crawler: Storing Data in Scrapy
–
–

Create Objects (Data) in a Container.

–

Store data as json object.

–
●

Connection to Swift, Account Authentication.

Store image files as image object.

Backend: Handling query request with Swift
–

Connection to Swift, Account Authentication.

–

Retrieve Objects from Containers.

–

Return Object URL.
Solution 2 – OpenStack Swift
●

Advantages:
–

Replacing MySQL and use Swift object storage for
part / all of data queries.

–

OpenStack Public Cloud
●

–

Do not need database maintenance, handled by public
cloud provider.

OpenStack Private Cloud
●

Use own server farm without configurating replicated
database.
Solution 2 – OpenStack Swift
●

Disadvantages:
–

Difficult to do complicated query to access data, data
should be stored in well-defined and structure in Swift.
●

–

OpenStack Public Cloud
●

–

Define syntax of filename for json data and image files.
Learn how to access data on Swift.

OpenStack Private Cloud
●

Installing , Configurating, Maintain OpenStack with Swift.
Lastly
Local OpenStack Workshop ?
●

●

Thanks for HK OpenStack Users to introduce
their solutions and deployments.
To educate and extend the use of OpenStack
in HK, should we organize local hand-on
openstack workshop which local app
developers and companies can learn use of
OpenStack ?
Thank You!
blog.linuxharbour.com
sammy.hk

More Related Content

Similar to Open Source Weather Information Project with OpenStack Object Storage

Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionSammy Fung
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at workSammy Fung
 
From Hk0weather to Open Data
From Hk0weather to Open DataFrom Hk0weather to Open Data
From Hk0weather to Open DataSammy Fung
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem vAkash Rajguru
 
GraphQL is actually rest
GraphQL is actually restGraphQL is actually rest
GraphQL is actually restJakub Riedl
 
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsAccess Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsSammy Fung
 
Meteor Day Athens (2014-11-07)
Meteor Day Athens (2014-11-07)Meteor Day Athens (2014-11-07)
Meteor Day Athens (2014-11-07)svub
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with PythonSivaranjan Goswami
 
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Sammy Fung
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web APISammy Fung
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraLINAGORA
 
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud RunDesigning flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Runwesley chun
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Chandan Kumar
 
Web App Prototypes with Google App Engine
Web App Prototypes with Google App EngineWeb App Prototypes with Google App Engine
Web App Prototypes with Google App EngineVlad Filippov
 
Web Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxWeb Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxHitechIOT
 
Liberate your Application Logging
Liberate your Application LoggingLiberate your Application Logging
Liberate your Application LoggingGlenn Block
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...BrianFraser29
 

Similar to Open Source Weather Information Project with OpenStack Object Storage (20)

Ice dec04-04-sammy
Ice dec04-04-sammyIce dec04-04-sammy
Ice dec04-04-sammy
 
Local Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell ExtensionLocal Weather Information and GNOME Shell Extension
Local Weather Information and GNOME Shell Extension
 
Use open source software to develop ideas at work
Use open source software to develop ideas at workUse open source software to develop ideas at work
Use open source software to develop ideas at work
 
From Hk0weather to Open Data
From Hk0weather to Open DataFrom Hk0weather to Open Data
From Hk0weather to Open Data
 
Akash rajguru project report sem v
Akash rajguru project report sem vAkash rajguru project report sem v
Akash rajguru project report sem v
 
GraphQL is actually rest
GraphQL is actually restGraphQL is actually rest
GraphQL is actually rest
 
Access Open Data with Open Source Software Tools
Access Open Data with Open Source Software ToolsAccess Open Data with Open Source Software Tools
Access Open Data with Open Source Software Tools
 
Meteor Day Athens (2014-11-07)
Meteor Day Athens (2014-11-07)Meteor Day Athens (2014-11-07)
Meteor Day Athens (2014-11-07)
 
An overview of data and web-application development with Python
An overview of data and web-application development with PythonAn overview of data and web-application development with Python
An overview of data and web-application development with Python
 
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
Web scraping 1 2-3 with python + scrapy (Summer BarCampHK 2012 version)
 
gRPC Overview
gRPC OverviewgRPC Overview
gRPC Overview
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
Angular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - LinagoraAngular (v2 and up) - Morning to understand - Linagora
Angular (v2 and up) - Morning to understand - Linagora
 
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud RunDesigning flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
Designing flexible apps deployable to App Engine, Cloud Functions, or Cloud Run
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO
 
Web App Prototypes with Google App Engine
Web App Prototypes with Google App EngineWeb App Prototypes with Google App Engine
Web App Prototypes with Google App Engine
 
Web Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptxWeb Scraping_ Gathering Data from Websites.pptx
Web Scraping_ Gathering Data from Websites.pptx
 
Liberate your Application Logging
Liberate your Application LoggingLiberate your Application Logging
Liberate your Application Logging
 
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
Brisbane MuleSoft Meetup 2023-03-22 - Anypoint Code Builder and Splunk Loggin...
 
HPC on OpenStack
HPC on OpenStackHPC on OpenStack
HPC on OpenStack
 

More from Sammy Fung

Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Sammy Fung
 
DevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineDevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineSammy Fung
 
Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Sammy Fung
 
My Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunityMy Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunitySammy Fung
 
Introduction to development with Django web framework
Introduction to development with Django web frameworkIntroduction to development with Django web framework
Introduction to development with Django web frameworkSammy Fung
 
香港中文開源軟件翻譯
香港中文開源軟件翻譯香港中文開源軟件翻譯
香港中文開源軟件翻譯Sammy Fung
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastSammy Fung
 
Mozilla - Openness of the Web
Mozilla - Openness of the WebMozilla - Openness of the Web
Mozilla - Openness of the WebSammy Fung
 
Open Source Technology and Community
Open Source Technology and CommunityOpen Source Technology and Community
Open Source Technology and CommunitySammy Fung
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionInstallation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionSammy Fung
 
Software Freedom and Open Source Community
Software Freedom and Open Source CommunitySoftware Freedom and Open Source Community
Software Freedom and Open Source CommunitySammy Fung
 
Building your own job site with Drupal
Building your own job site with DrupalBuilding your own job site with Drupal
Building your own job site with DrupalSammy Fung
 
Software Freedom and Community
Software Freedom and CommunitySoftware Freedom and Community
Software Freedom and CommunitySammy Fung
 
Open Source Job Board
Open Source Job BoardOpen Source Job Board
Open Source Job BoardSammy Fung
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Sammy Fung
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSSammy Fung
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoSammy Fung
 
Mozilla Community and Hong Kong
Mozilla Community and Hong KongMozilla Community and Hong Kong
Mozilla Community and Hong KongSammy Fung
 
ITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingSammy Fung
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2Sammy Fung
 

More from Sammy Fung (20)

Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹Python 爬網⾴工具 - Scrapy 介紹
Python 爬網⾴工具 - Scrapy 介紹
 
DevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to onlineDevRel - Transform article writing from printing to online
DevRel - Transform article writing from printing to online
 
Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)Introduction to Open Source by opensource.hk (2019 Edition)
Introduction to Open Source by opensource.hk (2019 Edition)
 
My Open Source Journey - Developer and Community
My Open Source Journey - Developer and CommunityMy Open Source Journey - Developer and Community
My Open Source Journey - Developer and Community
 
Introduction to development with Django web framework
Introduction to development with Django web frameworkIntroduction to development with Django web framework
Introduction to development with Django web framework
 
香港中文開源軟件翻譯
香港中文開源軟件翻譯香港中文開源軟件翻譯
香港中文開源軟件翻譯
 
Global Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 ForecastGlobal Open Source Development 2011-2014 Review and 2015 Forecast
Global Open Source Development 2011-2014 Review and 2015 Forecast
 
Mozilla - Openness of the Web
Mozilla - Openness of the WebMozilla - Openness of the Web
Mozilla - Openness of the Web
 
Open Source Technology and Community
Open Source Technology and CommunityOpen Source Technology and Community
Open Source Technology and Community
 
Installation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server EditionInstallation of LAMP Server with Ubuntu 14.10 Server Edition
Installation of LAMP Server with Ubuntu 14.10 Server Edition
 
Software Freedom and Open Source Community
Software Freedom and Open Source CommunitySoftware Freedom and Open Source Community
Software Freedom and Open Source Community
 
Building your own job site with Drupal
Building your own job site with DrupalBuilding your own job site with Drupal
Building your own job site with Drupal
 
Software Freedom and Community
Software Freedom and CommunitySoftware Freedom and Community
Software Freedom and Community
 
Open Source Job Board
Open Source Job BoardOpen Source Job Board
Open Source Job Board
 
Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)Introduction of Mozilla Hong Kong (COSCUP 2014)
Introduction of Mozilla Hong Kong (COSCUP 2014)
 
Introduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMSIntroduction of Open Source Job Board with Drupal CMS
Introduction of Open Source Job Board with Drupal CMS
 
Python, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and DjangoPython, web scraping and content management: Scrapy and Django
Python, web scraping and content management: Scrapy and Django
 
Mozilla Community and Hong Kong
Mozilla Community and Hong KongMozilla Community and Hong Kong
Mozilla Community and Hong Kong
 
ITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source MarketingITFest 2014 - Open Source Marketing
ITFest 2014 - Open Source Marketing
 
How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2How Open Data can help entrepreneurs - ITFest 2014 E2
How Open Data can help entrepreneurs - ITFest 2014 E2
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Open Source Weather Information Project with OpenStack Object Storage

  • 1. Open Source Weather Information Project with OpenStack Object Storage Sammy Fung blog.linuxharbour.com sammy.hk OpenStack Summit 2013
  • 3. Sammy Fung ● Software Developer – to use and develop open source sofware. – Perl → PHP → Python. – Startup works on online job board, job research and web crawling. – Consultancy works at a internet service company 43 Global to deploy OpenStack cloud service.
  • 4. Sammy Fung ● Open Source Community Leader. – – Community Manager, opensource.hk. – GNOME Asia committee member. – Mozilla Rep. – ● Founding Chairman, Hong Kong Linux User Group. Program committee member of COSCUP - the largest Open Source conference in Taiwan. Blogger at sammy.hk.
  • 5. About this presentation ● I presents my hk0weather project in different open source events and conference in Hong Kong and Asia this year. ● Weather information is my personal interests ● Started open source project hk0weather. ● Traditional Database or Object Storage ?
  • 6. OpenStack at ISP ● Compute: nova ● Block storage: cinder ● Networking: quantum / neutron ● Dashboard: Horizon
  • 7. BUT
  • 8. OpenStack is not just a platform of Virtual Servers
  • 9. OpenStack is a platform of cloud services.
  • 10. We should do some education. So, I talk about use of object storage in this talk.
  • 11. Agenda ● What is Open Data ? ● Use of Open Source Software in web crawling. ● ● Starting new Open Source project hk0weather to create Open Weather Data. Use of OpenStack Object Storage
  • 12. What is Open Data ?
  • 13. Open Data Three Laws of Open Government Data by David Eaves. 1.If it can't be spidered or indexed, it doesn't exist. 2.If it isn't available in open and machine readable format, it can't engage. 3.If a legal framework doesn't allow it to be repurposed, it doesn't empower. http://eaves.ca/2009/09/30/three-law-of-open-government-data/
  • 14. Open Data ● Tim Berners-Lee, the inventor of the Web. ● 5stardata.info - 5 star deployment scheme of Open Data. 1.make your stuff available on the Web (whatever format) under an open license. 2.make it available as structured data (e.g., Excel instead of image scan of a table) 3.use non-proprietary formats (e.g., CSV instead of Excel) 4.use URIs to denote things, so that people can point at your stuff. 5.link your data to other data to provide context.
  • 15. Legco Meeting Minutes and Voting Results
  • 16. Legco Meeting Minutes and Voting Results
  • 17. Weather Information in Hong Kong ● Hong Kong Observatory – Hourly Hong Kong Weather Report – Regional Weather in Hong Kong (10 min updates) – Weather Forecast and Weekly Weather Forecast – Typhoon Report and Forecast – Weather Maps and Images
  • 22. Weather at Data.One ● ● ● My Chinese Blog Post 'Progress of Open Government Data in Hong Kong' on 2013/1/17. Data.One released on 2011/3/31. Weather at Data.One provides 7 dataset URLs, returns RSS (XML) format (Eng/TChi/SChi) – One word: Useless. – Data.One dataset (RSS) is completely different with HKO own paid service (XML).
  • 23. Weather at Data.One ● Example - Current local weather report: ● Plain text report in RSS. ● Difference to quote report content: – – ● Website: a pair of HTML tags, eg. <PRE>....</PRE>. Data.One: a pair of RSS description tags, <description>....</description>. Other weather data is missing, eg. Regional temperture updates per each 12 mins.
  • 24. Weather at Data.One ● ● ● Weather at Data.One is 'report' but not 'data'. Weather RSS is already released by HKO before launch of Data.One. Technically, json/xml format is better readable by computer programs.
  • 25. Open Data is important to citizens.
  • 26. User of Open Source Software in web crawling
  • 27. Web Scraping ● a computer software technique of extracting information from websites. (Wikipedia) ● for business, hobbies, research purposes.
  • 28. Web Scraping ● Look for right URLs to scrap. ● Look for right content from webpages. ● Saving data into data store. ● When to run the web scraping program ?
  • 29. Use of Open Source Software in Web Crawling ● ● Use Open Source Tools to collect useful and meaningful machine-readable data. Doesn't need to wait provider to release data in machine-readable format.
  • 30. Open Source Tools ● Python programming lanugage ● with Regular Expression library ● Scrapy web crawling framework
  • 31. Why python + scrapy ? ● ● python: my current favourite programming language for few years. scrapy: web crawling framework written in Python.
  • 32. What is Scrapy ? ● ● An open source web scraping framework for Python. Scrapy is a fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.
  • 33. Scrapy Features ● define data you want to scrapy ● write spider to extract data ● Built-in: selecting and extracting data from HTML and XML ● Built-in: JSON, CSV, XML output ● Interactive shell console ● Built-in: web service, telnet console, logging ● Others
  • 34. Programme List of Paid TVs in 2004
  • 35. Programme List of Paid TVs in 2004 ● I want to know live football match was showing on which channel. ● Paid TV web site = M$ + IIS + ASP + Flash ● Slow....... Very Slow...... Extremely Slow! ● Couldn't connect at any peak hours! ● Wrote my first web crawler in PHP in 2004.
  • 36. Public Transportation in 2006-2010 ● Kowloon Motor Bus (KMB) – ● No map view for a bus route Public Transportation Enquiry System (PTES) – Exteremly Poor, Ugly (or much worse) map UI on PTES.
  • 37. HK Observatory and Joint Typhoon Warning Center ● Any typhoon is coming to Hong Kong ? And When will it come ? ● No easy data exchange format. ● No RSS nor ATOM. ● We aren't check websites everyday.
  • 38. My Products ● WeatherHK ← ← ← ● TCTrack
  • 39. WeatherHK ● http://twitter.com/weatherhk ● hourly current weather report ● weather forecast report ● tropical signal warning
  • 40. WeatherHK ● ● Backend: Python + Scrapy + Database + Twitter + NNTP...... Frontend: Twitter + Newsgroup
  • 43. TCTrack ● ● ● http://sammy.hk/projects/tctrack/tctrack.php Plot TC current and forecast tracks over Google Map. Source: – JTWC – HKO
  • 44. TCTrack ● ● ● http://sammy.hk/projects/tctrack/tctrack.php Probably first tctrack map in HK using GoogleMap Use of GMap: TCTrack -> Weather Underground Hong Kong -> HKO
  • 46. Releases information to citizens in a better presentation.
  • 47. Starting new Open Source project hk0weather to create Open Weather Data.
  • 48. Starting new Open Source projects to create Open Data ● ● Develop a open source project. Release data in standard machine-readable data format.
  • 49. hk0weather ● https://github.com/sammyfung/hk0weather ● Open Source Hong Kong Weather Project. ● convert to JSON data from HKO webpages. ● python + scrapy ● 1st version: from current weather report, extracting temperture and humidity from 20+ weather stations, export in json format.
  • 50. hk0weather ● https://github.com/sammyfung/hk0weather ● $ virtualenv hk0weatherenv ● $ source hk0weatherenv/bin/activate ● $ pip install scrapy ● $ git clone https://github.com/sammyfung/hk0weather.git ● $ cd hk0weather ● $ scrapy crawl currwx -t json -o testresult
  • 51. hk0weather ● Python – ● import re Scrapy – web crawling framework written in Python. – HtmlXPathSelector. – built-in JSON, CSV, XML output.
  • 52. hk0weather [{"humidity": 80, "station": "hko", "temperture": 17, "time": 1360785720}, {"station": "kingspark", "temperture": 16, "time": 1360785720}, {"station": "wongchukhang", "temperture": 17, "time": 1360785720}, {"station": "takwuling", "temperture": 16, "time": 1360785720}, {"station": "laufaushan", "temperture": 15, "time": 1360785720}, {"station": "taipo", "temperture": 16, "time": 1360785720}, {"station": "shatin", "temperture": 17, "time": 1360785720}, {"station": "tuenmun", "temperture": 17, "time": 1360785720}, {"station": "tseungkwano", "temperture": 16, "time": 1360785720}, {"station": "saikung", "temperture": 16, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "cheungchau", "temperture": 17, "time": 1360785720}, {"station": "tsingyi", "temperture": 17, "time": 1360785720}, {"station": "shekkong", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanhokoon", "temperture": 15, "time": 1360785720}, {"station": "tsuenwanshingmunvalley", "temperture": 17, "time": 1360785720}, {"station": "hongkongpark", "temperture": 17, "time": 1360785720}, {"station": "shaukeiwan", "temperture": 16, "time": 1360785720}, {"station": "kowlooncity", "temperture": 16, "time": 1360785720}, {"station": "happyvalley", "temperture": 18, "time": 1360785720}, {"station": "wongtaisin", "temperture": 17, "time": 1360785720}, {"station": "stanley", "temperture": 16, "time": 1360785720}, {"station": "kwuntong", "temperture": 15, "time": 1360785720}, {"station": "shamshuipo", "temperture": 17, "time": 1360785720}]
  • 53. Items.py class Hk0WeatherItem(Item): time = Field() station = Field() temperture = Field() humidity = Field()
  • 55. Currwx.py def parse(self, response): laststation = '' temperture = int() stations = [] hxs = HtmlXPathSelector(response) report = hxs.select('//div[@id="ming"]')
  • 56. libhk0 class hk0: stations = [ (u' 天 文 台 ', 'hko'), (u' 京 士 柏 ', 'kingspark'), (u' 黃 竹 坑 ', 'wongchukhang'), (u' 打 鼓 嶺 ', 'takwuling'), (u' 流 浮 山 ', 'laufaushan'),
  • 57. libhk0 class hk0: def gettime(self, report): … def hk0current(self, report): …
  • 59. Solution 1 – MySQL / SQLite ● Develop: – Web Crawler: Scrapy with MySQL/SQLite client – Backend: Handling query request with Django – Frontend: UI/UX design, query to backend ● Image Files ? ● Redundancy ?
  • 60. Infrastructure as a Service ● Public Cloud: – ● Private Cloud: – ● Rackspace, AWS..... OpenStack Object Services on IaaS: – Amazon S3 (Simple Storage Service) – Open Source: OpenStack Swift
  • 62. Application Software = Front-end + Back-end
  • 63. Web: Front-end = UI/UX at Web Browser Back-end = Handling JSON, REST...... Mobile: Front-end = UI/UX at Mobile App Back-end = Handling JSON, REST......
  • 64. Solution 1 – MySQL / SQLite ● Develop: – Web Crawler: Scrapy with MySQL/SQLite client – Backend: Handling query request with Django – Frontend: UI/UX design, query to backend ● Image Files ? ● Redundancy ?
  • 65. Solution 2 – OpenStack Swift ● Develop: – – Backend: Handling query request with Swift – ● Web Crawler: Scrapy with Swift client Frontend: UI/UX design, query to backend Image Files ? Redundancy ? – Both are solved, OpenStack or provider provide Object Services.
  • 66. Swift – OpenStack Object Storage ● ● ● Object Types from standard data (int or string) to image / video files. Supports S3 API REST API (Get / Put / Delete) to access data stored on storage through HTTP. ● Easily add capacity unlike RAID resize ● Data Replication ● No central database, RAID not required ● Memcached (Fast Data Caching)
  • 67. Some Swift Clients in Python ● OpenStack python-swiftclient ● Ceph: ceph object gateway (swift-compatible) ● Rackspace pyrax (most OpenStack works) ● Others
  • 68. Solution 2 – OpenStack Swift ● Web Crawler: Storing Data in Scrapy – – Create Objects (Data) in a Container. – Store data as json object. – ● Connection to Swift, Account Authentication. Store image files as image object. Backend: Handling query request with Swift – Connection to Swift, Account Authentication. – Retrieve Objects from Containers. – Return Object URL.
  • 69. Solution 2 – OpenStack Swift ● Advantages: – Replacing MySQL and use Swift object storage for part / all of data queries. – OpenStack Public Cloud ● – Do not need database maintenance, handled by public cloud provider. OpenStack Private Cloud ● Use own server farm without configurating replicated database.
  • 70. Solution 2 – OpenStack Swift ● Disadvantages: – Difficult to do complicated query to access data, data should be stored in well-defined and structure in Swift. ● – OpenStack Public Cloud ● – Define syntax of filename for json data and image files. Learn how to access data on Swift. OpenStack Private Cloud ● Installing , Configurating, Maintain OpenStack with Swift.
  • 72. Local OpenStack Workshop ? ● ● Thanks for HK OpenStack Users to introduce their solutions and deployments. To educate and extend the use of OpenStack in HK, should we organize local hand-on openstack workshop which local app developers and companies can learn use of OpenStack ?