Parse Weather Data from Any Site with Ruby and Hpricot

•

2 gefällt mir•1,547 views

The document discusses using the Ruby programming language and tools like Hpricot and XPath to parse HTML documents, highlighting how Hpricot can be used to easily extract information like country names and URLs from a weather website that has a poorly structured table layout. Steps provided include inspecting the site with Firebug to get element XPaths and then parsing the HTML using Hpricot to retrieve the desired data.

Technologie Bildung

Linux Creative Group

Hpricot – Dig The Impossible With Ruby

By: Subhransu Behera
arya.subhransu@gmail.com

So … Let’s See !
• Dynamic 
• Easy to Learn 
• Easy to maintain and grow 
• Convenient Short‐Cuts 
Ex: Str = “Linux Crea=ve Group” 
    Str_join = Str.split(“ “).join(“+”) 
• Transparent, code faster 
• Few Syntax Errors, Fewer Bugs 
• It’s Fun

Ruby Gems
• Package Management System for Ruby Applica=ons 
and Libraries  
• Resolve Dependencies.  
• Provides Central Repository of SoUware. 
• One Command Rules:    
  ‐ gem install <gem_name> 
• Can Have your Own Local Gem Server   
  ‐ gem install <gem_name> ‐‐source <gem_server_ip_and_port>

Hpricot

• Pull informa=on from virtually any website. 
• Search by Element ID, Tags, CSS Selectors. 
• Parse HTML including broken HTML 
• Update HTML 
• Use this data anywhere and anyway you want! 
• Parse by XPath for directly parsing an element. 
• Let’s see …. How it works. 

Let’s Parse A Badly
Designed Site !!
• h^p://www.worldweather.org 
• It’s a site that provides weather informa=on for 
diﬀerent loca=ons across the globe. 
• In the main page they have a badly nested table 
structure !! 
• An ideal Web‐Developer could have put them nicely in 
divs with meaningful IDs. 
• But let’s face the truth and parse the Country Names 
and their URLs.

Easy Steps – 3. Copy X-Path
of the Element

Easy Steps – 4. Parse By X-
Path Using Hpricot

References  

• Ruby Programming Language: h^p://
www.ruby‐lang.org/en/ 
• Hpricot: h^p://code.whytheluckys=ﬀ.net/
hpricot/ 
• X‐Path: h^p://en.wikipedia.org/wiki/XPath 
• Firebug: h^p://gecirebug.com/

Weitere ähnliche Inhalte

Ähnlich wie Parse Weather Data from Any Site with Ruby and Hpricot

LSG Webinar - 13 Nov 08Barry Sampson

HA+DRBD+Postgres - PostgresWest '08Jesse Young

Roll-out of the NYU HSL Website and Drupal CMSChris Evjy

Text Mining and SEASRLoretta Auvil

The Yahoo Open StackMegan Eskey

Fedora App Slide 2009 HastacLoretta Auvil

The Lean Startup at Web 2.0 ExpoVenture Hacks

Yakov Fain - Design Patterns a Deep Dive360|Conferences

Social Computing Tools and Social TechnographyKiran Budhrani

Blogging SlidesMatt Machell

Social Media Very Simple Overview What Is It How Did It Start What Does It DoKristin McCullough

Inside Picnik: How We Built Picnik (and What We Learned Along the Way)jjhuff

Tesi Laurea Specialisticalando84

UW ADC - Course 3 - Class 1 - User Stories And Acceptance TestingChris Sterling

GIPAESUG

Scalability without going nutsJames Cox

The New Face of Learning? (full version)Judith Christian-Carter

A Guide To Blogging For The UninitiatedMatt Machell

Rich Web Clients 20081118Christopher Bartling

企业级搜索引擎Solr交流chuan liang

Ähnlich wie Parse Weather Data from Any Site with Ruby and Hpricot (20)

LSG Webinar - 13 Nov 08

HA+DRBD+Postgres - PostgresWest '08

Roll-out of the NYU HSL Website and Drupal CMS

Text Mining and SEASR

The Yahoo Open Stack

Fedora App Slide 2009 Hastac

The Lean Startup at Web 2.0 Expo

Yakov Fain - Design Patterns a Deep Dive

Social Computing Tools and Social Technography

Blogging Slides

Social Media Very Simple Overview What Is It How Did It Start What Does It Do

Inside Picnik: How We Built Picnik (and What We Learned Along the Way)

Tesi Laurea Specialistica

UW ADC - Course 3 - Class 1 - User Stories And Acceptance Testing

GIPA

Scalability without going nuts

The New Face of Learning? (full version)

A Guide To Blogging For The Uninitiated

Rich Web Clients 20081118

企业级搜索引擎Solr交流

Kürzlich hochgeladen

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Histor y of HAM Radio presentation slidevu2urc

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Slack Application Development 101 Slidespraypatel2

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia

How to convert PDF to text with Nanonetsnaman860154

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco

Artificial Intelligence: Facts and MythsJoaquim Jorge

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

Histor y of HAM Radio presentation slide

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Slack Application Development 101 Slides

IAC 2024 - IA Fast Track to Search Focused AI Solutions

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

08448380779 Call Girls In Civil Lines Women Seeking Men

Axa Assurance Maroc - Insurer Innovation Award 2024

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...

How to convert PDF to text with Nanonets

[2024]Digital Global Overview Report 2024 Meltwater.pdf

What Are The Drone Anti-jamming Systems Technology?

Artificial Intelligence: Facts and Myths

GenCyber Cyber Security Day Presentation

Advantages of Hiring UIUX Design Service Providers for Your Business

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Automating Google Workspace (GWS) & more with Apps Script

Breaking the Kubernetes Kill Chain: Host Path Mount

Parse Weather Data from Any Site with Ruby and Hpricot

1. Linux Creative Group Hpricot – Dig The Impossible With Ruby By: Subhransu Behera arya.subhransu@gmail.com

2. Ruby !!! What’s Special?

3. So … Let’s See ! • Dynamic  • Easy to Learn  • Easy to maintain and grow  • Convenient Short‐Cuts  Ex: Str = “Linux Crea=ve Group”      Str_join = Str.split(“ “).join(“+”)  • Transparent, code faster  • Few Syntax Errors, Fewer Bugs  • It’s Fun 

4. Ruby Gems • Package Management System for Ruby Applica=ons  and Libraries   • Resolve Dependencies.   • Provides Central Repository of SoUware.  • One Command Rules:       ‐ gem install <gem_name>  • Can Have your Own Local Gem Server      ‐ gem install <gem_name> ‐‐source <gem_server_ip_and_port> 

5. Hpricot makes it easy to Parse

6. Hpricot • Pull informa=on from virtually any website.  • Search by Element ID, Tags, CSS Selectors.  • Parse HTML including broken HTML  • Update HTML  • Use this data anywhere and anyway you want!  • Parse by XPath for directly parsing an element.  • Let’s see …. How it works.  

7. Let’s Parse A Badly Designed Site !! • h^p://www.worldweather.org  • It’s a site that provides weather informa=on for  diﬀerent loca=ons across the globe.  • In the main page they have a badly nested table  structure !!  • An ideal Web‐Developer could have put them nicely in  divs with meaningful IDs.  • But let’s face the truth and parse the Country Names  and their URLs. 

8. Easy Steps – 1. Open The Site

9. Easy Steps – 2. Inspect With Firebug

10. Easy Steps – 3. Copy X-Path of the Element

11. Easy Steps – 4. Parse By X- Path Using Hpricot

12. Use some Logic & You’ll Get

13. Just Try it Out Questions?

14. References   • Ruby Programming Language: h^p:// www.ruby‐lang.org/en/  • Hpricot: h^p://code.whytheluckys=ﬀ.net/ hpricot/  • X‐Path: h^p://en.wikipedia.org/wiki/XPath  • Firebug: h^p://gecirebug.com/ 

15. Thanks 

Parse Weather Data from Any Site with Ruby and Hpricot

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Ähnlich wie Parse Weather Data from Any Site with Ruby and Hpricot

Ähnlich wie Parse Weather Data from Any Site with Ruby and Hpricot (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Parse Weather Data from Any Site with Ruby and Hpricot