SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Simple web-scraping with Mechanize and Nokogiri Nov 8 th  2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
The need ,[object Object],[object Object]
Parse  HTML pages (DOM) =>  Nokogiri
Installation ,[object Object],[object Object]
! Version 1.0.0 can be more stable than 2.x.x for some complex queries (" gem install mechanize -v 1.0.0 " to enforce it).
Basic example require   'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content =>   "Riviera.rb"
Main usage ,[object Object]
Use the agent to  perform HTTP(S) requests  (get, post). Each request gives a Nokogiri page.
Parse the page  using CSS selectors, XPath, DOM iterators.
Fill and post forms  using intuitive helpers.
Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text   =>   'Green King' ) . first . click page3 = agent. back agent. user_agent  =  'My user agent'
Common parsing Selectors page. root . css ( 'body div.myclass' ) . each   {   | element |  …  } page. root . xpath ( '//h3/a[@class="l"]' ) . eac h   {   | element |  …  }
Common parsing Elements < div > < a   href = &quot; http://www.google.com &quot; > Click here < img   src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] =>   &quot;http: // www.google.com&quot; element. content =>   &quot;    Click here       &quot; element. children . second . name =>   &quot;img&quot; element. parent . name =>   &quot;div&quot; element
Filling and submitting forms Basic example Google search form =  agent. get ( 'http://www.google.com' ) . forms . first form. q  =  'Rivierarb' results_page = form. submit

Weitere ähnliche Inhalte

Was ist angesagt?

A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...Ho Chi Minh City Software Testing Club
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutescameronbot
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With BloggingTakatsugu Shigeta
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsKonstantin Kudryashov
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management SystemValent Mustamin
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netVijay Saklani
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestraFabio Akita
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page siteMoneer kamal
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Pamela Fox
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rob
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワークTakatsugu Shigeta
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayFITC
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryDavid Park
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Richard Rabins
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersRobert Nyman
 

Was ist angesagt? (20)

Changing Template Engine
Changing Template EngineChanging Template Engine
Changing Template Engine
 
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
A Universal Automation Framework based on BDD Cucumber and Ruby on Rails - Ph...
 
Haml in 5 minutes
Haml in 5 minutesHaml in 5 minutes
Haml in 5 minutes
 
Evolution of API With Blogging
Evolution of API With BloggingEvolution of API With Blogging
Evolution of API With Blogging
 
Enabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projectsEnabling agile devliery through enabling BDD in PHP projects
Enabling agile devliery through enabling BDD in PHP projects
 
WordPress as a Content Management System
WordPress as a Content Management SystemWordPress as a Content Management System
WordPress as a Content Management System
 
Capybara
CapybaraCapybara
Capybara
 
Add row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.netAdd row in asp.net Gridview on button click using C# and vb.net
Add row in asp.net Gridview on button click using C# and vb.net
 
Ajax
AjaxAjax
Ajax
 
Dicas de palestra
Dicas de palestraDicas de palestra
Dicas de palestra
 
How to Create simple One Page site
How to Create simple One Page siteHow to Create simple One Page site
How to Create simple One Page site
 
Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)Gadgets Intro (Plus Mapplets)
Gadgets Intro (Plus Mapplets)
 
Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008Rugalytics | Ruby Manor Nov 2008
Rugalytics | Ruby Manor Nov 2008
 
BDD with cucumber
BDD with cucumberBDD with cucumber
BDD with cucumber
 
シックス・アパート・フレームワーク
シックス・アパート・フレームワークシックス・アパート・フレームワーク
シックス・アパート・フレームワーク
 
Components are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be OkayComponents are the Future of the Web: It’s Going To Be Okay
Components are the Future of the Web: It’s Going To Be Okay
 
Building high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQueryBuilding high-fidelity interactive prototypes with jQuery
Building high-fidelity interactive prototypes with jQuery
 
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
Building a Blogging System -- Rapidly using Alpha Five v10 with Codeless AJAX...
 
Symfony2
Symfony2Symfony2
Symfony2
 
HTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - FronteersHTML5 Forms - KISS time - Fronteers
HTML5 Forms - KISS time - Fronteers
 

Ähnlich wie Mechanize at the Ruby Drink-up of Sophia, November 2011

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint DevZeddy Iskandar
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]Chris Toohey
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileChris Toohey
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsAtlassian
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPStephan Schmidt
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Sergey Ilinsky
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2borkweb
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile WidgetsJose Palazon
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Finalematrix
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlJoomla!Days Netherlands
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!Herman Peeren
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgetsgiurca
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPyucefmerhi
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages pptIblesoft
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overviewreybango
 

Ähnlich wie Mechanize at the Ruby Drink-up of Sophia, November 2011 (20)

jQuery for Sharepoint Dev
jQuery for Sharepoint DevjQuery for Sharepoint Dev
jQuery for Sharepoint Dev
 
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
10 Things You're Not Doing [IBM Lotus Notes Domino Application Development]
 
IBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for MobileIBM Lotus Notes Domino XPages and XPages for Mobile
IBM Lotus Notes Domino XPages and XPages for Mobile
 
ASP_NET Features
ASP_NET FeaturesASP_NET Features
ASP_NET Features
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
SlideShare Instant
SlideShare InstantSlideShare Instant
SlideShare Instant
 
Getting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial GadgetsGetting the Most Out of OpenSocial Gadgets
Getting the Most Out of OpenSocial Gadgets
 
Vb.Net Web Forms
Vb.Net  Web FormsVb.Net  Web Forms
Vb.Net Web Forms
 
Component and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHPComponent and Event-Driven Architectures in PHP
Component and Event-Driven Architectures in PHP
 
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
Building Complex GUI Apps The Right Way. With Ample SDK - SWDC2010
 
Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2Javascript: Ajax & DOM Manipulation v1.2
Javascript: Ajax & DOM Manipulation v1.2
 
BluePrint Mobile Framework
BluePrint Mobile FrameworkBluePrint Mobile Framework
BluePrint Mobile Framework
 
Yahoo Mobile Widgets
Yahoo Mobile WidgetsYahoo Mobile Widgets
Yahoo Mobile Widgets
 
Flex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 FinalFlex For Flash Developers Ff 2006 Final
Flex For Flash Developers Ff 2006 Final
 
Flash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nlFlash Templates- Joomla!Days NL 2009 #jd09nl
Flash Templates- Joomla!Days NL 2009 #jd09nl
 
Flash templates for Joomla!
Flash templates for Joomla!Flash templates for Joomla!
Flash templates for Joomla!
 
Mashups as Collection of Widgets
Mashups as Collection of WidgetsMashups as Collection of Widgets
Mashups as Collection of Widgets
 
Lecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITPLecture 6 - Comm Lab: Web @ ITP
Lecture 6 - Comm Lab: Web @ ITP
 
Master pages ppt
Master pages pptMaster pages ppt
Master pages ppt
 
HTML5 Overview
HTML5 OverviewHTML5 Overview
HTML5 Overview
 

Mehr von rivierarb

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013rivierarb
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013rivierarb
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013rivierarb
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012rivierarb
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012rivierarb
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012rivierarb
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012rivierarb
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011rivierarb
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011rivierarb
 

Mehr von rivierarb (9)

Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
Ruby 2.0 at the Ruby drink-up of Sophia, February 2013
 
Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013Ruby object model at the Ruby drink-up of Sophia, January 2013
Ruby object model at the Ruby drink-up of Sophia, January 2013
 
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
Ruby and Twitter at the Ruby drink-up of Sophia, January 2013
 
PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012PoBot at the Ruby drink-up of Sophia, July 2012
PoBot at the Ruby drink-up of Sophia, July 2012
 
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012Ruby C extensions at the Ruby drink-up of Sophia, April 2012
Ruby C extensions at the Ruby drink-up of Sophia, April 2012
 
Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012Pry at the Ruby Drink-up of Sophia, February 2012
Pry at the Ruby Drink-up of Sophia, February 2012
 
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
Piloting processes through std IO at the Ruby Drink-up of Sophia, January 2012
 
DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011DRb at the Ruby Drink-up of Sophia, December 2011
DRb at the Ruby Drink-up of Sophia, December 2011
 
FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011FPM at the Ruby Drink-up of Sophia, September 2011
FPM at the Ruby Drink-up of Sophia, September 2011
 

Kürzlich hochgeladen

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Kürzlich hochgeladen (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Mechanize at the Ruby Drink-up of Sophia, November 2011

  • 1. Simple web-scraping with Mechanize and Nokogiri Nov 8 th 2011 Muriel Salvan Open Source Lead developer and architect X-Aeon Solutions http://x-aeon.com
  • 2.
  • 3. Parse HTML pages (DOM) => Nokogiri
  • 4.
  • 5. ! Version 1.0.0 can be more stable than 2.x.x for some complex queries (&quot; gem install mechanize -v 1.0.0 &quot; to enforce it).
  • 6. Basic example require 'mechanize' agent = Mechanize. new page = agent. get ( 'http://rivierarb.fr' ) element = page. root . css ( 'h1.logo' ) . first element . content => &quot;Riviera.rb&quot;
  • 7.
  • 8. Use the agent to perform HTTP(S) requests (get, post). Each request gives a Nokogiri page.
  • 9. Parse the page using CSS selectors, XPath, DOM iterators.
  • 10. Fill and post forms using intuitive helpers.
  • 11. Common requests page = agent. get ( 'http://rivierarb.fr' ) page2 = page. links_with ( :text => 'Green King' ) . first . click page3 = agent. back agent. user_agent = 'My user agent'
  • 12. Common parsing Selectors page. root . css ( 'body div.myclass' ) . each { | element | … } page. root . xpath ( '//h3/a[@class=&quot;l&quot;]' ) . eac h { | element | … }
  • 13. Common parsing Elements < div > < a href = &quot; http://www.google.com &quot; > Click here < img src = &quot; http://www.google.com/favicon.ico &quot; / > < / a > < / div > element [ 'href' ] => &quot;http: // www.google.com&quot; element. content => &quot; Click here &quot; element. children . second . name => &quot;img&quot; element. parent . name => &quot;div&quot; element
  • 14. Filling and submitting forms Basic example Google search form = agent. get ( 'http://www.google.com' ) . forms . first form. q = 'Rivierarb' results_page = form. submit
  • 15. Filling and submitting forms Fields When your HTML form has < input … name = &quot;myfield&quot; >...< / input > you can write form. myfield = 'The field value' form. field_with ( :name => 'myfield' ) . value = 'The field value' form. checkboxfield = '1' form. selectfield = '5'
  • 16. Filling and submitting forms Buttons ! Mechanize does not add the value of the button being clicked ! If the web server cares for buttons values in POST data, add them manually. < input type = &quot;submit&quot; name = &quot;btn1&quot; value = &quot;Clicked&quot;>...< / input > form. add_field ! ( 'btn1' , 'Clicked' ) b utton = form. button_with ( :name => 'btn1' ) page = form. click_button ( button )
  • 17.
  • 21. Use HTML parsers other than Nokogiri
  • 22. Does not have JavaScript engine (therefore no Ajax)
  • 23.
  • 26. Nokogiri element API This presentation is available under CC-BY license by Muriel Salvan
  • 27. Q/A