SlideShare ist ein Scribd-Unternehmen logo
1 von 10
Downloaden Sie, um offline zu lesen
Crawlers in Ruby




   Sapna Solutions

   Akshay Gupta
What is Crawler?
   When you are hungry, you prefer to go a
restaurant that can serve you with delicious food, of
your interest.

Technically:
             Restaurant : Site/Base_url
              Food : Data/Information
           Interest : Relevant knowledge
Synonym:
                      ● Spider
                      ● Robot

                      ● Bot
How to make Robot?



● Website
● DOM (Document Object Model)

● Library (depending upon language)
How to make in Ruby?


Libraries:
              Rubyfulsoup
                Hpricot
             WWW::Mechanize
               ScRUBYt
                 Watir
Hpricot

● Best to use for simple text-extraction
● Clear API

● Fast and better than Rubyfulsoup

● Methods like parent and child, sibling as in JS,

makes life easier
Is something missing?



        What you think?
        Is it really easy
               and
makes scraping fast and efficient?
Firebug :-)


Firebug integrates with Firefox to put a wealth of web
development tools at your fingertips while you browse.
You can edit, debug, and monitor CSS, HTML, and
JavaScript live in any web page.

● Firebug (http://www.getfirebug.com/)
● This makes life easier. Do learn to use it
Enough...where is the code??


● Build Doc = Hpricot(open(url-name))
● To walk through DOM: (Doc/”#header”)

● More: (Doc/”.love_class”), (Doc/”a/ul/li[4]”)

● Doc.search(“[@href]”).first[:href]
References


●http://www.rubyrailways.com/data-extraction-for-web-20-
●http://www.google.com

●http://wiki.github.com/why/hpricot
Thanks :-)


Questions???

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (9)

Canada videogame industry
Canada videogame industryCanada videogame industry
Canada videogame industry
 
Robot spiders and hippos
Robot spiders and hipposRobot spiders and hippos
Robot spiders and hippos
 
Keywlker crawler
Keywlker crawlerKeywlker crawler
Keywlker crawler
 
Resource Opportunities March 2009 1
Resource Opportunities March 2009 1Resource Opportunities March 2009 1
Resource Opportunities March 2009 1
 
Spider Resources 2009 Annual Report
Spider Resources 2009 Annual ReportSpider Resources 2009 Annual Report
Spider Resources 2009 Annual Report
 
Audited financials-apr-08-2010
Audited financials-apr-08-2010Audited financials-apr-08-2010
Audited financials-apr-08-2010
 
Mda may-31-2010
Mda may-31-2010Mda may-31-2010
Mda may-31-2010
 
Mda apr-08-2010
Mda apr-08-2010Mda apr-08-2010
Mda apr-08-2010
 
Design of Quadruped Walking Robot with Spherical Shell
Design of Quadruped Walking Robot with Spherical ShellDesign of Quadruped Walking Robot with Spherical Shell
Design of Quadruped Walking Robot with Spherical Shell
 

Ähnlich wie Crawler, Robots

Developing OpenResty Framework
Developing OpenResty FrameworkDeveloping OpenResty Framework
Developing OpenResty FrameworkOpenRestyCon
 
Libraries Frameworks And Cms
Libraries Frameworks And CmsLibraries Frameworks And Cms
Libraries Frameworks And CmsMark Casias
 
DiUS Computing Lca Rails Final
DiUS  Computing Lca Rails FinalDiUS  Computing Lca Rails Final
DiUS Computing Lca Rails FinalRobert Postill
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudConSanFrancisco123
 
Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPPaul Redmond
 
Understanding APIs.pptx introduction chk
Understanding APIs.pptx introduction chkUnderstanding APIs.pptx introduction chk
Understanding APIs.pptx introduction chknooreen nayyar syeda
 
CouchDB and Rails on the Cloud
CouchDB and Rails on the CloudCouchDB and Rails on the Cloud
CouchDB and Rails on the Cloudrockyjaiswal
 
Mash-Up Personal Learning Environments (MUPPLE)
Mash-Up Personal Learning Environments (MUPPLE)Mash-Up Personal Learning Environments (MUPPLE)
Mash-Up Personal Learning Environments (MUPPLE)Hannes Ebner
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine SpidersCJ Jenkins
 
Wordpress in 2,3... languages
Wordpress in 2,3... languagesWordpress in 2,3... languages
Wordpress in 2,3... languagesMartin Linkov
 
Rails Vs CakePHP
Rails Vs CakePHPRails Vs CakePHP
Rails Vs CakePHPGautam Rege
 
Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!Chad Dickerson
 
Intro to advanced web development
Intro to advanced web developmentIntro to advanced web development
Intro to advanced web developmentStevie T
 
Ruby application based on http
Ruby application based on httpRuby application based on http
Ruby application based on httpRichard Huang
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)ArangoDB Database
 
On-page SEO for Drupal
On-page SEO for DrupalOn-page SEO for Drupal
On-page SEO for DrupalSvilen Sabev
 
Mashup University 4: Intro To Mashups
Mashup University 4: Intro To MashupsMashup University 4: Intro To Mashups
Mashup University 4: Intro To MashupsJohn Herren
 

Ähnlich wie Crawler, Robots (20)

Developing OpenResty Framework
Developing OpenResty FrameworkDeveloping OpenResty Framework
Developing OpenResty Framework
 
Libraries Frameworks And Cms
Libraries Frameworks And CmsLibraries Frameworks And Cms
Libraries Frameworks And Cms
 
Os Bowkett
Os BowkettOs Bowkett
Os Bowkett
 
DiUS Computing Lca Rails Final
DiUS  Computing Lca Rails FinalDiUS  Computing Lca Rails Final
DiUS Computing Lca Rails Final
 
Yahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The CloudYahoo Pipes Middleware In The Cloud
Yahoo Pipes Middleware In The Cloud
 
Scraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHPScraping the web with Laravel, Dusk, Docker, and PHP
Scraping the web with Laravel, Dusk, Docker, and PHP
 
Understanding APIs.pptx
Understanding APIs.pptxUnderstanding APIs.pptx
Understanding APIs.pptx
 
Understanding APIs.pptx introduction chk
Understanding APIs.pptx introduction chkUnderstanding APIs.pptx introduction chk
Understanding APIs.pptx introduction chk
 
CouchDB and Rails on the Cloud
CouchDB and Rails on the CloudCouchDB and Rails on the Cloud
CouchDB and Rails on the Cloud
 
Mash-Up Personal Learning Environments (MUPPLE)
Mash-Up Personal Learning Environments (MUPPLE)Mash-Up Personal Learning Environments (MUPPLE)
Mash-Up Personal Learning Environments (MUPPLE)
 
Search Engine Spiders
Search Engine SpidersSearch Engine Spiders
Search Engine Spiders
 
Wordpress in 2,3... languages
Wordpress in 2,3... languagesWordpress in 2,3... languages
Wordpress in 2,3... languages
 
Rails Vs CakePHP
Rails Vs CakePHPRails Vs CakePHP
Rails Vs CakePHP
 
Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!Beyond web services: supporting mashup artists at Yahoo!
Beyond web services: supporting mashup artists at Yahoo!
 
Intro to advanced web development
Intro to advanced web developmentIntro to advanced web development
Intro to advanced web development
 
Ruby application based on http
Ruby application based on httpRuby application based on http
Ruby application based on http
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)
 
Scalable talk notes
Scalable talk notesScalable talk notes
Scalable talk notes
 
On-page SEO for Drupal
On-page SEO for DrupalOn-page SEO for Drupal
On-page SEO for Drupal
 
Mashup University 4: Intro To Mashups
Mashup University 4: Intro To MashupsMashup University 4: Intro To Mashups
Mashup University 4: Intro To Mashups
 

Kürzlich hochgeladen

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Kürzlich hochgeladen (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

Crawler, Robots