2. What is Crawler?
When you are hungry, you prefer to go a
restaurant that can serve you with delicious food, of
your interest.
Technically:
Restaurant : Site/Base_url
Food : Data/Information
Interest : Relevant knowledge
Synonym:
● Spider
● Robot
● Bot
3. How to make Robot?
● Website
● DOM (Document Object Model)
● Library (depending upon language)
4. How to make in Ruby?
Libraries:
Rubyfulsoup
Hpricot
WWW::Mechanize
ScRUBYt
Watir
5. Hpricot
● Best to use for simple text-extraction
● Clear API
● Fast and better than Rubyfulsoup
● Methods like parent and child, sibling as in JS,
makes life easier
6. Is something missing?
What you think?
Is it really easy
and
makes scraping fast and efficient?
7. Firebug :-)
Firebug integrates with Firefox to put a wealth of web
development tools at your fingertips while you browse.
You can edit, debug, and monitor CSS, HTML, and
JavaScript live in any web page.
● Firebug (http://www.getfirebug.com/)
● This makes life easier. Do learn to use it
8. Enough...where is the code??
● Build Doc = Hpricot(open(url-name))
● To walk through DOM: (Doc/”#header”)
● More: (Doc/”.love_class”), (Doc/”a/ul/li[4]”)
● Doc.search(“[@href]”).first[:href]