Getting started with Scrapy in Python

Web Scraping with Scrapy
Virendra Rajput

Hacker @Markitty

Agenda
● What is web scraping and why it's fun
● My experiments with web scraping
● Getting started with Scrapy
● How Scrapy works and a quick Demo
● Why Scrapy
● Questions

What is Web Scraping?
● Extracting information from websites
● Problem:
○ Static websites
○ No access to APIs to extract the data you
need
○ Need to extract data periodically
● Manual solution - go to the website and copy
the required data
● Smarter solution: Web Scraping

Web Scraping in Python
● Download webpage with urllib2, requests

● Parse the page with BeautifulSoup/lxml

● Select with XPath or css selectors

Scrapy - fast high Level Screen
Scraping and web crawling
Framework
● Pick a website
● Define the data you want to scrape
● Write the spider to extract the data
● Run the spider
● Store the Data

Why Scrapy
● Simplicity
● Fast
● Productive/ Extensible
● Portable
● Well docs & Healthy community
● Commercial Support

Advanced Features (built in)
● Interactive shell for trying XPaths (useful for
debugging)
● selecting and extracting data from html
sources
● cleaning and sanitizing the scraped data
● generating feed exports (JSON, CSV)
● media pipeline for downloading stuff
● Middlewares for (cookies, HTTP
compression, cache, user-agent spoofing,
etc)

Getting started with Scrapy in Python

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie Getting started with Scrapy in Python

Ähnlich wie Getting started with Scrapy in Python (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Getting started with Scrapy in Python