YouTube Link: https://youtu.be/5o9lucMaQLc
** Python Certification Training: https://www.edureka.co/data-science-python-certification-course **
This Edureka video on 'Scrapy Tutorial' will help you understand how you can make a simple web crawler using python scrapy and store the extracted data in a file. Following are the topics discussed:
What Is Scrapy?
What Is A Web Crawler?
How To Install Scrapy?
Starting Your First Scrapy Project
Making Your First Spider
Extracting Data
Storing The Extracted Data
Follow us to never miss an update in the future.
YouTube: https://www.youtube.com/user/edurekaIN
Instagram: https://www.instagram.com/edureka_learning/
Facebook: https://www.facebook.com/edurekaIN/
Twitter: https://twitter.com/edurekain
LinkedIn: https://www.linkedin.com/company/edureka
Castbox: https://castbox.fm/networks/505?country=in
2. What Is Scrapy?
What Is a Webcrawler?
How To Install Scrapy?
www.edureka.co/python
Starting First Scrapy Project
Making Your First Spider
Extracting Data
Storing The Extracted Data
4. www.edureka.co/python
Scrapy is a free and open-source web-crawling framework
written in python.
It was originally designed for web scraping but can also be
used for extracting the data using the APIs or as a general
purpose web crawler.
It is currently maintained by Scrapinghub Ltd.
6. What Is A
WebCrawler?
www.edureka.co/python
Web Crawler
• A web-crawler is also known as a web spider,
automatic indexer or simply crawler.
• It is an internet bot that helps in web indexing
• Web crawlers helps in collecting information
from a webpage and the links related to them
• It also helps in validating HTML code and
hyperlinks.
• They crawl one page at a time until all the
pages are indexed.
8. How To Install Scrapy?
www.edureka.co/python
To install scrapy, simply run the following command in
the command prompt or in the terminal, or simply you
can add the package from the project interpreter too.
pip install scrapy
12. Making Your
First Spider
www.edureka.co/python
Spiders are classes that you define and scrapy
uses to scrape information from a website.
They must subclass scrapy.Spider and define the
initial requests to make.
Also define how to parse the downloaded page
content to extract data.
To create a spider, simply make a python file in the
spiders directory in your project.
14. Extracting
Data
www.edureka.co/python
The best way to learn how to extract data using scrapy is to trying
selectors using scrapy shell. XPath and CSS are supported by scrapy
selector expressions.
Run the following command in the scrapy shell.
scrapy shell “websitelocation”