SlideShare ist ein Scribd-Unternehmen logo
1 von 14
WEB SCRAPING TO COLLECT DATA FROM ETL WITH PIPELINE
TEAM MEMBERS REGISTER NUMBER
PREETHA.K 19TD1526
SHIFANA FARVEEN. I 19TD1535
SUBASRI.S 19TD1538
Under the guidance of
Mrs.S.SARANYA…,
Assitant Professor
Department of CSE
RAAKCET.
O bjectives:
Web scraping
large amounts of data
websites
unstructured data-converted
structured data
spreadsheet -database
2
G oal with Project title:
 Enormous benefits-marketing field
 Finance
 Price monitoring
 Company Information
3
“
Domain of the Project:
4
Domain explanation in general:
 Develops-maintains-large-scale data processing systems.
 Preparing structured-unstructured data-analytic modelling.
 Data warehouse
 Manage overall pipeline orchestration
 Huge amounts of data-businesses.
5
Archietecture:
◎
6
S ignificance of Proposed model:
 Mainly deals with SQL checks on data to ensure that the data flowing in
and flowing out are inline with the organizational requirements.
 Data quality
 Reduced data loss
 Provides timely access
 Using ETL a splendid future is growing exponentially
 Generate reproducible code
 Distributed “Big Data” computation
 Scaling a working pipeline
7
Technique:
 Brings structure to your information as well as contributes to its clarity,
completeness, quality, and velocity
 Data formats changing over time
 Broken data connections
 Contradictions between systems
 Addressing the issues of different ETL components with the same
technology
 Not considering data scaling
 Failing to anticipate future data needs
8
9
10
Extract:
 Most companies and businesses acquire data from a variety of sources, such as
CRM files, ERP files, emails, Excel sheets, Word documents, log files data.
 During extraction, the ETL tool uses various connectors to extract relevant raw data
from their respective sources.
 Even though it is possible to manually extract data, it is a time-consuming and error-
prone process. With the ETL tool, this extraction stage is made easier and faster.
Tranform:
 After data extraction, create APIs to tranform them into a format of a destination
system as input.
 Cleaning
11
 Deduplication
 Format revision
 Key restructuring,etc,…
Load:
 Data loading is the process where the newly transformed data is collectively
loaded into a new location.
 Full load — loading entire data from source to destination. Suitable for smaller
source data
 Incremental/Delta Load — loading only the data from source which are not
available in destination. Suitable if soure datasize is huge. Usually it is
implemented based on date.
Steps on Proposed work:
 Web Scraping-extracting valuable and intersting information from web
pages
 Mainly targeting task are about automated web data extraction.
 Data can be extracted through the various source links.
 Inspector-parsing HTML entails identifying HTML elements and
associated tags.
 Parsing HTML Links with Beautiful soup
 Creates a parse tree for parsed pages that can be used to extract data from
HTML
 Extract data using Pandas and Requests
12
13
 Get extracted data with Request and Pandas
 Cleaning and Merging Scraped Data With Pandas
 Scrapping Data for Multiple Seasons and Teams with a Loop
 Final data Results and DataFrame
Thank you!
14

Weitere ähnliche Inhalte

Ähnlich wie project_phrase I.pptx

Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsConnexica
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...AboutYouGmbH
 
Innovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdfInnovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdfCristina Vidu
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowaleCapgemini
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesCindy Irby
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfRajvir Kaushal
 
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdfvitm11
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambdadarach
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Denodo
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap IT Strategy Group
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationDenodo
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptxJesusaEspeleta
 

Ähnlich wie project_phrase I.pptx (20)

Steering Away from Bolted-On Analytics
Steering Away from Bolted-On AnalyticsSteering Away from Bolted-On Analytics
Steering Away from Bolted-On Analytics
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Innovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdfInnovate with the data you have with UiPath and Snowflake.pdf
Innovate with the data you have with UiPath and Snowflake.pdf
 
End User Informatics
End User InformaticsEnd User Informatics
End User Informatics
 
PowerApps community call-March 2019
PowerApps community call-March 2019PowerApps community call-March 2019
PowerApps community call-March 2019
 
CWIN17 India / Bigdata architecture yashowardhan sowale
CWIN17 India / Bigdata architecture  yashowardhan sowaleCWIN17 India / Bigdata architecture  yashowardhan sowale
CWIN17 India / Bigdata architecture yashowardhan sowale
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
Denodo DataFest 2016: Comparing and Contrasting Data Virtualization With Data...
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Accenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdfAccenture-Cloud-Data-Migration-POV-Final.pdf
Accenture-Cloud-Data-Migration-POV-Final.pdf
 
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
52023374-5ab1-4b99-8b31-bdc4ee5a7d89.pdf
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Deconstructing Lambda
Deconstructing LambdaDeconstructing Lambda
Deconstructing Lambda
 
Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)Future of Data Strategy (ASEAN)
Future of Data Strategy (ASEAN)
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Resume
ResumeResume
Resume
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Extract, Transform and Load.pptx
Extract, Transform and Load.pptxExtract, Transform and Load.pptx
Extract, Transform and Load.pptx
 
TSE_Pres12.pptx
TSE_Pres12.pptxTSE_Pres12.pptx
TSE_Pres12.pptx
 

Kürzlich hochgeladen

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 

Kürzlich hochgeladen (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 

project_phrase I.pptx

  • 1. WEB SCRAPING TO COLLECT DATA FROM ETL WITH PIPELINE TEAM MEMBERS REGISTER NUMBER PREETHA.K 19TD1526 SHIFANA FARVEEN. I 19TD1535 SUBASRI.S 19TD1538 Under the guidance of Mrs.S.SARANYA…, Assitant Professor Department of CSE RAAKCET.
  • 2. O bjectives: Web scraping large amounts of data websites unstructured data-converted structured data spreadsheet -database 2
  • 3. G oal with Project title:  Enormous benefits-marketing field  Finance  Price monitoring  Company Information 3
  • 4. “ Domain of the Project: 4
  • 5. Domain explanation in general:  Develops-maintains-large-scale data processing systems.  Preparing structured-unstructured data-analytic modelling.  Data warehouse  Manage overall pipeline orchestration  Huge amounts of data-businesses. 5
  • 7. S ignificance of Proposed model:  Mainly deals with SQL checks on data to ensure that the data flowing in and flowing out are inline with the organizational requirements.  Data quality  Reduced data loss  Provides timely access  Using ETL a splendid future is growing exponentially  Generate reproducible code  Distributed “Big Data” computation  Scaling a working pipeline 7
  • 8. Technique:  Brings structure to your information as well as contributes to its clarity, completeness, quality, and velocity  Data formats changing over time  Broken data connections  Contradictions between systems  Addressing the issues of different ETL components with the same technology  Not considering data scaling  Failing to anticipate future data needs 8
  • 9. 9
  • 10. 10 Extract:  Most companies and businesses acquire data from a variety of sources, such as CRM files, ERP files, emails, Excel sheets, Word documents, log files data.  During extraction, the ETL tool uses various connectors to extract relevant raw data from their respective sources.  Even though it is possible to manually extract data, it is a time-consuming and error- prone process. With the ETL tool, this extraction stage is made easier and faster. Tranform:  After data extraction, create APIs to tranform them into a format of a destination system as input.  Cleaning
  • 11. 11  Deduplication  Format revision  Key restructuring,etc,… Load:  Data loading is the process where the newly transformed data is collectively loaded into a new location.  Full load — loading entire data from source to destination. Suitable for smaller source data  Incremental/Delta Load — loading only the data from source which are not available in destination. Suitable if soure datasize is huge. Usually it is implemented based on date.
  • 12. Steps on Proposed work:  Web Scraping-extracting valuable and intersting information from web pages  Mainly targeting task are about automated web data extraction.  Data can be extracted through the various source links.  Inspector-parsing HTML entails identifying HTML elements and associated tags.  Parsing HTML Links with Beautiful soup  Creates a parse tree for parsed pages that can be used to extract data from HTML  Extract data using Pandas and Requests 12
  • 13. 13  Get extracted data with Request and Pandas  Cleaning and Merging Scraped Data With Pandas  Scrapping Data for Multiple Seasons and Teams with a Loop  Final data Results and DataFrame