Anzeige

Más contenido relacionado

Similar a project_phrase I.pptx(20)

Anzeige

project_phrase I.pptx

  1. WEB SCRAPING TO COLLECT DATA FROM ETL WITH PIPELINE TEAM MEMBERS REGISTER NUMBER PREETHA.K 19TD1526 SHIFANA FARVEEN. I 19TD1535 SUBASRI.S 19TD1538 Under the guidance of Mrs.S.SARANYA…, Assitant Professor Department of CSE RAAKCET.
  2. O bjectives: Web scraping large amounts of data websites unstructured data-converted structured data spreadsheet -database 2
  3. G oal with Project title:  Enormous benefits-marketing field  Finance  Price monitoring  Company Information 3
  4. “ Domain of the Project: 4
  5. Domain explanation in general:  Develops-maintains-large-scale data processing systems.  Preparing structured-unstructured data-analytic modelling.  Data warehouse  Manage overall pipeline orchestration  Huge amounts of data-businesses. 5
  6. Archietecture: ◎ 6
  7. S ignificance of Proposed model:  Mainly deals with SQL checks on data to ensure that the data flowing in and flowing out are inline with the organizational requirements.  Data quality  Reduced data loss  Provides timely access  Using ETL a splendid future is growing exponentially  Generate reproducible code  Distributed “Big Data” computation  Scaling a working pipeline 7
  8. Technique:  Brings structure to your information as well as contributes to its clarity, completeness, quality, and velocity  Data formats changing over time  Broken data connections  Contradictions between systems  Addressing the issues of different ETL components with the same technology  Not considering data scaling  Failing to anticipate future data needs 8
  9. 9
  10. 10 Extract:  Most companies and businesses acquire data from a variety of sources, such as CRM files, ERP files, emails, Excel sheets, Word documents, log files data.  During extraction, the ETL tool uses various connectors to extract relevant raw data from their respective sources.  Even though it is possible to manually extract data, it is a time-consuming and error- prone process. With the ETL tool, this extraction stage is made easier and faster. Tranform:  After data extraction, create APIs to tranform them into a format of a destination system as input.  Cleaning
  11. 11  Deduplication  Format revision  Key restructuring,etc,… Load:  Data loading is the process where the newly transformed data is collectively loaded into a new location.  Full load — loading entire data from source to destination. Suitable for smaller source data  Incremental/Delta Load — loading only the data from source which are not available in destination. Suitable if soure datasize is huge. Usually it is implemented based on date.
  12. Steps on Proposed work:  Web Scraping-extracting valuable and intersting information from web pages  Mainly targeting task are about automated web data extraction.  Data can be extracted through the various source links.  Inspector-parsing HTML entails identifying HTML elements and associated tags.  Parsing HTML Links with Beautiful soup  Creates a parse tree for parsed pages that can be used to extract data from HTML  Extract data using Pandas and Requests 12
  13. 13  Get extracted data with Request and Pandas  Cleaning and Merging Scraped Data With Pandas  Scrapping Data for Multiple Seasons and Teams with a Loop  Final data Results and DataFrame
  14. Thank you! 14
Anzeige