SlideShare ist ein Scribd-Unternehmen logo
1 von 13
WHAT IS ETL?
“EXTRACT … TRANSFORM … LOAD”



Eng. Ismail El Gayar
Software Engineer
WHY ETL?

 Companies need a way to analyze their data
  for critical business decisions.
 Transactional Database can’t answer
  complex business questions.
 A data warehouse provide a common data
  repository.
 ETL provide a method of moving the data
  from various source into a data warehouse.
ETL CONCEPT
 A Company data may be scattered in
  different locations and in different formats.
 ETL Allows you to:
     Migrate the data into a data warehouse.
     Convert the various formats and types to adhere
      to one consistent system.
   ETL is a predefined process for access and
    manipulate source data and loading it into a
    target database.
ETL REQUIREMENTS
   Any ETL Architecture must meet the following
    requirements:
       Business Requirement
       Compliance Requirement
       Data Profiling
       Data Security
       Data Integration
       Right Data at Right Time
       Archiving & Uneage
       Final End User Delivery Interface
       Available Skills
       Legacy License
       Alignment with overall Enterprise Architecture
THE ETL PROCESS



                                                    Load
                                                 The process of
                            Transform            writing data into
                                                    the target
                            The process of
                                                     database
                          converting data from
                          one form to another
         Extract
      The process of
    reading data from a
         database
EXTRACT

   Gathering the data
     Raw  data that was written directly into the disk
     Data written to flat files or relational tables from
      structured source systems
     Data can be read multiple times, if needed.

   Cleansing the data
     Eliminateduplicates or fragmented data
     Exclude unwanted / unneeded information
TRANSFORM

 Preparing the data to be housed in the data
  warehouse.
 Converting the extracted data
     Using  rules and lookup tables
     Combining data

     Verification/Validity checks

     Standardization
LOAD

   Storing the transformed data in the data
    warehouse.

   Batch/Real-time processing

   Can follow star schema and snowflake
    schema
ETL FLOW
ADVANTAGE OF ETL TOOL
 Simple, faster and cheaper development
 Most ETL tools provide a metadata
  repository, synchronizing metadata from
  various sources.
 Most ETL tools deliver good performance,
  even for very large dataset.
 Most ETL tools provide impact analysis tools
  for any proposed schema changes.
 Most ETL tools have built-in connectors for
  all the major RDBMS systems
ADVANTAGE OF ETL TOOL
 Most ETL tools allow reuse of the existing
  complex programs.
 Several ETL tools offers visual Development
  Environment.
 Most ETL tools offers built-in scheduler
  sequencers and documentation.
 Several ETL tools offer various performance
  optimization options such as (parallel
  processing, complex load balancing etc)
POPULAR ETL TOOLS

                 Tools                             Company
Infosphere Datastage               IBM
Informatica                        Informatica Corp
DT/Studio                          Embarcadero Technologies
Ab Inito                           Ab Inito Software Corp
Oracle Warehouse Builder           ORACLE
Microsoft SQL Server Integration   Microsoft
Transformation Manager             ETL Solutions
THANK YOU

Weitere ähnliche Inhalte

Was ist angesagt?

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process Omid Vahdaty
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training PresentationApurba Biswas
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data WarehousingAlex Meadows
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Building Data Lakehouse.pdf
Building Data Lakehouse.pdfBuilding Data Lakehouse.pdf
Building Data Lakehouse.pdfLuis Jimenez
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Eric Sun
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl conceptsjeshocarme
 

Was ist angesagt? (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
DATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MININGDATA WAREHOUSING AND DATA MINING
DATA WAREHOUSING AND DATA MINING
 
ETL Process
ETL ProcessETL Process
ETL Process
 
OLAP v/s OLTP
OLAP v/s OLTPOLAP v/s OLTP
OLAP v/s OLTP
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
ETL Testing Training Presentation
ETL Testing Training PresentationETL Testing Training Presentation
ETL Testing Training Presentation
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Building Data Lakehouse.pdf
Building Data Lakehouse.pdfBuilding Data Lakehouse.pdf
Building Data Lakehouse.pdf
 
Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)Reshape Data Lake (as of 2020.07)
Reshape Data Lake (as of 2020.07)
 
Dw & etl concepts
Dw & etl conceptsDw & etl concepts
Dw & etl concepts
 

Ähnlich wie What is ETL?

Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETLganblues
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita DubeyAnkita Dubey
 
Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)pomishra
 
Chapter 4-ETL
Chapter 4-ETLChapter 4-ETL
Chapter 4-ETLteenoooo
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?HEXANIKA
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxcamyla81
 
Basha_ETL_Developer
Basha_ETL_DeveloperBasha_ETL_Developer
Basha_ETL_Developerbasha shaik
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSSDeepali Raut
 
Data junction tool
Data junction toolData junction tool
Data junction toolSara shall
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdfBOSupport
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...Shahzad
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become ObsoleteJerald Burget
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsRhonda Cetnar
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse Sasha Soleev
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODINagendra K
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapersKai Zhao
 

Ähnlich wie What is ETL? (20)

ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)
 
Chapter 4-ETL
Chapter 4-ETLChapter 4-ETL
Chapter 4-ETL
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
Basha_ETL_Developer
Basha_ETL_DeveloperBasha_ETL_Developer
Basha_ETL_Developer
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Data junction tool
Data junction toolData junction tool
Data junction tool
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
 
Should ETL Become Obsolete
Should ETL Become ObsoleteShould ETL Become Obsolete
Should ETL Become Obsolete
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse Test labs 2016. Тестирование data warehouse
Test labs 2016. Тестирование data warehouse
 
Datastage to ODI
Datastage to ODIDatastage to ODI
Datastage to ODI
 
Big data analytics beyond beer and diapers
Big data analytics   beyond beer and diapersBig data analytics   beyond beer and diapers
Big data analytics beyond beer and diapers
 

Mehr von Ismail El Gayar

Why computer engineering
Why computer engineeringWhy computer engineering
Why computer engineeringIsmail El Gayar
 
Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Ismail El Gayar
 
System science documentation
System science documentationSystem science documentation
System science documentationIsmail El Gayar
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programmingIsmail El Gayar
 
Object oriented methodology & unified modeling language
Object oriented methodology & unified modeling languageObject oriented methodology & unified modeling language
Object oriented methodology & unified modeling languageIsmail El Gayar
 

Mehr von Ismail El Gayar (7)

Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Why computer engineering
Why computer engineeringWhy computer engineering
Why computer engineering
 
Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)Geographic Information System for Egyptian Railway System(GIS)
Geographic Information System for Egyptian Railway System(GIS)
 
System science documentation
System science documentationSystem science documentation
System science documentation
 
Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
Parallel architecture &programming
Parallel architecture &programmingParallel architecture &programming
Parallel architecture &programming
 
Object oriented methodology & unified modeling language
Object oriented methodology & unified modeling languageObject oriented methodology & unified modeling language
Object oriented methodology & unified modeling language
 

Kürzlich hochgeladen

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Kürzlich hochgeladen (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

What is ETL?

  • 1. WHAT IS ETL? “EXTRACT … TRANSFORM … LOAD” Eng. Ismail El Gayar Software Engineer
  • 2. WHY ETL?  Companies need a way to analyze their data for critical business decisions.  Transactional Database can’t answer complex business questions.  A data warehouse provide a common data repository.  ETL provide a method of moving the data from various source into a data warehouse.
  • 3. ETL CONCEPT  A Company data may be scattered in different locations and in different formats.  ETL Allows you to:  Migrate the data into a data warehouse.  Convert the various formats and types to adhere to one consistent system.  ETL is a predefined process for access and manipulate source data and loading it into a target database.
  • 4. ETL REQUIREMENTS  Any ETL Architecture must meet the following requirements:  Business Requirement  Compliance Requirement  Data Profiling  Data Security  Data Integration  Right Data at Right Time  Archiving & Uneage  Final End User Delivery Interface  Available Skills  Legacy License  Alignment with overall Enterprise Architecture
  • 5. THE ETL PROCESS Load The process of Transform writing data into the target The process of database converting data from one form to another Extract The process of reading data from a database
  • 6. EXTRACT  Gathering the data  Raw data that was written directly into the disk  Data written to flat files or relational tables from structured source systems  Data can be read multiple times, if needed.  Cleansing the data  Eliminateduplicates or fragmented data  Exclude unwanted / unneeded information
  • 7. TRANSFORM  Preparing the data to be housed in the data warehouse.  Converting the extracted data  Using rules and lookup tables  Combining data  Verification/Validity checks  Standardization
  • 8. LOAD  Storing the transformed data in the data warehouse.  Batch/Real-time processing  Can follow star schema and snowflake schema
  • 10. ADVANTAGE OF ETL TOOL  Simple, faster and cheaper development  Most ETL tools provide a metadata repository, synchronizing metadata from various sources.  Most ETL tools deliver good performance, even for very large dataset.  Most ETL tools provide impact analysis tools for any proposed schema changes.  Most ETL tools have built-in connectors for all the major RDBMS systems
  • 11. ADVANTAGE OF ETL TOOL  Most ETL tools allow reuse of the existing complex programs.  Several ETL tools offers visual Development Environment.  Most ETL tools offers built-in scheduler sequencers and documentation.  Several ETL tools offer various performance optimization options such as (parallel processing, complex load balancing etc)
  • 12. POPULAR ETL TOOLS Tools Company Infosphere Datastage IBM Informatica Informatica Corp DT/Studio Embarcadero Technologies Ab Inito Ab Inito Software Corp Oracle Warehouse Builder ORACLE Microsoft SQL Server Integration Microsoft Transformation Manager ETL Solutions