SlideShare a Scribd company logo
1 of 23
ETL
with talend
POOJA B. MISHRA
What is ETL?
Extract is the process of reading data from a database
Transform is the process of converting the extracted data from its
previous form into the form it needs to be in so that it can be placed
into another database. Transformation occurs by using rules or
lookup tables or by combining the data with other data
Load is the process of writing the data into the target database
Process flow
Terms closely related and managed by
ETL processes
 data migration
data management
 data cleansing
data synchronization
 data consolidation.
.
Different ETL tools
•Oracle ETL
•Ab Initio
•Pentaho Data Integration -Kettle Project (open source ETL)
•SAS ETL studio
•Cognos Decisionstream
•Business Objects Data Integrator (BODI)
•Microsoft SQL Server Integration Services (SSIS)
•Informatica PowerCenter
•Talend
Prerequisites
Talend Open Studio for Data Integration
◦ http://www.talend.com/download
VirtualBox
◦ https://www.virtualbox.org/wiki/Downloads
Hortonworks Sandbox VM
◦ http://hortonworks.com/products/hortonworks-
sandbox/#install
How to set up – Step 1
Step 2
Step 3
Step 4
Step 5
Talend Interface
Workspace
Repository tree
Component configuration
Palette
WorkspaceRepository
tree
Palette
Repository
tree
Workspace
Palette
Component
configuration
What are supported data input formats?
•SQL
•MySQL
•PostgreSQL
•Sybase
•Teradata
•MSSQL
•Netezza
•Greenplum
•Access
•DB2
•Hive
What kinds of datasets can be loaded?
Talend Studio offers nearly comprehensive connectivity to:
Packaged applications (ERP, CRM, etc.), databases, mainframes, files, Web Services, and so on to
address the growing disparity of sources.
Data warehouses, data marts, OLAP applications - for analysis, reporting, dashboarding,
scorecarding, and so on.
Built-in advanced components for ETL, including string manipulations, Slowly Changing
Dimensions, automatic lookup handling, bulk loads support, and so on.
What are data export formats?
Talend - ELT DB Components – Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Challenges involved
Data volumes are growing exponentially
Data velocity is moving faster
As information systems grow in complexity, the disparity of
sources is growing as well
All these target structures have different data transformation
requirements and different tolerances in terms of latency
Transformations involved in ETL processes can be highly complex
Thank You!!

More Related Content

What's hot

Talend AS A Product
Talend AS A ProductTalend AS A Product
Talend AS A Product
Abdul Manaf
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
Jonathan Levin
 
ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or Worse
Eric Sun
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
Laurent Alquier
 

What's hot (20)

TaLend Online Training
TaLend Online TrainingTaLend Online Training
TaLend Online Training
 
Introduction To Pentaho Kettle
Introduction To Pentaho KettleIntroduction To Pentaho Kettle
Introduction To Pentaho Kettle
 
Kettle – Etl Tool
Kettle – Etl ToolKettle – Etl Tool
Kettle – Etl Tool
 
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)
 
Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014Talend Big Data Capabilities - 2014
Talend Big Data Capabilities - 2014
 
Business Intelligence Open Source
Business Intelligence Open SourceBusiness Intelligence Open Source
Business Intelligence Open Source
 
Talend AS A Product
Talend AS A ProductTalend AS A Product
Talend AS A Product
 
ETL Metadata Injection with Pentaho Data Integration
ETL Metadata Injection with Pentaho Data IntegrationETL Metadata Injection with Pentaho Data Integration
ETL Metadata Injection with Pentaho Data Integration
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
Open Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETLOpen Source ETL vs Commercial ETL
Open Source ETL vs Commercial ETL
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
ETL Practices for Better or Worse
ETL Practices for Better or WorseETL Practices for Better or Worse
ETL Practices for Better or Worse
 
Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
 
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...
Pentaho Data Integration: Extrayendo, integrando, normalizando y preparando m...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
What is ETL?
What is ETL?What is ETL?
What is ETL?
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
 
Datastage ppt
Datastage pptDatastage ppt
Datastage ppt
 
KnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge baseKnowIT, semantic informatics knowledge base
KnowIT, semantic informatics knowledge base
 
Data stage interview questions and answers|DataStage FAQS
Data stage interview questions and answers|DataStage FAQSData stage interview questions and answers|DataStage FAQS
Data stage interview questions and answers|DataStage FAQS
 

Viewers also liked

Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Acquia
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Kai Wähner
 

Viewers also liked (11)

Talend For Big Data : Secret Key to Hadoop
Talend For Big Data  : Secret Key to HadoopTalend For Big Data  : Secret Key to Hadoop
Talend For Big Data : Secret Key to Hadoop
 
Meet David - ETL / Informatica Consultant
Meet David - ETL / Informatica ConsultantMeet David - ETL / Informatica Consultant
Meet David - ETL / Informatica Consultant
 
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
Setting the Record Straight: Drupal as an Enterprise Web Content Management S...
 
Talend Community Use Group Bristol: Preparing your business for mastering dat...
Talend Community Use Group Bristol: Preparing your business for mastering dat...Talend Community Use Group Bristol: Preparing your business for mastering dat...
Talend Community Use Group Bristol: Preparing your business for mastering dat...
 
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity HardwareBig Linked Data ETL Benchmark on Cloud Commodity Hardware
Big Linked Data ETL Benchmark on Cloud Commodity Hardware
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
 
Simplifying Big Data ETL with Talend
Simplifying Big Data ETL with TalendSimplifying Big Data ETL with Talend
Simplifying Big Data ETL with Talend
 
Présentation de Talend Winter 2017
Présentation de Talend Winter 2017 Présentation de Talend Winter 2017
Présentation de Talend Winter 2017
 
Talend Big Data Capabilities Overview
Talend Big Data Capabilities OverviewTalend Big Data Capabilities Overview
Talend Big Data Capabilities Overview
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine LearningData Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
Data Preparation vs. Inline Data Wrangling in Data Science and Machine Learning
 

Similar to Etl with talend (data integeration)

oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
ssuser8ccb5a
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
camyla81
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
ganblues
 
final_proj_Implementation of the ETL system
final_proj_Implementation of the ETL systemfinal_proj_Implementation of the ETL system
final_proj_Implementation of the ETL system
R-uturaj R-aval
 
Basha_ETL_Developer
Basha_ETL_DeveloperBasha_ETL_Developer
Basha_ETL_Developer
basha shaik
 

Similar to Etl with talend (data integeration) (20)

Etl with talend (data integeration)
Etl with talend (data integeration)Etl with talend (data integeration)
Etl with talend (data integeration)
 
ETL Tools Ankita Dubey
ETL Tools Ankita DubeyETL Tools Ankita Dubey
ETL Tools Ankita Dubey
 
oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021oracle_soultion_oracledataintegrator_goldengate_2021
oracle_soultion_oracledataintegrator_goldengate_2021
 
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptxCERN_DIS_ODI_OGG_final_oracle_golde.pptx
CERN_DIS_ODI_OGG_final_oracle_golde.pptx
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...To Study  E T L ( Extract, Transform, Load) Tools Specially  S Q L  Server  I...
To Study E T L ( Extract, Transform, Load) Tools Specially S Q L Server I...
 
ETL Process
ETL ProcessETL Process
ETL Process
 
Building the DW - ETL
Building the DW - ETLBuilding the DW - ETL
Building the DW - ETL
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
Where to Start ETL Developer Career
Where to Start ETL Developer CareerWhere to Start ETL Developer Career
Where to Start ETL Developer Career
 
ETL (1).ppt
ETL (1).pptETL (1).ppt
ETL (1).ppt
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
Etl overview training
Etl overview trainingEtl overview training
Etl overview training
 
What is ETL and Zero ETL | Extract, Transform, Load
What is ETL and Zero ETL | Extract, Transform, LoadWhat is ETL and Zero ETL | Extract, Transform, Load
What is ETL and Zero ETL | Extract, Transform, Load
 
final_proj_Implementation of the ETL system
final_proj_Implementation of the ETL systemfinal_proj_Implementation of the ETL system
final_proj_Implementation of the ETL system
 
Basha_ETL_Developer
Basha_ETL_DeveloperBasha_ETL_Developer
Basha_ETL_Developer
 
StreamHorizon overview
StreamHorizon overviewStreamHorizon overview
StreamHorizon overview
 
ETL DW-RealTime
ETL DW-RealTimeETL DW-RealTime
ETL DW-RealTime
 
Introduction to Conductor
Introduction to ConductorIntroduction to Conductor
Introduction to Conductor
 
Hadoop etl
Hadoop etlHadoop etl
Hadoop etl
 

Recently uploaded

JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand5 Reasons Driving Warehouse Management Systems Demand
5 Reasons Driving Warehouse Management Systems Demand
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
SQL Injection Introduction and Prevention
SQL Injection Introduction and PreventionSQL Injection Introduction and Prevention
SQL Injection Introduction and Prevention
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
Optimizing Operations by Aligning Resources with Strategic Objectives Using O...
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
Odoo vs Shopify: Why Odoo is Best for Ecommerce Website Builder in 2024
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
architecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdfarchitecting-ai-in-the-enterprise-apis-and-applications.pdf
architecting-ai-in-the-enterprise-apis-and-applications.pdf
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
How to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabberHow to install and activate eGrabber JobGrabber
How to install and activate eGrabber JobGrabber
 

Etl with talend (data integeration)

  • 2. What is ETL? Extract is the process of reading data from a database Transform is the process of converting the extracted data from its previous form into the form it needs to be in so that it can be placed into another database. Transformation occurs by using rules or lookup tables or by combining the data with other data Load is the process of writing the data into the target database
  • 4. Terms closely related and managed by ETL processes  data migration data management  data cleansing data synchronization  data consolidation. .
  • 5. Different ETL tools •Oracle ETL •Ab Initio •Pentaho Data Integration -Kettle Project (open source ETL) •SAS ETL studio •Cognos Decisionstream •Business Objects Data Integrator (BODI) •Microsoft SQL Server Integration Services (SSIS) •Informatica PowerCenter •Talend
  • 6. Prerequisites Talend Open Studio for Data Integration ◦ http://www.talend.com/download VirtualBox ◦ https://www.virtualbox.org/wiki/Downloads Hortonworks Sandbox VM ◦ http://hortonworks.com/products/hortonworks- sandbox/#install
  • 7. How to set up – Step 1
  • 12. Talend Interface Workspace Repository tree Component configuration Palette WorkspaceRepository tree Palette Repository tree Workspace Palette Component configuration
  • 13. What are supported data input formats? •SQL •MySQL •PostgreSQL •Sybase •Teradata •MSSQL •Netezza •Greenplum •Access •DB2 •Hive
  • 14. What kinds of datasets can be loaded? Talend Studio offers nearly comprehensive connectivity to: Packaged applications (ERP, CRM, etc.), databases, mainframes, files, Web Services, and so on to address the growing disparity of sources. Data warehouses, data marts, OLAP applications - for analysis, reporting, dashboarding, scorecarding, and so on. Built-in advanced components for ETL, including string manipulations, Slowly Changing Dimensions, automatic lookup handling, bulk loads support, and so on.
  • 15. What are data export formats?
  • 16. Talend - ELT DB Components – Step 1
  • 22. Challenges involved Data volumes are growing exponentially Data velocity is moving faster As information systems grow in complexity, the disparity of sources is growing as well All these target structures have different data transformation requirements and different tolerances in terms of latency Transformations involved in ETL processes can be highly complex