2. Introduction
2
ETL(EXTRACT, TRANSFORM, LOAD)
• The ETL process collects the raw data from various data sources (your CRM, ad accounts, ERP,
email servers, …) and saves them to the staging area.
• Before data can be loaded in the target data warehouse or database of your choice, the data
undergoes extensive transformations.
• Depending on your business logic, you might mask sensitive personal information, remove
outliers, or aggregate metrics to make your analysts’ life easier, before finally loading data into
the data storage.
ELT(EXTRACT, LOAD, TRANSFORM)
• A variant of ETL wherein the extracted data is first loaded into the target system.
• Transformations are performed after the data is loaded into the data warehouse.
• ELT typically works well when the target system is powerful enough to handle transformations.
Analytical
• Databases like Amazon Redshift and Google Big Query are often used in ELT pipelines because
they are highly efficient in performing transformations
ETL VS ELT
4. 4
Differences
ETL ELT
1) Support for Data Warehouse Yes, ETL is the traditional process for
transforming and integrating
structured or relational data into a
cloud-based or on-premises data
warehouse.
Yes, ELT is the modern process for
transforming and integrating
structured or unstructured data into a
cloud-based data warehouse.
2) Support for Data
Lake/Mart/Lakehouse
No, ETL is not an appropriate
process for data lakes, data marts or
data lakehouses.
Yes, the ELT process is tailored to
provide a data pipeline for data lakes,
data marts or data lakehouses.
3) Size/type of data set ETL is most appropriate for
processing smaller, relational data
sets which require complex
transformations and have been
predetermined as being relevant to
the analysis goals.
ELT can handle any size or type of
data and is well suited for processing
both structured and unstructured big
data. Since the entire data set is
loaded, analysts can choose at any
time which data to transform and use
for analysis.
ETL VS ELT
5. 5
Differences
ETL ELT
4) Implementation The ETL process has been around
for decades and there is a mature
ecosystem of ETL tools and experts
readily available to help with
implementation.
The ELT process is a newer
approach and the ecosystem of tools
and experts needed to implement it
is still growing.
5) Transformation In the ETL process, data
transformation is performed in a
staging area outside of the data
warehouse and the entire data must
be transformed before loading. As a
result, transforming larger data sets
can take a long time up front but
analysis can take place immediately
once the ETL process is complete.
In the ELT process, data
transformation is performed on an
as-needed basis in the target system
itself. As a result, the transformation
step takes little time but can slow
down the querying and analysis
processes if there is not sufficient
processing power.
ETL VS ELT
6. 6
Differences
ETL VS ELT
ETL ELT
6. Loading The ETL loading step requires data to be
loaded into a staging area before being loaded
into the target system. This multi-step process
takes longer than the ELT process
In ELT, the full data set is loaded directly into
the target system. Since there is only one
step, and it only happens one time, loading in
the ELT process is faster than ETL.
7) Cost ETL can be cost-prohibitive for many small and
medium businesses.
ELT benefits from a robust ecosystem of
cloud-based platforms which offer much
lower costs and a variety of plan options to
store and process data.
8) Compliance ETL is better suited for compliance with GDPR,
HIPAA, and CCPA standards given that users
can omit any sensitive data prior to loading in
the target system.
ELT carries more risk of exposing private
data and not complying with GDPR, HIPAA,
and CCPA standards given that all data is
loaded into the target system.
7. 7
Use Case
ETL VS ELT
ETL ELT
TRANSFORM TECHNOLOGIES Scripting languages, SQL
procedures
Data warehouse specific solutions
PHYSICAL SPACE REQUIRED TO
STORE DATA
Lower Higher
MATURITY Tested and proven Novel and (sometimes)
experimental
ENGINEERING EXPERTISE
REQUIRED
Medium High
DATA TYPE All, but best for structured (relational)
data
All, but excels at unstructured data
PROS Simpler to deploy and maintain. A lot
of (human and technical) resources
available.
Can handle massive amounts of
data. Best for unstructured data.
CONS Scaling - Becomes increasingly more
complex for large data deployments.
Needs a higher level of expertise to
deploy and maintain. Edge cases
are not always polished for reliability
8. AWS GLU
ETL VS ELT
AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move,
and integrate data from multiple sources. You can use it for analytics, machine learning, and application
development. It also includes additional productivity and data ops tooling for authoring, running jobs, and
implementing business workflows.
With AWS Glue, you can discover and connect to more than 70 diverse data sources and manage your data in a
centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to
load data into your data lakes. Also, you can immediately search and query cataloged data using Amazon Athena,
Amazon EMR, and Amazon Redshift Spectrum.
AWS GLUE COMPONENTS::
AWS Glue console
AWS Glue Data Catalog
AWS Glue crawlers and classifiers
AWS Glue crawlers and classifiers
AWS Glue ETL operations
Streaming ETL in AWS Glue
The AWS Glue jobs system
8
10. AZURE DATA FACTORY
ETL VS ELT
10
Azure Data Factory falls under the identify domain of Services in the SEO(Search Engine
Optimization )catalog, and it’s a cloud based integration service.
Basically it works on data .It Orchestrates and automates the movement or transformation
of data.
As data is coming from a number of different products ,to analyze and store all this data we
need a power full tool ,so Azure data factory will Help us
How ADF Will Help us??
Storing the data with the help of Azure Data Lake storage
Analyzing the data
Transforming the data with the help of pipelines
Publishing the Organized data
Visualizing the data with third party applications like Apache spark and Hadoop
12. BUSINEES OBJECT DATA SERVICES(BODS)
ETL VS ELT
12
SAP BODS is an ETL tool for extracting data from disparate systems, transform data into
meaningful information, and load data in a data warehouse. It is designed to deliver enterprise-
class solutions for data integration, data quality, data processing and data profiling. The full form of
SAP BODS is Business Objects Data Services.
• Repository, Management Console, Designer, Job Server, Access Server, are important
components of SAP BODS Architecture
• SAP Business Objects offers better profiling because of too many acquisitions of other companies.
13. Conclusion
We need to look at business/ technical problems , What would be our reference
data model architecture and then come up with roadmaps towards the same.
• ETL is best suited for fast analytics in smaller-to-medium data environments,
where the source data and data operations are well-controlled and do not
evolve constantly (do not need flexibility).
• ELT, in contrast, is best suited for working with semi-structured or
unstructured data, in big data environments, where the changing data
operation requirements foresee a lot of needed flexibility.
Hinweis der Redaktion
We are building up a base of integrated expertise by data transfer. key use case example using the warehouse of information sharing we can easily reuse the data into other systems their by enabling collaboration with various ecosystems empowering customer centric thinking with southern water IT landscape . To do this digital first is the key
Essentially, we are trying to establish an echo system with fundament for sharing data