3. ETL Overview
Extraction Transformation Loading – ETL
To get data out of the source and load it into the
data warehouse.
Data is extracted from an OLTP database,
transformed to match the data warehouse
schema and loaded into the data warehouse
database
5. Why???
As data sources change the data warehouse will
periodically updated.
Also, as business changes the DW system needs
to change – in order to maintain its value as a tool
for decision makers, as a result of that the ETL
also changes and evolves. The ETL processes
must be designed for ease of modification. As
solid, well-designed, and documented ETL
system is necessary for the success of a data
warehouse project.
An ETL system consists of three consecutive
functional
steps: extraction, transformation, and loading:
7. Extract Process
The Extract step covers the data extraction from
the source system and makes it accessible for
further processing. The main objective of the
extract step is to retrieve all the required data
from the source system with as little resources as
possible.
There are several ways to perform the extract:
1. Update notification
2. Incremental extract
3. Full extract
8. Clean
The cleaning step is one of
the most important as it
ensures the quality of the data
in the data warehouse.
Cleaning should perform basic
data unification rules, such as:
1. Making identifiers unique
2. Convert null values into
standardized
3. Convert phone numbers,
ZIP codes to a standardized
form
4. Validate address fields,
convert them into proper
naming, e.g.
Street/St/St./Str./Str
5. Validate address fields
against each other.
9. Transformation
applies a set of rules
to transform the data
from the source to the
target.
This includes
converting any
measured data to the
same dimension using
the same units so that
they can later be
joined.
10. Problems???
classes of conficts
and problems that can
be distinguished in
two levels : the
schema and the
instance level.
1. Schema-level
problems.
2. Record-level
problems.
3. Value-level
problems.
11. Solution…
To deal with such
issues, the integration
and transformation
tasks involve a wide
variety of functions,
such as normalizing,
de-normalizing ,
reformatting,
recalculating,
summarizing, merging
data from multiple
sources, modifying key
structures, adding an
element of time,
identifying default
values, supplying
decision commands to
choose between
12. Loading
Loading data to the
target
multidimensional
structure is the final
ETL step. In this step,
extracted and
transformed data is
written into the
dimensional
structures actually
accessed by the end
users and application
systems. Loading step
includes both loading
dimension tables and