1. Top Answers to ETL Interview Questions
1.What is ETL Process?
ETL is the process of Extraction, Transforming and Loading.
2.How many steps are there in ETL process?
In ETL process data is extracted from source such as the database servers and it is used to generate business roll.
3.What are the steps involved in ETL process?
The steps involved are defining the source; target, creating the mapping, creating the session, and creating the workflow.
4.Can there be sub steps of ETL steps?
Each of the steps involved in ETL has several sub steps. The transform step has most sub steps.
5.What is initial load and what is full load?
In ETL the initial load is the process for populating all data warehousing tables for very first time. Full load means when the
data is loaded for the first time all set records are loaded at a stretch depending on its volume. It would erase all contents in
the table and would reload fresh data.
6.What is meant by incremental load?
Incremental load refers to applying the dynamic changes as and when required in a specific period and predefined schedules.
7.What are three tier systems in ETL?
The data warehouse is considered to be the three tier system in ETL.
8.What are the three tiers in ETL?
Middle layer in ETL provides the data that is usable in a secure way to end users. Other two layers are on the other side of the
middle tier, the end user and back end data storage.
9.What are the names of the layers in ETL?
The first layer in ETL is the source layer and it is the layer where data lands. Second layer is integration layer where data is
stored after transformation. Third layer is the dimension layer where actual presentation layer stands.
10.What is meant by snapshots?
Snapshots are the copies of read only data that is stored in the master table.
11.What are the characteristics of Snapshots?
Snapshots are located on remote node and refreshed periodically so that the changes in master table can be recorded. They are
also replica of tables.
12.What are views?
Views are built using the attributes of one or more tables. View with single tables can be updated but those with multiple
tables cannot be updated.
13.What is meant by materialized view log?
Materialized view log is the pre-computed table with aggregated or joined data from the fact tables as well as the dimension
tables.
14.What is a materialized view?
Materialized view is an aggregate table.
15.What is the difference between power center and power mart?
Task accomplished by Power Center is processing large volumes of data. Power Mart processes low volumes of data.
16.With which apps can Power Center be connected?
Power Center can be connected with ERP source like the SAP, Oracle Apps, and the People Soft etc.
17.Which partition is used to improve the performances of ETL transactions?
To improve the performances of ETL transactions the session partition is used.
18.Does Power Mart provide connections to ERP sources?
No! Power Mart does not provide connection to any of the ERP sources. It also does not allow sessions partition.
19.What is meant by partitioning in ETL?
Partitioning in ETL refers to sub division of the transactions in order to improve their performances.
20.What is the benefit of increasing number of partitions in ETL?
Increase in the number of partitions enables the informatics Server to create multiple connections to a host of sources.
2. 21.What are the types of partitions in ETL?
Types of partitions in ETL are Round-Robin partition and Hash partition.
22.What is Round Robin partitioning?
In Round Robin partitioning the data is evenly distributed by the informatica among all the partitions. It is used when the
number of rows in process in each of the partitions is nearly the same.
23.What is Hash partitioning?
In Hash partitioning the informatica server would apply a hash function in order to partition keys to group data among the
partitions. It is used to ensure the processing of group of rows with the same partitioning key in same partition.
24.What is mapping in ETL?
Mapping refers to flow of data from source to the destination.
25.What is session in ETL?
Session is a set of instructions that describes the data movement from the source to the destination.
26.What is meant by Worklet in ETL?
Worklet is the set of tasks in ETL. It can be any set of tasks in the program.
27.What is workflow in ETL?
Workflow is a set of instruction that specifies the way of executing the tasks to the informatica.
28.What is referred by Mapplet In ETL?
Mapplet in ETL is used for the purpose of creation as well as configuration of a group of transformations.
29.What is meant by operational data store?
Operational data store is the repository that exists between the staging area and the data warehouse. Data stored in ODS has
low granularity.
30.How operational data store works?
Aggregated data is loaded into the EDW after it is populated in operational data store or ODS. Basically ODS is also semi
DWH helping analysis of business data. Data persistence period in ODS is usually in the range of 30-45 days and not more.
31.What the ODS in ETL generates?
ODS in ETL generates primary keys, takes care of the error and also rejects just like the DWH.
32.When are the tables in ETL analyzed?
Use of ANALYZE statement allows validation and computing of statistics for either the index table or the cluster.
33.How are the tables analyzed in ETL?
Statistics generated by the ANALYZE statement use is reused by cost based optimizer in order to calculate the most efficient
plan for data retrieval. ANALYZE statement can support validation of structures of objects as well as space management in
the system. Operations include COMPUTER, ESTIMATE, and DELETE.
34.How can the mapping be fine tuned in ETL?
Steps for fine tuning the mapping involves using condition for filter in source qualifying the data without use of filter;
utilizing persistence as well as cache store in look up t/r; using the aggregations t/r in sorted i/p group by different ports, using
operators in expressions instead of functions, and increase the cache size and commit interval.
35.What are differences between connected and unconnected look up in ETL?
Connected look up is used for mapping and returns multiple values. It can be connected to another transformation and also
returns a value. Unconnected look up is used when look up is not available in main flow and it returns only single output. It
also cannot be connected to other transformation but are reusable.