2. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Introduction
A data stage is basically a tool that is used to design, develop and execute
various applications to fill multiple tables in data warehouse or data marts.
It is a program for Windows servers that extracts data from databases and
change them into data warehouses. It has become an essential part of IBM
What is the difference between Compiled and âValidated Okâ in Data
Stage?
-Validating a job is all about running the job in âCheck onlyâ mode.
-The checks that will perform are been as follows :
- The connections are established to data sources or data warehouse
- The SELECT statements are prepared
- Intermediate files are opened in Hashed file, UniVerse or ODBC stages
3. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Datastage has numerous data types, easy metadata management and
dedicated OSH to run high data volume jobs in a reduced timeframe. Because
of its underlying parallelism feature the ETL transformation which consumes
more hardware resources and considerable time frame in other environment
will show a remarkable improvement in both resources and time when
implemented in DS. The tool provides full integration facilities to the file
servers like Linux, UNIX, hadoop and well proven scripting languages like
SHELL, PERL etcâŠAlso its provides separate interface for web based java and
even chains web service and XML.
4. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Look at the course content
DataStage Architecture
DataStage Clients
Designer
Director
Administrator
DataStage Workflow
Types of DataStage Job
Parallel Jobs
Server Jobs
Job Sequences
5. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Setting up DataStage Environment
DataStage Administrator Properties
Defining Environment Variables
Importing Table Definitions
Creating Parallel Jobs
Design a simple Parallel job in Designer
Compile your job
Run your job in Director
View the job log
Command Line Interface (dsjob)
6. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Accessing Sequential Data
Sequential File stage
Data Set stage
Complex Flat File stage
Create jobs that read from and write to
sequential files
Read from multiple files using file patterns
Use multiple readers
Null handling in Sequential File Stage
7. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Platform Architecture
Describe parallel processing architecture Describe pipeline & partition
parallelism
List and describe partitioning and collecting algorithms
Describe configuration files
Explain OSH & ScoreDescribe parallel processing architecture Describe
pipeline & partition parallelism
List and describe partitioning and collecting algorithms
Describe configuration files
Explain OSH & Score
8. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Combining Data
Combine data using the Lookup stage
Combine data using merge stage
Combine data using the Join stage
Combine data using the Funnel stage
Sorting and Aggregating Data
Sort data using in-stage sorts and Sort stage
Combine data using Aggregator stage
Remove Duplicates stage
9. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Transforming Data
Understand ways DataStage allows you to
transform data
Create column derivations using userdefined
code and system functions
Filter records based on business criteria
Control data flow based on data conditions
Repository Functions
Perform a simple Find
Perform an Advanced Find Perform an impact
analysis
Compare the differences between two Table
Definitions and Jobs.
10. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Some interview questions
What steps should be taken to improve Datastage jobs?
In order to improve performance of Datastage jobs, we have to first establish
the baselines. Secondly, we should not use only one flow for performance
testing. Thirdly, we should work in increment. Then, we should evaluate data
skews. Then we should isolate and solve the problems, one by one. After
that, we should distribute the file systems to remove bottlenecks, if any. Also,
we should not include RDBMS in start of testing phase. Last but not the least,
we should understand and assess the available tuning knobs.
11. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Differentiate between Join, Merge and Lookup stage?
All the three concepts are different from each other in the way they
use the memory storage, compare input requirements and how they
treat various records. Join and Merge needs less memory as
compared to the Lookup stage.
Explain Quality stage?
Quality stage is also known as Integrity stage. It
assists in integrating different types of data
from various sources.
12. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Define Job control?
Job control can be best performed by using Job Control Language (JCL). This
tool is used to execute multiple jobs simultaneously, without using any kind
of loop.
Differentiate between Symmetric Multiprocessing and Massive Parallel
Processing?
n Symmetric Multiprocessing, the hardware resources are shared by processor.
The processor has one operating system and it communicates through shared
memory. While in Massive Parallel processing, the processor access the
hardware resources exclusively.
13. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
What are the steps required to kill the job in Datastage?
To kill the job in Datasatge, we have to kill the respective processing ID.
Differentiate between validated and Compiled in the
Datastage?
In Datastage, validating a job means, executing a job. While validating,
the Datastage engine verifies whether all the required properties are
provided or not. In other case, while compiling a job, the Datastage
engine verifies that whether all the given properties are valid or not.
14. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Define APT_CONFIG in Datastage?
It is the environment variable that is used to identify the *.apt file in Datastage.
It is also used to store the node information, disk storage information and
scratch information.
Define Repository tables in Datastage?
In Datastage, the Repository is another name for a data warehouse. It
can be centralized as well as distributed.
15. B2/6/2 Vashi ,Navi Mumbai, Contact:09892900103/9892900173
datastagetraining.vibranttechnologies.co.in
enquiry@vibrantgroup.co.in
datastagetraining.vibranttechnologies.co.in
Where to Get More Information
Vibrant Group:
www.vibrantgroup.co.in
Vibrant Technologies & Computers
www.vibranttechnologies.co.in/technologies.vibrantgroup.co.
in
Vibrant HR Team
www.hr.vibrangroup.co.in