Informatica provides the market's leading data integration platform. Tested on nearly 500,000 combinations of platforms and applications, the data integration platform inter operates with the broadest possible range of disparate standards, systems, and applications. This unbiased and universal view makes Informatica unique in today's market as a leader in the data integration platform. It also makes Informatica the ideal strategic platform for companies looking to solve data integration issues of any size.
4. Slide 4 www.edureka.co/informatica
Understand Informatica & Informatica Product Suite
Explain the Error Handling In Informatica
Understand Informatica Domain & Repository Management
Understand Informatica Recovery Concepts
Understand PowerCenter Log Management
At the end of this module, you will be able to:
Objectives
5. Slide 5 www.edureka.co/informatica
Informatica – A Product Company
Informatica Corp. provides data integration software and services for various businesses, industries and government
organizations including telecommunication, health care, financial and insurance services
6. Slide 6 www.edureka.co/informatica
Informatica Products & Their Functionalities
There are a wide range of products available under the Informatica product suite that helps satisfy the data
integration requirements within the enterprise and beyond
Informatica's product is a portfolio focused on Data Integration:
» Data Integration & ETL
» Information Lifecycle Management
» Complex Event Processing
» Data Masking
» Data Quality
» Data Replication
» Data Virtualization
» Master Data Management
» Ultra Messaging
Currently at version 9.6, these components form a toolset for establishing and maintaining enterprise-wide data
warehouses
9. Slide 9 www.edureka.co/informatica
PowerCenter - Fully integrated end-to-end data integration platform, Informatica PowerCenter
Enterprise converts raw data into information to drive analysis, daily operations, and data
governance initiatives
Information Lifecycle Management - Informatica’s Information Lifecycle Management software
empowers your IT organizations to cost-effectively handle data growth, safely retire legacy
systems and applications, optimize test data management and protect sensitive data
Complex Event Processing - Informatica RulePoint is a complex event processing software that
delivers robust and effective complex event processing with real-time alerts and insight into
pertinent information to operate in a smarter, faster, efficient and competitive way
Data Masking - Informatica Data Masking products dynamically mask sensitive production data
from unauthorized access, permanently and irreversibly mask nonproduction data thereby helping
IT organizations to comply with data privacy regulations, organization-wide data privacy
mandates and reduce the risk of a data breach
Informatica Products & Their Functionalities (Contd.)
10. Slide 10 www.edureka.co/informatica
Data Quality - Informatica Data Quality provides clean, high-quality data regardless of size, data format,
platform, or technology to the business. Helps validating and improving address information, profiling and
cleansing business data, or implementing a data governance practice and ensure the data quality requirements
are met
Data Replication - Informatica Data Replication is database-agnostic, real-time transaction replication software
that’s highly scalable, reliable, and non-disruptive to the performance of operational source systems
Data Virtualization - Informatica Data Services provides a single scalable architecture for both data integration
and data federation, creating a data virtualization layer that hides and handles the complexity of accessing
underlying data sources - all while insulating them from change
Master Data Management - The Informatica Master Data Management (MDM) product family delivers
consolidated and reliable business-critical data—also known as master data—to the applications that employees
rely on every day
Ultra Messaging - Informatica Ultra Messaging is a family of next-generation, low-latency messaging middleware
products. With very high throughput and 24x7 reliability, they deliver extremely low-latency application messaging
over both network-based and shared-memory (inter-process) based transports
Informatica Products & Their Functionalities (Contd.)
11. Slide 11 www.edureka.co/informatica
Informatica Resources
Informatica Corporate Website
Informatica University
Customer Portal
Product Documentation
Knowledge Base
Technical Support
Informatica Product Certification
12. Slide 12 www.edureka.co/informatica
Introduction to PowerCenter
PowerCenter:
It is a single, unified enterprise data integration platform that allows companies and government
organizations of all sizes to access, discover and integrate data from virtually any business system, in any
format and deliver that data throughout the enterprise at any speed
An ETL tool ( Extract, Transform and Load)
The main advantages of PowerCenter over other ETL tools lies in its robustness, for it can be used in both
Windows and Unix based systems
PowerCenter can read from a variety of different sources and write to as many targets, while transforming
data in between
The main advantages of PowerCenter over other ETL tools, and hence a reason for its popularity over other
such tools are as follows:
» It is robust, and can be used in both windows and UNIX based systems
» It is high performing yet very simple for developing, maintaining and administering
13. Slide 13 www.edureka.co/informatica
Versions of PowerCenter
PowerCenter Version History:
The current version of PowerCenter is Informatica PowerCenter 9.6.1 HF2 (as of Feb ’15)
From version 9.x onwards, PowerCenter has become service oriented, with each server component being
identified as a service. (Ex.: Repository service, Integration service etc.)
The previous versions of Informatica are neither in use nor under support of Informatica
For more information please visit www.informatica.com
14. Slide 14 www.edureka.co/informatica
PowerCenter Architecture - SOA
The architecture of Informatica PowerCenter (version 9.x onwards) is based on the Service Oriented Architecture
(SOA) concept
A service oriented architecture (SOA) can be defined as a group of services, which communicate with each other.
The process of communication involves either simple data passing or it could involve two or more services
coordinating same activity
Informatica 9.x represents a major change in the architecture of the product line
Aim: Its main aim is to provide improved performance and high availability
Approach: By reengineering, the underlying architecture has been made even more service-based
17. Slide 17 www.edureka.co/informatica
Error Handling In Informatica
Error Handling is one of the must have components in any Data Warehouse or Data Integration project.
When we start with any Data Warehouse or Data Integration projects, business users come up with set of exceptions
to be handled in the ETL process. In this article, lets talk about how do we easily handle these user defined error.
Identifying errors and creating an error handling strategy is very important.
The 2 types of errors in an ETL process are – Data Errors & Process Errors.
Data Errors : To handle Data errors we can use the Row Error Logging feature. The errors are captured into the
error tables. We can then analyse, correct and reprocess them.
Process errors : To handle Process errors we can configure an email task to notify the event of a session failure.
18. Slide 18 www.edureka.co/informatica
Error Handling In Informatica
INFORMATICA FUNCTIONS USED
Informatica PowerCenter to define our user defined error capture logic.
ERROR() : This function Causes the PowerCenter Integration Service to skip a row and issue an
error message, which you define. The error message displays in the session log or written to the
error log tables based on the error logging type configuration in the session.
ABORT() : Stops the session, and issues a specified error message to the session log file or written
to the error log tables based on the error logging type configuration in the session. When the
PowerCenter Integration Service encounters an ABORT function, it stops transforming data at that
row. It processes any rows read before the session aborts.
20. Slide 20 www.edureka.co/informatica
Error Handling In Informatica
INFORMATICA ERROR TABLES
Once Configuration is specified, Informatica PowerCenter will create four different tables for error
logging and the table details as below.
ETL_PMERR_DATA :- Stores data about a transformation row error and its corresponding source row.
ETL_PMERR_MSG :- Stores metadata about an error and the error message.
ETL_PMERR_SESS :- Stores metadata about the session.
ETL_PMERR_TRANS:- Stores metadata about the source and transformation ports, when error occurs.
With this, we are done with the setting required to capture user defined errors. Any data records which
violates our data validation check will be captured into PMERR tables mentioned above.
21. Slide 21 www.edureka.co/informatica
Error Handling In Informatica
REPORT THE ERROR DATA
Now we have the error data stored in the error table, we can pull the error report using an SQL Query.
We can be more fancy with the SQL and get more information from the error.
select
sess.FOLDER_NAME as 'Folder Name',
sess.WORKFLOW_NAME as 'WorkFlow Name',
sess.TASK_INST_PATH as 'Session Name',
data.SOURCE_ROW_DATA as 'Source Data',
msg.ERROR_MSG as 'Error MSG'
from
ETL_PMERR_SESS sess
left outer join ETL_PMERR_DATA data
on data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and
data.SESS_INST_ID = sess.SESS_INST_ID
left outer join ETL_PMERR_MSG msg
on msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and
msg.SESS_INST_ID = sess.SESS_INST_ID
where
sess.FOLDER_NAME = <Project Folder Name> and
sess.WORKFLOW_NAME = <Workflow Name> and
sess.TASK_INST_PATH = <Session Name> and
sess.SESS_START_TIME = <Session Run Time>
25. Slide 25 www.edureka.co/informatica
Server Components of PowerCenter
The PowerCenter server components comprises of the following services:
Repository service: The Repository service manages the repository. It retrieves, inserts, and updates metadata
into the repository database tables
Integration service: The Integration service runs sessions and workflows
SAP BW service: The SAP BW service looks out for RFC requests from SAP BW and initiates workflows to extract
data from, or load data into the SAP BW
Web services hub: The Web services hub receives requests from web service clients and exposes PowerCenter
workflows as services
27. Slide 27 www.edureka.co/informatica
Informatica- Domain & Nodes
The salient features of a Domain are as follows:
A Domain is a logical collection or set of nodes and services
The PowerCenter Domain is the fundamental administrative unit of PowerCenter
A Domain can be a single PowerCenter installation, or it can consist of multiple PowerCenter installations
The salient features of a node are as follows:
A node is a logical representation of a physical machine. It has physical attributes such as a hostname and a port number
Each node runs a service manager which is responsible for the application and core services
A node can be a gateway node or a worker node, but it can belong to only one Domain
28. Slide 28 www.edureka.co/informatica
Gateway Node
A gateway node can be described as follows:
The gateway node is the node where all core services are meant to run
The primary function of a gateway node is to route all service request from the PowerCenter client to other
available nodes
If gateway node is unavailable, a Domain cannot accept any service request, however only one node within the
Domain can act as a gateway at any given point in time
31. Slide 31 www.edureka.co/informatica
Informatica Repository Management
#1. Repository is a generic term referred to container, place or room where something is stored.
#2. Informatica repository is a set of database tables where Informatica stores its metadata. METADATA is data
that describes other data. More specifically it is data about data.
#3. Informatica repository keeps Informatica Meta data. Information about different type of objects,
Example mappings, transformations, Folders, connections, user privileges etc.
#4. Informatica repository metadata tables in industry also called as OPB tables/views or REP tables/views.
#5. Repository is managed with client tool “Informatica power center repository manager”. Repository manager is
useful for ADMIN activities.
1 You can create, edit and delete folders.
2 You can manage object and user permissions.
3 You can backup repository to local machine and restore it back to some other server.
4 You can create deployment group.
5 You can view objects and their locks and can disable write intent lock on the objects locked by you.
6 You can import and export objects.
7 You can copy objects from one folder to another.
34. Slide 34 www.edureka.co/informatica
Informatica Recovery Concepts
# Workflow Recovery
• Workflow recovery allows you to continue processing the workflow and workflow tasks from the point of
interruption.
• During the workflow recovery process Integration Service access the workflow state, which is stored in memory or on
disk based on the recovery configuration.
• The workflow state of operation includes the status of tasks in the workflow and workflow variable values.
• The configuration includes.
1. Workflow Configuration for Recovery
2. Session and Tasks Configuration for Recovery
3. Recovering the Workflow from Failure
35. Slide 35 www.edureka.co/informatica
Informatica Recovery Concepts
1. Workflow Configuration for Recovery
To configure a workflow for recovery, we
must enable the workflow for recovery
or configure the workflow to suspend on
task error.
Enable Recovery : When you enable a
workflow for recovery, the Integration
Service saves the workflow state of
operation in a shared location. You can
recover the workflow if it terminates,
stops, or aborts. The workflow does not
have to be running.
36. Slide 36 www.edureka.co/informatica
Informatica Recovery Concepts
1. Workflow Configuration for Recovery
Suspend : When you configure a workflow to
suspend on error, the Integration Service stores
the workflow state of operation in memory.
You can recover the suspended workflow if a task
fails.
You can fix the task error and recover the
workflow.
If the workflow is not able to recover
automatically from failure with in the maximum
allowed number of attempts, it goes to
'suspended' state.
.
37. Slide 37 www.edureka.co/informatica
Informatica Recovery Concepts
2. Session and Tasks Configuration for Recovery
Session and Tasks Each session or task in
a workflow has its own recovery strategy.
When the Integration Service recovers a
workflow, it recovers tasks based on the
recovery strategy of each task or session
specified.
Three different options are available.
1. Restart task
2. Fail task and continue workflow
3. Resume from the last check point for
Recovery
38. Slide 38 www.edureka.co/informatica
Informatica Recovery Concepts
1. Restart task : This recovery strategy is available for all type of workflow tasks. When the
Integration Service recovers a workflow, it restarts each recoverable task that is configured with
a restart strategy. You can configure Session and Command tasks with a restart recovery
strategy. All other tasks have a restart recovery strategy by default.
2. Fail task and continue workflow : This recovery strategy is only available for session and
command tasks. When the Integration Service recovers a workflow, it does not recover the task.
The task status becomes failed, and the Integration Service continues running the workflow.
Configure a fail recovery strategy if you want to complete the workflow, but you do not want to
recover the task.
3. Resume from the last checkpoint : This recovery strategy is only available for session tasks. The
Integration Service saves the session state of operation and maintains target recovery tables. If
the session aborts, stops, or terminates, the Integration Service uses the saved recovery
information to resume the session from the point of interruption.
39. Slide 39 www.edureka.co/informatica
Informatica Recovery Concepts
3. Recovering the Workflow from Failure
Workflow can be either recovered automatically or manually depending on the workflow recovery strategy
Recovering Automatically
If you have High Availability (HA) licence and the workflow is configured to recover automatically as described above,
Integration service automatically attempts to recover the workflow based on the recovery strategy set of each session or
task in the workflow. If the workflow is not able to recover automatically from failure with in the maximum allowed
number of attempts, it goes to 'suspended' state, which can be then manually recovered.
Recovering Manually
If you do not have High Availability (HA) licence, you can manually recover the workflow or individual tasks with in a
workflow separately. You can access the options as shown in below image from the workflow manager or from the
workflow monitor.
40. Slide 40 www.edureka.co/informatica
Informatica Recovery Concepts
3. Recovering the Workflow from Failure
Recovering Manually
Recover workflow :- Continue processing the
workflow from the point of interruption.
Recover Task :- Recover a session but not the rest of
the workflow.
Recover workflow from a task :- Recover a session
and continue processing a workflow.
42. Slide 42 www.edureka.co/informatica
Informatica Log Management
Informatica Log Management
Workflow can be either recovered automatically or manually depending on the workflow recovery strategy
The Integration service will be generate two logs when the mapping runs
1) Session log -- Has the details of the task ,session errors and load statistics..
2) Workflow log -- Has the details of the workflow processing, and workflow errors..
The workflow log will be generated when the workflow started and the session log will be generated once the session
initiated.
43. Slide 43 www.edureka.co/informatica
Informatica Log Management
Informatica Log Management
The workflow log will be generated when the workflow started and the session log will be generated once the session
initiated.
The below process will happen the when the workflow initiated..
1. The Integration Service writes binary log files on the node. It sends information about the sessions and workflows to the
Log Manager.
2. The Log Manager stores information about workflow and session logs in the domain configuration database. The domain
configuration database stores information such as the path to the log file location, the node that contains the log, and the
Integration Service that created the log.
3. When you view a session or workflow in the Log Events window, the Log Manager retrieves the information from the
domain configuration database to determine the location of the session or workflow logs.
4. The Log Manager dispatches a Log Agent to retrieve the log events on each node to display in the Log Events window.
44. Slide 44 www.edureka.co/informatica
Informatica Log Management
Informatica Log Management
When a workflow is invoked the Integration Service creates the following output files:
Workflow log :The Integration Service process creates a workflow log for each workflow it runs. It writes information in the
workflow log such as initialization of processes, workflow task run information, errors encountered, and workflow run
summary.
Session log : The Integration Service process creates a session log for each session it runs. It writes information in the
session log such as initialization of processes, session validation, creation of SQL commands for reader and writer threads,
errors encountered, and load summary.
Session detail : When you run a session, the Workflow Manager creates session details that provide load statistics for each
target in the mapping Performance Detail : Performance details provide transformation-by-transformation information on
the flow of data through the session.
Reject Files : By default, the Integration Service process creates a reject file for each target in the session. The reject file
contains rows of data that the writer does not write to targets.
45. Slide 45 www.edureka.co/informatica
Informatica Log Management
Informatica Log Management
When a workflow is invoked the Integration Service creates the following output files:
Row Error Logs : When a row error occurs, the Integration Service process logs error information that allows you to
determine the cause and source of the error.
Recovery Tables Files : The Integration Service process creates recovery tables on the target database system when it runs a
session enabled for recovery. When you run a session in recovery mode.
Indicator File : If you use a flat file as a target, you can configure the Integration Service to create an indicator file for target
row type information. For each target row, the indicator file contains a number to indicate whether the row was marked for
insert, update, delete, or reject.
Cache Files : When the Integration Service process creates memory cache, it also creates cache files. The Integration Service
process creates cache files for the following mapping objects: Aggregator transformation, Joiner transformation,Rank
transformation, Lookup transformation, Sorter transformation, XML target.
46. Slide 46 www.edureka.co/informatica
Informatica Log Management
Informatica Logs - Different Types of Tracing Levels In Informatica
The tracing levels can be configured at the transformation And/OR session level in informatica. There are 4
different types of tracing levels. The different types of tracing levels are listed below:
Tracing levels:
•None: Applicable only at session level. The Integration Service uses the tracing levels configured in the mapping.
•Terse: logs initialization information, error messages, and notification of rejected data in the session log file.
•Normal: Integration Service logs initialization and status information, errors encountered and skipped rows due to
transformation row errors. Summarizes session results, but not at the level of individual rows.
•Verbose Initialization: In addition to normal tracing, the Integration Service logs additional initialization details;
names of index and data files used, and detailed transformation statistics.
•Verbose Data: In addition to verbose initialization tracing, the Integration Service logs each row that passes into
the mapping. Also notes where the Integration Service truncates string data to fit the precision of a column and
provides detailed transformation statistics. When you configure the tracing level to verbose data, the Integration
Service writes row data for all rows in a block when it processes a transformation.