Informatica's Data Virtualization Solution addresses the problems organizations face in getting business data to users in a timely manner. It currently takes weeks or months on average to integrate new data sources, create reports, or change data hierarchies. Data Virtualization creates a common access layer across data sources so data can be accessed and analyzed without movement. It provides reusable data services, advanced transformations, and real-time data profiling and quality checks to help organizations more quickly and directly access clean trusted data. Data Virtualization is a key part of building an agile data platform that can leverage existing investments and infrastructure.
2. 2
Problem Statement : Takes too long to
get business the data it needs
On average, how long does it take to add a new
source of data to your data warehouse?
On average, how long does it take to create a complex
report or dashboard with about 20 dimensions, 12
measures, and 6 user access roles?
2011
TDWI BI BENCHMARK REPORT
Organizational and Performance Metrics
for Business Intelligence Teams
1 week
2 weeks
3 weeks
1 month
2 months
3 months
4-6 months
6 months or more
11%
7%
7%
22%
20%
14%
12%
7%
On average, how long does it take to change a
hierarchy (e.g. a new way of classifying products
or organizing sales regions)?
1 week
2 weeks
3 weeks
1 month
2 months
3 months
4-6 months
6 months or more
25%
13%
8%
25%
10%
6%
8%
4%
1 week
2 weeks
3 weeks
1 month
2 months
3 months
4-6 months
6 months or more
15%
9%
13%
16%
16%
16%
9%
5%
3. 3
Why Does it Take So Long?
It takes too long to explain
requirements
It takes months to change a
DW/add new critical data
It takes many iterations to get
the right data/reports
Changes break integrations &
impact applications
Directly accessing operational
systems is not possible / ideal
Typical Data Integration Process
IT Has
a Huge Backlog
a
b
c
d
1
2
3
4
5
6
Design
Change
Integrate
Unit Test
Validate
Deploy
Business is
Involved Too Late
As-Is Value Stream Map (LOT OF WAIT & WASTE)
e
4. 4
What is Needed?
PROFILE AND CLEASE DATA SO
IT CAN BE READILY TRUSTED
DELIVER REUSABLE DATA
SERVICES TO CONSUMERS
CREATE A COMMON ACCESS
LAYER ACROSS DATA SOURCES
Enterprise
Data Sources
Data
Virtualization
(Built-On Lean
Principles)…PRODUCTCUSTOMER ORDER
Logical View of All Underlying Data
QUICKLY & DIRECTLY ACCESS
DATA WITHOUT MOVEMENT
00110101
00100101
01011010
10010110
PortalBI Composite Apps
Data
Consumers
5. 5
Informatica Proprietary/Confidential. Informational Purposes Only. No Representation, Warranty or
Commitment regarding Future Functionality. Not to be Relied Upon in Making Purchasing Decision.
Business IT
TRANSFORM IN RT
Advanced Transformations,
Data Quality, Data Masking
4
Virtual Table
Replicated
CRM
Accounts
ACCESS & MERGE
2
Virtual Table
PROFILE IN RT
Business
Manager
Analyst,
Steward
Developer,
Architect
Common
Metadata
3
Virtual Table
MODEL
Customer
Name
Address
Category
Orders
1
Virtual Table
CRM
SCALE & PERFORM
Accounts
7
Optimizations
& Caching
Virtual Table
MOVE OR FEDERATE
AccountsCall Center
DW
6
Virtual Table
REUSE INSTANTLY
Batch Web Services
5
Query
Engine
WS
Server
Virtual Table
CRM
Agile Data Platform
6. 6
Data Virtualization :
Piece of Agile Data Platform Puzzle
• Provides a semantic access layer atop variety of data sources
• Data needs to be clean, masked etc.
• Pre-built library of advanced data transformations, e.g. merge
• Integrated real-time, on-the-fly data profiling & data quality
DW
BI
Virtual View
Access
Merge
Deliver
DW
Prototype
First
Move to DW
or Instantly Reuse
as SQL/WS
Advanced
Transformations &
Data Quality
Analyze & Profile
Data & Logic
Anytime
Early Business
Involvement
7. 7
Informatica Proprietary/Confidential. Informational Purposes Only. No Representation, Warranty or
Commitment regarding Future Functionality. Not to be Relied Upon in Making Purchasing Decision.
Key Considerations
1000s of
lines of code
TIME COST
Maintenance
Nightmare
Model & metadata-
driven environment
TIME COST
Sustain &
Maintain
Enabling Rapid
Development
v/s
Profile data AND
logic anywhere
TIME COST RISK
Get it Right
1st Time
Only source profiling,
need extra processing
Many Iterations
& Mistakes
TIME COST RISK
Analyzing &
Profiling
v/s Hand-coding can’t do
advanced transforms
TIME COST RISK
SQL
XQuery
Simple Cleansing
Web Service
Limited Rules,
No Data Quality
Leverage pre-built
logic including quality
TIME COST RISK
Virtual Table
Bake-in
Quality
Integrating
with Quality
v/s
Naturally extend
your infrastructure
TIME COST
Re-purpose
Logic & Skills
TIME COST
Re-work, re-deploy &
re-train every time
Re-invent the
Wheel
Leveraging
Investments
v/s
Scaling with
Flexibility
v/s
Virtualize or physically
materialize in 1 tool
TIME COST
Prototype First
& Then Scale
EII
Optimizations
TIME COST
Overburden Data
Virtualization
EII
X
RISK
Non-integrated
technologies
8. 8
Gartner Magic Quadrant for
Data Integration Tools, 2011
“The ability to switch seamlessly and transparently
between delivery modes (bulk / batch vs. granular
real-time vs. federation) with minimal rework will be
key for IT organizations seeking to develop a
successful data integration strategy.”
Ted Friedman, VP Distinguished Analyst, Gartner
Leveraging the Power of the Platform
“With v9, Informatica advanced its capabilities with
on-the-fly data quality and profiling, a model-driven
approach to provisioning data services, performance
enhancements, cloud integration, common metadata,
and role-specific tools.”
The Forrester Wave: Data Virtualization, Q1 2012
Forrester Wave: Data
Virtualization, Q1 ‘12
Power of
The Platform
The problem is that today it takes too long to deliver new critical data or reports to the business…You can see this from the results of the 2011 TDWI BI Benchmark Report, on an average, it takes months to add a new source of data to a data warehouse.
Let’s see why it takes so long.As a first step, it is important to take into account a typical data integration process, which is by nature, multi-step and involves the business when it is too late. At this point, if the business wants changes or needs other data or identifies inaccuracies, getting IT’s help means going back into a queue and waiting for IT as they work through their backlog of requests.The reasons for this delay in delivering new data and reports are manifoldIt takes too long for the Business to explain requirements to ITIt takes months for IT to change a DW / add new critical dataIt takes many iterations between Business and IT to get the right data / reportsAny changes in the underlying data sources break integrations and impact consuming applicationsDirectly accessing operational systems is not possible / ideal
Finally, Informatica enables business and IT to deliver a current, complete, and trusted view of the business – within days vs. months. It does this by:Creating a common logical data access layer across all data sources - a point to remember here is that if it is not possible or desirable to directly hit an operational system, data replication can be used to create a replica and then use that replica as a source – this step can be done by the Analyst, without waiting for IT’s helpAccessing and merging diverse data into a virtual view without physically moving the data – this step can also be done by the Analyst, without waiting for IT’s helpInvolving the Analyst to analyze and profile the federated data or the virtual view – which means no staging or no further processingApplying advanced transformations including data quality in real-time to the federated data or virtual view And then, delivering data services or virtual views that can be instantly reused across projectsAll these capabilities are available as a single package called PowerCenter Data Virtualization Edition. Enterprises can reuse existing Informatica skills and data integration logic to deliver BI projects up to five times faster and at a third of the cost.
The ability to switch seamlessly and transparently between delivery modes (bulk/batch vs. granular real-time vs. federation) with minimalrework will be key for IT organizations seeking to develop a successful data integration strategy.Ted Friedman, VP Distinguished Analyst, Gartner