2. The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
2
4. Agenda
• Data Warehouse Problem Space (Data Intg. Focus)
• Ancient Pre-History of Data Warehouse
• “The Good Old Days” of Data Warehouse
• Revival Period for Data Warehouse
• Data Integration for Modern Data Warehousing
• Old Generation: Hub & Spoke with Invasive Capture
• New Generation: Agent-based with Non-invasive Capture
• Drive Business Value with Data Integration
• Why Replace? Isn’t my Old _____ Good Enough?
• The Oracle Solution for Data Integration
• Oracle GoldenGate
• Oracle Data Integrator
• Oracle Data Quality
4
6. Data Warehouse Ancient History
• 1985 – 1995 “Controlled Chaos”
• Fragmented Strategy for Marts vs. Warehouse
• No practical notion of “Enterprise Data Warehouse”
• Data Integration:
• Hand-coded Scripts (External to DB)
• Not Optimized
• Procedural Transformations (PL/SQL etc)
• Few Data Integration Tools
• No Formal Methodology, Metrics or Governance
6
7. Data Warehouse Good Old Days
• 1995 – 2005 “Formal Methods and Discipline”
• Strategy Choices for Marts vs. Warehouse
• Top-down (Inmon) vs. Bottom-up (Kimball)
• Formal notion of “Enterprise Data Warehouse”
• Data Integration:
• Tool-based Data Integration Solutions
• Optimized, Parallel Server-based Transforms
• Formal Methodology, Metrics or Governance
• Reduced Reliance on Hand-coded Scripts and
Procedural Transformations (PL/SQL etc)
7
8. Data Warehouse Revival Period
• 2005 – 2015 “Specialized Warehouse Solutions”
• Technology-driven Choices for High-end DW’s
• Commodity H/W vs. Optimized Appliances
• Relational/Star vs. Columnar (vs. Cubes/OLAP)
• Database + BI vs. Distributed Analytic Apps (Hadoop etc)
• EDW as a “source of truth” vision morphs and
expands to MDM as a distinct problem domain
• Data Integration is still stuck in the “Good Old Days”
Good Old Days Modern Alternative
Hub-based Runtime Agent-based Runtime
Centralized ETL Server Optimized E-LT (DW Appliance)
Mainly Batch Mainly Real Time / Trickle Feed
8
10. Modern Data Integration Approach
Heterogeneous, Real-time, Non-Invasive, High Performance E-LT
Traditional ETL + CDC Modern E-LT + Real-time
• Invasive Capture on OLTP • Continuous feeds from
systems using complex Adapters operational systems
• Transformations in ETL engine • Non-invasive data capture
on expensive middle tier servers • Thin middle tier with
• Bulk load to the data warehouse transformations on the database
with large nightly/daily batch platform (target)
• Mini-batches throughout the day
Extract
or bulk processing nightly
Trickle
Agent
Agent
Xform Xform Bulk
Lookup Lookup
Data Data
Staging Load Heterogeneous
10
11. Good Old Days of ETL Batch Integration
• Good Tools, but:
• Expensive Environments, Performance
Bottlenecks, Too Many Data Hops,
Proprietary Skills w/Vendor Lock-in, and
Heavy Optimization in Complex Situations Development,
QA, System (etc)
Environments
• Won’t scale w/new Generation of DW’s
Extract Transform Load Lookups/Calcs Transform Load
ETL engines ETL Metadata
require BIG Lookup Meta
H/W and heavy Data
parallel tuning
ETL Engine(s)
Lookup
Sources Stage Data Prod
11
12. Modern Agent-based E-LT Processing
• Same Good Tools you Expect, plus:
• Reduce Data Center Costs, De-commission Servers
• Open Frameworks, Non-Proprietary SQL Skills
• Deploys Seamlessly Alone or within SOA Servers
• Scales Linearly with Modern DW Appliances
Extract Transform Load Lookups/Calcs Transform Load
Development,
QA, System (etc)
Environments
Set-based SQL SQL Load
E-LT transforms inside DB is
Meta typically faster always faster
Agent
Lookup
Sources Data Movement Stage Data Data Transformation Prod
12
13. Good Old Days of Real Time Replication
• Good Tools, but:
• Arcane capture process, sometimes invasive
• Okay for Data Integration Changed Data Capture, but:
• not used for Active-Active / ZDT Migrations
• not used for High Availability or Disaster Recovery
ETL Engine(s)
Lookup
Sources Stage Data Prod
Transaction Apply
CDC Hub(s)
Mgmt Server
13
14. Agent-based Real Time Replication
• Same Good Tools you Expect, but:
• Not dependent on hardware for replication
• Capable of Heterogeneous, Active-Active Deployments
• Suitable for Zero Downtime Migrations
• Point-in-time Recovery
Lookup
Sources Stage Data Data Movement Prod
Capture Replicat
Agent Agent
14
15. Data Capture Architecture Options
• Next Generation Capabilities
• Non-invasive, heterogeneous, disk-based log access
• Suitable for CDC + High Availability & Active-Active
• Bi-directional and high performance
• Check-pointing and Simple Trail/Queue Management
Updates Triggers
Inserts
Deletes
Log Tables
Oracle
IBM DB2
MSFT SQL Server
Sybase
Teradata On-Disk Logs
Enscribe
15
16. Good Old Days of Data Integration
• Monolithic & Expensive Environments
• Fragile, Hard to Manage
Development,
QA, System (etc)
• Difficult to Tune or Optimize Environments
Extract Transform Load Lookups/Calcs Transform Load
ETL engines ETL Metadata
require BIG Lookup Meta
H/W and heavy Data
parallel tuning
ETL Engine(s)
Lookup
Sources Stage Data Prod
Transaction Apply
CDC Hub(s)
Mgmt Server
16
17. Modern Data Integration Architecture
• Lightweight, Inexpensive Environments – Agents
• Resilient, Easy to Manage – Non-Invasive
• Easy to Optimize and Tune – uses DBMS power
Extract Transform Load Lookups/Calcs Transform Load
Development,
QA, System (etc)
Environments
Set-based SQL SQL Load
E-LT transforms inside DB is
Meta typically faster always faster
Agent
Bulk Data Movement Lookup
Sources Stage Data Data Transformation Prod
Capture Replicat
Agent Agent
17
19. Business Drivers for Data Integration
Add Value to the Core Business Lines
Design metadata-driven integration
1. Do More with Less Leverage skills & dictate patterns
Ensure continuous uptime
2. Compete Globally 24X7 Access data in real time
3. Use Data for Competitive Ensure the quality of your data
Advantage Actively govern most valuable asset
4. Automate and Adapt Expose data services for reuse
Business Processes Orchestrate processes using SOA
19
20. Project Drivers for Data Integration
Essential Ingredient for Information Agility
Strategic Value of Data Integration
• Consistency for major enterprise initiatives like BI, DW, & MDM
• Common technical foundation platform across data silos
• Central point for data governance, availability and controls
Key Data Integration Use Cases
• BI, DW, and OLTP Data Integration & Replication
• SOA, Enterprise Integration & Modernization
• Migrations and Master Data Management
20
22. Why Replace _______?
• We often hear, “my company has already standardized
on __________, why should I replace it?
Answer:
Save Money on Data Center Costs
Accelerate Project Delivery / TTM
Supply Real Time Intelligence to the Business
Reduce Batch Windows on Data Warehouse
Unify Data Integration with SOA Plans
22
23. Save Money on Hardware/Data Center
E-LT runs on Small Commodity Servers as an Agent Process
Typical: Separate ETL Server Next Generation Architecture
• Proprietary ETL Engine, Poor Performance
• High Costs for Separate Standalone
Server E-LT
E-LT: No New Servers Transform Transform
Extract Load
• Lower Cost: Leverage Compute
Resources & Partition Workload efficiently
• Efficient: Exploits Database Optimizer
• Fast: Exploits Native Bulk Load & Other
Database Interfaces Conventional ETL Architecture
• Scalable: Scales as you add Processors to
Source or Target
Extract Transform Load
Benefits
• Optimal Performance & Scalability
• Better Hardware Leverage
• Easier to Manage & Lower Cost
23
24. Speed Project Delivery/Time to Market
E-LT uses Declarative SQL-style Design + Simple Runtime
• Development Productivity • Environment Setup (ex: BI Apps)
• 40% Efficiency Gains • 33-50% Less Complex
Number of Setup Steps 7
Number of Servers 1
Number of connections 3
Number of Setup Steps 10
Number of Servers 3
Number of connections 7
24
25. Supply Real Time Business Intelligence
Non-invasive Capture + E-LT Processing
Application Real Time BI Analytic BI
(using Data Copy) (Facts & Dims)
Consistency
Window
E-LT
(Mini-Batch + Transforms)
25
26. Reduce Consistency Windows w/E-LT
Fewer Steps, Faster Xform, and Faster Loads vs. typical ETL
Extract Transform Load Lookups/Calcs Transform Load
Lookup
Sources Stage Data Prod
ETL engines Main driver for batch
require BIG ETL Engine(s) window is data integrity &
H/W and heavy consistency; once lookup &
parallel tuning ETL Metadata calc functions begin, DW
Lookup Meta
typically goes offline
Data
Extract Transform Load Extract Transform Load ETL Batch Window
DW is
Online Extract Transform Load Uptime Gains Transform Load E-LT Batch Window
Lookup
Sources Data Movement Stage Data Data Movement Prod
E-LT Set-based SQL SQL Load
Meta transforms inside DB is
Agent typically faster always faster
26
27. *What About “Pushdown Processing”
• Pushdown Processing is what the ETL vendors do to
compensate for bad performance – push the transformation
processing to the Database
• Both Pushdown & E-LT have in common:
• uses the power of your Data Warehouse for maximum performance
• can combine engine-based operations with DB-based transformations to
accomplish any level of data transformation complexity
• can scale to any multi-TB level and using parallel processing
• Only E-LT can claim:
• performance optimized for your Database – whichever DB you use
• operate without any new IT Hardware costs
• 100% Java-based
• easily embedded within your existing or planned SOA infrastructure
• is not a glorified scheduler that relies on PL-SQL, or other custom-coded
DB scripts to achieve maximal performance
• can entirely eliminate needless network-hops for remote data joins
• can operate with no additional energy drain in your Datacenter
27
28. Unify E-LT Agent with SOA Runtime
Best of Breed Data Integration as a Shared SOA Service
Unified Management + Monitoring
• Common Runtime – 100% Java
• Common Monitoring
Example Use Cases
• Bulk Data Transformation (any2any)
• XML/EDI Large File Handling
• SOA-driven Business Intelligence
• Load DW from SOA
High Performance
ETL & Replication • Unified Data Steward Workflow
(ETL Error Hospital w/BPEL PM)
Data Warehouse • ERP Migration, Replication / Loading
Any Data Source
& OLAP • Query Offloading & Zero Downtime
E-LT Frameworks are optimal architectures for:
• Embedded Applications • Business Intelligence
• Application Integration • Performance Management
• Middleware Servers • Database & OLAP
28
30. Oracle Data Integration Solution
Best-in-class Heterogeneous Platform for Data Integration
Oracle Custom MDM Business Activity SOA
Applications Applications Applications Intelligence Monitoring Platforms
Comprehensive Data Integration Solution
SOA Abstraction Layer
Process Manager Service Bus Data Services Data Federation
Oracle Data Integrator Oracle GoldenGate Oracle Data Quality
ELT/ETL Real-time Data Data Profiling
Data Transformation Log-based CDC Data Parsing
Bulk Data Movement Bi-directional Replication Data Cleansing
Data Lineage Data Verification Match and Merge
Storage Data Warehouse/ OLTP OLAP Cube Flat Files Web 2.0 Web and Event
Data Mart System Services, SOA
30
31. Key Data Integration Products
• Heterogeneous E-LT & ETL • OLAP Data Loading
• High-speed Transformations • Data Warehouse Loading
• Real Time Data Replication • DBMS High Availability
• Changed Data Capture • Disaster Tolerance
• Comprehensive Integration • Process Orchestration
• ELT/ETL for Bulk Data • Human Workflow
• Service Bus • Data Grid
• Data Service Modeling • Data Redaction
• Query Federation • Service Data Objects
• Business Data / Metadata • Time Series Reporting
• Statistical Analysis • Integrated Data Quality
• Cleansing & Parsing • High Performance
• De-duplication • Integrated w/ODI
31
32. Oracle Data Integrator Enterprise Edition
Optimized E-LT for improved Performance, Productivity and Lower TCO
Legacy
Sources
E-LT Transformation Any Data
vs. E-T-L Warehouse
Application
Sources Declarative Set-based design
Any
Change Data Capture Planning
System
OLTP DB Hot-pluggable Architecture
Sources
Pluggable Knowledge Modules
32
33. Oracle GoldenGate Overview
Enterprise-wide Solution for Real Time Data Needs
• Standardize on Single
Disaster Recovery,
Data Protection Standby Technology for Multiple Needs
(Open & Active)
• Deploy for Continuous
Zero Downtime Availability and Real-time Data
Migration and
Upgrades Access for Reporting / BI
Log Based, Real-
Time Change Data Operational
Capture Reporting
Reporting Database
OGG
ETL
ODS EDW
ETL
• Highly Flexible
Heterogeneous EDW
Source Systems
Real-time BI • Fast Deployments
• Lower TCO & Improved ROI
Query Offloading
Data Distribution
33
34. How Oracle GoldenGate Works
Modular De-Coupled Architecture
Capture: committed transactions are captured (and can be
filtered) as they occur by reading the transaction logs.
Trail: stages and queues data for routing.
Pump: distributes data for routing to target(s).
Route: data is compressed,
encrypted for routing to target(s).
Delivery: applies data with transaction
integrity, transforming the data as required.
Trail Trail
Capture Pump Delivery
LAN/WAN
Internet
TCP/IP
Source Target
Database(s) Bi-directional Database(s)
34
35. Govern Data Better with Data Quality
• Data Movement
• Data Profiling – E-LT & ETL
– Statistical Analysis – Data Transformation
– Rule-based Validation – Change Data Capture
– Monitoring & Timeslice – Data Access
– Fine-grained Auditing Data Movement – Data Services
Data Quality and Data Integration
Profiling
Data Cleansing
• Data Cleansing
• Data Validation during ETL
• Data Standardization
• Address Matching & Dedup
• Error Hospital / Workflow
35
37. Modern Data Integration Approach
Heterogeneous, Real-time, Non-Invasive, High Performance E-LT
Traditional ETL + CDC Modern E-LT + Real-time
• Invasive Capture on OLTP • Continuous feeds from
systems using complex Adapters operational systems
• Transformations in ETL engine • Non-invasive data capture
on expensive middle tier servers • Thin middle tier with
• Bulk load to the data warehouse transformations on the database
with large nightly/daily batch platform (target)
• Mini-batches throughout the day
Extract
or bulk processing nightly
Trickle
Agent
Agent
Xform Xform Bulk
Lookup Lookup
Data Data
Staging Load Heterogeneous
37
40. The preceeding is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
40