Providing a real-time BI solution for its global customers and operations department is a necessity for IFPI, the International Federation of the Phonographic Industry, whose primary objective is to safeguard the rights of record producers through various anti-piracy strategies.
For the data warehousing team at IFPI, using Oracle Streams and Oracle Warehouse Builder (OWB) for real-time data replication and integration was becoming a challenge. The solution was difficult to maintain and overall throughput was degrading as data volumes increased. The need for greater stability and performance led IFPI to implement Oracle GoldenGate and Oracle Data Integrator.
Co-presented with Nick Hurt at Rittman Mead BI Forum 2014 and KScope14.
2. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Introduction
•Michael Rainey (Rittman Mead)
‣Principal Consultant - Oracle Data Integration expert
•Nick Hurt (IFPI)
‣Solutions Developer - working on Oracle solutions since 2002
•the International Federation of Photographic Societies
‣“the voice” of the recording industry worldwide
‣represents the interests of 1,300 record companies from across the
globe
‣acquired funding for OBIEE in 2010
‣required near real-time anti-piracy analytics
3. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Agenda
•IFPI data - the good, the challenging, the ugly
•Pre-upgrade
‣Environment
‣Challenges
•Overview of GoldenGate and Oracle Data Integrator
•Upgrade - planning, migration steps
•Post-upgrade results
•Closing remarks on real-time warehousing
4. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Challenges of IFPI data
•The good
‣Seek & destroy infringing URLs
•The challenging
‣Velocity - 1 mil+ upserts per day
‣Volatility depth - indefinite retrospective updates
‣Large wide product dimension - 12 million rows
•The ugly
‣Multiple redundant updates
‣Back-dated corrections
‣Multiple sources of information (data consistency & quality)
-heavy data cleansing - identifying duplicates
-inconsistencies (error-tolerant/error-correction ETL)
5. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Link Lifecycle
Deleted /
Matching
Cease
& Desist
Take-
down
Time
t0+tn t1 = t0+tn t2 = t1+tn
Link Correction Link Actioned Link Removed
Infringing
URL Detected
t0
Link Found
Optional eventsPrimary event
6. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Process Flow / Dataset
Event Detected
ETL
Cleansing
De-duping
Summaries
Dashboards
Fact table representation
Time Found Link New Unique File Unique Link Actioned Taken-down
4/10/14 2:50 PM www.4shared.com/rar/-6ebvl89/Justin_Bieber_-_All_Around_The.html 1 1 0 0
4/15/14 11:44 AM www.4shared.com/mp3/-2J4lahU/Nickel_Back_-_If_Everyone_Care.htm 1 1 0 0
4/15/14 2:50 PM www.4shared.com/rar/-6ebvl89 0 1 0 0
Time
7. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Process Flow / Dataset
Fact table representation
Time Found Link New Unique File Unique Link Actioned Taken-down
4/10/14 2:50 PM www.4shared.com/rar/-6ebvl89/Justin_Bieber_-_All_Around_The.html 1 1 1 1
4/15/14 11:44 AM www.4shared.com/mp3/-2J4lahU/Nickel_Back_-_If_Everyone_Care.htm 1 1 1 0
4/15/14 2:50 PM www.4shared.com/rar/-6ebvl89 0 1 1 1
4/15/14 11:01 PM www.4shared.com/mp3/-qXkFru8/Kanye_West__Jay-Z_Bingo_Player.html 1
Event Detected Summaries
Dashboards
Time
ETL upserts
12. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Oracle Data Integrator 11g
•Oracle’s strategic product for data integration
•Uses ELT (Extract, Load, Transform) approach
‣No middle ETL engine necessary
‣Uses the power of the target database to
perform transformations
•Supports heterogeneous data sources
•Declarative design - separation of business and
technical integration
•Data integrity controls create a “data firewall”
•Extensible through “Knowledge Modules”
13. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
ODI 11g Journalizing (CDC)
•Oracle Data Integrator Change Data Capture (CDC) delivered via Journalizing
‣Identify, capture, and deliver changes made to source data
‣Journalizing Knowledge Module (JKM) performs setup and creates infrastructure
•ODI CDC Framework
‣Capture Process - mechanism for capturing changed data from the source
database (Ex. Oracle GoldenGate)
‣Journals - tables (J$) hold references to changed records and the change type
(insert / update / delete)
‣Journalizing Views - (JV$, JV$D) provides access to changed data, used by IKM /
LKM in mappings
‣Subscribers - used to allow consumption of changed data at different intervals, for
multiple applications, etc.
14. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
GoldenGate and ODI Integration
•JKM Oracle to Oracle Consistent (OGG) Knowledge Module
‣ODI Metadata used to generate GoldenGate parameter files
(extract, pump, replicate) and
configuration files
‣Delivered with ODI
•ODI CDC Framework generated
‣Staging table - replicate of
source
‣J$ (journal) table - change rows
•Journalized data used in
transformations (via JV$ views)
15. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Migration Decisions /Upgrade Planning
•ODI Master repository location
•GoldenGate considerations
‣Installation and configuration (RAC is trickier)
‣Classic vs Integrated capture (requires EE for both source & target)
‣How to use it? Product built for migration and/or replication
‣Naming conventions
•OWB mappings to ODI interfaces
‣Various migration approaches
•Control, Monitoring & Alerting (no free lunch)
•Testing & Go-live approach
16. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Migration Steps
•Migrate OLTP applications to RAC
‣GoldenGate RAC target kept in-sync during application migration
•Performance tuning & ODI KM Modifications
‣Retain existing CDC framework objects when adding new tables
‣Update column mapping in replicat
‣Remove unnecessary code in Integration Knowledge Module
•Generate GoldenGate extract, pump and replicat
‣ODI Journalizing Knowledge Module
‣Source definitions file recommended
•Migrate OWB mappings to ODI interfaces
‣3Rs: re-assess, replicate and refine existing mappings
•Test the migration
‣Run both systems in parallel and compare results
‣Trends, aggregates, row counts
19. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Control, Alerting and Monitoring
•GoldenGate status and lag
•ODI Agent monitoring
•ETL throughput / health: ODI session tables
•Enterprise Manager job scheduler to control ETL process
•Monitoring dashboard
!
!
31. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Upgrade Results
•Reduced lag
‣From 5-15 minutes to <1 minute
•Stabilised fact mapping with equivalent load volumes
‣Pre-upgrade 2 mins - hours
‣Post-upgrade 10 - 25 seconds
•Reduced ETL downtime
‣2+ days p/m to minutes p/m
•Simpler to extend tables under CDC
•Purging audit information <1 hour rather than days
32. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Upgrade Effects
•Faster troubleshooting & diagnosis times
•Shorter maintenance & development times
•Focus on performance and streamlining processes
•Investigation into excessive redo volumes
•MDM project kick-off
•Contemplation of The Reference Architecture…
33. www.rittmanmead.com inquiries@rittmanmead.com @rittmanmead
Reference Architecture - Hybrid Layer
•Staging Data Layer
‣Buffers reception for right-time distribution
‣Apply business rules to make the data clean, consistent and complete
‣Retain rejected data for manual/automatic correction
•Performance Layer
‣Dimensional model - star schema
‣Permanent & non-volatile data (traditionally speaking)
•Hybrid Layer
‣Caters for deeply volatile data by persisting historic and real-time facts
‣Combines elements of staging and performance layers
‣Facilitates agile de-coupled ETL processes