ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
Â
Ronalao termpresent
1. ETL and OLAP Cube Reporting Using the NetFlix OLTP Database By: Rona Charlene Lao
2. Introduction This project is about building a Data Warehouse database from the Netflix database from the first weekâs Assignment. Objectives: To provide an end to end solution to upload transactional data into the Data Warehouse. Provide dynamic reports for NetFlix showing various representations of their aggregated data based on Rental, Shipment, Payment and DVD Inventory. To demonstrate how OLAP is used to provide dynamic multidimensional reports.
3. Scope To create mock up data to be uploaded into the Data Warehouse To build a complete end to end ETL solution. Use of SQL*Loader, stored procedures and triggers to implement business transformation rules from Staging to Target Area. To create canned reports and demonstrate how Data Warehouses can provide Dynamic multidimensional reports
4. Out of Scope To build the OLTP database from scratch Code all business and functional rules related to Netflix data storage and operational requirements
7. Process Flow - Extract SQL Queries SQL Queries were ran against the NetFlix OLTP Database to extract the data for the dimension tables. The extracts were saved as CSV Files. SQL*Loader â This tool was used to upload the CSV Files into the Staging Area of the DW database. Stored Procedures â Used to extract data for the Member and DVD dimension tables and for the fact tables. Fact Tables stored procedures have two parameters, startdt and enddt.
9. Process Flow - Transform After the Stored Procedure for the DVD extract executes, the V_DVD materialized view gets refreshed (force) T_STAR_DIM, also gets automatically updated through a trigger once the STG_MOVIEPERSONROLE_DIM table gets populated. The T_STAR_DIM table is a denormalized version of the MOVIEPERSONROLE table T_MEMBER_DIM is also a denormalizedversion of a source table
10. Process Flow â Load The Stored Procedure, POP_TARGET_SP, moves the data from the Staging Area (STG_) to its corresponding table in the Target Area (T_) within the DW Database. Only takes the records that are not already in the Target Area. Ensures that there is only a subset of data that is run by the process while guaranteeing the preservation of historical data in the Target Fact Tables (T_*_F). Uses NOT IN statements to ensure that there is no duplication Listed in sequence to preserve and abide byintegrity constraints set up in the Target Area.
20. Incremental Load Created mock up data Performed CSV extracts Ran SQL*Loader Ran Stored Procedures for the population of the Staging Area Ran Stored Procedure for the population of the Target Area Refreshed Online Cubes Recreated Offline Cubes
21. Demo Please see the demo.avi file in the ronalao_term.zip file