1. Introduction to Data Warehousing
December 20, 2012
Tameem Ahmad
M.Tech. (F)
ZHCET, AMU, Aligarh
2. References:
• “Building Data Warehouse” by Inmon (Third Edition), New
York: John Wiley & Sons, (2002)
• “Data Mining: Concepts and Techniques” by Han,Kamber.
2000
• http://www.data-warehouse-online.com/ [Accessed: November
4, 2012]
• Data Warehousing Battle of the Giants: Comparing the Basics of the
Kimball and Inmon Models: by Mary Breslin
http://www.bibestpractices.com/view-articles/4768
12/27/2012 Tameem Ahmad 2
3. Plan for the Presentation
• Necessity of Data Warehousing. (Why it is needed?)
• What is Data Warehousing?
• Architecture
• Schema
• How to build Data Warehouse (components)
• Data Warehousing Tools
12/27/2012 Tameem Ahmad 3
4. ? ? ? ?
Necessity is the mother of invention…
Why Data Warehouse?
12/27/2012 Tameem Ahmad 4
5. Scenario
• ABC Pvt Ltd is a company with branches
at Mumbai, Delhi, Chennai and Banglore.
The Sales Manager wants quarterly sales
report. Each branch has a separate
operational system.
12/27/2012 Tameem Ahmad 5
6. Scenarion: ABC Pvt. Ltd.
Mumbai
Delhi
Sales per item type per branch Sales
for first quarter. Manager
Chennai
Banglore
12/27/2012 Tameem Ahmad 6
6
7. Solution: ABC Pvt. Ltd.
Extract sales information from each
database.
Store the information in a common
repository at a single site.
12/27/2012 Tameem Ahmad 7
8. Solution: ABC Pvt. Ltd.
Mumbai
Report
Delhi
Query & Sales
Data Analysis tools Manager
Warehouse
Chennai
Banglore
12/27/2012 Tameem Ahmad 8
9. Data Warehousing…
• Definition
A data warehouse is
» -subject-oriented,
» -integrated,
» -time-variant,
» -nonvolatile
collection of data in support of management’s decision
making process.
12/27/2012 Tameem Ahmad 9
10. Subject-oriented
• Data warehouse is organized around subjects such as
sales, product, customer.
• It focuses on modeling and analysis of data for decision
makers.
• Excludes data not useful in decision support process.
12/27/2012 Tameem Ahmad 10
11. Integration
• Data Warehouse is constructed by integrating multiple
heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.
RDBMS
Legacy Data
System Warehouse
Data Processing
Flat File
Data
Transformation
12/27/2012 Tameem Ahmad 11
12. Time-variant
• Provides information from historical perspective e.g. past 5-
10 years
12/27/2012 Tameem Ahmad 12
13. Nonvolatile
• Data once recorded cannot be updated.
• Data warehouse requires two operations
in data accessing
– Initial loading of data
– Access of data
load access
12/27/2012 Tameem Ahmad 13
15. Data Warehousing Architecture (Contt…)
• Data Warehouse server
• almost always a relational DBMS, rarely flat files
• OLAP servers
• to support and operate on multi-dimensional data
structures
• Clients
• Query and reporting tools
• Analysis tools
• Data mining tools
12/27/2012 Tameem Ahmad 15
17. Measures & Dimensions
• Measure – Units sold, Amount.
• Dimensions – Product, Time, Region
12/27/2012 Tameem Ahmad 17
18. Star Schema
• A single, large and central fact table and one table for each
dimension.
• Every fact points to one tuple in each of the dimensions
and has additional attributes.
• Does not capture hierarchies directly.
12/27/2012 Tameem Ahmad 18
19. Star Schema (Contt…)
Fact Table
Store Dimension Time Dimension
Store Key
Store Key Product Key Period Key
Store Name Period Key Year
City Units Quarter
State Price Month
Region
Product Key
Product Desc
Product Dimension
Benefits: Easy to understand, easy to define hierarchies, reduces no. of
physical joins.
12/27/2012 Tameem Ahmad 19
20. Snowflake Schema
• Variant of star schema model.
• A single, large and central fact table and one or more tables
for each dimension.
• Dimension tables are normalized i.e. split dimension table
data into additional tables
12/27/2012 Tameem Ahmad 20
21. Snowflake Schema (Contt…)
Store Dimension Fact Table Time Dimension
Store Key Period Key
Store Key
Product Key Year
Store Name
Period Key Quarter
City Key Units Month
Price
City Dimension
City Key
Product Key
City
Product Desc
State
Region Product Dimension
Drawbacks: Time consuming joins,report generation slow
12/27/2012 Tameem Ahmad 21
22. Building the Data Warehouse
• Data Selection
• Data Pre-processing
– Fill missing values
– Remove inconsistency
• Data Transformation & Integration
• Data Loading
Data in warehouse is stored in form of fact tables and dimension
tables.
12/27/2012 Tameem Ahmad 22
23. Data Warehousing Tools
• Data Warehouse
– SQL Server 2000 DTS
– Oracle 8i Warehouse Builder
• ETL tools
– Ab Initio
– Informatica
• Reporting tools
• OLAP tools −MS Excel Pivot Chart
– SQL Server Analysis −VB Applications
Services −cognos,
– Oracle Express Server −Microstrategy,
−Hyperion
12/27/2012 Tameem Ahmad 23