SlideShare ist ein Scribd-Unternehmen logo
1 von 135
®
IBM Software Group
© 2007 IBM Corporation
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
3
06/15/14 TCS Confidential 3
IBM Software Group | WebSphere software
4
Course Roadmap
• Why we use Data warehousing
• Difference between Operational System and Data Warehouse
• Introduction to Data warehousing
• Data Warehousing Approaches
• Data Warehouse Technical Architecture
• Data Modelling concepts
• Operational Data Store
• Schema Design of Data warehouse
• Data Acquisation
• ETL Products
• Project Life Cycle
IBM Software Group | WebSphere software
5
Why We Need Data Warehousing ?
 Better business intelligence for end-users
 Reduction in time to locate, access, and analyze information
 Consolidation of disparate information sources
 To Store Large Volumes of Historical Detail Data from Mission Critical Applications
 Strategic advantage over competitors
 Faster time-to-market for products and services
 Replacement of older, less-responsive decision support systems
 Reduction in demand on IS to generate reports
IBM Software Group | WebSphere software
6
What is an Operational System?
 Operational systems are just what their name implies; they are the systems that
help us run the day-to-day enterprise operations.
 These are the backbone systems of any enterprise, such as order entry inventory
etc.
 The classic examples are airline reservations, credit-card authorizations, and ATM
withdrawals etc.,
IBM Software Group | WebSphere software
7
Characteristics of Operational Systems
• Continuous availability
• Predefined access paths
• Transaction integrity
• Volume of transaction - High
• Data volume per query - Low
• Used by operational staff
• Supports day to day control operations
• Large number of users
IBM Software Group | WebSphere software
8
OLTP Vs Data Warehouse
Operational System Data Warehouse
Transaction Processing Query Processing
Predictable CPU Usage Random CPU Usage
Time Sensitive History Oriented
Operator View Managerial View
Normalized Efficient
Design for TP
Denormalized Design for
Query Processing
IBM Software Group | WebSphere software
9
OLTP Vs Warehouse
Operational System Data Warehouse
Designed for Atmocity,
Consistency, Isolation and
Durability
Designed for quite or static
database
Organized by transactions
(Order, Input, Inventory)
Organized by subject
(Customer, Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent
users
Volatile Data Non Volatile Data
IBM Software Group | WebSphere software
10
Operational System Data Warehouse
Stores all data Stores relevant data
Performance Sensitive Less Sensitive to performance
Not Flexible Flexible
Efficiency Effectiveness
IBM Software Group | WebSphere software
11
What is a Data Warehouse ?
 Data WarehouseData Warehouse is a
 Subject-Oriented
 Integrated
 Time-Variant
 Non-volatile
WH Inmon - Regarded As Father Of Data WarehousingWH Inmon - Regarded As Father Of Data Warehousing
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
13
13
Subject Oriented Analysis
Data Warehouse StorageTransactional Storage
SalesSales
CustomersCustomers
ProductsProducts
Entry
Sales Rep
Quantity Sold
Part Number
Date
Customer Name
Product Description
Unit Price
Mail Address
Process Oriented Subject Oriented
IBM Software Group | WebSphere software
14
14
Integration of Data
Data Warehouse StorageTransactional Storage
Appl. A - M, F
Appl. B - 1, 0
Appl. C - X, Y
Appl. A - pipeline cm.
Appl. B - pipeline inches
Appl. C - pipeline mcf
Appl. A - balance dec(13,2)
Appl. B - balance PIC 9(9)V99
Appl. C - balance float
Appl. A - bal-on-hand
Appl. B - current_balance
Appl. C - balance
Appl. A - date (Julian)
Appl. B - date (yymmdd)
Appl. C - date (absolute)
M, F
pipeline cm
balance dec(13, 2)
balance
date (Julian)
Integration
Encoding
Unit of
Attributes
Physical
Attributes
Naming
Conventions
Data
Consistency
IBM Software Group | WebSphere software
15
15
Load
Access
Mass Load / Access of DataRecord-by-Record Data Manipulation
Insert
Access
Insert
Change
Delete
Change
Volatile Non-Volatile
Volatility of Data
Data Warehouse StorageTransactional Storage
IBM Software Group | WebSphere software
16
16
Time Variant Data Analysis
Data Warehouse StorageTransactional Storage
Current Data Historical Data
0
5
10
15
20
Sales ( in lakhs
)
January February March
Year97
Sales ( Region , Year - Year 97 - 1st Qtr)
East
West
North
IBM Software Group | WebSphere software
Load/
Update
Consistent Points in Time
Updated constantly
Data changes according to
need, not a fixed schedule
Added to regularly, but loaded data
is rarely directly changed
Does NOT mean the Data
warehouse is never updated or
never changes!!
Constant Change
Operational systems
Database
Data warehouse
Datawarehouse- Differences from Operational
Systems
Insert
Insert
Update
Initial Load
Incremental Load
Incremental Load
Update
Delete
IBM Software Group | WebSphere software
18
Difference B/W OLTP AND OLAP
IBM Software Group | WebSphere software
19
DW Implementation Approaches
 Top Down
 Bottom-up
 Combination of both
 Choices depend on:
current infrastructure
resources
architecture
ROI
Implementation speed
IBM Software Group | WebSphere software
20
Heterogeneous Source Systems
Staging
Common Staging interface Layer
EDW- “Top Down”Approach
Data mart bus architecture Layer
Enterprise Datawarehouse
Source
1
Source
2
Source
3
Incremental Architected data marts
DM 1 DM 3DM 2
IBM Software Group | WebSphere software
21
Heterogeneous Source Systems
Staging
Common Staging interface Layer
EDW- “Bottom up”Approach
Data mart bus architecture Layer
Source
1
Source
2
Source
3
Incremental Architected data marts
DM 1 DM 3DM 2
Enterprise Datawarehouse
IBM Software Group | WebSphere software
22
Source System Data Staging Area Presentation Area
Services:
Transform from
source-to-Target
Maintain Conform
Dimensions
No user query
support
Data Store:
Flat files or
relational tables
Design Goals:
Staging
Throughput
integrity/
consistency
Load
Access
Ad Hoc Query Tools
Report Writers
Analytic Applications
Modeling:
Forecasting
Scoring
Data
Mining
Data Mart #1
Dimensional
Atomic AND
summery data
Business
Process Centric
Design Goals:
Easy-of -use
Query
Performance
Data Mart #2
Data Mart #.....
Data Mart Bus:
Conformed facts and dims
Extract
Extract
Extract
Data Access Tools
Independent Data Marts: Ralph Kimball’s Ideology
Ralph Kimball’ Approach
IBM Software Group | WebSphere software
23
•E/R Design or Flat File
•Retain History Needed
for
regular processing
•No end user access
• Dimensional
•Transaction &
Summary data
•Data Mart Single
subject area
(i.e. Fact table)
•Multiple Marts May
exist in a
Single Database
Instance
Bottom Up Approach
Staging Data Store
Data Warehouse
Data Mart Data Mart Data Mart
Data Mart Data MartData Mart
•Integrated Data
•Timely User Access
•Conformed Dimensions
•Single Process to
Build Dimension
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
25
Bill Inmon’ Approach
Source
System
Data Staging
Area
Presentation
Area
“Enterprise Data
Warehouse”
Normalized
tables
Atomic Data
User query
support to
atomic data
Extract
Extract
Extract
Load
Data Mart #1
Dimensional
summery data
Departmental
Centric
Access
Access
Data Access
Tools
Data Mart #2
Data Mart #...
ETL
Dependent Data Marts: Bill Inmon’s Ideology
DWH
IBM Software Group | WebSphere software
26
Top Down Approach
• Raw Input Data
• E/R Model
• Subject Areas
• Transaction Level Detail
• Historical Persistency As justified- Archive
for Retrieval if Needed
• Most are dimensional
• Data Mart Design by Business
Function
• Summary Level Data
•
Data Mart Data Mart
Staging Data Store
Data
Warehouse
Data Mart
Data
Mart
Flat
File
•Integrated Data
•Timely user Access
•Single Process to build dimension
IBM Software Group | WebSphere software
27
DW Implementation Approaches
Top Down
 More planning and design initially
 Involve people from different work-
groups, departments
 Data marts may be built later from
Global DW
 Overall data model to be decided up-
front
Bottom Up
 Can plan initially without waiting for
global infrastructure
 built incrementally
 can be built before or in parallel with
Global DW
 Less complexity in design
IBM Software Group | WebSphere software
28
DW Implementation Approaches
Top Down
 Consistent data definition and
enforcement of business rules across
enterprise
 High cost, lengthy process, time
consuming
 Works well when there is centralized IS
department responsible for all H/W and
resources
Bottom Up
 Data redundancy and
inconsistency between data marts
may occur
 Integration requires great planning
 Less cost of H/W and other
resources
 Faster pay-back
IBM Software Group | WebSphere software
29
29
DW Architectures
IBM Software Group | WebSphere software
30
Prod
Mkt
HR
Fin
Acctg
Data Sources
Transaction Data
IBM
IMS
VSAM
Oracle
Sybase
ETL Software Data Stores Data Analysis
Tools and
Applications
Users
Other Internal Data
ERP SAP
Clickstream Informix
Web Data
External Data
Demographic Harte-
Hanks
S
T
A
G
I
N
G
A
R
E
A
O
P
E
R
A
T
I
O
N
A
L
D
A
T
A
S
T
O
R
E
Ascential
Extract
Sagent
SAS
Clean/Scrub
Transform
Firstlogic
Load
DATASTAGE
Data Marts
Teradata
IBM
Data
Warehouse
Meta
Data
Finance
Marketing
Sales
Essbase
Microsoft
ANALYSTS
MANAGERS
EXECUTIVES
OPERATIONAL
PERSONNEL
CUSTOMERS/
SUPPLIERS
SQL
Cognos
SAS
Queries,Reporting,
DSS/EIS,
Data Mining
Micro Strategy
Siebel
Business
Objects
Web
Browser
IBM Software Group | WebSphere software
31
Benefits of DWH
To formulate effective business, marketing
and sales strategies.
To precisely target promotional activity.
To discover and penetrate new markets.
To successfully compete in the marketplace
from a position of informed strength.
To build predictive rather than retrospective models.
IBM Software Group | WebSphere software
32
Data Modeling
IBM Software Group | WebSphere software
33
Data Modeling
 WHAT IS A DATA MODEL?
A data model is an abstraction of some aspect of the real
world (system).
 WHY A DATA MODEL?
• Helps to visualize the business
• A model is a means of communication.
• Models help elicit and document requirements.
• Models reduce the cost of change.
• Model is the essence of DW architecture based on which
DW will be implemented
IBM Software Group | WebSphere software
34
STEPS in DATA MODELING
Problem & scope definition
Requirement Gathering
Analysis
Logical Database Design
Deciding Database
Physical Database design
Schema Generation
IBM Software Group | WebSphere software
35
Levels of modeling
 Conceptual modeling
Describe data requirements from a
business point of view without technical
details
 Logical modeling
Refine conceptual models
Data structure oriented, platform
independent
 Physical modeling
Detailed specification of what is physically
implemented using specific technology
IBM Software Group | WebSphere software
36
Modeling Techniques
 Entity-Relationship Modeling
Traditional modeling technique
Technique of choice for OLTP
Suited for corporate data warehouse
 Dimensional Modeling
Analyzing business measures in the specific business context
Helps visualize very abstract business questions
End users can easily understand and navigate the data
structure
IBM Software Group | WebSphere software
37
 Relationship
Relationship between entities - structural interaction
and association
described by a verb
Cardinality
 1-1
 1-M
 M-M
Example : Books belong to Printed Media
Entity-Relationship Modeling - Basic Concepts
IBM Software Group | WebSphere software
38
Entity-Relationship Modeling - Basic Concepts
 Attributes
Characteristics and properties of entities
Example :
 Book Id, Description, book category are
attributes of entity “Book”
Attribute name should be unique and self-
explanatory
Primary Key, Foreign Key, Constraints are defined
on Attributes
IBM Software Group | WebSphere software
Review of Logical Modeling Terms & Symbols
 Entities define specific groups of information
Sales Organization
Sales Org ID
Distribution Channel
Entity
IBM Software Group | WebSphere software
Review of Logical Modeling Terms & Symbols
 One or more attribute uniquely identifies an instance of an
entity
Sales Organization
Sales Org ID
Distribution Channel
Identifier
IBM Software Group | WebSphere software
Review of Logical Modeling Terms & Symbols
 The logical model identifies relationships between
entities
Sales Detail
Sales Record ID
Sales Rep
Sales Rep ID
Relationship
{
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
Logical Data Model
Sales Detail
Sales Record ID
Customer
Customer ID
Product
Product SKU
Suppliers
Supplier ID
Manufacturing Group
Manufacturing Org ID
Factory
Factory ID
Sales Organization
Sales Org ID
Distribution Channel
Sales Rep
Sales Rep ID
Retail
Market
Product Sales Plan
Plan ID
Wholesale
Industry
IBM Software Group | WebSphere software
44
44
Examples: ER Model
IBM Software Group | WebSphere software
45
Limitations of E-R Modeling
 Poor Performance
 Tend to be very complex and difficult to navigate.
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
47
47
Dimensional Modeling
IBM Software Group | WebSphere software
48
Dimensional Modeling
 Dimensional modeling uses three basic concepts : measures,
facts, dimensions.
 Is powerful in representing the requirements of the business
user in the context of database tables.
 Focuses on numeric data, such as values counts, weights,
balances and occurences.
IBM Software Group | WebSphere software
49
What is a Facts
 A fact is a collection of related data items, consisting of measures
and context data.
 Each fact typically represents a business item, a business
transaction, or an event that can be used in analyzing the business
or business process.
 Facts are measured, “continuously valued”, rapidly changing
information. Can be calculated and/or derived.
 Granularity
The level of detail of data contained in the data warehouse
e.g. Daily item totals by product, by store
IBM Software Group | WebSphere software
50
Types of Facts
 Additive
Able to add the facts along all the dimensions
Discrete numerical measures eg. Retail sales in $
 Semi Additive
Snapshot, taken at a point in time
Measures of Intensity
Not additive along time dimension eg. Account balance, Inventory
balance
Added and divided by number of time period to get a time-average
 Non Additive
Numeric measures that cannot be added across any dimensions
Intensity measure averaged across all dimensions eg. Room
temperature
Textual facts - AVOID THEM
IBM Software Group | WebSphere software
51
Dimensions
 A dimension is a collection of members or units of the same type
of views.
 Dimensions determine the contextual background for the facts.
 Dimensions represent the way business people talk about the
data resulting from a business process, e.g., who, what, when,
where, why, how
IBM Software Group | WebSphere software
52
52
Dimensional Hierarchy
World
America AsiaEurope
USA
FL
Canada Argentina
GA VA CA WA
TampaMiami Orlando Naples
Continent Level
State Level
City Level
World Level
Country Level
ParentRelation
Dimension Member /
Business Entity
Geography Dimension
Attributes: Population, Tourist’s Place
IBM Software Group | WebSphere software
53
Dimensions Types
 Conformed Dimension
 Junk Dimension
 Fast Changing Dimension
 Role Playing Dimension
 ‘Garbage’ Dimension
 Slowly Changing Dimension
 Degenerated Dimension
53
IBM Software Group | WebSphere software
54
What is a Slowly Changing Dimension?
 Although dimension tables are typically static lists, most dimension tables do change over
time.
 Since these changes are smaller in magnitude compared to changes in fact tables, these
dimensions are known as slowly growing or slowly changing dimensions.
IBM Software Group | WebSphere software
55
Slowly Changing Dimension -Classification
Slowly changing dimensions are classified into three different
types
 TYPE I
 TYPE II
 TYPE III
IBM Software Group | WebSphere software
56
Slowly Changing Dimensions Type I
Shane
Name
Shane@xyz.com1001
EmailEmp id
Shane
Name
Shane@xyz.com1001
EmailEmp id
Shane
Name
Shane@
abc.co.in
1001
EmailEmp id
Shane
Name
Shane@
abc.co.in
1001
EmailEmp id
Source
Source Target
Target
Shane@
xyz.com
IBM Software Group | WebSphere software
57
Slowly Changing Dimensions Type II
Shane
Name
Shane@xyz.com10
EmailEmp id
Shane@x
yz.
com
Email
Shane
Name
10
Emp id
1000
PM_PRI
MARYK
EY
0
PM_VER
SION_N
UMBER
Source Target
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
59
Slowly Changing Dimensions -Versioning
Shane
Name
Shane@
abc.co.in
10
EmailEmp id
Source
Target
0Shane@
xyz.com
Shane101000
1Shane@
abc.co.in
Shane101001
EmailNameEmp idPM_PRIMA
RYKEY
PM_VERSION_NUMBER
IBM Software Group | WebSphere software
60
Slowly Changing Dimensions -Versioning
Shane
Name
Shane@
abc.com
10
EmailEmp id
Source
Target
1Shane@
abc.co.in
Shane101001
2Shane@
abc.com
Shane101003
0Shane@
xyz.com
Shane101000
EmailNameEmp idPM_PRIM
ARYKEY
PM_VERSION_NUM
BER
IBM Software Group | WebSphere software
61
Slowly Changing Dimensions Type II -
Flag
Shane
Name
Shane@xyz.com10
EmailEmp id
Shane@
xyz.
com
Email
Shane
Name
10
Emp id
1000
PM_PR
IMAR
YKEY
Y
PM_CUR
RENT_FL
AG
Source
Target
IBM Software Group | WebSphere software
62
Slowly Changing Dimensions - Flag Current
Shane
Name
Shane@
abc.co.in
10
EmailEmp id
Source
Target
NShane@
xyz.com
Shane101000
YShane@
abc.co.in
Shane101001
EmailNameEmp idPM_PRIMA
RYKEY
PM_CURRENT_FLAG
IBM Software Group | WebSphere software
63
Slowly Changing Dimensions - Flag Current
Shane
Name
Shane@
abc.com
10
EmailEmp id
Source
Target
NShane@
abc.co.in
Shane101001
YShane@
abc.com
Shane101003
NShane@
xyz.com
Shane101000
EmailNameEmp idPM_PRIMA
RYKEY
PM_CURRENT_FLAG
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
65
Slowly Changing Dimensions Type II
Shane
Name
Shane@xyz.c
om
10
EmailEmp id
01/01/00
PM_BEG
IN_DAT
E
Shane@x
yz.com
Email
Shane
Name
10
Emp id
1000
PM_PRI
MARYK
EY
PM_EN
D_DATE
Source
Target
IBM Software Group | WebSphere software
66
Slowly Changing Dimensions -Effective Date
Shane
Name
Shane@
abc.co.in10
Email
Emp id
Source
Target
03/01/00
01/01/00
PM_BEGIN_D
ATE
03/01/00Shane@x
yz.com
Shane101000
Shane@
abc.co.in
Shane101001
EmailNameEmp idPM_PRIMAR
YKEY
PM_END_D
ATE
IBM Software Group | WebSphere software
67
Slowly Changing Dimensions - Effective Date
Shane
Name
Shane@
abc.com10
EmailEmp id
Source
Target
05/02/00
03/01/00
01/01/00
PM_BEGIN_D
ATE
05/02/00Shane@
abc.co.in
Shane101001
Shane@
abc.com
Shane101003
03/01/00Shane@
xyz.com
Shane101000
EmailNameEmp idPM_PRIM
ARYKEY
PM_END_DA
TE
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
69
Slowly Changing Dimensions Type III
Shane
Name
Shane@xyz.c
om
10
EmailEmp id
PM_Prev_
Column
Name
Shane@xyz.
com
Email
Shane
Name
10
Emp id
1
PM_PRI
MARYKE
Y
01/01/00
PM_EFFEC
T_DATE
Source
Target
IBM Software Group | WebSphere software
70
Slowly Changing Dimensions Type III
Shane
Name
Shane@
abc.co.in10
EmailEmp id
Source
Target
Shane@xyz.co
m
PM_Prev_Colu
mnName
01/02/00Shane@
abc.co.in
Shane101
EmailNameEmp idPM_PRIMAR
YKEY
PM_EFFEC
T_DATE
IBM Software Group | WebSphere software
71
Slowly Changing Dimensions Type III
Shane
Name
Shane@
abc.com10
EmailEmp id
Source
Target
Shane@
abc.co.in
PM_Prev_Colu
mnName
01/03/00Shane@
abc.com
Shane101
EmailNameEmp idPM_PRIM
ARYKEY
PM_EFFECT_
DATE
IBM Software Group | WebSphere software
72
Degenerate Dimension
 Dimension keys in fact table without corresponding dimension tables are
called Degenerate Dimensions
 Purpose of Degenerate Dimensions
1. Generally used when each record in fact represents transaction line item
2. Useful for grouping transaction line items belonging to a single
transaction
IBM Software Group | WebSphere software
73
Fast Changing Dimension
A fast changing dimension is a dimension whose attribute or
attributes for a record (row) change rapidly over time.
1. Example: Age of associates, Income, Daily balance etc.
2. Technique to handle fast changing dimension: Create band
tables
IBM Software Group | WebSphere software
74
Role Playing Dimension
A single dimension which is expressed differently in a fact table using
views is called a role-playing dimension. This can be achieved by
creating views on dimension table.
IBM Software Group | WebSphere software
75
Conformed Dimension
A conformed dimension means the same thing to
each fact table to which it can be joined.
Typically, dimension tables that are referenced or are
likely to be referenced by multiple fact tables
(multiple dimensional models) are called conformed
dimensions
.
IBM Software Group | WebSphere software
76
Conformed Dimension Option #1
 Identical dimensions with same keys, labels, definitions and Values
Sales Schema
Inventory Schema
SALES Facts
DATE KEY
PRODUCT KEY
STORE KEY
PROMO KEY
Product Desc
Brand Desc
Category Desc
PRODUCT KEY
INVENTORY
Facts
DATE KEY
PRODUCT KEY
STORE KEY
Product Desc
Brand Desc
Category Desc
PRODUCT KEY
IBM Software Group | WebSphere software
77
Conformed Dimension Option #2
Subset of base dimension with common labels, definitions
and values
Sales
Schema
Forecast
Schema
SALES $
DATE KEY
PRODUCT KEY
STORE KEY
PROMO KEY
Product Desc
Brand Desc
Category Desc
PRODUCT KEY
DATE KEY
Day-of-week
Week Desc
Month Desc
SALES $
MONTH KEY
BRAND KEYBrand Desc
Category Desc
BRAND KEY MONTH KEY
Month Desc
BRAND KEY Brand Desc Category Desc
12345 Cherriors Cereal
PROD KEY Prod Desc Brand Desc Category Desc
12345 Cherriors 10 Cherriors Cereal
IBM Software Group | WebSphere software
78
‘Garbage’ Dimension
A garbage dimension is a dimension that consists of low-cardinality columns
such as codes, indicators, and status flags.
Approach to handle Garbage dimension:
• Put the new attributes into existing dimension tables.
• Put the new attributes into the fact table.
• Create new separate dimension tables garbage dimension
• Create a separate ‘Garbage Dimension’ table
IBM Software Group | WebSphere software
79
Junk Dimensions
 Whether to use junk dimension
5 indicators, each has 3 values -> 243 (35
) rows
5 indicators, each has 100 values -> 100 million (1005
) rows
 When to insert rows in the dimension
IBM Software Group | WebSphere software
80
Factless Fact Tables
The two types of factless fact tables are:
 Coverage tables
 Event tracking tables
IBM Software Group | WebSphere software
81
Factless Fact Tables - Coverage Tables
Coverage tables are required when a primary fact
table is sparse
Example: Tracking products in a store that did not sell
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
83
Factless Fact Tables - Event Tracking
These tables are used for tracking a event:
Example: Tracking student attendance
IBM Software Group | WebSphere software
84
Fact Constellation
 Fact constellations: Multiple fact tables share dimension tables,viewed as
a collection of stars, therefore called galaxy schema or fact constellation
IBM Software Group | WebSphere software
85
What is a Data mart?
 Data mart is a decentralized subset of data found either in a data warehouse
or as a standalone subset designed to support the unique business unit
requirements of a specific decision-support system.
 Data marts have specific business-related purposes such as measuring the
impact of marketing promotions, or measuring and forecasting sales
performance etc,.
Data Mart
Data Mart
Enterprise
Data Warehouse
IBM Software Group | WebSphere software
86
Data marts - Main Features
Main Features:
 Low cost
 Controlled locally rather than centrally, conferring power on the user group.
 Contain less information than the warehouse
 Rapid response
 Easily understood and navigated than an enterprise data warehouse.
 Within the range of divisional or departmental budgets
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
88
Datamart Advantages :
 Typically single subject area and fewer dimensions
 Limited feeds
 Very quick time to market (30-120 days to pilot)
 Quick impact on bottom line problems
 Focused user needs
 Limited scope
 Optimum model for DW construction
 Demonstrates ROI
 Allows prototyping
Advantages of Datamart over Datawarehouse
IBM Software Group | WebSphere software
89
Data Mart disadvantages :
• Does not provide integrated view of business information.
• Uncontrolled proliferation of data marts results in redundancy
• More number of data marts complex to maintain
• Scalability issues for large number of users and increased data volume
Disadvantages of Data Mart
IBM Software Group | WebSphere software
90
90
Data marts
• Embedded data marts are marts that are stored within
the central DW. They can be stored relationally as files or
cubes.
• Dependent data marts are marts that are fed directly by
the DW, sometimes supplemented with other feeds, such
as external data.
• Independent data marts are marts that are fed directly
by external sources and do not use the DW.
DM - Types
®
IBM Software Group
© 2007 IBM Corporation
The Operational Data Store
IBM Software Group | WebSphere software
92
IBM Software Group | WebSphere software
93
Why We Need Operational Data Store?
Need
 To obtain a “system of record” that contains the best data that
exists in a legacy environment as a source of information
 Best here implies data to be
Complete
Up to date
Accurate
 In conformance with the organization’s information model
IBM Software Group | WebSphere software
 ODS data resolves data integration issues
 Data physically separated from production
environment to insulate it from the processing
demands of reporting and analysis
 Access to current data facilitated.
Operational Data Store - Insulated from OLTP
Tactical
Analysis
OLTP Server
ODS
IBM Software Group | WebSphere software
95
 Detailed data
 Records of Business Events
(e.g. Orders capture)
 Data from heterogeneous sources
 Does not store summary data
 Contains current data
Operational Data Store - Data
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
97
ODS- Benefits
 Integrates the data
 Synchronizes the structural differences in data
 High transaction performance
 Serves the operational and DSS environment
 Transaction level reporting on current data
Flat
files
Relational
Database
Operational
Data Store
60,5.2,”JOHN”
72,6.2,”DAVID”
Excel files
IBM Software Group | WebSphere software
 Update schedule - Daily or less
time frequency
 Detail of Data is mostly between
30 and 90 days
 Addresses operational needs
 Weekly or greater time frequency
 Potentially infinite history
 Address strategic needs
Operational Data Store- Update schedule
ODS
Data
Data warehouse
Data
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
100
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant
within system;
Unmanaged
redundancy among
systems
Somewhat
redundant with
operational
databases
Managed
redundancy
Data stability Dynamic Somewhat dynamic Static
Data update Field by field Field by field Controlled batch
Data usage Highly structured,
repetitive
Somewhat
structured, some
analytical
Highly
unstructured,
heuristic or
analytical
Database size Moderate Moderate Large to very large
Database
structure stability
Stable Somewhat stable Dynamic
IBM Software Group | WebSphere software
101
Star Schema Design
Single fact table surrounded by denormalized dimension
tables
The fact table primary key is the composite of the
foreign keys (primary keys of dimension tables)
Fact table contains transaction type information.
Many star schemas in a data mart
Easily understood by end users, more disk storage
required
IBM Software Group | WebSphere software
102
EXAMPLE OF STAR SCHEMA
IBM Software Group | WebSphere software
103
Snowflake Schema
Single fact table surrounded by normalized dimension
tables
Normalizes dimension table to save data storage space.
When dimensions become very very large
Less intuitive, slower performance due to joins
 May want to use both approaches, especially if supporting multiple
end-user tools.
IBM Software Group | WebSphere software
104
Example of Snow flake schema
IBM Software Group | WebSphere software
105
Snowflake - Disadvantages
 Normalization of dimension makes it difficult for user to
understand
 Decreases the query performance because it involves more
joins
 Dimension tables are normally smaller than fact tables - space
may not be a major issue to warrant snowflaking
IBM Software Group | WebSphere software
106
Data Acquisation
 Data Extraction
 Data Transformation
 Data Loading
106
IBM Software Group | WebSphere software
107
Tool Category Products
ETL Tools ETI Extract, Informatica, IBM Visual Warehouse
Oracle Warehouse Builder
OLAP Server Oracle Express Server, Hyperion Essbase, IBM DB2
OLAP Server, Microsoft SQL Server OLAP Services,
Seagate HOLOS, SAS/MDDB
OLAP Tools Oracle Express Suite, Business Objects, Web
Intelligence, SAS, Cognos Powerplay/Impromtu,
KALIDO, MicroStrategy, Brio Query, MetaCube
Data Warehouse Oracle, Informix, Teradata, DB2/UDB, Sybase,
Microsoft SQL Server, RedBricks
Data Mining &
Analysis
SAS Enterprise Miner, IBM Intelligent Miner,
SPSS/Clementine, TCS Tools
Representative DW Tools
IBM Software Group | WebSphere software
108
ETL PRODUCTS
 CODE BASED ETL TOOLS
 GUI BASED ETL TOOLS
108
IBM Software Group | WebSphere software
109
CODE BASED ETL TOOLS
 SAS ACCESS
 SAS BASE
 TERADATA ETL TOOLS
1. BTEQ
2. TPUMP
3. FAST LOAD
4. MULTI LOAD
IBM Software Group | WebSphere software
110
GUI BASED ETL TOOLS
 Informatica
 DT/Studio
 Data Stage
 Business Objects Data Integrator (BODI)
 AbInitio
 Data Junction
 Oracle Warehouse Builder
 Microsoft SQL Server Integration Services
 IBM DB2 Ware house Center
®
IBM Software Group
© 2007 IBM Corporation
Extraction Types
IBM Software Group | WebSphere software
112
Extraction Types
Extraction
Full Extract
Periodic/
Incremental
Extract
IBM Software Group | WebSphere software
113
Full Extract
Source System
Full Extract
Data Mart
New data
IBM Software Group | WebSphere software
115
Incremental Extract
Data Mart
Source System
Incremental Extract
Existing data
Incremental
Data
IBM Software Group | WebSphere software
116
Incremental Extract
Data Mart
Source System
Incremental Extract
New data
Changed data
Existing data
Incremental
Data
IBM Software Group | WebSphere software
117
Incremental Extract
Data Mart
Source System
Incremental Extract
New data
Changed data
Existing data updated
using changed data
Incremental
Data
Incremental addition
to data mart
IBM Software Group | WebSphere software
118
DATAWARE LOADING
®
IBM Software Group
© 2007 IBM Corporation
IBM Software Group | WebSphere software
120
Types of Data warehouse Loading
 Target update types
Insert
Update
IBM Software Group | WebSphere software
Types of Data Warehouse Updates
Insert
Full Replace
Selective Replace
Update plus Retain History
Update
Point in Time
Snapshots
New Data
Changed Data
Data Warehouse
Source data Data Staging
IBM Software Group | WebSphere software
New Data and Point-In-Time Data Insert
Source data
New data
OR
Point-in-Time
Snapshot
(e.g.. Monthly)
New Data Added to
Existing Data
IBM Software Group | WebSphere software
Changed Data Insert
Source data
Changed Data Added to
Existing Data
Changed
data
IBM Software Group | WebSphere software
124
DataData
WareWare
househouse
DataData
WareWare
househouse
Enterprise
Data
Warehouse
InfoInfo
AccessAccess
InfoInfo
AccessAccess
Reporting tools
Web
Browsers
OLAP
Mining
ETLETLETLETL
External DataExternal Data
StorageStorage
BusinessBusiness
RequirementRequirement
Map DataMap Data
sourcessources
ReverseReverse
Engg.Engg.
MapMap
Req. toReq. to
OLTPOLTP
OLTPOLTP
SystemSystem
LogicalLogical
ModelingModeling
RefineRefine
ModelModel
Data Warehouse Life cycle
IBM Software Group | WebSphere software
125
Project Life Cycle
 Software Requirement Specification
 High level Design(HLD)
 Low level Design(LLD)
 Development
 Unit Testing
 System Integration Testing
 Peer Review
 User Acceptance Testing
 Production
 Maintenance
125
®
IBM Software Group
© 2007 IBM Corporation
Meta Data in a
Data Warehouse
IBM Software Group | WebSphere software
127
• Data about data and the processes
• Metadata is stored in a data dictionary and repository.
• Insulates the data warehouse from changes in the schema of
operational systems.
• It serves to identify the contents and location of data in the
data warehouse
What is Metadata?
IBM Software Group | WebSphere software
128
 Share resources
 Users
 Tools
 Document system
 Without meta data
 Not Sustainable
 Not able to fully utilize resource
Why Do You Need Meta Data?
IBM Software Group | WebSphere software
The Role of Meta Data in the Data Warehouse
 Know what data you have and
 You can trust it!
Meta Data enables data to become information, because with it you
IBM Software Group | WebSphere software
Meta Data Answers….
How have business definitions and terms changed over time?
How do product lines vary across organizations?
What business assumptions have been made?
How do I find the data I need?
What is the original source of the data?
How was this summarization created?
What queries are available to access the data
IBM Software Group | WebSphere software
131
Meta Data Process
 Integrated with entire process and data flow
 Populated from beginning to end
 Begin population at design phase of project
 Dedicated resources throughout
 Build
 Maintain
•Design
•Mapping
•Design
•Mapping
•Extract
•Scrub
•Transform
•Extract
•Scrub
•Transform
•Load
•Index
•Aggregation
•Load
•Index
•Aggregation
•Replication
•Data Set Distribution
•Replication
•Data Set Distribution
•Access & Analysis
•Resource Scheduling & Distribution
•Access & Analysis
•Resource Scheduling & Distribution
Meta DataMeta Data
System MonitoringSystem Monitoring
IBM Software Group | WebSphere software
132
Types of ETL Meta Data
.
ETL Meta data
Technical
Meta data
Operational
Meta data
IBM Software Group | WebSphere software
 Data Warehouse Meta data
This Meta data stores descriptive information about the physical
implementation details of data warehouse.
 Source Meta data
This Meta data stores information about the source data and the mapping of source
data to data warehouse data
Classification of ETL Meta Data
IBM Software Group | WebSphere software
 Transformations & Integrations.
This Meta data describes comprehensive information about the Transformation and
loading.
 Processing Information
This Meta data stores information about the activities involved in the processing of data
such as scheduling and archives etc
 End User Information
This Meta data records information about the user profile and security.
ETL Meta Data
IBM Software Group | WebSphere software
135
ETL -Planning for the Movement
The following may be helpful for planning the movement
 Develop a ETL plan
 Specifications
 Implementation
®
IBM Software Group
© 2007 IBM Corporation

Weitere ähnliche Inhalte

Was ist angesagt?

Company report xinglian
Company report xinglianCompany report xinglian
Company report xinglianXinglian Liu
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overviewjdijcks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016bzigman
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIDenodo
 
SnapLogic Cloud Integration
SnapLogic Cloud IntegrationSnapLogic Cloud Integration
SnapLogic Cloud IntegrationSnapLogic
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglyTyler Wishnoff
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User InformationDenodo
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014Amazon Web Services
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudDenodo
 

Was ist angesagt? (20)

Company report xinglian
Company report xinglianCompany report xinglian
Company report xinglian
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Federation
Data FederationData Federation
Data Federation
 
BarbaraZigmanResume 2016
BarbaraZigmanResume 2016BarbaraZigmanResume 2016
BarbaraZigmanResume 2016
 
Data Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AIData Science Operationalization: The Journey of Enterprise AI
Data Science Operationalization: The Journey of Enterprise AI
 
SnapLogic Cloud Integration
SnapLogic Cloud IntegrationSnapLogic Cloud Integration
SnapLogic Cloud Integration
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Govern and Protect Your End User Information
Govern and Protect Your End User InformationGovern and Protect Your End User Information
Govern and Protect Your End User Information
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
DW 101
DW 101DW 101
DW 101
 
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
(ENT211) Migrating the US Government to the Cloud | AWS re:Invent 2014
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the CloudHow to Take Advantage of an Enterprise Data Warehouse in the Cloud
How to Take Advantage of an Enterprise Data Warehouse in the Cloud
 

Ähnlich wie Dwh basics datastage online training

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overviewKeshav Murthy
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...DATAVERSITY
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Software Defined Infrastructure
Software Defined InfrastructureSoftware Defined Infrastructure
Software Defined Infrastructureinside-BigData.com
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
High Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsHigh Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsGabor Samu
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Harsha Gowda B R
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEdenH6
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceKeshav Murthy
 
Engage for success ibm spectrum accelerate 2
Engage for success   ibm spectrum accelerate 2Engage for success   ibm spectrum accelerate 2
Engage for success ibm spectrum accelerate 2xKinAnx
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...EMC
 

Ähnlich wie Dwh basics datastage online training (20)

DWBASIC.ppt
DWBASIC.pptDWBASIC.ppt
DWBASIC.ppt
 
DWH_Session_1.pptx
DWH_Session_1.pptxDWH_Session_1.pptx
DWH_Session_1.pptx
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Software Defined Infrastructure
Software Defined InfrastructureSoftware Defined Infrastructure
Software Defined Infrastructure
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
High Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environmentsHigh Value Business Intelligence for IBM Platform compute environments
High Value Business Intelligence for IBM Platform compute environments
 
Datawarehouse org
Datawarehouse orgDatawarehouse org
Datawarehouse org
 
Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10Nw2008 tips tricks_edw_v10
Nw2008 tips tricks_edw_v10
 
Enterprise Data Warehousing Positioning
Enterprise Data Warehousing PositioningEnterprise Data Warehousing Positioning
Enterprise Data Warehousing Positioning
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Informix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performanceInformix & IWA : Operational analytics performance
Informix & IWA : Operational analytics performance
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Engage for success ibm spectrum accelerate 2
Engage for success   ibm spectrum accelerate 2Engage for success   ibm spectrum accelerate 2
Engage for success ibm spectrum accelerate 2
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
Pivotal the new_pivotal_big_data_suite_-_revolutionary_foundation_to_leverage...
 

Kürzlich hochgeladen

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Kürzlich hochgeladen (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Dwh basics datastage online training

  • 1. ® IBM Software Group © 2007 IBM Corporation
  • 2. ® IBM Software Group © 2007 IBM Corporation
  • 3. IBM Software Group | WebSphere software 3 06/15/14 TCS Confidential 3
  • 4. IBM Software Group | WebSphere software 4 Course Roadmap • Why we use Data warehousing • Difference between Operational System and Data Warehouse • Introduction to Data warehousing • Data Warehousing Approaches • Data Warehouse Technical Architecture • Data Modelling concepts • Operational Data Store • Schema Design of Data warehouse • Data Acquisation • ETL Products • Project Life Cycle
  • 5. IBM Software Group | WebSphere software 5 Why We Need Data Warehousing ?  Better business intelligence for end-users  Reduction in time to locate, access, and analyze information  Consolidation of disparate information sources  To Store Large Volumes of Historical Detail Data from Mission Critical Applications  Strategic advantage over competitors  Faster time-to-market for products and services  Replacement of older, less-responsive decision support systems  Reduction in demand on IS to generate reports
  • 6. IBM Software Group | WebSphere software 6 What is an Operational System?  Operational systems are just what their name implies; they are the systems that help us run the day-to-day enterprise operations.  These are the backbone systems of any enterprise, such as order entry inventory etc.  The classic examples are airline reservations, credit-card authorizations, and ATM withdrawals etc.,
  • 7. IBM Software Group | WebSphere software 7 Characteristics of Operational Systems • Continuous availability • Predefined access paths • Transaction integrity • Volume of transaction - High • Data volume per query - Low • Used by operational staff • Supports day to day control operations • Large number of users
  • 8. IBM Software Group | WebSphere software 8 OLTP Vs Data Warehouse Operational System Data Warehouse Transaction Processing Query Processing Predictable CPU Usage Random CPU Usage Time Sensitive History Oriented Operator View Managerial View Normalized Efficient Design for TP Denormalized Design for Query Processing
  • 9. IBM Software Group | WebSphere software 9 OLTP Vs Warehouse Operational System Data Warehouse Designed for Atmocity, Consistency, Isolation and Durability Designed for quite or static database Organized by transactions (Order, Input, Inventory) Organized by subject (Customer, Product) Relatively smaller database Large database size Many concurrent users Relatively few concurrent users Volatile Data Non Volatile Data
  • 10. IBM Software Group | WebSphere software 10 Operational System Data Warehouse Stores all data Stores relevant data Performance Sensitive Less Sensitive to performance Not Flexible Flexible Efficiency Effectiveness
  • 11. IBM Software Group | WebSphere software 11 What is a Data Warehouse ?  Data WarehouseData Warehouse is a  Subject-Oriented  Integrated  Time-Variant  Non-volatile WH Inmon - Regarded As Father Of Data WarehousingWH Inmon - Regarded As Father Of Data Warehousing
  • 12. ® IBM Software Group © 2007 IBM Corporation
  • 13. IBM Software Group | WebSphere software 13 13 Subject Oriented Analysis Data Warehouse StorageTransactional Storage SalesSales CustomersCustomers ProductsProducts Entry Sales Rep Quantity Sold Part Number Date Customer Name Product Description Unit Price Mail Address Process Oriented Subject Oriented
  • 14. IBM Software Group | WebSphere software 14 14 Integration of Data Data Warehouse StorageTransactional Storage Appl. A - M, F Appl. B - 1, 0 Appl. C - X, Y Appl. A - pipeline cm. Appl. B - pipeline inches Appl. C - pipeline mcf Appl. A - balance dec(13,2) Appl. B - balance PIC 9(9)V99 Appl. C - balance float Appl. A - bal-on-hand Appl. B - current_balance Appl. C - balance Appl. A - date (Julian) Appl. B - date (yymmdd) Appl. C - date (absolute) M, F pipeline cm balance dec(13, 2) balance date (Julian) Integration Encoding Unit of Attributes Physical Attributes Naming Conventions Data Consistency
  • 15. IBM Software Group | WebSphere software 15 15 Load Access Mass Load / Access of DataRecord-by-Record Data Manipulation Insert Access Insert Change Delete Change Volatile Non-Volatile Volatility of Data Data Warehouse StorageTransactional Storage
  • 16. IBM Software Group | WebSphere software 16 16 Time Variant Data Analysis Data Warehouse StorageTransactional Storage Current Data Historical Data 0 5 10 15 20 Sales ( in lakhs ) January February March Year97 Sales ( Region , Year - Year 97 - 1st Qtr) East West North
  • 17. IBM Software Group | WebSphere software Load/ Update Consistent Points in Time Updated constantly Data changes according to need, not a fixed schedule Added to regularly, but loaded data is rarely directly changed Does NOT mean the Data warehouse is never updated or never changes!! Constant Change Operational systems Database Data warehouse Datawarehouse- Differences from Operational Systems Insert Insert Update Initial Load Incremental Load Incremental Load Update Delete
  • 18. IBM Software Group | WebSphere software 18 Difference B/W OLTP AND OLAP
  • 19. IBM Software Group | WebSphere software 19 DW Implementation Approaches  Top Down  Bottom-up  Combination of both  Choices depend on: current infrastructure resources architecture ROI Implementation speed
  • 20. IBM Software Group | WebSphere software 20 Heterogeneous Source Systems Staging Common Staging interface Layer EDW- “Top Down”Approach Data mart bus architecture Layer Enterprise Datawarehouse Source 1 Source 2 Source 3 Incremental Architected data marts DM 1 DM 3DM 2
  • 21. IBM Software Group | WebSphere software 21 Heterogeneous Source Systems Staging Common Staging interface Layer EDW- “Bottom up”Approach Data mart bus architecture Layer Source 1 Source 2 Source 3 Incremental Architected data marts DM 1 DM 3DM 2 Enterprise Datawarehouse
  • 22. IBM Software Group | WebSphere software 22 Source System Data Staging Area Presentation Area Services: Transform from source-to-Target Maintain Conform Dimensions No user query support Data Store: Flat files or relational tables Design Goals: Staging Throughput integrity/ consistency Load Access Ad Hoc Query Tools Report Writers Analytic Applications Modeling: Forecasting Scoring Data Mining Data Mart #1 Dimensional Atomic AND summery data Business Process Centric Design Goals: Easy-of -use Query Performance Data Mart #2 Data Mart #..... Data Mart Bus: Conformed facts and dims Extract Extract Extract Data Access Tools Independent Data Marts: Ralph Kimball’s Ideology Ralph Kimball’ Approach
  • 23. IBM Software Group | WebSphere software 23 •E/R Design or Flat File •Retain History Needed for regular processing •No end user access • Dimensional •Transaction & Summary data •Data Mart Single subject area (i.e. Fact table) •Multiple Marts May exist in a Single Database Instance Bottom Up Approach Staging Data Store Data Warehouse Data Mart Data Mart Data Mart Data Mart Data MartData Mart •Integrated Data •Timely User Access •Conformed Dimensions •Single Process to Build Dimension
  • 24. ® IBM Software Group © 2007 IBM Corporation
  • 25. IBM Software Group | WebSphere software 25 Bill Inmon’ Approach Source System Data Staging Area Presentation Area “Enterprise Data Warehouse” Normalized tables Atomic Data User query support to atomic data Extract Extract Extract Load Data Mart #1 Dimensional summery data Departmental Centric Access Access Data Access Tools Data Mart #2 Data Mart #... ETL Dependent Data Marts: Bill Inmon’s Ideology DWH
  • 26. IBM Software Group | WebSphere software 26 Top Down Approach • Raw Input Data • E/R Model • Subject Areas • Transaction Level Detail • Historical Persistency As justified- Archive for Retrieval if Needed • Most are dimensional • Data Mart Design by Business Function • Summary Level Data • Data Mart Data Mart Staging Data Store Data Warehouse Data Mart Data Mart Flat File •Integrated Data •Timely user Access •Single Process to build dimension
  • 27. IBM Software Group | WebSphere software 27 DW Implementation Approaches Top Down  More planning and design initially  Involve people from different work- groups, departments  Data marts may be built later from Global DW  Overall data model to be decided up- front Bottom Up  Can plan initially without waiting for global infrastructure  built incrementally  can be built before or in parallel with Global DW  Less complexity in design
  • 28. IBM Software Group | WebSphere software 28 DW Implementation Approaches Top Down  Consistent data definition and enforcement of business rules across enterprise  High cost, lengthy process, time consuming  Works well when there is centralized IS department responsible for all H/W and resources Bottom Up  Data redundancy and inconsistency between data marts may occur  Integration requires great planning  Less cost of H/W and other resources  Faster pay-back
  • 29. IBM Software Group | WebSphere software 29 29 DW Architectures
  • 30. IBM Software Group | WebSphere software 30 Prod Mkt HR Fin Acctg Data Sources Transaction Data IBM IMS VSAM Oracle Sybase ETL Software Data Stores Data Analysis Tools and Applications Users Other Internal Data ERP SAP Clickstream Informix Web Data External Data Demographic Harte- Hanks S T A G I N G A R E A O P E R A T I O N A L D A T A S T O R E Ascential Extract Sagent SAS Clean/Scrub Transform Firstlogic Load DATASTAGE Data Marts Teradata IBM Data Warehouse Meta Data Finance Marketing Sales Essbase Microsoft ANALYSTS MANAGERS EXECUTIVES OPERATIONAL PERSONNEL CUSTOMERS/ SUPPLIERS SQL Cognos SAS Queries,Reporting, DSS/EIS, Data Mining Micro Strategy Siebel Business Objects Web Browser
  • 31. IBM Software Group | WebSphere software 31 Benefits of DWH To formulate effective business, marketing and sales strategies. To precisely target promotional activity. To discover and penetrate new markets. To successfully compete in the marketplace from a position of informed strength. To build predictive rather than retrospective models.
  • 32. IBM Software Group | WebSphere software 32 Data Modeling
  • 33. IBM Software Group | WebSphere software 33 Data Modeling  WHAT IS A DATA MODEL? A data model is an abstraction of some aspect of the real world (system).  WHY A DATA MODEL? • Helps to visualize the business • A model is a means of communication. • Models help elicit and document requirements. • Models reduce the cost of change. • Model is the essence of DW architecture based on which DW will be implemented
  • 34. IBM Software Group | WebSphere software 34 STEPS in DATA MODELING Problem & scope definition Requirement Gathering Analysis Logical Database Design Deciding Database Physical Database design Schema Generation
  • 35. IBM Software Group | WebSphere software 35 Levels of modeling  Conceptual modeling Describe data requirements from a business point of view without technical details  Logical modeling Refine conceptual models Data structure oriented, platform independent  Physical modeling Detailed specification of what is physically implemented using specific technology
  • 36. IBM Software Group | WebSphere software 36 Modeling Techniques  Entity-Relationship Modeling Traditional modeling technique Technique of choice for OLTP Suited for corporate data warehouse  Dimensional Modeling Analyzing business measures in the specific business context Helps visualize very abstract business questions End users can easily understand and navigate the data structure
  • 37. IBM Software Group | WebSphere software 37  Relationship Relationship between entities - structural interaction and association described by a verb Cardinality  1-1  1-M  M-M Example : Books belong to Printed Media Entity-Relationship Modeling - Basic Concepts
  • 38. IBM Software Group | WebSphere software 38 Entity-Relationship Modeling - Basic Concepts  Attributes Characteristics and properties of entities Example :  Book Id, Description, book category are attributes of entity “Book” Attribute name should be unique and self- explanatory Primary Key, Foreign Key, Constraints are defined on Attributes
  • 39. IBM Software Group | WebSphere software Review of Logical Modeling Terms & Symbols  Entities define specific groups of information Sales Organization Sales Org ID Distribution Channel Entity
  • 40. IBM Software Group | WebSphere software Review of Logical Modeling Terms & Symbols  One or more attribute uniquely identifies an instance of an entity Sales Organization Sales Org ID Distribution Channel Identifier
  • 41. IBM Software Group | WebSphere software Review of Logical Modeling Terms & Symbols  The logical model identifies relationships between entities Sales Detail Sales Record ID Sales Rep Sales Rep ID Relationship {
  • 42. ® IBM Software Group © 2007 IBM Corporation
  • 43. IBM Software Group | WebSphere software Logical Data Model Sales Detail Sales Record ID Customer Customer ID Product Product SKU Suppliers Supplier ID Manufacturing Group Manufacturing Org ID Factory Factory ID Sales Organization Sales Org ID Distribution Channel Sales Rep Sales Rep ID Retail Market Product Sales Plan Plan ID Wholesale Industry
  • 44. IBM Software Group | WebSphere software 44 44 Examples: ER Model
  • 45. IBM Software Group | WebSphere software 45 Limitations of E-R Modeling  Poor Performance  Tend to be very complex and difficult to navigate.
  • 46. ® IBM Software Group © 2007 IBM Corporation
  • 47. IBM Software Group | WebSphere software 47 47 Dimensional Modeling
  • 48. IBM Software Group | WebSphere software 48 Dimensional Modeling  Dimensional modeling uses three basic concepts : measures, facts, dimensions.  Is powerful in representing the requirements of the business user in the context of database tables.  Focuses on numeric data, such as values counts, weights, balances and occurences.
  • 49. IBM Software Group | WebSphere software 49 What is a Facts  A fact is a collection of related data items, consisting of measures and context data.  Each fact typically represents a business item, a business transaction, or an event that can be used in analyzing the business or business process.  Facts are measured, “continuously valued”, rapidly changing information. Can be calculated and/or derived.  Granularity The level of detail of data contained in the data warehouse e.g. Daily item totals by product, by store
  • 50. IBM Software Group | WebSphere software 50 Types of Facts  Additive Able to add the facts along all the dimensions Discrete numerical measures eg. Retail sales in $  Semi Additive Snapshot, taken at a point in time Measures of Intensity Not additive along time dimension eg. Account balance, Inventory balance Added and divided by number of time period to get a time-average  Non Additive Numeric measures that cannot be added across any dimensions Intensity measure averaged across all dimensions eg. Room temperature Textual facts - AVOID THEM
  • 51. IBM Software Group | WebSphere software 51 Dimensions  A dimension is a collection of members or units of the same type of views.  Dimensions determine the contextual background for the facts.  Dimensions represent the way business people talk about the data resulting from a business process, e.g., who, what, when, where, why, how
  • 52. IBM Software Group | WebSphere software 52 52 Dimensional Hierarchy World America AsiaEurope USA FL Canada Argentina GA VA CA WA TampaMiami Orlando Naples Continent Level State Level City Level World Level Country Level ParentRelation Dimension Member / Business Entity Geography Dimension Attributes: Population, Tourist’s Place
  • 53. IBM Software Group | WebSphere software 53 Dimensions Types  Conformed Dimension  Junk Dimension  Fast Changing Dimension  Role Playing Dimension  ‘Garbage’ Dimension  Slowly Changing Dimension  Degenerated Dimension 53
  • 54. IBM Software Group | WebSphere software 54 What is a Slowly Changing Dimension?  Although dimension tables are typically static lists, most dimension tables do change over time.  Since these changes are smaller in magnitude compared to changes in fact tables, these dimensions are known as slowly growing or slowly changing dimensions.
  • 55. IBM Software Group | WebSphere software 55 Slowly Changing Dimension -Classification Slowly changing dimensions are classified into three different types  TYPE I  TYPE II  TYPE III
  • 56. IBM Software Group | WebSphere software 56 Slowly Changing Dimensions Type I Shane Name Shane@xyz.com1001 EmailEmp id Shane Name Shane@xyz.com1001 EmailEmp id Shane Name Shane@ abc.co.in 1001 EmailEmp id Shane Name Shane@ abc.co.in 1001 EmailEmp id Source Source Target Target Shane@ xyz.com
  • 57. IBM Software Group | WebSphere software 57 Slowly Changing Dimensions Type II Shane Name Shane@xyz.com10 EmailEmp id Shane@x yz. com Email Shane Name 10 Emp id 1000 PM_PRI MARYK EY 0 PM_VER SION_N UMBER Source Target
  • 58. ® IBM Software Group © 2007 IBM Corporation
  • 59. IBM Software Group | WebSphere software 59 Slowly Changing Dimensions -Versioning Shane Name Shane@ abc.co.in 10 EmailEmp id Source Target 0Shane@ xyz.com Shane101000 1Shane@ abc.co.in Shane101001 EmailNameEmp idPM_PRIMA RYKEY PM_VERSION_NUMBER
  • 60. IBM Software Group | WebSphere software 60 Slowly Changing Dimensions -Versioning Shane Name Shane@ abc.com 10 EmailEmp id Source Target 1Shane@ abc.co.in Shane101001 2Shane@ abc.com Shane101003 0Shane@ xyz.com Shane101000 EmailNameEmp idPM_PRIM ARYKEY PM_VERSION_NUM BER
  • 61. IBM Software Group | WebSphere software 61 Slowly Changing Dimensions Type II - Flag Shane Name Shane@xyz.com10 EmailEmp id Shane@ xyz. com Email Shane Name 10 Emp id 1000 PM_PR IMAR YKEY Y PM_CUR RENT_FL AG Source Target
  • 62. IBM Software Group | WebSphere software 62 Slowly Changing Dimensions - Flag Current Shane Name Shane@ abc.co.in 10 EmailEmp id Source Target NShane@ xyz.com Shane101000 YShane@ abc.co.in Shane101001 EmailNameEmp idPM_PRIMA RYKEY PM_CURRENT_FLAG
  • 63. IBM Software Group | WebSphere software 63 Slowly Changing Dimensions - Flag Current Shane Name Shane@ abc.com 10 EmailEmp id Source Target NShane@ abc.co.in Shane101001 YShane@ abc.com Shane101003 NShane@ xyz.com Shane101000 EmailNameEmp idPM_PRIMA RYKEY PM_CURRENT_FLAG
  • 64. ® IBM Software Group © 2007 IBM Corporation
  • 65. IBM Software Group | WebSphere software 65 Slowly Changing Dimensions Type II Shane Name Shane@xyz.c om 10 EmailEmp id 01/01/00 PM_BEG IN_DAT E Shane@x yz.com Email Shane Name 10 Emp id 1000 PM_PRI MARYK EY PM_EN D_DATE Source Target
  • 66. IBM Software Group | WebSphere software 66 Slowly Changing Dimensions -Effective Date Shane Name Shane@ abc.co.in10 Email Emp id Source Target 03/01/00 01/01/00 PM_BEGIN_D ATE 03/01/00Shane@x yz.com Shane101000 Shane@ abc.co.in Shane101001 EmailNameEmp idPM_PRIMAR YKEY PM_END_D ATE
  • 67. IBM Software Group | WebSphere software 67 Slowly Changing Dimensions - Effective Date Shane Name Shane@ abc.com10 EmailEmp id Source Target 05/02/00 03/01/00 01/01/00 PM_BEGIN_D ATE 05/02/00Shane@ abc.co.in Shane101001 Shane@ abc.com Shane101003 03/01/00Shane@ xyz.com Shane101000 EmailNameEmp idPM_PRIM ARYKEY PM_END_DA TE
  • 68. ® IBM Software Group © 2007 IBM Corporation
  • 69. IBM Software Group | WebSphere software 69 Slowly Changing Dimensions Type III Shane Name Shane@xyz.c om 10 EmailEmp id PM_Prev_ Column Name Shane@xyz. com Email Shane Name 10 Emp id 1 PM_PRI MARYKE Y 01/01/00 PM_EFFEC T_DATE Source Target
  • 70. IBM Software Group | WebSphere software 70 Slowly Changing Dimensions Type III Shane Name Shane@ abc.co.in10 EmailEmp id Source Target Shane@xyz.co m PM_Prev_Colu mnName 01/02/00Shane@ abc.co.in Shane101 EmailNameEmp idPM_PRIMAR YKEY PM_EFFEC T_DATE
  • 71. IBM Software Group | WebSphere software 71 Slowly Changing Dimensions Type III Shane Name Shane@ abc.com10 EmailEmp id Source Target Shane@ abc.co.in PM_Prev_Colu mnName 01/03/00Shane@ abc.com Shane101 EmailNameEmp idPM_PRIM ARYKEY PM_EFFECT_ DATE
  • 72. IBM Software Group | WebSphere software 72 Degenerate Dimension  Dimension keys in fact table without corresponding dimension tables are called Degenerate Dimensions  Purpose of Degenerate Dimensions 1. Generally used when each record in fact represents transaction line item 2. Useful for grouping transaction line items belonging to a single transaction
  • 73. IBM Software Group | WebSphere software 73 Fast Changing Dimension A fast changing dimension is a dimension whose attribute or attributes for a record (row) change rapidly over time. 1. Example: Age of associates, Income, Daily balance etc. 2. Technique to handle fast changing dimension: Create band tables
  • 74. IBM Software Group | WebSphere software 74 Role Playing Dimension A single dimension which is expressed differently in a fact table using views is called a role-playing dimension. This can be achieved by creating views on dimension table.
  • 75. IBM Software Group | WebSphere software 75 Conformed Dimension A conformed dimension means the same thing to each fact table to which it can be joined. Typically, dimension tables that are referenced or are likely to be referenced by multiple fact tables (multiple dimensional models) are called conformed dimensions .
  • 76. IBM Software Group | WebSphere software 76 Conformed Dimension Option #1  Identical dimensions with same keys, labels, definitions and Values Sales Schema Inventory Schema SALES Facts DATE KEY PRODUCT KEY STORE KEY PROMO KEY Product Desc Brand Desc Category Desc PRODUCT KEY INVENTORY Facts DATE KEY PRODUCT KEY STORE KEY Product Desc Brand Desc Category Desc PRODUCT KEY
  • 77. IBM Software Group | WebSphere software 77 Conformed Dimension Option #2 Subset of base dimension with common labels, definitions and values Sales Schema Forecast Schema SALES $ DATE KEY PRODUCT KEY STORE KEY PROMO KEY Product Desc Brand Desc Category Desc PRODUCT KEY DATE KEY Day-of-week Week Desc Month Desc SALES $ MONTH KEY BRAND KEYBrand Desc Category Desc BRAND KEY MONTH KEY Month Desc BRAND KEY Brand Desc Category Desc 12345 Cherriors Cereal PROD KEY Prod Desc Brand Desc Category Desc 12345 Cherriors 10 Cherriors Cereal
  • 78. IBM Software Group | WebSphere software 78 ‘Garbage’ Dimension A garbage dimension is a dimension that consists of low-cardinality columns such as codes, indicators, and status flags. Approach to handle Garbage dimension: • Put the new attributes into existing dimension tables. • Put the new attributes into the fact table. • Create new separate dimension tables garbage dimension • Create a separate ‘Garbage Dimension’ table
  • 79. IBM Software Group | WebSphere software 79 Junk Dimensions  Whether to use junk dimension 5 indicators, each has 3 values -> 243 (35 ) rows 5 indicators, each has 100 values -> 100 million (1005 ) rows  When to insert rows in the dimension
  • 80. IBM Software Group | WebSphere software 80 Factless Fact Tables The two types of factless fact tables are:  Coverage tables  Event tracking tables
  • 81. IBM Software Group | WebSphere software 81 Factless Fact Tables - Coverage Tables Coverage tables are required when a primary fact table is sparse Example: Tracking products in a store that did not sell
  • 82. ® IBM Software Group © 2007 IBM Corporation
  • 83. IBM Software Group | WebSphere software 83 Factless Fact Tables - Event Tracking These tables are used for tracking a event: Example: Tracking student attendance
  • 84. IBM Software Group | WebSphere software 84 Fact Constellation  Fact constellations: Multiple fact tables share dimension tables,viewed as a collection of stars, therefore called galaxy schema or fact constellation
  • 85. IBM Software Group | WebSphere software 85 What is a Data mart?  Data mart is a decentralized subset of data found either in a data warehouse or as a standalone subset designed to support the unique business unit requirements of a specific decision-support system.  Data marts have specific business-related purposes such as measuring the impact of marketing promotions, or measuring and forecasting sales performance etc,. Data Mart Data Mart Enterprise Data Warehouse
  • 86. IBM Software Group | WebSphere software 86 Data marts - Main Features Main Features:  Low cost  Controlled locally rather than centrally, conferring power on the user group.  Contain less information than the warehouse  Rapid response  Easily understood and navigated than an enterprise data warehouse.  Within the range of divisional or departmental budgets
  • 87. ® IBM Software Group © 2007 IBM Corporation
  • 88. IBM Software Group | WebSphere software 88 Datamart Advantages :  Typically single subject area and fewer dimensions  Limited feeds  Very quick time to market (30-120 days to pilot)  Quick impact on bottom line problems  Focused user needs  Limited scope  Optimum model for DW construction  Demonstrates ROI  Allows prototyping Advantages of Datamart over Datawarehouse
  • 89. IBM Software Group | WebSphere software 89 Data Mart disadvantages : • Does not provide integrated view of business information. • Uncontrolled proliferation of data marts results in redundancy • More number of data marts complex to maintain • Scalability issues for large number of users and increased data volume Disadvantages of Data Mart
  • 90. IBM Software Group | WebSphere software 90 90 Data marts • Embedded data marts are marts that are stored within the central DW. They can be stored relationally as files or cubes. • Dependent data marts are marts that are fed directly by the DW, sometimes supplemented with other feeds, such as external data. • Independent data marts are marts that are fed directly by external sources and do not use the DW. DM - Types
  • 91. ® IBM Software Group © 2007 IBM Corporation The Operational Data Store
  • 92. IBM Software Group | WebSphere software 92
  • 93. IBM Software Group | WebSphere software 93 Why We Need Operational Data Store? Need  To obtain a “system of record” that contains the best data that exists in a legacy environment as a source of information  Best here implies data to be Complete Up to date Accurate  In conformance with the organization’s information model
  • 94. IBM Software Group | WebSphere software  ODS data resolves data integration issues  Data physically separated from production environment to insulate it from the processing demands of reporting and analysis  Access to current data facilitated. Operational Data Store - Insulated from OLTP Tactical Analysis OLTP Server ODS
  • 95. IBM Software Group | WebSphere software 95  Detailed data  Records of Business Events (e.g. Orders capture)  Data from heterogeneous sources  Does not store summary data  Contains current data Operational Data Store - Data
  • 96. ® IBM Software Group © 2007 IBM Corporation
  • 97. IBM Software Group | WebSphere software 97 ODS- Benefits  Integrates the data  Synchronizes the structural differences in data  High transaction performance  Serves the operational and DSS environment  Transaction level reporting on current data Flat files Relational Database Operational Data Store 60,5.2,”JOHN” 72,6.2,”DAVID” Excel files
  • 98. IBM Software Group | WebSphere software  Update schedule - Daily or less time frequency  Detail of Data is mostly between 30 and 90 days  Addresses operational needs  Weekly or greater time frequency  Potentially infinite history  Address strategic needs Operational Data Store- Update schedule ODS Data Data warehouse Data
  • 99. ® IBM Software Group © 2007 IBM Corporation
  • 100. IBM Software Group | WebSphere software 100 OLTP Vs ODS Vs DWH Characteristic OLTP ODS Data Warehouse Data redundancy Non-redundant within system; Unmanaged redundancy among systems Somewhat redundant with operational databases Managed redundancy Data stability Dynamic Somewhat dynamic Static Data update Field by field Field by field Controlled batch Data usage Highly structured, repetitive Somewhat structured, some analytical Highly unstructured, heuristic or analytical Database size Moderate Moderate Large to very large Database structure stability Stable Somewhat stable Dynamic
  • 101. IBM Software Group | WebSphere software 101 Star Schema Design Single fact table surrounded by denormalized dimension tables The fact table primary key is the composite of the foreign keys (primary keys of dimension tables) Fact table contains transaction type information. Many star schemas in a data mart Easily understood by end users, more disk storage required
  • 102. IBM Software Group | WebSphere software 102 EXAMPLE OF STAR SCHEMA
  • 103. IBM Software Group | WebSphere software 103 Snowflake Schema Single fact table surrounded by normalized dimension tables Normalizes dimension table to save data storage space. When dimensions become very very large Less intuitive, slower performance due to joins  May want to use both approaches, especially if supporting multiple end-user tools.
  • 104. IBM Software Group | WebSphere software 104 Example of Snow flake schema
  • 105. IBM Software Group | WebSphere software 105 Snowflake - Disadvantages  Normalization of dimension makes it difficult for user to understand  Decreases the query performance because it involves more joins  Dimension tables are normally smaller than fact tables - space may not be a major issue to warrant snowflaking
  • 106. IBM Software Group | WebSphere software 106 Data Acquisation  Data Extraction  Data Transformation  Data Loading 106
  • 107. IBM Software Group | WebSphere software 107 Tool Category Products ETL Tools ETI Extract, Informatica, IBM Visual Warehouse Oracle Warehouse Builder OLAP Server Oracle Express Server, Hyperion Essbase, IBM DB2 OLAP Server, Microsoft SQL Server OLAP Services, Seagate HOLOS, SAS/MDDB OLAP Tools Oracle Express Suite, Business Objects, Web Intelligence, SAS, Cognos Powerplay/Impromtu, KALIDO, MicroStrategy, Brio Query, MetaCube Data Warehouse Oracle, Informix, Teradata, DB2/UDB, Sybase, Microsoft SQL Server, RedBricks Data Mining & Analysis SAS Enterprise Miner, IBM Intelligent Miner, SPSS/Clementine, TCS Tools Representative DW Tools
  • 108. IBM Software Group | WebSphere software 108 ETL PRODUCTS  CODE BASED ETL TOOLS  GUI BASED ETL TOOLS 108
  • 109. IBM Software Group | WebSphere software 109 CODE BASED ETL TOOLS  SAS ACCESS  SAS BASE  TERADATA ETL TOOLS 1. BTEQ 2. TPUMP 3. FAST LOAD 4. MULTI LOAD
  • 110. IBM Software Group | WebSphere software 110 GUI BASED ETL TOOLS  Informatica  DT/Studio  Data Stage  Business Objects Data Integrator (BODI)  AbInitio  Data Junction  Oracle Warehouse Builder  Microsoft SQL Server Integration Services  IBM DB2 Ware house Center
  • 111. ® IBM Software Group © 2007 IBM Corporation Extraction Types
  • 112. IBM Software Group | WebSphere software 112 Extraction Types Extraction Full Extract Periodic/ Incremental Extract
  • 113. IBM Software Group | WebSphere software 113 Full Extract Source System Full Extract Data Mart New data
  • 114. IBM Software Group | WebSphere software 115 Incremental Extract Data Mart Source System Incremental Extract Existing data Incremental Data
  • 115. IBM Software Group | WebSphere software 116 Incremental Extract Data Mart Source System Incremental Extract New data Changed data Existing data Incremental Data
  • 116. IBM Software Group | WebSphere software 117 Incremental Extract Data Mart Source System Incremental Extract New data Changed data Existing data updated using changed data Incremental Data Incremental addition to data mart
  • 117. IBM Software Group | WebSphere software 118 DATAWARE LOADING
  • 118. ® IBM Software Group © 2007 IBM Corporation
  • 119. IBM Software Group | WebSphere software 120 Types of Data warehouse Loading  Target update types Insert Update
  • 120. IBM Software Group | WebSphere software Types of Data Warehouse Updates Insert Full Replace Selective Replace Update plus Retain History Update Point in Time Snapshots New Data Changed Data Data Warehouse Source data Data Staging
  • 121. IBM Software Group | WebSphere software New Data and Point-In-Time Data Insert Source data New data OR Point-in-Time Snapshot (e.g.. Monthly) New Data Added to Existing Data
  • 122. IBM Software Group | WebSphere software Changed Data Insert Source data Changed Data Added to Existing Data Changed data
  • 123. IBM Software Group | WebSphere software 124 DataData WareWare househouse DataData WareWare househouse Enterprise Data Warehouse InfoInfo AccessAccess InfoInfo AccessAccess Reporting tools Web Browsers OLAP Mining ETLETLETLETL External DataExternal Data StorageStorage BusinessBusiness RequirementRequirement Map DataMap Data sourcessources ReverseReverse Engg.Engg. MapMap Req. toReq. to OLTPOLTP OLTPOLTP SystemSystem LogicalLogical ModelingModeling RefineRefine ModelModel Data Warehouse Life cycle
  • 124. IBM Software Group | WebSphere software 125 Project Life Cycle  Software Requirement Specification  High level Design(HLD)  Low level Design(LLD)  Development  Unit Testing  System Integration Testing  Peer Review  User Acceptance Testing  Production  Maintenance 125
  • 125. ® IBM Software Group © 2007 IBM Corporation Meta Data in a Data Warehouse
  • 126. IBM Software Group | WebSphere software 127 • Data about data and the processes • Metadata is stored in a data dictionary and repository. • Insulates the data warehouse from changes in the schema of operational systems. • It serves to identify the contents and location of data in the data warehouse What is Metadata?
  • 127. IBM Software Group | WebSphere software 128  Share resources  Users  Tools  Document system  Without meta data  Not Sustainable  Not able to fully utilize resource Why Do You Need Meta Data?
  • 128. IBM Software Group | WebSphere software The Role of Meta Data in the Data Warehouse  Know what data you have and  You can trust it! Meta Data enables data to become information, because with it you
  • 129. IBM Software Group | WebSphere software Meta Data Answers…. How have business definitions and terms changed over time? How do product lines vary across organizations? What business assumptions have been made? How do I find the data I need? What is the original source of the data? How was this summarization created? What queries are available to access the data
  • 130. IBM Software Group | WebSphere software 131 Meta Data Process  Integrated with entire process and data flow  Populated from beginning to end  Begin population at design phase of project  Dedicated resources throughout  Build  Maintain •Design •Mapping •Design •Mapping •Extract •Scrub •Transform •Extract •Scrub •Transform •Load •Index •Aggregation •Load •Index •Aggregation •Replication •Data Set Distribution •Replication •Data Set Distribution •Access & Analysis •Resource Scheduling & Distribution •Access & Analysis •Resource Scheduling & Distribution Meta DataMeta Data System MonitoringSystem Monitoring
  • 131. IBM Software Group | WebSphere software 132 Types of ETL Meta Data . ETL Meta data Technical Meta data Operational Meta data
  • 132. IBM Software Group | WebSphere software  Data Warehouse Meta data This Meta data stores descriptive information about the physical implementation details of data warehouse.  Source Meta data This Meta data stores information about the source data and the mapping of source data to data warehouse data Classification of ETL Meta Data
  • 133. IBM Software Group | WebSphere software  Transformations & Integrations. This Meta data describes comprehensive information about the Transformation and loading.  Processing Information This Meta data stores information about the activities involved in the processing of data such as scheduling and archives etc  End User Information This Meta data records information about the user profile and security. ETL Meta Data
  • 134. IBM Software Group | WebSphere software 135 ETL -Planning for the Movement The following may be helpful for planning the movement  Develop a ETL plan  Specifications  Implementation
  • 135. ® IBM Software Group © 2007 IBM Corporation