More Related Content
Similar to Designing A Data Warehouse With Sql 2008 (20)
Designing A Data Warehouse With Sql 2008
- 1. DESIGNING A DATA
WAREHOUSE WITH SQL
SERVER 2008
Joy Mundy, joy@kimballgroup.com
Introductions and Background
Presenter
Joy Mundy, Kimball Group
Kimball Group
The authors of the Data Warehouse Toolkit series of
books, including the Microsoft Data Warehouse Toolkit
Kimball University DW / BI courses
DW / BI strategic consulting
2
© 2005-2009 Kimball Group. All rights reserved. Page 1
- 2. Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
3
Some Possible Reasons
A. The CIO told us to
B. It sounds like fun
C. It’s a great opportunity for us to add significant
value to the business
D. We’re not building a DW/BI system, just an
Executive Dashboard
Which one of these is “Right”?
4
© 2005-2009 Kimball Group. All rights reserved. Page 2
- 3. Answer: C.
It is a great opportunity to add business value
It is also a great opportunity to:
Work with senior management
Advance your career
Play with fun, new technology
However, there are a few risks…
5
DW / BI System Risks
High profile
Success (and failure) is visible to senior management
Business driven – can be hard for technologists
Technology focus is rarely successful
“Build it and they will come” doesn't work
Dashboards are appropriate for mature DW/BI systems,
but are not a starting point
Data quality and integration are hard problems, even if
the technology works well
The project is complex and politically challenging
Follow a proven approach
6
© 2005-2009 Kimball Group. All rights reserved. Page 3
- 4. Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
7
Architecture Principles
Business requirements determine architecture
Listen to business requirements and translate them
into functional components
This means your DW/BI system architecture will not
be the same as your neighbor’s
Do not build major DW/BI components because you
are supposed to
8
© 2005-2009 Kimball Group. All rights reserved. Page 4
- 5. Architectural Approaches
Build reports directly from the transaction systems
Standalone marts
Normalized data warehouse feeding downstream
marts
Kimball dimensional data warehouse
9
Standalone Marts
Pros
•Marts reflect business
Sales requirements
Mart •Get business value
KPI this year
Sales+ View
Sources Cons
•HR •Multiple extracts of
•Projects the same data
•Siebel •Multiple transforms
•Skills Dtb CSAT+ RoB •Inconsistent versions
•CustSat files of the same data
•Sales
•SAP
CSAT •10th mart takes as
•FeedWrx old long to build as first
•Business
lists EMR Capacity
•Many others
Planning
PCD DIM
ESRT
CFR Others
© 2005-2009 Kimball Group. All rights reserved. Page 5
- 6. Normalized DW and Downstream
Marts
Pros
Sales+ •Data extracted and
consolidated only
once
•Marts reflect business
KPI View requirements
Sources
•HR Cons
•Projects •Takes too long to
•Siebel Enterprise Data CSAT+ build a new mart
•Skills Dtb Warehouse •Too many business
•CustSat files (not dimensional) rules between EDW &
•Sales •Integrated
•SAP marts; we still get
•FeedWrx
•Historical RoB inconsistencies
•Design reflects source
•Business systems
•EDW is by (and for) IT,
lists Capacity using its language and
•Many others Planning structures
DIM •Marts are for the
business
ESRT
Others
Kimball Dimensional Data
Warehouse
Pros
•Data extracted and
User consolidated only
applications once
Kimball-style •DW design meets
•Most “marts”
Dimensional become views into business requirements
Enterprise Data the enterprise •Data is structured to
Sources Warehouse system support easy analytic
•HR •Integrated & historical •Ad hoc use is
•Projects use with good perf
•Design reflects analytic supported and
•Siebel encouraged •Data and terms are
requirements
•Skills Dtb •Built incrementally consistent
•CustSat files •Contains the most •Once data is in the
•Sales detailed data possible DW, building new KPIs
•SAP •Fact data hooks or BI applications is
•FeedWrx together via shared Mart A
•Business much easier
(conformed) dimensions
lists •Presentation area is
•Many others relational or OLAP Mart B Cons
•OLAP is recommended •Takes longer to get
for Msft platform We may supplement the biz value than simply
•(Still need relational main DW/BI system with a throwing together a
DW) handful of custom BI apps mart
that meet specific needs.
These are the exception.
© 2005-2009 Kimball Group. All rights reserved. Page 6
- 7. Summary of Architectures
Approach Trxn system Ease of use Time to market
burden
Report directly Very high Very poor Poor
from trxn systems
Departmental marts Moderate Good until you need “90 days”, no economies
something new. of scale
Navigation challenges
Normalized DW + Low DW = poor Huge up-front
marts Marts = good until investment. Marts are
you need something “60 days”
new
Navigation challenges
Kimball dimensional Low Very good Large up-front
DW investment. Excellent
economies of scale.
13
The Microsoft DW/BI Technical
Architecture Metadata
Dimensionalization
Source Systems
Business/Extract
Business Users
Data Quality
OLAP
Rules
RDBMS
•SharePoint
•Report Builder
•Performance
Point
© 2005-2009 Kimball Group. All rights reserved. Page 7
- 8. Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
15
Kimball Method Basic Principles
Business driven
Iterative Lifecycle
Dimensional model for data delivery
Enterprise data framework
Bus Matrix
Conformed dimensions
Full solution from extracts to business value
16
© 2005-2009 Kimball Group. All rights reserved. Page 8
- 9. The Kimball DW/BI Lifecycle
Technical Product
Architecture Selection &
Design Installation Growth
Business
Project Require- Dimensional Physical ETL Design &
Planning ments Modeling Deployment
Design Development
Definition
BI BI Maintenance
Application Application
Specification Development
Project Management
Key Concepts:
- Business centric - Dimensional delivery
- Full solution - Iterative process
- Enterprise aware - Incremental growth
17
Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
18
© 2005-2009 Kimball Group. All rights reserved. Page 9
- 10. Business Requirements (1)
Interview key people across the org
Ask “What do you do?” not “What do you want?”
It is our job to design the solution, not theirs
Look for common analytic themes
Better promotion response rate
Improve sales performance
Break themes down into business processes that
generate needed data
Promotions Responses Orders
19
Business Requirements (2)
Design the data warehouse Enterprise Bus Matrix
Prioritize themes with senior management
Summarize finding in a Requirements Document
Identify and recruit good business sponsor(s)
Visionary
Influential
Reasonable
20
© 2005-2009 Kimball Group. All rights reserved. Page 10
- 11. Profile the Data
Early and often
Does the data exist to support the required
analysis?
Where are the problems affecting ETL design
Primary keys
Referential integrity
NULL values
Junk values
The dreaded “Notes” field
SSIS 2008 has useful data profiling functionality
21
Requirements Prioritization Based on
Value and Feasibility
High
Customer
Profitability Orders
Promotions
Product
Orders
Profitability
Forecast
Value / Impact
Business
Shipping
Call
Tracking
Returns
Manufacturing
Costs
Exchange
Rates
Low
Low Feasibility High
Key Concepts:
Created in a meeting with Senior Mgmt. Relative value is a business decision
Boxes come from Bus. Requirements Relative feasibility needs IT input
© 2005-2009 Kimball Group. All rights reserved. Page 11
- 12. Enterprise Bus Matrix
Adventure Works <-- Conformed Dimensions -->
Data Warehouse Key Concepts:
Internet Registered User
Date (Order, Start, Ship)
Bus Matrix The high level DW/BI data
architecture
Business Priority
Rows = Business
End Customer
Processes
Promotion
Columns = Conformed
Employee
Problem
Reseller
Product
Shipper
Vendor
Dimensions
Page
DW/BI system
Part
Business Process
Orders Forecasting 2 x x x x x
implemented row by row
Reseller Orders 1 x x x x x based on business priority
Internet Orders 1 x x x x x x
Purchasing x x x x x x x
Parts Inventory x x x x x
Manufacturing 6 x x x
Finished Goods Inv. x x x
Shipping 3 x x x x x x x
Returns 5 x x x x x x
Customer Calls 4 x x x x x x x x
Web Support 4 x x x x x x x x
23
Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
24
© 2005-2009 Kimball Group. All rights reserved. Page 12
- 13. Microsoft Technology for the DW
Back Room
Integration Services is a competitive ETL tool
Great performance, solid toolbox
Relational Database is strong BI platform
Key BI-related features, including partitioning,
compression, and star join optimization
Analysis Services is OLAP market leader
Dimensional design is flexible
More scalable and manageable
Data Mining – strong mining platform, leverages AS
for speed; good integration
25
Relational vs. OLAP (Why OLAP?)
Relational strengths
Data management
Flexibility
OLAP strengths
Analytic language
Ad hoc query performance
Metadata layer
Security, especially for ad hoc queries
26
© 2005-2009 Kimball Group. All rights reserved. Page 13
- 14. Microsoft Technology for the
DW/BI Front Room
Reporting Services
Good enterprise platform
Programmer-oriented report designer
Limited ad hoc query
Data presentation
Office, SharePoint, [ProClarity]
Integrated development (VS) and management environments
Scale – technology can scale to multi-TBs
Plan to spend more time and $, including on significant consulting
expertise.
Real-time features
27
Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
28
© 2005-2009 Kimball Group. All rights reserved. Page 14
- 15. The Dimensional Model (the Target)
Based on top business priority data area
Fact table = measurement of business events
Dimension tables = objects that participate in business
events (Customer, Product, Date, …)
Surrogate keys (meaningless integer)
Slowly changing dimensions
Type 1 = Overwrite old values with new
Type 2 = Add a new row when values change
Identify data quality issues now
29
Relational Dimensional Model
Date Product
Sales Fact
Product Key
Customer Key
Date Key
… other keys
Sales Amount
Other dims… Sales Quantity Customer
… other
measures
© 2005-2009 Kimball Group. All rights reserved. Page 15
- 16. Surrogate Keys
Dimension PKs should be surrogate (meaningless)
keys
Managed by the DW
Usually an integer type
Usually populated via IDENTITY keyword in dimension
table definition
Why?
Small (int) keys are vital for performance
The source system will re-use keys. They swear they
won’t. But they will.
Enables dimension attribute change tracking
31
Surrogate Keys and ETL
Dimensions
Carry source system key(s) as non-key attributes in the
dimension
New rows automatically get a new surrogate key
Facts
Fact table usually does not contain source system keys
Final step of fact processing is to exchange the source
system keys for DW surrogate keys
Lookup to dimension tables based on source key,
returning surrogate key
32
© 2005-2009 Kimball Group. All rights reserved. Page 16
- 17. Conformed Dimensions
One master dimension table that all fact tables subscribe to
Get agreement organization-wide on:
What the dimensions are called
Which hierarchies you have
Similar-but-different attributes and hierarchies have different names
Which attributes are managed by restating history and which by
tracking history
Create two sets of attributes if you need it both ways
Why?
Single version of the truth
Flexibility of basic design
33
Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
34
© 2005-2009 Kimball Group. All rights reserved. Page 17
- 18. The Need for BI Applications
Approximately 10% of your user population will
learn how to build ad hoc queries
They must learn the tool AND the data
This means you must build applications to provide
access to the other 90%
Structured
Flexible (parameters, pick lists, formats)
Well organized
35
BI Application Steps
1. BI Application design and specs
Right after business requirements
Template, mock-ups, specs, navigation framework
2. BI Application development
Can’t start until data and tools are available
Pull out your specs and get to work
Best to do this as part of Beta testing
36
© 2005-2009 Kimball Group. All rights reserved. Page 18
- 19. Standard Reports
We recommend going live with a modest number of
reports (8-12)
Enlist business users in creating and QA-ing reports
Users don’t know what they want until you show them
something
Lots of reports are “theme and variations” –
parameterize them!
Build a BI portal to host the reports
Brand it with the DW/BI logo
Add useful info about operations, contents, and help
37
Advanced BI Applications
Planning and forecasting applications
You need a decent history of fairly accurate data before you can
plan / forecast
Planning and forecasting activities are highly analytic, with a little
bit of writeback
Heavy emphasis on “what-if”
Data mining
Collection of statistical techniques to identify trends and
correlations
Requires detailed (atomic) data
Can be the most valuable thing you do with your DW/BI system
Advanced BI apps are not Phase 1 projects
38
© 2005-2009 Kimball Group. All rights reserved. Page 19
- 20. Agenda
The “Right” reason to build a DW/BI system
The Kimball Architecture
The Kimball Method and Lifecycle
Business Requirements
Technology Track
Data Track
BI Applications
Operations, Maintenance, and Growth
39
Deployment, Maintenance, and
Growth
Deployment has two major components
Software and data availability (dev, test, prod)
User preparedness (training, documentation, and
support)
Maintenance
Monitor usage and performance
Anticipate problems
Growth
Iterate back through the Lifecycle with the next priority
business process
40
© 2005-2009 Kimball Group. All rights reserved. Page 20
- 21. Session Summary
The DW/BI system can be high value, but it is
definitely high risk
Reduce risk by using an approach based on
business requirements
a flexible data architecture
delivering the full solution
Microsoft SQL Server 2008 provides the full
technology stack for DW/BI systems
SQL Server 2008 is well suited for the Kimball
Method
41
Next Steps
Learn about your business
Strategies, challenges, opportunities, terms
Industry, competition, trends
Learn the Kimball Method
Learn about adding business value
Learn the Lifecycle approach
Learn the Microsoft SQL Server 2008 DW/BI toolset
Get started!
Do a high level requirements definition and prioritization
42
© 2005-2009 Kimball Group. All rights reserved. Page 21
- 22. For More Information…
Kimball University
Next 4-day Microsoft class on 3/31 (Chicago). Stockholm in May.
Other classes in modeling, lifecycle, and ETL throughout the year
Websites
www.kimballgroup.com: articles and design tips
forum.kimballgroup.com: the Kimball Forum
Kimball Books
The Microsoft Data Warehouse Toolkit, Joy Mundy and Warren
Thornthwaite with Ralph Kimball, Wiley, 2006 (the Microsoft book)
The Data Warehouse Toolkit 2nd Edition, Ralph Kimball and Margy Ross,
Wiley, 2002 (the modeling book)
The Data Warehouse Lifecycle Toolkit 2nd Edition, Kimball, Ross and
Thornthwaite, Wiley, 2008 (how to build a DW)
The Data Warehouse ETL Toolkit, Kimball and Caserta, Wiley, 2004 (ETL
theory and practice)
43
© 2005-2009 Kimball Group. All rights reserved. Page 22