Data Warehouse - Business Intelligence Lifecycle Overview by Warren Thronthwaite
This slide deck describes the Kimball approach from the best-selling Data Warehouse Toolkit, 2nd Edition. It was presented to the Bay Area Microsoft Business Intelligence User Group in October 2012.
Starting with business requirements and project definition, the lifecycle branches out into three tracks: Technical, Data and Applications. You will learn:
* The major steps in the Lifecycle and what needs to happen in each one.
* Why business requirements are so important and how they influence all major decisions across the entire DW/BI system.
* Key tools for prioritizing business requirements and creating an enterprise information framework.
* How to break up a DW/BI system into doable increments that add real business value and can be completed in a reasonable time frame.
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
1. THE DW/BI SYSTEM LIFECYCLE
OVERVIEW
The Kimball Approach
Warren Thornthwaite
2. Acknowledgments
Course materials adapted from...
The Microsoft Data Warehouse Toolkit
J. Mundy, W. Thornthwaite (Wiley 2006)
The Data Warehouse Lifecycle Toolkit, 2nd Ed.
R. Kimball, M. Ross, W. Thornthwaite,
J. Mundy, B. Becker (Wiley 2008)
The Data Warehouse Toolkit, 2nd Ed.
R. Kimball, M. Ross (Wiley 2002)
Kimball University
Course materials
Design Tips and Intelligent Enterprise articles at
www.KimballUniversity.com
2
3. Acknowledgments
Course materials adapted from...
The Data Warehouse Lifecycle Toolkit, 2nd Ed.
R. Kimball, M. Ross, W. Thornthwaite,
J. Mundy, B. Becker (Wiley 2008)
The Data Warehouse Toolkit, 2nd Ed.
R. Kimball, M. Ross (Wiley 2002)
The Microsoft Data Warehouse Toolkit, 2nd Ed.
J. Mundy, W. Thornthwaite (Wiley 2011)
Kimball Group / Kimball University
Course materials
Design Tips and Intelligent Enterprise articles at
www.KimballGroup.com
4. Session Agenda
DW/BI Lifecycle business context
DW/BI System Lifecycle overview
Planning and managing the project/program
Defining business requirements
Creating the dimensional modeling (the data track)
Designing the DW/BI system architecture (the technology track)
Building the ETL system
Building BI applications (the applications track)
Rollout and repeat
4
6. The Business Context to the Lifecycle
Business people need information to make plans and assess
results, and this need continues to grow
Data is captured by complex systems structured to support
specific transaction requirements
Business people find it difficult to get business information from
data in transaction systems
Therefore, our job is to create a system that will:
reliably take data out of the source systems,
restructure its form and content as appropriate for business analysis,
and provide it to the business people via tools they can actually use.
6
7. Strengthen Your Awareness of the
Broader Context
Ask yourself “What am I doing?”
Writing a program
Building a database
Creating a DW/BI system
Solving a set of high value, difficult problems
Thebroader you think, the more effective you will be
in addressing business problems and delivering real
business value.
7
9. The Kimball Approach
Understand business requirements and deliver business value
Follow a proven methodology: the DW Lifecycle
Build and deliver incrementally (by business process) within an
enterprise data framework (Bus Matrix and conformed dimensions)
Design the data sets for flexibility, usability and performance
(Business Process Dimensional Model)
Provide the complete solution, including reports, query tools, portals,
documentation, training, and support
9
10. Business Requirements Are the
Foundation of Success
The more you focus your efforts on information-based
business opportunities that are high value and
relatively easy to implement, the more likely you will
be to succeed.
Regardless of which ETL or BI tools you use
Regardless of which database you use
Regardless of your technical skills
11. The DW/BI Lifecycle
Technical Product
Architecture Selection &
Design Installation Growth
Business
Project Dimensional Physical ETL Design &
Requirements Deployment
Planning Modeling Design Development
Definition
BI BI Maintenance
Application Application
Specification Development
Project Management
11
12. Planning and Managing the
Project/Program
Technical Product
Architecture Selection &
Design Installation Growth
Business
Dimensional Physical ETL Design &
Requirements Deployment
Modeling Design Development
Definition
BI BI Maintenance
Application Application
Specification Development
13. Project Planning & Management
Highlights
Assess readiness and determine starting point
Define the program / project – (2 phased startup)
Phase 1 program level: Enterprise business requirements
Prioritization / Business justification
Phase 2 project scope: Initial business process lifecycle iteration
Plan the project
Team roles and responsibilities
Detailed project plan
Manage the project
Control scope creep
Communication to manage expectations
13
14. Data Warehouse
Readiness Factors
1. Strong business management sponsor(s) (60%)
Vision of value, Politically capable, Realistic
2. Compelling business motivation (15%)
Generates urgency and supplies justification
3. Feasibility (15%)
Data feasibility
4. Other organizational Issues (10%)
IT/Business partnership
Current analytic culture
Requirements definition and prioritization are best tools to
address shortfalls
Proof of concept demo is generally a bad idea
14
15. Defining the Project:
the Two Phased Startup
Phase 1: Enterprise Requirements definition
Phase 2: Project requirements focused on top priority
business process
1 2
Enterprise Project
(Horizontal) (Vertical)
Initial Requirements
Project
Project Prioritization
Business Planning Business
Scope Process
Requirement Requirement
Definition Definition
Project Management
15
16. Planning the Project:
the Phase 2 Detailed Project Plan
Assign roles and responsibilities
Leverage existing project planning tools
List end-to-end tasks for entire Lifecycle
Integrated and detailed
Key team members should develop estimates for their
tasks
User acceptance after major tasks & deliverables
Keep unique characteristics in mind
Cross-functional, high visibility, iterative
Data problems will happen – identify them early!
16
17. Estimating Guidelines for
Project Planning / Management
Phase 1 requirements and prioritization
Key determinates: Readiness / sponsor scenario
Rule of thumb: Three weeks to months +
Developing the project plan
Rule of thumb: Less than two weeks
On-going project management
Key determinates: Organizational complexities, #
players, # issues, political realities, ...
Rule of thumb: Often dedicated to DW/BI team
17
18. Defining Business Requirements
Technical Product
Architecture Selection &
Design Installation Growth
Project Dimensional Physical ETL Design &
Deployment
Planning Modeling Design Development
BI BI Maintenance
Application Application
Specification Development
Project Management
19. Defining Business Requirements:
Overall Process
Interviews are preferable
Three phases
Preparation (do your homework)
Interviews (including data source experts)
Documentation
Two passes
Enterprise
Project
19
20. Defining Business Requirements: the
Interviews
Assign roles and be ready
Must ask the right question
NOT “What do you want?”
Ask “What do you do?”
(“What are your roles and responsibilities? What could you do better
with improved access to information?”)
Cover key areas and listen
Take notes
Debrief with team immediately after
Common themes / opportunities
Required data (business processes)
Do-ability
Areas requiring clarification
User analytical / technical sophistication
20
21. Defining Business Requirements:
Interview Results
You must do the formal documentation
Validation
Reference material
Individual interview write-ups
Summary, not transcript
Business Objectives
Analytic opportunities and info requirements
Project Success Criteria
Consolidated findings document
Main content is list and descriptions of analytic opportunities
Includes the initial data warehouse bus matrix
21
22. The Data Warehouse Bus Matrix is the
Enterprise Data Architecture Framework
Matrix of business processes and
conformed dimensions
Business <--- Dimensions --->
Processes Date Product Dist Ctr Vendor Shipper Store Customer Promo
Purchase Orders X X X X
Dist Ctr Delivery X X X X X
Dist Ctr Inventory X X X
Store Deliveries X X X X X
Store Inventory X X X
Store Sales X X X X X
Returns X X X X X
22
23. Requirements Prioritization Session
Facilitated session with Business and IT management
Agenda:
Confirm requirements
Prioritize analytic info groups High
A
Evaluate business B
impact / benefit
Evaluate feasibility E
Potential G
Business F
Outcomes: Impact
H
Mgmt education on feasibility D
“Right” opportunities C
Consensus Low
Ownership / Sponsorship Low Feasibility High
Roadmap for growth
23
24. Defining Business Requirements
Summary
Understanding business requirements is CRITICAL to
successful DW/BI system
Don’t overlook the up-front preparation
Focus on listening
Document what you’ve heard
Analytic requirements
Enterprise bus matrix
Prioritize with senior management
24
25. Designing the Dimensional Model
The Data Track
Technical Product
Architecture Selection &
Design Installation Growth
Business
Project Requirements Deployment
Planning
Definition
BI BI Maintenance
Application Application
Specification Development
Project Management
26. Designing the Business
Process Dimensional Model
Basic dimensional modeling concepts
Slowly changing dimensions
The dimensional modeling process
Data profiling and data stewardship
26
27. Terminology: Business Process
Dimensional Model (or Star Schema)
Normalized fact table (business event) for a single business
process at atomic detail level (the grain)
Denormalized dimensions (entities/objects) with all attributes
and one active row per
occurrence of the object Product KEY Product KEY Store KEY
Date KEY
Product Store KEY Store
Benefits: Attributes Promo KEY Attributes
Easier to understand
Better performance
Pre-joined dimensions Date KEY Promo KEY
Facts
Star join optimization Promo
Date
Dimensional engine Attributes
Attributes
Extensible to handle change
27
28. Terminology:
Slowly Changing Dimension
Techniques for handling changes to dimension attributes
Type 1: overwrite attribute values
Common default, appropriate for corrections
Type 2 : create a new dimension row when attribute value
changes
Flexible technique, critical for accurately tracking behavior over time
Hybrid combinations of 1 and 2 are most common
Integration Services has basic Slowly Changing Dimension
management built in
28
29. Dimensional Modeling Process
Develop the Data Warehouse Bus matrix
Start with the 4-step method to identify facts and dimensions
Step 1: Identify the business process (what row on the matrix should we
start with?)
Step 2: Declare the grain
Step 3: Choose the dimensions
Step 4: Choose the facts
Diagram the dimensional model
Fill in the dimension and fact attributes (Step 5)
Use business requirements + source docs + data profiling
Follow naming standards (understandable to business)
Try the dimensional modeling spreadsheet from the book’s web site:
http://www.kimballgroup.com/html/booksMDWTtools.html
29
30. Creating Conformed Dimensions
(Step 5)
All fact tables that share dimensions must use the same
dimension with the same key
Agree on column names and definitions
Identify best source
Assign surrogate key to every dimension row
Product
Combine all attributes into Surrogate Key Product KEY
Master dimension table Business Key Product Code
Use the Master dimension Description
to map the business Marketing Brand
Category
key in the fact rows Height
to the surrogate key for Logistics
Width
each business process Weight
that uses the dimension Cost Acctg. Standard Cost
30
31. Dealing with Data Quality
Data Profiling Data Stewardship
Data exploration to determine Identify people on the
data feasibility business side who care
Understand data structures, Enroll them in data
relationships and business rules exploration
Identify (and document) data
problems Include source systems
Tools
managers
Simple: SQL, BI tool, RS project Agree on names, definitions,
(see kimballgroup.com) business rules, etc.
Advanced: Data Profiling tool
32. Dimensional Modeling Summary
Enterprise perspective / roadmap
Enterprise Data Warehouse Bus Matrix
Presentation area must be dimensional
Ease of use
Query performance
Start with atomic detail, not just summary
Conform dimensions for consistency
Apply SCD techniques for handling attribute changes
Engage business to define names, content, business rules, and deal with data
quality
Process
4-step approach
1) Business process, 2) grain, 3) dimensions, 4) facts
Fill in the attributes and measures (Step 5)
32
33. Designing the DW/BI System
Architecture
The Technology Track
Growth
Business
Project Dimensional Physical ETL Design &
Requirements Deployment
Planning Modeling Design Development
Definition
BI BI Maintenance
Application Application
Specification Development
Project Management
34. Architecture Principles
The DW/BI System architecture is the set of
components and functionality needed to meet the
business requirements
Business requirements determine architecture
Most of the tools include only core functionality. You
will have to write code for your specific issues.
Thismeans your DW/BI system architecture will not
be the same as your neighbor’s
Draw it out and write it down!
34
35. The Goal: A Conformed DW/BI System
Source
Systems
ETL Business Process
System Dimensional Models
Relational Analysis Logistics
Inventory Dimension DMBS Services
Processing
Inventory
Fact Sales
Orders
Processing
Orders
Billing
Aggregates
- Analysis Svcs. Returns Marketing
- Relational DB
Billing / • Models contain atomic-level detail
Returns • with aggregates for performance
• and transparent aggregate navigation
• Includes both relational dimensional model
and OLAP dimensional model
35
36. What Goes Into a Typical
Warehouse Architecture?
High Level Warehouse Technical Architecture Model
Back Room Front Room
Metadata
BI Applications
Presentation Server Layer Data Access Services Direct access
Dimension query and
Source Data Quality Real Time Layer
Maintenance Operational reporting tools
BI Portal
Systems Workbench
- Operational Front End BI and Enterprise
- ODS Standard
Performanc Reporting
- Desktop tools Reports
- XML / Flat files ETL Metadata Dim 4 Dim 1 e Mgmt Services
Aggregate Navigator
Fact
- MDM system Services
Dim 3 Dim 2
Enterprise Bus Matrix
- External Storage Repository Dim 4 Dim 1
Analytic
Applications
Fact
-… Dim 3 Dim 2
Dim 4 Dim 1
Dashboards &
Fact
Dim 3 Dim 2
Delivery Admin. Web Scorecards
Cleaning &
Applications
Dim 4 Dim 1
Extract Services Services
Fact
Preparing for
Conforming
Dim 3 Dim 2
Presentation Dim 4
Fact
Dim 1 Operational
Dim 3 Dim 2 Aggregates for Metadata BI
Management Atomic level
Performance Security Mgmt.
Operational
ETL Services business process Services and Systems and
dimensional models Browsing Reports
Infrastructure
37. Sample Architecture Plan Document
Outline
Executive Overview
Architecture Implications of Business Requirements
Architecture Overview
Back Room and Front Room Services
Data Stores (Source, Staging, Presentation Servers)
Metadata Strategy
ETL System Strategy and Details
BI Applications System Strategy and Details
Infrastructure
Architecture Implementation Phases and Timing
Technology Evaluation Process
37
Architecture Models
38. The ETL System
Populating the data warehouse
Technical Product
Architecture Selection &
Design Installation Growth
Business
Project Dimensional Physical
Requirements Deployment
Planning Modeling Design
Definition
BI BI Maintenance
Application Application
Specification Development
Project Management
39. ETL Startup
Create an ETL Plan
Based on dimensional model docs, data quality, and
additional research
Map source tables to each target and identify required
transformations
Each target flow corresponds to an ETL package
Setup development environment
39
40. The ETL Functions
Understand the core functions common to most ETL
systems (there are 34 of them)
They fall into four categories:
Extract: get the data out of the source and into the DW
system
Transformation: clean the data and conform it to
standard definitions and contents
Prepare the data for presentation: “dimensionalization”
Manage all the above functions in a coherent system
40
41. Populating Dimension Tables
Recreating Type 2 change history can be a challenge
Cleaning and conforming can be complex
Integrating multiple sources and de-duplicating is a
process unique to your business
Integration Services’ tools including Fuzzy Lookup can
help for simple problems
Complex problems require a third party tool or service
Universaldimension function is handling changes in
dimension attributes (SCDs)
41
42. Slowly Changing Dimensions
Dimension attributes will change over time
Business users determine what must be tracked
1
Source New Source Rows Assign Simple Insert
Surrogate Master
File
Keys Dimension
2
Type 2
Assign Update
Changed Insert New
Compare Surrogate Current_Flags
Rows Current Rows
Keys and Dates
3 Type 1
Update
Changed
Master Value in
Rows Replace
Dimension Current Row Most-Recent-
Key Map w/Current
No Change
Rows
(Optional)
Ignore
42
43. Populating Fact Tables
Populate the initial historical load
Different source systems, data structures, formats over time
History missing
Must build historical Slowly Changing Dimensions first
Can take a long time
Develop incremental load logic
Usually different packages from the historical load
Push vs. Pull (extract ownership)
Identify new / changed rows
Key substitution is big task
Catch up history and start incremental loads
Validate data at each step
43
44. Fact Table ETL:
Surrogate Key Lookup (Pipeline)
Replace production keys in the fact table extract with surrogate
keys from the dimensions
Maintain and ensure referential integrity!
Watch for fact table key collisions
Most Recent Most Recent Most Recent Most Recent
Time Key Product Key Store Key Promotion
Map Map Map Key Map
time_ID product_ID store_ID promo_ID
time_key product_key store_key promo_key
Fact Table Fact Table
replace replace replace replace
Records With Records With
time_ID with product_ID store_ID with promo_ID w/
Production IDS Surrogate Keys
surrogate w/ surrogate surrogate surrogate
time_ID time_key product_key store_key promo_key time_key
product_ID product_key
store_ID store_key
promotion_ID promotion_key
dollar_sales dollar_sales
unit_sales
dollar_cost
Referential Integrity Failures unit_sales
dollar_cost
Key Collisions
load fact
table records
into DBMS
44
45. ETL System Summary
Develop a plan and setup development environment
Build out historical dimensions, including Type 2
attribute changes
Build out historical facts based on historical dimension
key substitution
Design and build Analysis Services cube(s)
Create incremental load packages
45
46. BI Applications
Delivering Value --not just data
Technical Product
Architecture Selection &
Design Installation Growth
Business
Project Dimensional Physical ETL Design &
Requirements Deployment
Planning Modeling Design Development
Definition
Maintenance
Project Management
47. BI Application
Concepts
Role and definition
Application design
Templates
Navigation
Applications development
Additional Value
Data validation
Performance tuning
Character development
47
48. Role and Definition of BI
Applications
Nature Consumer Information
of Use Type Interface Value of BI Appl’n
Ad hoc - Reporting / Analysis
Strategic power
Desktop tools for
Do-it-yourself examples
users - Assured reference points
queries Migration
Path
Push-button - Low effort
knowledge BI - Current business view
workers - Flexible
Applications
Migration
Path
Standard Operational
report Reporting
Operational consumers Environment
49. The BI Application Continuum
Standard Analytic
Reports Applications
Simple, fixed Standard, flexible Complex analytic
format, pre-run format, parameter applications with
reports driven reports domain expertise,
embedded algorithms
and operational system
feedback loops
50. Design and Spec BI Applications
Create and prioritize a candidate report list
Develop a standard report template
Mandatory content (descriptions, titles, etc.)
Output look and feel
Develop mock-ups for top N candidate BI
applications
Report/dashboard layouts with parameters
Document data sets, business rules, calculations, etc.
End user navigation
Structured path through templates and reports
This becomes the core of the BI Portal
50
51. Sample BI Application Mock-up
From the Geography
Dimension Variable Time
Period
Variable Time Period
Global BI System
<Geography Name> We’re here to help
Topline Performance Report
<Period> Compared to <Previous Period>
Sales YA Sales Sales Market % Var
<<Product Line>> Units Units Index Share Prev Share
xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x
xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x
xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x
xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x
xxxxxxxxxx xxx,xxx xxx,xxx xx.x xx.xx x.x
Report Information
Report Category: {Sales Analysis}
Report Name: {Topline Performance Report – current vs. prior period by Geography}
Source: {DW - Sales Performance} Run on: {Run_Date} Page { 1}
Product Lines that meet the constraint criteria
(may have drill down capability).
54. Developing BI Applications
Begin development when...
Data is ready
Front end tool installed and environment set-up
Pull out report specifications
Build reports
Validate tool calculations and drill-paths
Validation data
Performance Tuning
Ongoing maintenance and enhancement resources
Character Development
54
55. BI Applications Summary
BI Applications play a critical role
High value reporting
Broad audience
Data Warehouse team learning opportunity
Design Application standards, specs and navigation up
front
Develop Applications when data is ready
Validate tool capabilities
Check data quality
Identify query performance problems
Make sure you have dedicated resources to maintain and
enhance your BI offerings
55
56. Rollout and Repeat
Security, Deployment, Operations, and Growth
Technical Product
Architecture Selection &
Design Installation
Business
Project Dimensional Physical ETL Design &
Requirements
Planning Modeling Design Development
Definition
BI BI
Application Application
Specification Development
Project Management
57. The Thankless Tasks that Must Be
Done
Security
Deployment process
User support
Training, desktop, support, documentation
System deployment
dev Test Production
Maintenance
System
User support
57
58. Growth
The Lifecycle is an iterative process
Revisitopportunities with business and select the next
top priority
Build additional dimensions
Load facts for this business process
(fill out the bus matrix row by row)
Build and deliver the BI applications
Rollout and repeat!
58
59. Start the Business Dimensional Lifecycle All
Over Again!
Technical Product
Architecture Selection &
Design Installation Growth
Business
Project Dimensional Physical ETL Design &
Requirements Deployment
Planning Modeling Design Development
Definition
BI BI Maintenance
Application Application
Specification Development
Project Management
60. Conclusion
The Lifecycle is a proven DW/BI methodology
Keys to success:
Business value focused
Short, iterative delivery cycles
In an Enterprise framework
Full, end-to-end solution
60
61. Contact Info
warren@kimballgroup.com
Visit www.kimballgroup.com for
Articles
Design tips (149 and counting)
Whitepapers
Forum
Allof the concepts discussed are expanded on in the
Kimball Toolkit series of books