Given at Oracle Open World 2011: Not to be confused with Oracle Database Vault (a commercial db security product), Data Vault Modeling is a specific data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It has been in use globally for over 10 years now but is not widely known. The purpose of this presentation is to provide an overview of the features of a Data Vault modeled EDW that distinguish it from the more traditional third normal form (3NF) or dimensional (i.e., star schema) modeling approaches used in most shops today. Topics will include dealing with evolving data requirements in an EDW (i.e., model agility), partitioning of data elements based on rate of change (and how that affects load speed and storage requirements), and where it fits in a typical Oracle EDW architecture. See more content like this by following my blog http://kentgraziano.com or follow me on twitter @kentgraziano.
1. Why Data Vault?
Kent Graziano
Data Vault Master and Oracle ACE
TrueBridge Resources
OOW 2011
Session #28782
2. My Bio
• Kent Graziano
– Certified Data Vault Master
– Oracle ACE (BI/DW)
– Data Architecture and Data Warehouse Specialist
• 30 years in IT
• 20 years of Oracle-related work
• 15+ years of data warehousing experience
– Co-Author of
• The Business of Data Vault Modeling (2008)
• The Data Model Resource Book (1st Edition)
• Oracle Designer: A Template for Developing an Enterprise
Standards Document
– Past-President of Oracle Development Tools User Group
(ODTUG) and Rocky Mountain Oracle User Group
– Co-Chair BIDW SIG for ODTUG
3. Data Vault Definition
The Data Vault is a detail oriented, historical
tracking and uniquely linked set of normalized
tables that support one or more functional areas
of business.
It is a hybrid approach encompassing the best of
breed between 3rd normal form (3NF) and star
schema. The design is flexible, scalable, consistent,
and adaptable to the needs of the enterprise. It is a
data model that is architected specifically to meet
the needs of today’s enterprise data warehouses.
Dan Linstedt: Defining the Data Vault
TDAN.com Article
(C) TeachDataVault.com
4. Where does a Data Vault Fit?
(C) TeachDataVault.com
5. Where does a Data Vault Fit?
Oracle’s Next Generation Data Warehouse Reference Architecture
Data Vault goes here
(C) Oracle Corp
6. Why Bother With Something New?
Old Chinese proverb:
'Unless you change direction, you're
apt to end up where you're headed.'
(C) TeachDataVault.com
7. Why do we need it?
• We have seen issues in constructing (and
managing) an enterprise data warehouse model
using 3rd normal form, or Star Schema.
– 3NF – Complex PKs with cascading snapshot
dates (time-driven PKs)
– Star – difficult to re-engineer fact tables for
granularity changes
• These issues lead to break downs in
flexibility, adaptability, and even scalability
(C) Kent Graziano
8. Data Vault Time Line
E.F. Codd invented 1976 Dr Peter Chen 1990 – Dan Linstedt
relational modeling Created E-R Begins R&D on Data Vault
Diagramming Modeling
Chris Date and Hugh
Darwen Maintained Mid 70’s AC Nielsen
and Refined Popularized
Modeling Dimension & Fact Terms
1960 1970 1980 1990 2000
Late 80’s – Barry Devlin and
Early 70’s Bill Inmon Dr Kimball Release “Business
Began Discussing Data Data Warehouse”
Warehousing
Mid 80’s Bill Inmon
Popularizes Data
Mid 60’s Dimension & Fact Modeling Warehousing
presented by General Mills and 2000 – Dan Linstedt
Dartmouth University Mid – Late 80’s Dr Kimball releases first 5 articles on
Popularizes Star Schema Data Vault Modeling
(C) TeachDataVault.com
15. Bringing the Data Vault to Your Project
HOW DOES IT WORK?
(C) TeachDataVault.com
16. Key: Flexibility (Agility)
• Goes beyond standard 3NF
• Hyper normalized
• Hubs and Links only holds keys and meta data
• Satellites split by rate of change and/or source
• Enables Agile data modeling
• Easy to add to model without having to change existing structures
and load routines
• Relationships (links) can be dropped and created on-demand.
• No more reloading history because of a missed requirement
• Based on natural business keys
• Not system surrogate keys
• Allows for integrating data across functions and source
systems more easily
• All data relationships are key driven.
(C) TeachDataVault.com
17. Key: Flexibility (Agility)
Adding new components to the EDW has NEAR ZERO impact to:
• Existing Loading Processes
• Existing Data Model
• Existing Reporting & BI Functions
• Existing Source Systems
• Existing Star Schemas and Data Marts
(C) TeachDataVault.com
18. Split and Merge ON DEMAND!
2 weeks from now
6 months from now
(C) TeachDataVault.com
19. Case In Point:
Result of flexibility of the Data Vault Model
allowed them to merge 3 companies in 90
days – that is ALL systems, ALL DATA!
(C) TeachDataVault.com
20. Key: Scalability in Architecture
Scaling is easy, its based on the following principles
• Hub and spoke design
• MPP Shared-Nothing Architecture
• Scale Free Networks
• Can be partitioned vertically and horizontally to meet performance demands
(C) TeachDataVault.com
21. Perhaps You Wish To Split For
Performance Reasons?
FROM THIS
TO THIS!
(C) TeachDataVault.com
22. Case In Point:
Result of scalability was to produce a Data
Vault model that scaled to 3 Petabytes in
size, and is still growing today!
(C) TeachDataVault.com
23. Key: Scalability in Team Size
You should be able to SCALE your TEAM as well!
With the Data Vault methodology, you can:
Scale your team when desired, at different points in the project!
(C) TeachDataVault.com
24. Case In Point: (Dutch Tax Authority)
Result of scalability was to increase ETL
developers for each new source system,
and reassign them when the system was
completely loaded to the Data Vault
(C) TeachDataVault.com
25. Key: Productivity
Increasing Productivity requires a reduction in complexity.
The Data Vault Model simplifies all of the following:
• ETL Loading Routines
• Real-Time Ingestion of Data
• Data Modeling for the EDW
• Enhancing and Adapting for Change to the Model
• Ease of Monitoring, managing and optimizing processes
(C) TeachDataVault.com
26. Key: Productivity
• Standardized modeling rules
• Highly repeatable and learnable modeling
technique
• Can standardize load routines
• Delta Driven process
• Re-startable, consistent loading patterns.
• Can standardize extract routines
• Rapid build of new or revised Data Marts
• Can be automated
• RapidACE (www.rapidace.com)
(C) Kent Graziano
27. Key: Productivity
• The Data Vault holds granular historical relationships.
• Holds all history for all time, allowing any source
system feeds to be reconstructed on-demand
• Easy generation of Audit Trails for data lineage and
compliance.
• Data Mining can discover new relationships between
elements
• Patterns of change emerge from the historical
pictures and linkages.
• The Data Vault can be accessed by power-users
(C) Kent Graziano
28. Case in Point:
Result of Productivity was: 2 people in 2
weeks merged 3 systems, built a full Data
Vault EDW, 5 star schemas and 3 reports.
These individuals generated:
• 90% of the ETL code for moving the data set
• 100% of the Staging Data Model
• 75% of the finished EDW data Model
• 75% of the star schema data model
(C) TeachDataVault.com
29. The Competing Bid?
The competition bid this with 15 people
and 3 months to completion, at a cost of
$250k! (they bid a Very complex system)
Actual total cost? $30k and 2 weeks!
(C) TeachDataVault.com
30. Other Benefits of a Data Vault
• Modeling it as a DV forces integration of the Business Keys upfront.
• Good for organizational alignment.
• An integrated data set with raw data extends it’s value beyond BI:
• Source for data quality projects
• Source for master data
• Source for data mining
• Source for Data as a Service (DaaS) in an SOA (Service Oriented Architecture).
• Upfront Hub integration simplifies the data integration routines
required to load data marts.
• Helps divide the work a bit.
• It is much easier to implement security on these granular pieces.
• Granular, re-startable processes enable pin-point failure correction.
• It is designed and optimized for real-time loading in its core
architecture (without any tweaks or mods).
(C) Kent Graziano
31. Conclusion?
Changing the direction of the river takes
less effort than stopping the flow of water
(C) TeachDataVault.com
32. The Experts Say…
“The Data Vault is the optimal choice for
modeling the EDW in the DW 2.0
framework.” Bill Inmon
“The Data Vault is foundationally
strong and exceptionally scalable
architecture.” Stephen Brobst
“The Data Vault is a technique which some industry
experts have predicted may spark a revolution as the
next big thing in data modeling for enterprise
warehousing....” Doug Laney
33. More Notables…
“This enables organizations to take control of their
data warehousing destiny, supporting better and
more relevant data warehouses in less time than
before.” Howard Dresner
“[The Data Vault] captures a practical body of
knowledge for data warehouse development which
both agile and traditional practitioners will benefit
from..” Scott Ambler
35. Growing Adoption…
• The number of Data Vault users in the US
surpassed 500 in 2010 and grows rapidly
(http://danlinstedt.com/about/dv-
customers/)
(C) Kent Graziano
36. In Review…
• Data Vault provides you with the tools you need to
succeed in your DW/BI projects
• Flexibility
• Enabling rapid change on a massive scale without
downstream impacts!
• Scalability
• Providing no foreseeable barrier to increased size and
scope
• Productivity
• Enabling low complexity systems with high value output at
a rapid pace
(C) TeachDataVault.com
38. Where To Learn More
The Technical Modeling Book: http://LearnDataVault.com
On YouTube: http://www.youtube.com/LearnDataVault
On Facebook: www.facebook.com/learndatavault
Dan’s Blog: www.danlinstedt.com
The Discussion Forums: http://LinkedIn.com – Data Vault Discussions
World wide User Group (Free): http://dvusergroup.com
The Business of Data Vault Modeling
by Dan Linstedt, Kent Graziano, Hans Hultgren
(available at www.lulu.com )
38
40. Contact Information
Kent Graziano
Kent.graziano@att.net
Want more Data Vault?
Session # 05923: Introduction to Data Vault Modeling
Thursday, 4:00 PM, Moscone South Rm 303