This document discusses Data Vault fundamentals and best practices. It introduces Data Vault modeling, which involves modeling hubs, links, and satellites to create an enterprise data warehouse that can integrate data sources, provide traceability and history, and adapt incrementally. The document recommends using data virtualization rather than physical data marts to distribute data from the Data Vault. It also provides recommendations for further reading on Data Vault, Ensemble modeling, data virtualization, and certification programs.
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Data Vault Fundamentals & Best Practices Overview
1. Data Vault Fundamentals &
Best Practices
1
Erik Fransen, managingconsultant
+31 6 159 444 76
@erikfransen
2. Agenda
• Introduction
• Data Vault Basics
• Benefits & Challenges
• Best practices: Automation & Data
Virtualization
• Recommended reading
2
3. • Founded in 1998, The Hague, NL
• 40+ consultants
• Business Intelligence, Data Vault, Datawarehousing,
Datawarehouse Automation, Big Data, Data Virtualization
• Business & technical consultancy, end-to-end
implementation projects of Data Vault EDW, audits,
training, certification
• Wide range of customers (profit, non-profit) across various
industries
• Since 2009 Genesee Academy partner for Data Vault Day
and Data Vault Certification in NL, B & D
• Implementation partner of Cisco, MapR, Qlik & Tableau
4. The Data Vault modeling approach
Data Vault is a data modeling approach
…so it fits into the family of modeling approaches:
4
3rd Normal Form
Ensemble
Modeling
Dimensional
• While 3rd Normal Form is optimal for Operational Systems
…and Dimensional is optimal for Data Marts
…the Ensemble Modeling is optimal for the Datawarehouse
• And Data Vault is the leading form
of Ensemble Modeling
6. Why do we use Data Vault for DWH?
6
• When we need a DWH that supports:
– Integration
– Traceability
– History
– Incremental Build
– Agility
• Gracefully Adapts to New Sources
• Full Auditability - Source to Mart
• Enterprise View of Central Data
• Ready for Automation
Data Vault is specifically
designed for modelling the
EDW
7. The Data Vault Ensemble
7
• The Data Vault Ensemble conforms to a single key – embodied in the
Hub construct
• The parts for the Data Vault Ensemble only include:
– Hubs The Natural Business Keys
– Links The Natural Business Relationships
– Satellite s All Context, Descriptive Data and History of
Links and Hubs
“Separating thingsthat change from things that don’t change”
8. The Data Vault modeling approach
• As the scope of the EDW is expanded and new data sources added, the
Data Vault can adapt to these changes without impacting the existing
model
• This is what allows the EDW to be built incrementally and to adapt to
change without the need for re-engineering.
New Area absorbed
8
H_Cust
H_Sale
H_Empl
H_Store
H_Car
Tools for DWH Automation update the Data Vault
EDW (model + data) in a fast, agile & consistent way
9. • Business benefits
• Ability to adapt quickly to new business needs
• Data is traceable allowing for a fully auditable, integrated data store
• Allows the EDW to absorb all data all of the time
• Easily adapts to new data sources and changing business rules – without expensive re-
engineering
• Results in an Data Warehouse with lower total cost of ownership (TCO)
• Automation: short time to market, consist quality
• Project/development benefits
• Ideal for agile development techniques resulting in lower project risk and more
frequent deliverables
• Can be built incrementally without compromising the core architecture
• Automation: fast and incremental sprints, predictable costs
• Architectural benefits
• Parallel loading
• Data architecture that supports future expanded scope
• Can scale to virtually any size
• Ready for Automation: forces standardization
Data Vault Benefits
9
10. Data Vault Modeling Process
The Modeling Process for creating a Data Vault
model includes three primary steps:
1) Identify and Model the Core Business Concepts
• Business Interviews is at the heart of this step
What do you do? What are the main things you work with?
• Also find best/target Natural Business Key
2) Identify and Model the Natural Business Relationships
• Specific Unique Relationships
3) Analyze and Design the Context Satellites
• Consider Rate of Change, Type of Data and also the Sources of
your data during design process
10
Ideally the data vault is modelled based
on business processes and business
concepts
11. Getting data out of the Data Vault
• Problem:
– The Data Vault EDW is about data decomposition, data
registration and data integration
– Data Vault is not intended, nor designed or optimized for
data distribution and data consumption downstream the
EDW
– Leads typically to many complex physical data marts (high
maintenance, high cost)
• Solution:
– Start thinking differently: focus on creating functional data
products for the business
– Stop loading and replicating data physically, start using
data virtualization
11
12. Eliminate the need for physical data marts
No data replication
needed
Real-time data
refreshment
No redundant data
storage
Simple updates of
data models
Simple queries
Short Time to
Market
Automatic updates
Lower storage costs
High performance
Ready for Big Data
Data Vault
EDW
CRM
ERP
Weblog
s
…
Productio
n
Data
Data Copy
Steering
information
SQL
Data
Virtualization
Tool
+
Data
Abstraction
Layers
No Data Copy
at all
12
14. Wrap up
• Data Vault Basics:
– Hubs, Links, Satellites
– Integration, history, incremental modelling, agility
• Benefits:
– Business, project, architecture
– Make use of automation tools for fast, agile and consistent
delivery
• Challenges:
– Data downstream the data vault EDW
– Solution: use virtual data marts and automate SuperNova
data models for reporting & analytics
14
18. Recommend reading on Data Virtualization
Data Virtualization in Business Intelligence Architectures
• First independent book on data virtualization that
explains in a product-independent way how data
virtualization technology works.
• Illustrates concepts using examples developed with
commercially available products.
• Shows you how to solve common data integration
challenges such as data quality, system
interference, and overall performance by following
practical guidelines on using data virtualization.
• Apply data virtualization right away with three
chapters full of practical implementation guidance.
• Understand the big picture of data virtualization
and its relationship with data governance and
information management.
18
19. Data Vault Training & Certification
• CDVDM: March 31, April 1 2016 Amsterdam
• DVD: March 2, 2016 Diegem
• www.centennium-opleidingen.nl
• For all questions: opleidingen@centennium.nl
19
20. A short history on Data Vault
• 2002: First papers published by Dan Linstedt
• 2006: Start CDVDM certification program by Genesee
Academy
• 2007: Start of Data Vault EDW implementations
– Primarily in Europe (NL, S), some in USA
• 2008-2015: Several books published on DataVault by Dan
Linstedt, Hans Hultgren and others
• 2013: Data Vault on the radar in B, DACH, UK, USA, AUS,
NZ, Asia
• 2013: Data Vault EDW implementations going worldwide
• 2015: Over 900 CDVDM professionals and 750+ Data Vault
EDW worldwide
20