SlideShare ist ein Scribd-Unternehmen logo
1 von 59
10/6/2011 LearnDataVault.com 1
Data Vault Modeling MethodologyA Primer… © Dan Linstedt 2009-2012 All Rights Reserved http://LearnDataVault.com
A bit about me… 3 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://www.youtube.com/LearnDataVault http://LearnDataVault.com Full profile on http://www.LinkedIn.com/dlinstedt LearnDataVault.com
What IS a Data Vault? (Business Definition) Data Vault Model Detail oriented Historical traceability Uniquely linked set of normalized tables Supports one or more functional areas of business 10/6/2011 LearnDataVault.com 4 ,[object Object]
CMMI Level 5 Project Plan
Risk, Governance, Versioning
Peer Reviews, Release Cycles
Repeatable, Consistent, Optimized
Complete with Best Practices for BI/DWBusiness Keys Span  / Cross Lines of Business Sales Contracts Planning Delivery Finance Operations Procurement Functional Area
What Does One Look Like? 10/6/2011 LearnDataVault.com 5 Records a history of the interaction Customer Product Sat Sat Sat Sat Sat Link Customer Product F(x) F(x) F(x) Sat Sat Sat Sat Order F(x) Sat Order Elements: ,[object Object]
Link
SatelliteHub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data
Who’s Using It? 10/6/2011 LearnDataVault.com 6
The PAIN!! Issues in Current EDW Projects 10/6/2011 LearnDataVault.com 7
EDW Architecture: Generation 1 10/6/2011 LearnDataVault.com 8 Enterprise BI Solution (batch) Sales Staging (EDW) Star Schemas Complex  Business  Rules Finance Conformed Dimensions Junk Tables Helper Tables Factless Facts Staging + History Contracts Complex Business Rules +Dependencies
Kick-Starting Data Warehousing HR Asks IT to build the FIRST Data Warehouse / Prototype 10/6/2011 LearnDataVault.com 9 1. 2. IT Says…   OK:  $125k and 90 days… 3. HR Says: Great!  Get Started
Everyone’s Happy! IT Delivers. On-Time & In Budget! 10/6/2011 LearnDataVault.com 10 4. 5. HR Says: Thank-you!  We’re Happy! First Star! Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF Fact_PDQ Fact_MYFACT Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type
So Where’s the PAIN? 10/6/2011 LearnDataVault.com 11
The PAIN is RIGHT HERE!! Contracts Sees Success, wants the same for their systems. 10/6/2011 LearnDataVault.com 12 1. 2. IT Says…  Ok, but…  It won’t be  $125k and 90 days… Because we have to “merge it” with HR” it will be $250 and 180 days. 3. Contracts Says: Ouch!  That’s not reasonable, but we need it, so go ahead…
And HERE…. 10/6/2011 LearnDataVault.com 13 Finance, Sales, and Marketing want in…. IT Says…  Ok, but…  It won’t be  $250k and 90 days…  Because we have to “merge it” with HR and Contracts it will be $350k and 250 days. And this continues…. Business Says... “Can’t you just make-a-copy of the Star Schema, and give me my own for cheaper & less time?
Silo Building / IT Non-Agility 10/6/2011 LearnDataVault.com 14 First Star SALES We built our own because IT costs too much FINANCE We built our own because IT took too long MARKETING We built our own because we need customized dimension data Why is this happening?  What’s Causing this Problem?
Root Cause of Pain: Re-Engineering! 10/6/2011 LearnDataVault.com 15 IT is forced to Re-EngineerETL loading code + SQL BI Queries WHENEVER: ,[object Object]
New systems are introduced1. Adding fields to Dimensions ,[object Object]
(causing ETL Loading to change, and forcing Engineers to RELOAD existing data)Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF Fact_PDQ Fact_MYFACT Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type 3. Adding Dimensions to Facts 2. Adding fields to Facts
Why Re-Engineering? 10/6/2011 LearnDataVault.com 16 Adding fields to a conformed  dimension…. Adding fields to a shared  fact…. Changing code to match  new business rules… Require adding/changing Fields in target tables! Require Re-Engineering!
Other Pains? 10/6/2011 LearnDataVault.com 17 Dimension-Itis? IT – Non-Agility? Deformed Dimensions? What about the “data” you don’t see? What about the “BAD” data left in the source systems?
The Solution Go the Data Vault Route! 10/6/2011 LearnDataVault.com 18
EDW Architecture: Generation 2 10/6/2011 LearnDataVault.com 19 SOA Enterprise BI Solution Star Schemas (real-time) Sales (batch) DV EDW (batch) Staging Error Marts Finance Contracts Report Collections Business Rules Downstream! (the Lens Filter)
Unstructured Data And Data Vault 10/6/2011 LearnDataVault.com 20 Unstructured Data Sets Ontologies/Taxonomies Unstructured  Processing Engine ,[object Object]
Docs
Images
Movies
SoundOn-Demand Cubes Joins through LINK Structures Data Vault EDW
IT Agility 10/6/2011 LearnDataVault.com 21 RAW “what-is” Star Schemas Complex Business  Rules ETL-T Data Vault (EDW) Source Staging Business Driven Star Schemas 2. Business Gap Analysis ,[object Object]
Business Requirements
Start new phase1. Fast Load & Fast Integration 3. IT Implementation of Business Rules
What are the Facts Jack? 10/6/2011 LearnDataVault.com 22 Generation 1 EDW’s tried to provide “One version of the truth” Generation 2 (Data Vaults) provide… “One version of the facts, for each point in time.”
Business Gap Analysis 10/6/2011 LearnDataVault.com 23 The Way Business Perceives  it’s business to be running Gap Analysis Operational Reports Gap Analysis Dynamic Cubes (Data Marts) The way the source systems see the business running.
Secured/Protected Information Systems 10/6/2011 LearnDataVault.com 24 Non-Classified DV Classified Data Vault Hub Sat Hub Data Copy Link Link Sat Sat Sat Model Copy Sat Hub Hub Link Hub Sat Sat Sat Sat Sat Sat Sat Sat Yellow = New Tables ,[object Object]
Classified world can add all their own structures while maintaining congruence with standard unclassified Data Vault,[object Object]
Where’s the Solution? 10/6/2011 LearnDataVault.com 26 Re-Engineering Handle Changes Wherever… Whenever…  with EASE!
The Three vehicles… Pros and Cons of the Modeling Methodologies 10/6/2011 LearnDataVault.com 27
3rd Normal Form Pros/Cons as an EDW PROS (as 3NF) Many to many linkages Handle lots of information Tightly integrated information Highly structured Conducive to near-real time loads Relatively easy to extend 10/6/2011 LearnDataVault.com 28 CONS (as EDW) Time driven PK issues Parent-child complexities Cascading change impacts Difficult to load Not conducive to BI tools Not conducive to drill-down Difficult to architect for an enterprise Not conducive to spiral/scope controlled implementation Physical design usually doesn’t follow business processes
Star Schema Pros/Cons as an EDW PROS (as Data Mart) Good for multi-dimensional analysis Subject oriented answers Excellent for aggregation points Rapid development / deployment Great for some historical storage 10/6/2011 LearnDataVault.com 29 CONS (as EDW) Not cross-business functional Use of junk / helper tables Trouble with VLDW Unable to provide integrated enterprise information Can’t handle ODS or exploration warehouse requirements Trouble with data explosion in near-real-time environments Trouble with updates to type 2 dimension primary keys Trouble with late arriving data in dimensions to support real-time arriving transactions Not granular enough information to support real-time data integration
Data Vault Pros/Cons as an EDW PROS (as EDW) Supports near-real time and batch feeds Supports functional business linking Extensible / flexible Provides rapid build / delivery of star schema’s Supports VLDB / VLDW Designed for EDW Supports data mining and AI Provides granular detail Incrementally built 10/6/2011 LearnDataVault.com 30 CONS (as EDW) Not conducive to OLAP processing Requires business analysis to be firm Introduces many join operations
The Three Vehicles… Which would you use to win a race? Which would you use to move a house? Would you adapt the truck and enter a race with Porches and expect to win? 10/6/2011 LearnDataVault.com 31
#1 complaint about DV architecture So you want to deal with Joins do you? 10/6/2011 LearnDataVault.com 32
Joins, Everywhere! 10/6/2011 LearnDataVault.com 33 Yes, the DV is full of joins but… These are highly normalized tables (thin & Narrow), reducing I/O’s to read large numbers of rows, at high speed, in parallel.  Joins occur in RAM instead of on disk.  The Optimizer is given a chance to “drop tables” from the join that aren’t necessary. When Parallelism is too much… ,[object Object]
Not enough rows being queried, (the overhead of starting the threads takes longer than an original scan.End Result?  The DV Scales to the Petabyte Levels when necessary…
Mathematics Behind the Data Vault Model *** The Data Vault is BACKED by Mathematical Principles*** Parallel versus sequential execution models Set Logic I/O Bandwidth & Throughput Compression (for query performance gains) Process Repeatability (tuning & predictability measurements) RAM versus electromagnetic disk (Solid-State Drives are not measured) http://osl.cs.uiuc.edu/docs/IPDPS-TR04/TCA_TR04.pdf 10/6/2011 LearnDataVault.com 34
Know when to hold ‘em, know when to fold ‘em When to use DV, and when not… 10/6/2011 LearnDataVault.com 35
The Challenger…. 10/6/2011 LearnDataVault.com 36 The challenger says: ,[object Object]
I don’t have volume problems…
I don’t have compliance/auditability problems…
I don’t have real-time problems…
My system produces matching results across lines of business…
I’ve never had to “re-state” the data in the warehouse…
I can still build new marts, and conform dimensions in 30 days or less…
My business doesn’t acquire new systems often (if ever)

Weitere ähnliche Inhalte

Was ist angesagt?

Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Michael Olschimke
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultDaniel Upton
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Daniel Upton
 
Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingDaniel Upton
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementEmpowered Holdings, LLC
 
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
CWIN 17 / sessions data vault modeling -  f2-f - nishat guptaCWIN 17 / sessions data vault modeling -  f2-f - nishat gupta
CWIN 17 / sessions data vault modeling - f2-f - nishat guptaCapgemini
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesCGI
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseDaniel Upton
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Hans Hultgren
 
Conceptional Data Vault
Conceptional Data VaultConceptional Data Vault
Conceptional Data VaultTorsten Glunde
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault ModelingKent Graziano
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Edureka!
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-AshishGuleria
 

Was ist angesagt? (20)

Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Lean Data Warehouse via Data Vault
Lean Data Warehouse via Data VaultLean Data Warehouse via Data Vault
Lean Data Warehouse via Data Vault
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...Shorter time to insight more adaptable less costly bi with end to end modelst...
Shorter time to insight more adaptable less costly bi with end to end modelst...
 
Agile BI via Data Vault and Modelstorming
Agile BI via Data Vault and ModelstormingAgile BI via Data Vault and Modelstorming
Agile BI via Data Vault and Modelstorming
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data Management
 
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
CWIN 17 / sessions data vault modeling -  f2-f - nishat guptaCWIN 17 / sessions data vault modeling -  f2-f - nishat gupta
CWIN 17 / sessions data vault modeling - f2-f - nishat gupta
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Original: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile EnterpriseOriginal: Lean Data Model Storming for the Agile Enterprise
Original: Lean Data Model Storming for the Agile Enterprise
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Conceptional Data Vault
Conceptional Data VaultConceptional Data Vault
Conceptional Data Vault
 
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
(OTW13) Agile Data Warehousing: Introduction to Data Vault Modeling
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 

Ähnlich wie IRM UK - 2009: DV Modeling And Methodology

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...
Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...
Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...LicensingLive! - SafeNet
 
Creating Your Data Governance Dashboard
Creating Your Data Governance DashboardCreating Your Data Governance Dashboard
Creating Your Data Governance DashboardTrillium Software
 
ETL Market Webcast
ETL Market WebcastETL Market Webcast
ETL Market Webcastmark madsen
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourWhereScape
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?Denodo
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overviewKeshav Murthy
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxHong Ong
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMichael Pearce
 
SAP BusinessObject's Webi Rich Client
SAP BusinessObject's Webi Rich ClientSAP BusinessObject's Webi Rich Client
SAP BusinessObject's Webi Rich ClientEric Molner
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo
 
CDI-MDMSummit.290213824
CDI-MDMSummit.290213824CDI-MDMSummit.290213824
CDI-MDMSummit.290213824ypai
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & OptimizationAmbareesh Kulkarni
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceDavid Portnoy
 
Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?
Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?
Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?WSO2
 
Kelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud ComputingKelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud ComputingDavid Chou
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Denodo
 
Resume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - IndResume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - IndAbhishek Ray
 

Ähnlich wie IRM UK - 2009: DV Modeling And Methodology (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...
Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...
Bringing to Market a Successful Cloud Service - Knowing When to Partner, When...
 
Creating Your Data Governance Dashboard
Creating Your Data Governance DashboardCreating Your Data Governance Dashboard
Creating Your Data Governance Dashboard
 
ETL Market Webcast
ETL Market WebcastETL Market Webcast
ETL Market Webcast
 
Data Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast TourData Vault 2.0 Demystified: East Coast Tour
Data Vault 2.0 Demystified: East Coast Tour
 
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
¿Cómo modernizar una arquitectura de TI con la virtualización de datos?
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
 
DBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptxDBT ELT approach for Advanced Analytics.pptx
DBT ELT approach for Advanced Analytics.pptx
 
MLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into ProductionMLOps - Getting Machine Learning Into Production
MLOps - Getting Machine Learning Into Production
 
Fitter Faster Smarter
Fitter Faster Smarter Fitter Faster Smarter
Fitter Faster Smarter
 
SAP BusinessObject's Webi Rich Client
SAP BusinessObject's Webi Rich ClientSAP BusinessObject's Webi Rich Client
SAP BusinessObject's Webi Rich Client
 
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo LiteDenodo Partner Connect: Business Value Demo with Denodo Demo Lite
Denodo Partner Connect: Business Value Demo with Denodo Demo Lite
 
CDI-MDMSummit.290213824
CDI-MDMSummit.290213824CDI-MDMSummit.290213824
CDI-MDMSummit.290213824
 
Data Provisioning & Optimization
Data Provisioning & OptimizationData Provisioning & Optimization
Data Provisioning & Optimization
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?
Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?
Partner Webinar: Why Is Open Source the Smartest Choice for Hybrid Integration?
 
Kelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud ComputingKelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud Computing
 
Gd resume
Gd resumeGd resume
Gd resume
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
Resume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - IndResume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - Ind
 

Kürzlich hochgeladen

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Kürzlich hochgeladen (20)

unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

IRM UK - 2009: DV Modeling And Methodology

  • 2. Data Vault Modeling MethodologyA Primer… © Dan Linstedt 2009-2012 All Rights Reserved http://LearnDataVault.com
  • 3. A bit about me… 3 Author, Inventor, Speaker – and part time photographer… 25+ years in the IT industry Worked in DoD, US Gov’t, Fortune 50, and so on… Find out more about the Data Vault: http://www.youtube.com/LearnDataVault http://LearnDataVault.com Full profile on http://www.LinkedIn.com/dlinstedt LearnDataVault.com
  • 4.
  • 5. CMMI Level 5 Project Plan
  • 9. Complete with Best Practices for BI/DWBusiness Keys Span / Cross Lines of Business Sales Contracts Planning Delivery Finance Operations Procurement Functional Area
  • 10.
  • 11. Link
  • 12. SatelliteHub = List of Unique Business Keys Link = List of Relationships, Associations Satellites = Descriptive Data
  • 13. Who’s Using It? 10/6/2011 LearnDataVault.com 6
  • 14. The PAIN!! Issues in Current EDW Projects 10/6/2011 LearnDataVault.com 7
  • 15. EDW Architecture: Generation 1 10/6/2011 LearnDataVault.com 8 Enterprise BI Solution (batch) Sales Staging (EDW) Star Schemas Complex Business Rules Finance Conformed Dimensions Junk Tables Helper Tables Factless Facts Staging + History Contracts Complex Business Rules +Dependencies
  • 16. Kick-Starting Data Warehousing HR Asks IT to build the FIRST Data Warehouse / Prototype 10/6/2011 LearnDataVault.com 9 1. 2. IT Says… OK: $125k and 90 days… 3. HR Says: Great! Get Started
  • 17. Everyone’s Happy! IT Delivers. On-Time & In Budget! 10/6/2011 LearnDataVault.com 10 4. 5. HR Says: Thank-you! We’re Happy! First Star! Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF Fact_PDQ Fact_MYFACT Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type
  • 18. So Where’s the PAIN? 10/6/2011 LearnDataVault.com 11
  • 19. The PAIN is RIGHT HERE!! Contracts Sees Success, wants the same for their systems. 10/6/2011 LearnDataVault.com 12 1. 2. IT Says… Ok, but… It won’t be $125k and 90 days… Because we have to “merge it” with HR” it will be $250 and 180 days. 3. Contracts Says: Ouch! That’s not reasonable, but we need it, so go ahead…
  • 20. And HERE…. 10/6/2011 LearnDataVault.com 13 Finance, Sales, and Marketing want in…. IT Says… Ok, but… It won’t be $250k and 90 days… Because we have to “merge it” with HR and Contracts it will be $350k and 250 days. And this continues…. Business Says... “Can’t you just make-a-copy of the Star Schema, and give me my own for cheaper & less time?
  • 21. Silo Building / IT Non-Agility 10/6/2011 LearnDataVault.com 14 First Star SALES We built our own because IT costs too much FINANCE We built our own because IT took too long MARKETING We built our own because we need customized dimension data Why is this happening? What’s Causing this Problem?
  • 22.
  • 23.
  • 24. (causing ETL Loading to change, and forcing Engineers to RELOAD existing data)Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF Fact_PDQ Fact_MYFACT Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type Customer_ID Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Type 3. Adding Dimensions to Facts 2. Adding fields to Facts
  • 25. Why Re-Engineering? 10/6/2011 LearnDataVault.com 16 Adding fields to a conformed dimension…. Adding fields to a shared fact…. Changing code to match new business rules… Require adding/changing Fields in target tables! Require Re-Engineering!
  • 26. Other Pains? 10/6/2011 LearnDataVault.com 17 Dimension-Itis? IT – Non-Agility? Deformed Dimensions? What about the “data” you don’t see? What about the “BAD” data left in the source systems?
  • 27. The Solution Go the Data Vault Route! 10/6/2011 LearnDataVault.com 18
  • 28. EDW Architecture: Generation 2 10/6/2011 LearnDataVault.com 19 SOA Enterprise BI Solution Star Schemas (real-time) Sales (batch) DV EDW (batch) Staging Error Marts Finance Contracts Report Collections Business Rules Downstream! (the Lens Filter)
  • 29.
  • 30. Docs
  • 33. SoundOn-Demand Cubes Joins through LINK Structures Data Vault EDW
  • 34.
  • 36. Start new phase1. Fast Load & Fast Integration 3. IT Implementation of Business Rules
  • 37. What are the Facts Jack? 10/6/2011 LearnDataVault.com 22 Generation 1 EDW’s tried to provide “One version of the truth” Generation 2 (Data Vaults) provide… “One version of the facts, for each point in time.”
  • 38. Business Gap Analysis 10/6/2011 LearnDataVault.com 23 The Way Business Perceives it’s business to be running Gap Analysis Operational Reports Gap Analysis Dynamic Cubes (Data Marts) The way the source systems see the business running.
  • 39.
  • 40.
  • 41. Where’s the Solution? 10/6/2011 LearnDataVault.com 26 Re-Engineering Handle Changes Wherever… Whenever… with EASE!
  • 42. The Three vehicles… Pros and Cons of the Modeling Methodologies 10/6/2011 LearnDataVault.com 27
  • 43. 3rd Normal Form Pros/Cons as an EDW PROS (as 3NF) Many to many linkages Handle lots of information Tightly integrated information Highly structured Conducive to near-real time loads Relatively easy to extend 10/6/2011 LearnDataVault.com 28 CONS (as EDW) Time driven PK issues Parent-child complexities Cascading change impacts Difficult to load Not conducive to BI tools Not conducive to drill-down Difficult to architect for an enterprise Not conducive to spiral/scope controlled implementation Physical design usually doesn’t follow business processes
  • 44. Star Schema Pros/Cons as an EDW PROS (as Data Mart) Good for multi-dimensional analysis Subject oriented answers Excellent for aggregation points Rapid development / deployment Great for some historical storage 10/6/2011 LearnDataVault.com 29 CONS (as EDW) Not cross-business functional Use of junk / helper tables Trouble with VLDW Unable to provide integrated enterprise information Can’t handle ODS or exploration warehouse requirements Trouble with data explosion in near-real-time environments Trouble with updates to type 2 dimension primary keys Trouble with late arriving data in dimensions to support real-time arriving transactions Not granular enough information to support real-time data integration
  • 45. Data Vault Pros/Cons as an EDW PROS (as EDW) Supports near-real time and batch feeds Supports functional business linking Extensible / flexible Provides rapid build / delivery of star schema’s Supports VLDB / VLDW Designed for EDW Supports data mining and AI Provides granular detail Incrementally built 10/6/2011 LearnDataVault.com 30 CONS (as EDW) Not conducive to OLAP processing Requires business analysis to be firm Introduces many join operations
  • 46. The Three Vehicles… Which would you use to win a race? Which would you use to move a house? Would you adapt the truck and enter a race with Porches and expect to win? 10/6/2011 LearnDataVault.com 31
  • 47. #1 complaint about DV architecture So you want to deal with Joins do you? 10/6/2011 LearnDataVault.com 32
  • 48.
  • 49. Not enough rows being queried, (the overhead of starting the threads takes longer than an original scan.End Result? The DV Scales to the Petabyte Levels when necessary…
  • 50. Mathematics Behind the Data Vault Model *** The Data Vault is BACKED by Mathematical Principles*** Parallel versus sequential execution models Set Logic I/O Bandwidth & Throughput Compression (for query performance gains) Process Repeatability (tuning & predictability measurements) RAM versus electromagnetic disk (Solid-State Drives are not measured) http://osl.cs.uiuc.edu/docs/IPDPS-TR04/TCA_TR04.pdf 10/6/2011 LearnDataVault.com 34
  • 51. Know when to hold ‘em, know when to fold ‘em When to use DV, and when not… 10/6/2011 LearnDataVault.com 35
  • 52.
  • 53. I don’t have volume problems…
  • 54. I don’t have compliance/auditability problems…
  • 55. I don’t have real-time problems…
  • 56. My system produces matching results across lines of business…
  • 57. I’ve never had to “re-state” the data in the warehouse…
  • 58. I can still build new marts, and conform dimensions in 30 days or less…
  • 59. My business doesn’t acquire new systems often (if ever)
  • 60. My incoming data sets don’t changeI Say… That’s wonderful, don’t fix what’s broken. Have a nice day, oh- but call me when or if you ever run into these problems…
  • 61.
  • 64.
  • 65. IT and Business Accountability
  • 70.
  • 71. Step 1 10/6/2011 LearnDataVault.com 39 Identify your business processes, followed by your business keys (that are used to identify the data that flows through the business processes) ** NOTE: Along the way, document your assumptions, document your reasons for choosing keys, and modeling designs, develop a list of questions to be answered by business users…
  • 72. Step 2 10/6/2011 LearnDataVault.com 40 Identify the issues/problems that might be carried with the identified business keys, annotate the risks, and mitigate each one.
  • 73. Step 3 10/6/2011 LearnDataVault.com 41 Identify the units of work, the associations – LINK tables, where keys combine to form a notion, a concept, and a relationship.
  • 74. Step 4 10/6/2011 LearnDataVault.com 42 Identify the descriptive data that belongs to SINGLE Hub Keys, ensure that the data doesn’t represent or rely on a relationship.
  • 75. Step 5 10/6/2011 LearnDataVault.com 43 Identify the Satellite data that depends on relationships – move it to the appropriate LINK table. HINT: If you “want” to put a Foreign Key in a Satellite, you have a clear sign that the Satellite is in the WRONG place, and needs to be assigned to a LINK table rather than a HUB.
  • 76. Step 6 10/6/2011 LearnDataVault.com 44 Scope the Model Down to a managable chunk. Implement the first two Hubs, Hub Satellites, and first Link. BUILD IN INCREMENTS!
  • 77. Step 7 10/6/2011 LearnDataVault.com 45 Setup the key generation load routines, setup the staging area, and begin loading data.
  • 78. Step 8 10/6/2011 LearnDataVault.com 46 Review any “truncation” errors, or any data-type conversion problems, fix the staging area, and remove duplicates.
  • 79. Step 9 10/6/2011 LearnDataVault.com 47 Begin Loading the Data Vault. Load all Hubs, then all Hub Satellites, Then all Links, and finish with All Link Satellites.
  • 80. Step 10 10/6/2011 LearnDataVault.com 48 Reconcile the Data Vault to the source system, then build a first data mart from the results. Bring business value FAST!
  • 81. Instructor led lab 10/6/2011 LearnDataVault.com 49
  • 82. 10 minutes to find the Hubs…. 10/6/2011 LearnDataVault.com 50
  • 83. Possible Hubs From Northwind 10/6/2011 LearnDataVault.com 51
  • 84. 10 Minutes to find the Links… 10/6/2011 LearnDataVault.com 52
  • 85. Possible Links From Northwind 10/6/2011 LearnDataVault.com 53
  • 86. 10 minutes to find the Satellites… 10/6/2011 LearnDataVault.com 54
  • 87. Possible Satellites From Northwind 10/6/2011 LearnDataVault.com 55
  • 88. What did we learn? We often deal with more than 1 system at a time… this was a lab with only one model. We didn’t have any business requirements that we might need to answer questions, but doesn’t that reflect real-life? The data set is extremely dirty (you never have that in your systems right?) Time Zone based data can be a problem Lack of metadata causes integration issues and modeling decisions 10/6/2011 LearnDataVault.com 56
  • 89. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney 57
  • 90. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler 58
  • 91. Where To Learn More The Technical Modeling Book: http://LearnDataVault.com The Discussion Forums: & eventshttp://LinkedIn.com – Data Vault Discussions Contact me:http://DanLinstedt.com - web siteDanLinstedt@gmail.com - email World wide User Group (Free)http://dvusergroup.com 59