SlideShare ist ein Scribd-Unternehmen logo
1 von 70
Introduction
Data Vault Model
       &
 Methodology
   © Dan Linstedt, 2011-2012 all rights
                reserved
 Prepared for: DAMA Oregon, July 2012
                                          1
Who’s Using It?




                  2
The Experts Say…
   “The Data Vault is the optimal choice
   for modeling the EDW in the DW 2.0
   framework.” Bill Inmon

   “The Data Vault is foundationally
   strong and exceptionally scalable
   architecture.”   Stephen Brobst



        “The Data Vault is a technique which some
        industry experts have predicted may spark a
        revolution as the next big thing in data modeling
        for enterprise warehousing....” Doug Laney


                                                            3
More Notables…

   “This enables organizations to take control of
   their data warehousing destiny, supporting
   better and more relevant data warehouses in
   less time than before.” Howard Dresner



  “[The Data Vault] captures a practical body of
  knowledge for data warehouse development
  which both agile and traditional practitioners
  will benefit from..” Scott Ambler




                                                    4
Agenda
•   Introduce Yourselves…
•   What is a Data Vault? Where does it come from?
•   Pros & Cons of Data Modeling for EDW
•   Current EDW Issues & Pains
•   Consequences of Implementing the Pains…
•   How do we “Fix” This?
•   Keys to Success
•   When “NOT” to use a Data Vault
•   Ontologies and Data Vault
•   A Working Example
•   Query Performance (PIT & Bridge)
•   Conclusion (break)
•   Live Demo

                                                     5
Introduce Yourselves
•   Your Expectations?
•   Your Questions?
•   Your Background?
•   Areas of Interest?
•   What are the top 3 pains your
    EDW/BI solution is experiencing?

• About Me…
    o http://www.LinkedIn.com/dlinstedt


• Learn More Data Vault on-line at:
    o http://LearnDataVault.com/training




                                           6
Where did it come from?
      What is it?
     Defining the Data Vault Space




                                     7
Data Warehousing Time Line




             The Data Vault Model & Methodology
             took 10 years of R&D to become
             consistent, flexible, and scalable.

                                                   8
What IS a Data Vault?                           (Business Definition)


• Data Vault Model                  •   Data Vault Methodology
   o Detail oriented                    –   CMMI, Project Plan
   o Historical traceability            –   Risk, Governance, Versioning
   o Uniquely linked set of             –   Peer Reviews, Release Cycles
     normalized tables                  –   Repeatable, Consistent, Optimized
   o Supports one or more               –   Complete with Best Practices for
     functional areas of business           BI/DW


                                    •   Data Vault Architecture
                                         – 3 Tier Architecture (for including
                                           Batch & Unstructured Data)
                                         – 2 Tier Architecture (for Real-Time
                                           only)




                                                                            9
The Data Vault Model
                                         Records a history
                     Customer            of the interaction          Product

                        Sat                                            Sat
Elements:                                      Sat
•Hub
•Link          Sat     Customer                Link                  Product   Sat
•Satellite                                   F(x)
                        Sat       F(x)                        F(x)     Sat


                                                                       Sat

Hub = List of Unique Business Keys                                    Order    Sat
Link = List of Relationships, Associations
Satellites = Descriptive Data
                                                              F(x)     Sat

                                                                      Order
                                                                                     10
Data Vault Methodology
Follows: SEI/CMMI Level 5, PMP, Six Sigma, TQM, and Agile elements


    Optimized business
5   processes, repeatable, scalable, fault-tolerant.
    Automatable (generatable)

    Metrics, Estimates vs Actuals, Function Point
4   Analysis, Identification of broken processes


    Defined Business Processes, Defined
3   Goals, Defined Objectives


    Risk assessments / analysis, managed
2   processes, basic alignment efforts


    Process unpredictable and
1   poorly controlled




                                                                     11
Data Vault Architecture
             SOA                                Enterprise BI Solution
                                                                                   Star
     Sales                                                                       Schemas
                (batch)             (real-time)

    Finance
                          Staging     (batch)        EDW
                                                   (Data Vault)                    Error
                                                                                   Marts
   Contracts

 Unstructured                                                     Complex
                                                                                 Report
    Data                                                          Business      Collections
 (Hadoop NoSQL)                                                    Rules


FUNDAMENTAL GOALS
•Repeatable      •Scalable                 The business rules are moved closer to the business,
•Consistent      •Auditable              improving IT reaction time, reducing cost and minimizing
•Fault-tolerant                              impacts to the enterprise data warehouse (EDW)
•Supports phased release

                                                                                              12
Star Schemas, 3NF,
        Data Vault:
       Pros & Cons
         Defining the Data Vault Space
     Why NOT use Star Schemas as an EDW?
         Why NOT use 3NF as an EDW?
Why NOT use Data Vault as a Data Delivery Model?



                                                   13
Star Schema Pros/Cons as an EDW
PROS                                  CONS
• Good for multi-dimensional          • Not cross-business functional
  analysis                            • Use of junk / helper tables
• Subject oriented answers            • Trouble with VLDW
• Excellent for aggregation points    • Unable to provide integrated
                                        enterprise information
• Rapid development /
                                      • Can’t handle ODS or
  deployment                            exploration warehouse
• Great for some historical storage     requirements
                                      • Trouble with data explosion in
                                        near-real-time environments
                                      • Trouble with updates to type 2
                                        dimension primary keys
                                      • Trouble with late arriving data
                                        in dimensions to support real-
                                        time arriving transactions
                                      • Not granular enough
                                        information to support real-
                                        time data integration        14
3nf Pros/Cons as an EDW
PROS                               CONS
• Many to many linkages            • Time driven PK issues
• Handle lots of information       • Parent-child complexities
• Tightly integrated information   • Cascading change impacts
• Highly structured                • Difficult to load
• Conducive to near-real time      • Not conducive to BI tools
  loads                            • Not conducive to drill-down
• Relatively easy to extend        • Difficult to architect for an
                                     enterprise
                                   • Not conducive to spiral/scope
                                     controlled implementation
                                   • Physical design usually doesn’t
                                     follow business processes




                                                                  15
Data Vault Pros/Cons as an EDW

PROS                                   CONS
• Supports near-real time and          • Not conducive to OLAP
  batch feeds
                                         processing
• Supports functional business
  linking                              • Requires business analysis
• Extensible / flexible                  to be firm
• Provides rapid build / delivery of   • Introduces many join
  star schema’s                          operations
• Supports VLDB / VLDW
• Designed for EDW
• Supports data mining and AI
• Provides granular detail
• Incrementally built


                                                                 16
Analogy: The Porsche, the SUV and the Big Rig




•   Which would you use to win a race?
•   Which would you use to move a house?
•   Would you adapt the truck and enter a race with Porches and expect to
    win?

                                                                            17
Current EDW Issues and
         Pains
  Business Rule Processing, Lack of Agility, and
       Future proofing your new solution




                                                   18
Current EDW Project Issues
This is NOT what
you want happening
to your project!




                           THE GAP!!   19
2 Tier EDW Architecture
                                       Enterprise BI Solution
     Sales
                 (batch)

                                Staging            Complex            Star
    Finance                     (EDW)              Business         Schemas
                                                   Rules #2
                                                               Conformed Dimensions
                                                                    Junk Tables
   Contracts     Complex Staging + History                         Helper Tables
                 Business                                          Factless Facts
                  Rules
               +Dependencies


•Quality routines                   •High risk of incorrect data aggregation
•Cross-system dependencies          •Larger system = increased impact
•Source data filtering              •Often re-engineered at the SOURCE
•In-process data manipulation       •History can be destroyed (completely re-computed)


                                                                                      20
#1 Cause of BI Initiative Failure




         Let’s take a look at one example…


                                             21
Re-Engineering
                                                Business
                                                 Rules
                          Data Flow (Mapping)
Current Sources
     Sales
             Customer
                        Source
                         Join
     Finance
          Customer
         Transactions




        Customer
        Purchases

  ** NEW SYSTEM**

                                                           22
Federated Star Schema Inhibiting Agility
                                                           Data Mart 3


        High
                                           Data Mart 2
   Effort
   & Cost
                       Data Mart 1




                                     Changing and Adjusting conformed dimensions causes an
                                     exponential rise in the cost curve over time

       Low                           RESULT: Business builds their own Data Marts!
                                        Maintenance
               Start                                                                 Time
                                        Cycle Begins


The main driver for this is the maintenance costs, and re-engineering of the existing
system which occurs for each new “federated/conformed” effort. This increases
delivery time, difficulty, and maintenance costs.

                                                                                             23
What are the ROOT Causes?

     The root causes of RE-ENGINEERING are:




                                              24
Consequences of
Implementing the Pains…
Business rules up-stream of your EDW and Conforming
             Dimensions to store ALL history




                                                      25
Deformed Dimensions
•    Deformity: The URGE to continue “slamming data” into an existing conformed
     dimension until it simply cannot sustain any further changes, the result: a
     deformed dimension and a HUGE re-engineering cost / nightmare.

     Business Wants a Change!
     Business said: Just add that to the existing Dimension, it will be easy right?

         Business Change

                     V1                    Business Change
                   …………………
                                                                        Business Change
    Complex
                   …………………
                   …………………
                                                   V2                                 V3
                   …………………                          ………………
     Load          …………………
                   …………………
                                                    ………………
                                                    ………………
                                                                                 …………………
                                                                                 …………………
                   …………………                          ………………                       …………………
                   …………………                          ………………                       …………………
                                                    ………………                       …………………
                                  Complex           ………………
                                                    ………………
                                                                                 …………………
                                                                                 …………………
                                                                                 …………………
                                    Load            ………………
                                                    ………………
                                                    ………………
                                                                                 …………………
                                                                                 …………………

      90 days, $125k                                ………………
                                                    ………………        Complex
                                                                                 …………………
                                                                                 …………………
                                                    ………………                       …………………
                                                                                 …………………
                                                                   Load          …………………
                                                                                 …………………
                                                                                 …………………
                                                                                 …………………

                                    120 days, $200k                              …………………
                                                                                 …………………
                                                                                 …………………
                                                                                 …………………

        Re-Engineering the                                                       …………………



    Load Processes EACH TIME!
                                                                    180 days, $275k
                                                                                           26
Dimension-itis
•   DimensionItis: Incurable Disease, the symptoms are the creation of new
    dimensions because the cost and time to conform existing dimensions
    with new attributes rises beyond the business ability to pay…
                                                                          …………………...
                                                                          …………………...
                                                                          …………………...
                                                                          …………………...
                                                                          …………………...
                                                                …………………...…………………...   …………………...
                                                                …………………...…………………...   …………………...
                                                                …………………...             …………………...


    Business Says:
                                                                …………………...             …………………...
                                                                …………………...             …………………...
                                                                …………………...
                                               …………………...       …………………...                               …………………...
                                               …………………...                                                …………………...

    Avoid the re-engineering                   …………………...
                                               …………………...
                                               …………………...
                                               …………………...
                                                                                                         …………………...
                                                                                                         …………………...
                                                                                                         …………………...
                                                                                                         …………………...


    costs, just “copy” the                     …………………...
                                               …………………...
                                               …………………...
                                               …………………...
                                                                                                         …………………...



                                               …………………...                                                         …………………...

    dimensions and create a new             …………………...
                                            …………………...
                                            …………………...
                                      …………………...
                                                                         …………………...
                                                                         …………………...
                                                                         …………………...
                                                                                                                  …………………...
                                                                                                                  …………………...
                                                                                                                  …………………...
                                                                                                                  …………………...
                                            …………………...


    one for
                                      …………………...                         …………………...                               …………………...
                                            …………………...                                                   …………………...
                                      …………………...
                                      …………………...                         …………………...                      …………………...
                                      …………………...                         …………………...                      …………………...
                                      …………………...                         …………………...                      …………………...


    OUR department…                   …………………...
                                         …………………...
                                      …………………...
                                         …………………...
                                      …………………...
                                         …………………...
                                         …………………...
                                                                         …………………...
                                                                         …………………...
                                                                         …………………...
                                                                                                         …………………...
                                                                                                         …………………...
                                                                                                         …………………...
                                                                                                         …………………... …………………...
                                                                         …………………...                                  …………………...
                                         …………………...
                                                                                                                     …………………...
                                         …………………...                      …………………...                                  …………………...
                                         …………………...                      …………………...
                                                …………………...                                                           …………………...
                                                …………………...               …………………...                    …………………...    …………………...
                                                                                                       …………………...
                                                …………………...               …………………...
                                                …………………...                                             …………………...
                                                                         …………………...                    …………………...
                                                …………………...
                                                …………………...               …………………...                    …………………...
                                                …………………...                                             …………………...




    What happens
                                                …………………...                                             …………………...
                                                …………………...                                             …………………...
                                                …………………...                                             …………………...
                                                …………………...                                             …………………...
                                                …………………...                                             …………………...
                                                …………………...                                             …………………...
                                                                                                       …………………...
                                                                                                  …………………...



    when we (IT) give
                                                                                                       …………………...
                                                         …………………...                               …………………...
                                                                                                       …………………...
                                                         …………………...                               …………………...
                                                                                                       …………………...
                                                         …………………...                               …………………...
                                                                                                       …………………...
                                                         …………………...                               …………………...
                                                                  …………………...                      …………………...
                                                                  …………………...                      …………………...
                                                                                            …………………...
                                                                  …………………...                      …………………...



    in to this? …
                                                                               …………………...   …………………...
                                                                  …………………...                      …………………...
                                                                                            …………………...
                                                                               …………………...
                                                                  …………………...                      …………………...
                                                                                            …………………...
                                                                               …………………...
                                                                  …………………...                      …………………...
                                                                                            …………………...
                                                                               …………………...
                                                                  …………………...                …………………...
                                                                               …………………...
                                                                  …………………...
                                                                               …………………...
                                                                               …………………...
                                                                               …………………...
                                                                               …………………...
                                                                               …………………...




                                                                                                                                  27
Result: Silo Data Junkyards!
• Business Says: Take the dimension you have, copy it, and change
  it… This should be cheap, and easy right?
           Business Change
                                   180
           To Modify Existing Star =                       days, $275k      SALES

                                                                         We built our own
                                                                         because IT costs too
                                                                         much…
                     First Star
  Customer_ID                          Customer_ID
                                                                           FINANCE
  Customer_Name                        Customer_Name
  Customer_Addr                        Customer_Addr
  Customer_Addr1                       Customer_Addr1
  Customer_City                        Customer_City
  Customer_State                       Customer_State
  Customer_Zip                         Customer_Zip
  Customer_Phone                       Customer_Phone
  Customer_Tag                         Customer_Tag
  Customer_Score
  Customer_Region
  Customer_Stats
                                       Customer_Score
                                       Customer_Region
                                       Customer_Stats
                                                                          We built our own
  Customer_Phone      Customer_ID      Customer_Phone
  Customer_Type       Customer_Name
                      Customer_Addr
                                       Customer_Type
                                                                          because IT took too
                      Customer_Addr1
                      Customer_City                                       long…
                      Customer_State
                      Customer_Zip
                      Customer_Phone
                      Fact_ABC
                      Fact_DEF                                             MARKETING
   Customer_ID        Fact_PDQ
                                         Customer_ID
   Customer_Name      Fact_MYFACT        Customer_Name
   Customer_Addr
   Customer_Addr1
   Customer_City
   Customer_State
                                         Customer_Addr
                                         Customer_Addr1
                                         Customer_City
                                                                          We built our own
                                         Customer_State
   Customer_Zip
   Customer_Phone
   Customer_Tag
                                         Customer_Zip
                                         Customer_Phone
                                         Customer_Tag
                                                                          because we needed
   Customer_Score
   Customer_Region
   Customer_Stats
   Customer_Phone
                                         Customer_Score
                                         Customer_Region
                                         Customer_Stats
                                                                          customized
                                         Customer_Phone
   Customer_Type
                                         Customer_Type
                                                                          dimension data…

                                                                                              28
Accountability In Question?
Corporate Fraud Accountability Title XI consists of seven sections. Section 1101
recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It
identifies corporate fraud and records tampering as criminal offenses and joins
those offenses to specific penalties. It also revises sentencing guidelines and
strengthens their penalties. This enables the SEC to temporarily freeze large or
unusual payments.
          Source                                                    HR Mart
            1
                          Business
          Source
                           Rules                                    Sales Mart
                          Change            Staging
            2
                           Data!
          Source                                                    Finance Mart
            3

      Are changes to data ON THE WAY IN to the EDW
      equivalent to records tampering?

                                                                                   29
How do we “fix” this?
Answer: Move the business rules downstream, AND no-longer
           be forced to conform dimensions.




                                                            30
It’s Not Just a Data Model…




                              31
Move the Business Rules Downstream
• No “Conforming” of Dimensions on the way in to the EDW
• Hold on… We do distinguish between HARD and SOFT business
  rules…




                                                          32
Hard & Soft Business Rules
     Hard Business Rules         Soft Business Rules
• Data Domain Alignment    • Any requirement the
  (Data Type Matching)       business user
• Normalization (where       states, that, when
  necessary)                 applied, CHANGES the data
• System Column              or CHANGES the meaning
  Computation                of the data (the grain or
                             interpretation)

                           • Simple example that will
                             knock the socks off your
                             feet!




                                                        33
Progressive Agility and Responsiveness of IT

    High

 Effort
 & Cost
                                     Foundational Base Built



                                                    New Functional Areas Added
              Initial DV Build Out


    Low

                                     Maintenance
           Start                                                                 Time
                                     Cycle Begins

           Re-Engineering does NOT occur with a Data Vault Model.
           This keeps costs down, and maintenance easy. It also reduces
           complexity of the existing architecture.


                                                                                        34
NO Re-Engineering
Current Sources
                                     Data Vault
     Sales
                            Stage
              Customer      Copy        Hub
                                      Customer

     Finance
                            Stage
              Customer                  Link
             Transactions   Copy
                                     Transaction



      Customer              Stage   Hub      Hub
      Purchases                     Acct    Product   NO IMPACT!!!
                            Copy
                                                      NO RE-ENGINEERING!

** NEW SYSTEM**




                                                                     35
Keys to Success
Bringing the Data Vault to Your Project




                                          36
Key: Flexibility




Adding new components to the EDW has NEAR ZERO impact to:
• Existing Loading Processes
• Existing Data Model
• Existing Reporting & BI Functions
• Existing Source Systems
• Existing Star Schemas and Data Marts
                                                            37
Case In Point:
     Result of flexibility of the Data Vault Model
     allowed them to merge 3 companies in 90 days –
     that is ALL systems, ALL DATA!




                                                      38
Key: Scalability in Architecture




     Scaling is easy, its based on the following principles
     • Hub and spoke design
     • MPP Shared-Nothing Architecture
     • Scale Free Networks


                                                              39
Case In Point:


    Result of scalability was to produce a Data
    Vault model that scaled to 3 Petabytes in
    size, and is still growing today!




                                                  40
Key: Scalability in Team Size




        You should be able to SCALE your TEAM as well!
           With the Data Vault methodology, you can:
 Scale your team when desired, at different points in the project!


                                                                     41
Case In Point:
                             (Dutch Tax Authority)




 Result of scalability was to increase ETL developers for
 each new source system, and reassign them when the
 system was completely loaded to the Data Vault




                                                            42
Key: Productivity




Increasing Productivity requires a reduction in complexity.
The Data Vault Model simplifies all of the following:
• ETL Loading Routines
• Real-Time Ingestion of Data
• Data Modeling for the EDW
• Enhancing and Adapting for Change to the Model
• Ease of Monitoring, managing and optimizing processes

                                                              43
Case in Point:
   Result of Productivity was: 2 people in 2 weeks
   merged 3 systems, built a full Data Vault EDW, 5
   star schemas and 3 reports.




      These individuals generated:
      • 90% of the ETL code for moving the data set
      • 100% of the Staging Data Model
      • 75% of the finished EDW data Model
      • 75% of the star schema data model

                                                      44
The Competing Bid?
The competition bid this with 15 people
and 3 months to completion, at a cost of
$250k! (they bid a Very complex system)




Our total cost? $30k and 2 weeks!

                                           45
Results?




Changing the direction of the river takes less
   effort than stopping the flow of water


                                                 46
47
When NOT
       to use the Data Vault
A review of some reasons why not to use a Data Vault Model




                                                             48
When NOT to Use the Data Vault
• You have:
  o   a small set of point solution requirements
  o   a very short time-frame for delivery
  o   To use the data one-time, then throw it away
  o   a single source system, single source application
  o   A single business analyst in the entire company

• You do NOT have:
  o   audit requirements forcing you to keep history
  o   multiple data center consolidation efforts
  o   near-real-time to worry about
  o   massive batch data to integrate
  o   External data feeds outside your control
  o   Requirements to do trend analysis of all your data
  o   Pain – that forces you to reengineer every time you ask for a
      change to your current data warehousing systems

                                                                      49
Ontologies & Data Vault
     Hub, Link, Satellite - Definitions




                                          50
Business Keys = Ontology
Firm Name                               Business Keys should be
                                         arranged in an ontology
   Drug Listing
                                           In order to learn the
       Product Number                  dependencies of the data set
       Dose Form Code

       NDA Application #                      NOTE: Different Ontologies
                                          represent different business views of
            Drug Label Code                              the data!

                  Patent Number

                     Patent Use Code




                                                                                  51
Associations = Ontological Hooks

    Firm Name


    Firms Generate
                                  Drug Listing
    Product Listings

                         Firms Manufacture         Product Number
                             Products


                       Listings for Products are
                                                   NDA Application #
                         in NDA Applications



     Business Keys are associated by many
    linking factors, these links comprise the
          associations in the hierarchy.


                                                                 52
Descriptors = Context

                       Firm
Firm Name
                     Locations


Firms Generate                              Listing
                          Drug Listing
Product Listings                          Formulation


                   Firms Manufacture     Product Number
                       Products

                                             Product
                    Start & End of         Ingredients
                    manufacturing



  Descriptors provide the context at a
  specific point in time – they are the
 warehousing portion of the Data Vault
                                                         53
A working Example
 National Drug Codes + Orange Book of Drug Patent
                   Applications

http://www.accessdata.fda.gov/scripts/cder/ndc/default.cfm
http://www.fda.gov/Drugs/InformationOnDrugs/ucm129662.htm



                                                             54
Hub Table Structures




               SQN = Sequence (insertion order)
   LDTS = Load Date (when the Warehouse first sees the data)
RSRC = Record Source (System + App where the data ORIGINATED)

                                                                55
Link Table Structures




Note: A Link is really no different than a factless fact!


                                                            56
Satellite Table Structures




           SQN = Sequence (parent identity number)
   LDTS = Load Date (when the Warehouse first sees the data)
         LEDTS = End of lifecycle for superseded record
RSRC = Record Source (System + App where the data ORIGINATED)

                                                                57
In Review…
• Data Vault is…
   o   A Data Warehouse Model & Methodology
   o   Hub and Spoke Design
   o   Simple, Easy, Repeatable Structures
   o   Comprised of Standards, Rules & Procedures
   o   Made up of Ontological Metadata
   o   AUTOMATABLE!!!

• Hubs = Business Keys
• Links = Associations / Transactions
• Satellites = Descriptors




                                                    58
Why do we build Links this
         way?



                             59
History Teaches Us…

               Portfolio
                             The EDW is designed to handle TODAY’S
                     1
Today:                       relationship, as soon as history is loaded, it
                     M
                             breaks the model!
               Customer
                                                         Hub Portfolio
                                                              1
               Portfolio
5 years              M
From now
                         M                                   M
               Customer
                                                        Hub Customer


                Portfolio
                         M
10 Years ago
                         1

               Customer        This situation forces re-engineering of the
                               model, load routines, and queries!

                                                                              60
History Teaches Us…

               Portfolio
                     1
Today:
                     M                                Hub Portfolio
               Customer                                    1


                                                          M
               Portfolio
5 years                                                 LNK
                     M
from now                                                Cust-Port
                         M
                                                          M
               Customer
                                                           1

                                                      Hub Customer
                Portfolio
                         M
10 Years ago                 This design is flexible, handles
                         1
                             past, present, and future relationship changes
               Customer      with NO RE-ENGINEERING!


                                                                        61
Applying the Data Vault to Global DW

            Manufacturing EDW                      Planning in Brazil
                 in China
                                                        Hub
                            Hub
            Link
                                                      Sat   Sat     Link
                          Sat   Sat
                                            Link

 Hub               Link           Hub                               Hub


Sat   Sat     Sat     Sat       Sat   Sat                         Sat   Sat

 Base EDW Created in Corporate
       Financials in USA



                                                                              62
Query Performance
Point-in-time and Bridge Tables, overcoming query issues




                                                           63
PIT Table Architecture
Satellite: Point In Time
 PARENT SEQUENCE             Primary
 LOAD DATE                     Key
 {Satellite 1 Load Date}
 {Satellite 2 Load Date}
 {Satellite 3 Load Date}
 {…}                                             PIT Sat
 {Satellite N Load Date}                Sat 1

                                       Sat 2
                                                 Hub
                           PIT Sat     Sat 3     Order
                 Sat 1
                                        Sat 4
               Sat 2         Hub                              Hub      Sat 1

               Sat 3       Customer                          Product   Sat 2
                                                Link Line
                 Sat 4
                                                  Item




                                                 Satellite
                                                Line Item
                                                                          64
PIT Table Example
SAT_CUST_CONTACT_NAME                   SAT_CUST_CONTACT_CELL                 SAT_CUST_CONTACT_ADDR
 SQN   LOAD_DTS     NAME                SQN   LOAD_DTS     CELL               SQN   LOAD_DTS     ADDR
 1     10-14-2000   Dan L               1     10-14-2000   999-555-1212       1     08-01-2000   26 Prospect
 1     11-01-2000   Dan Linedt          1     10-15-2000   999-111-1234       1     09-29-2000   26 Prosp St.
 1     12-31-2000   Dan Linstedt        1     10-16-2000   999-252-2834       1     12-17-2000   28 November
                                        1     10-17-2000   999.257-2837       1     01-01-2001   26 Prospect St
                                        1     10-18-2000   999-273-5555




                    SQN   LOAD_DTS     SAT_NAME_LDTS       SAT_CELL_LDTS   SAT_ADDR_LDTS
                    1     08-01-2000   NULL                NULL            08-01-2000
                    1     09-01-2000   NULL                NULL            08-01-2000
                    1     10-01-2000   NULL                NULL            09-29-2000
                    1     11-01-2000   11-01-2000          10-18-2000      09-29-2000
                    1     12-01-2000   11-01-2000          10-18-2000      09-29-2000
                    1     01-01-2001   12-31-2000          10-18-2000      01-01-2001

                      Snapshot Date




                                                                                                           65
BridgeTable Architecture
Satellite: Bridge
                             Primary
 UNIQUE SEQUENCE               Key
 LOAD DATE
 {Hub 1 Sequence #}
 {Hub 2 Sequence #}
 {Hub 3 Sequence #}
 {Link 1 Sequence #}
 {Link 2 Sequence #}
 {…}
 {Link N Sequence #}
 {Hub 1 Business Key}
 {Hub 2 Business Key}
 {…}                                               Bridge
 {Hub N Business Key}




    Sat 1

  Sat 2             Hub                             Hub
                                        Link                  Link       Hub Parts
  Sat 3             Seller                         Product

    Sat 4
                                       Satellite             Satellite

                                                                                 66
Bridge Table Data Example


 Bridge Table: Seller by Product by Part
 SQN   LOAD_DTS     SELL_SQN   SELL_ID   PROD_SQN   PROD_NUM     PART_SQN   PART_NUM
 1     08-01-2000   15         NY*1      2756       ABC-123-9K   525        JK*2*4
 2     09-01-2000   16         CO*24     2654       DEF-847-0L   324        MN*5-2
 3     10-01-2000   16         CO*24     82374      PPA-252-2A   9938       DD*2*3
 4     11-01-2000   24         AZ*25     25222      UIF-525-88   7          UF*9*0
 5     12-01-2000   99         NM*5      81         DAN-347-7F   16         KI*9-2
 6     01-01-2001   99         NM*5      81         DAN-347-7F   24         DL*0-5


   Snapshot Date




                                                                                       67
Conclusions



              68
Where To Learn More
• The Technical Modeling Book:
  http://LearnDataVault.com/

• On-Line Training direct from me:
  http://LearnDataVault.com/training

• The Discussion Forums: & events
  http://LinkedIn.com – Data Vault Discussions

• Contact me:
  http://DanLinstedt.com - web site
  DanLinstedt@gmail.com - email


                                                 69
LIVE
      DEMONSTRATION
Physical Demonstration, Loading Processes and Execution




                                                          70

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

Was ist angesagt? (20)

Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Présentation data vault et bi v20120508
Présentation data vault et bi v20120508Présentation data vault et bi v20120508
Présentation data vault et bi v20120508
 
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScapeData Vault 2.0 DeMystified with Dan Linstedt and WhereScape
Data Vault 2.0 DeMystified with Dan Linstedt and WhereScape
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Demystifying Data Warehouse as a Service
Demystifying Data Warehouse as a ServiceDemystifying Data Warehouse as a Service
Demystifying Data Warehouse as a Service
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 

Andere mochten auch

Andere mochten auch (14)

IRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And MethodologyIRM UK - 2009: DV Modeling And Methodology
IRM UK - 2009: DV Modeling And Methodology
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Data vault: What's Next
Data vault: What's NextData vault: What's Next
Data vault: What's Next
 
Best Practices: Data Admin & Data Management
Best Practices: Data Admin & Data ManagementBest Practices: Data Admin & Data Management
Best Practices: Data Admin & Data Management
 
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Guru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best PracticesGuru4Pro Data Vault Best Practices
Guru4Pro Data Vault Best Practices
 
Data vault modeling et retour d'expérience
Data vault modeling et retour d'expérienceData vault modeling et retour d'expérience
Data vault modeling et retour d'expérience
 
L'open access expliqué aux jeunes chercheurs de ComEauLabo. ENS-Lyon
L'open access expliqué aux jeunes chercheurs de ComEauLabo. ENS-LyonL'open access expliqué aux jeunes chercheurs de ComEauLabo. ENS-Lyon
L'open access expliqué aux jeunes chercheurs de ComEauLabo. ENS-Lyon
 
AnalytiX DS - Master Deck
AnalytiX DS - Master DeckAnalytiX DS - Master Deck
AnalytiX DS - Master Deck
 
Usages des réseaux sociaux académiques : enjeux et opportunités (2016)
Usages des réseaux sociaux académiques : enjeux et opportunités (2016)Usages des réseaux sociaux académiques : enjeux et opportunités (2016)
Usages des réseaux sociaux académiques : enjeux et opportunités (2016)
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
LogicalDOC Ecosystem
LogicalDOC EcosystemLogicalDOC Ecosystem
LogicalDOC Ecosystem
 
Data Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part FourData Vault ReConnect Speed Presenting PM Part Four
Data Vault ReConnect Speed Presenting PM Part Four
 
Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011Data Warehouse Agility Array Conference2011
Data Warehouse Agility Array Conference2011
 

Ähnlich wie Introduction To Data Vault - DAMA Oregon 2012

Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
Girish Dhareshwar
 

Ähnlich wie Introduction To Data Vault - DAMA Oregon 2012 (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...Building a business intelligence architecture fit for the 21st century by Jon...
Building a business intelligence architecture fit for the 21st century by Jon...
 
02. Data Warehouse and OLAP
02. Data Warehouse and OLAP02. Data Warehouse and OLAP
02. Data Warehouse and OLAP
 
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & TalendIntroducing the Big Data Ecosystem with Caserta Concepts & Talend
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
 
Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)Introduction to Microsoft’s Master Data Services (MDS)
Introduction to Microsoft’s Master Data Services (MDS)
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)Introduction to Modern Data Virtualization 2021 (APAC)
Introduction to Modern Data Virtualization 2021 (APAC)
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
How Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data WarehouseHow Real TIme Data Changes the Data Warehouse
How Real TIme Data Changes the Data Warehouse
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
The BI Sandbox
The BI SandboxThe BI Sandbox
The BI Sandbox
 
Ppt
PptPpt
Ppt
 
142230 633685297550892500
142230 633685297550892500142230 633685297550892500
142230 633685297550892500
 

Kürzlich hochgeladen

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
dollysharma2066
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
Matteo Carbone
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
amitlee9823
 

Kürzlich hochgeladen (20)

Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 

Introduction To Data Vault - DAMA Oregon 2012

  • 1. Introduction Data Vault Model & Methodology © Dan Linstedt, 2011-2012 all rights reserved Prepared for: DAMA Oregon, July 2012 1
  • 3. The Experts Say… “The Data Vault is the optimal choice for modeling the EDW in the DW 2.0 framework.” Bill Inmon “The Data Vault is foundationally strong and exceptionally scalable architecture.” Stephen Brobst “The Data Vault is a technique which some industry experts have predicted may spark a revolution as the next big thing in data modeling for enterprise warehousing....” Doug Laney 3
  • 4. More Notables… “This enables organizations to take control of their data warehousing destiny, supporting better and more relevant data warehouses in less time than before.” Howard Dresner “[The Data Vault] captures a practical body of knowledge for data warehouse development which both agile and traditional practitioners will benefit from..” Scott Ambler 4
  • 5. Agenda • Introduce Yourselves… • What is a Data Vault? Where does it come from? • Pros & Cons of Data Modeling for EDW • Current EDW Issues & Pains • Consequences of Implementing the Pains… • How do we “Fix” This? • Keys to Success • When “NOT” to use a Data Vault • Ontologies and Data Vault • A Working Example • Query Performance (PIT & Bridge) • Conclusion (break) • Live Demo 5
  • 6. Introduce Yourselves • Your Expectations? • Your Questions? • Your Background? • Areas of Interest? • What are the top 3 pains your EDW/BI solution is experiencing? • About Me… o http://www.LinkedIn.com/dlinstedt • Learn More Data Vault on-line at: o http://LearnDataVault.com/training 6
  • 7. Where did it come from? What is it? Defining the Data Vault Space 7
  • 8. Data Warehousing Time Line The Data Vault Model & Methodology took 10 years of R&D to become consistent, flexible, and scalable. 8
  • 9. What IS a Data Vault? (Business Definition) • Data Vault Model • Data Vault Methodology o Detail oriented – CMMI, Project Plan o Historical traceability – Risk, Governance, Versioning o Uniquely linked set of – Peer Reviews, Release Cycles normalized tables – Repeatable, Consistent, Optimized o Supports one or more – Complete with Best Practices for functional areas of business BI/DW • Data Vault Architecture – 3 Tier Architecture (for including Batch & Unstructured Data) – 2 Tier Architecture (for Real-Time only) 9
  • 10. The Data Vault Model Records a history Customer of the interaction Product Sat Sat Elements: Sat •Hub •Link Sat Customer Link Product Sat •Satellite F(x) Sat F(x) F(x) Sat Sat Hub = List of Unique Business Keys Order Sat Link = List of Relationships, Associations Satellites = Descriptive Data F(x) Sat Order 10
  • 11. Data Vault Methodology Follows: SEI/CMMI Level 5, PMP, Six Sigma, TQM, and Agile elements Optimized business 5 processes, repeatable, scalable, fault-tolerant. Automatable (generatable) Metrics, Estimates vs Actuals, Function Point 4 Analysis, Identification of broken processes Defined Business Processes, Defined 3 Goals, Defined Objectives Risk assessments / analysis, managed 2 processes, basic alignment efforts Process unpredictable and 1 poorly controlled 11
  • 12. Data Vault Architecture SOA Enterprise BI Solution Star Sales Schemas (batch) (real-time) Finance Staging (batch) EDW (Data Vault) Error Marts Contracts Unstructured Complex Report Data Business Collections (Hadoop NoSQL) Rules FUNDAMENTAL GOALS •Repeatable •Scalable The business rules are moved closer to the business, •Consistent •Auditable improving IT reaction time, reducing cost and minimizing •Fault-tolerant impacts to the enterprise data warehouse (EDW) •Supports phased release 12
  • 13. Star Schemas, 3NF, Data Vault: Pros & Cons Defining the Data Vault Space Why NOT use Star Schemas as an EDW? Why NOT use 3NF as an EDW? Why NOT use Data Vault as a Data Delivery Model? 13
  • 14. Star Schema Pros/Cons as an EDW PROS CONS • Good for multi-dimensional • Not cross-business functional analysis • Use of junk / helper tables • Subject oriented answers • Trouble with VLDW • Excellent for aggregation points • Unable to provide integrated enterprise information • Rapid development / • Can’t handle ODS or deployment exploration warehouse • Great for some historical storage requirements • Trouble with data explosion in near-real-time environments • Trouble with updates to type 2 dimension primary keys • Trouble with late arriving data in dimensions to support real- time arriving transactions • Not granular enough information to support real- time data integration 14
  • 15. 3nf Pros/Cons as an EDW PROS CONS • Many to many linkages • Time driven PK issues • Handle lots of information • Parent-child complexities • Tightly integrated information • Cascading change impacts • Highly structured • Difficult to load • Conducive to near-real time • Not conducive to BI tools loads • Not conducive to drill-down • Relatively easy to extend • Difficult to architect for an enterprise • Not conducive to spiral/scope controlled implementation • Physical design usually doesn’t follow business processes 15
  • 16. Data Vault Pros/Cons as an EDW PROS CONS • Supports near-real time and • Not conducive to OLAP batch feeds processing • Supports functional business linking • Requires business analysis • Extensible / flexible to be firm • Provides rapid build / delivery of • Introduces many join star schema’s operations • Supports VLDB / VLDW • Designed for EDW • Supports data mining and AI • Provides granular detail • Incrementally built 16
  • 17. Analogy: The Porsche, the SUV and the Big Rig • Which would you use to win a race? • Which would you use to move a house? • Would you adapt the truck and enter a race with Porches and expect to win? 17
  • 18. Current EDW Issues and Pains Business Rule Processing, Lack of Agility, and Future proofing your new solution 18
  • 19. Current EDW Project Issues This is NOT what you want happening to your project! THE GAP!! 19
  • 20. 2 Tier EDW Architecture Enterprise BI Solution Sales (batch) Staging Complex Star Finance (EDW) Business Schemas Rules #2 Conformed Dimensions Junk Tables Contracts Complex Staging + History Helper Tables Business Factless Facts Rules +Dependencies •Quality routines •High risk of incorrect data aggregation •Cross-system dependencies •Larger system = increased impact •Source data filtering •Often re-engineered at the SOURCE •In-process data manipulation •History can be destroyed (completely re-computed) 20
  • 21. #1 Cause of BI Initiative Failure Let’s take a look at one example… 21
  • 22. Re-Engineering Business Rules Data Flow (Mapping) Current Sources Sales Customer Source Join Finance Customer Transactions Customer Purchases ** NEW SYSTEM** 22
  • 23. Federated Star Schema Inhibiting Agility Data Mart 3 High Data Mart 2 Effort & Cost Data Mart 1 Changing and Adjusting conformed dimensions causes an exponential rise in the cost curve over time Low RESULT: Business builds their own Data Marts! Maintenance Start Time Cycle Begins The main driver for this is the maintenance costs, and re-engineering of the existing system which occurs for each new “federated/conformed” effort. This increases delivery time, difficulty, and maintenance costs. 23
  • 24. What are the ROOT Causes? The root causes of RE-ENGINEERING are: 24
  • 25. Consequences of Implementing the Pains… Business rules up-stream of your EDW and Conforming Dimensions to store ALL history 25
  • 26. Deformed Dimensions • Deformity: The URGE to continue “slamming data” into an existing conformed dimension until it simply cannot sustain any further changes, the result: a deformed dimension and a HUGE re-engineering cost / nightmare. Business Wants a Change! Business said: Just add that to the existing Dimension, it will be easy right? Business Change V1 Business Change ………………… Business Change Complex ………………… ………………… V2 V3 ………………… ……………… Load ………………… ………………… ……………… ……………… ………………… ………………… ………………… ……………… ………………… ………………… ……………… ………………… ……………… ………………… Complex ……………… ……………… ………………… ………………… ………………… Load ……………… ……………… ……………… ………………… ………………… 90 days, $125k ……………… ……………… Complex ………………… ………………… ……………… ………………… ………………… Load ………………… ………………… ………………… ………………… 120 days, $200k ………………… ………………… ………………… ………………… Re-Engineering the ………………… Load Processes EACH TIME! 180 days, $275k 26
  • 27. Dimension-itis • DimensionItis: Incurable Disease, the symptoms are the creation of new dimensions because the cost and time to conform existing dimensions with new attributes rises beyond the business ability to pay… …………………... …………………... …………………... …………………... …………………... …………………...…………………... …………………... …………………...…………………... …………………... …………………... …………………... Business Says: …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... Avoid the re-engineering …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... costs, just “copy” the …………………... …………………... …………………... …………………... …………………... …………………... …………………... dimensions and create a new …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... one for …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... OUR department… …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... What happens …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... when we (IT) give …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... in to this? … …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... …………………... 27
  • 28. Result: Silo Data Junkyards! • Business Says: Take the dimension you have, copy it, and change it… This should be cheap, and easy right? Business Change 180 To Modify Existing Star = days, $275k SALES We built our own because IT costs too much… First Star Customer_ID Customer_ID FINANCE Customer_Name Customer_Name Customer_Addr Customer_Addr Customer_Addr1 Customer_Addr1 Customer_City Customer_City Customer_State Customer_State Customer_Zip Customer_Zip Customer_Phone Customer_Phone Customer_Tag Customer_Tag Customer_Score Customer_Region Customer_Stats Customer_Score Customer_Region Customer_Stats We built our own Customer_Phone Customer_ID Customer_Phone Customer_Type Customer_Name Customer_Addr Customer_Type because IT took too Customer_Addr1 Customer_City long… Customer_State Customer_Zip Customer_Phone Fact_ABC Fact_DEF MARKETING Customer_ID Fact_PDQ Customer_ID Customer_Name Fact_MYFACT Customer_Name Customer_Addr Customer_Addr1 Customer_City Customer_State Customer_Addr Customer_Addr1 Customer_City We built our own Customer_State Customer_Zip Customer_Phone Customer_Tag Customer_Zip Customer_Phone Customer_Tag because we needed Customer_Score Customer_Region Customer_Stats Customer_Phone Customer_Score Customer_Region Customer_Stats customized Customer_Phone Customer_Type Customer_Type dimension data… 28
  • 29. Accountability In Question? Corporate Fraud Accountability Title XI consists of seven sections. Section 1101 recommends a name for this title as “Corporate Fraud Accountability Act of 2002”. It identifies corporate fraud and records tampering as criminal offenses and joins those offenses to specific penalties. It also revises sentencing guidelines and strengthens their penalties. This enables the SEC to temporarily freeze large or unusual payments. Source HR Mart 1 Business Source Rules Sales Mart Change Staging 2 Data! Source Finance Mart 3 Are changes to data ON THE WAY IN to the EDW equivalent to records tampering? 29
  • 30. How do we “fix” this? Answer: Move the business rules downstream, AND no-longer be forced to conform dimensions. 30
  • 31. It’s Not Just a Data Model… 31
  • 32. Move the Business Rules Downstream • No “Conforming” of Dimensions on the way in to the EDW • Hold on… We do distinguish between HARD and SOFT business rules… 32
  • 33. Hard & Soft Business Rules Hard Business Rules Soft Business Rules • Data Domain Alignment • Any requirement the (Data Type Matching) business user • Normalization (where states, that, when necessary) applied, CHANGES the data • System Column or CHANGES the meaning Computation of the data (the grain or interpretation) • Simple example that will knock the socks off your feet! 33
  • 34. Progressive Agility and Responsiveness of IT High Effort & Cost Foundational Base Built New Functional Areas Added Initial DV Build Out Low Maintenance Start Time Cycle Begins Re-Engineering does NOT occur with a Data Vault Model. This keeps costs down, and maintenance easy. It also reduces complexity of the existing architecture. 34
  • 35. NO Re-Engineering Current Sources Data Vault Sales Stage Customer Copy Hub Customer Finance Stage Customer Link Transactions Copy Transaction Customer Stage Hub Hub Purchases Acct Product NO IMPACT!!! Copy NO RE-ENGINEERING! ** NEW SYSTEM** 35
  • 36. Keys to Success Bringing the Data Vault to Your Project 36
  • 37. Key: Flexibility Adding new components to the EDW has NEAR ZERO impact to: • Existing Loading Processes • Existing Data Model • Existing Reporting & BI Functions • Existing Source Systems • Existing Star Schemas and Data Marts 37
  • 38. Case In Point: Result of flexibility of the Data Vault Model allowed them to merge 3 companies in 90 days – that is ALL systems, ALL DATA! 38
  • 39. Key: Scalability in Architecture Scaling is easy, its based on the following principles • Hub and spoke design • MPP Shared-Nothing Architecture • Scale Free Networks 39
  • 40. Case In Point: Result of scalability was to produce a Data Vault model that scaled to 3 Petabytes in size, and is still growing today! 40
  • 41. Key: Scalability in Team Size You should be able to SCALE your TEAM as well! With the Data Vault methodology, you can: Scale your team when desired, at different points in the project! 41
  • 42. Case In Point: (Dutch Tax Authority) Result of scalability was to increase ETL developers for each new source system, and reassign them when the system was completely loaded to the Data Vault 42
  • 43. Key: Productivity Increasing Productivity requires a reduction in complexity. The Data Vault Model simplifies all of the following: • ETL Loading Routines • Real-Time Ingestion of Data • Data Modeling for the EDW • Enhancing and Adapting for Change to the Model • Ease of Monitoring, managing and optimizing processes 43
  • 44. Case in Point: Result of Productivity was: 2 people in 2 weeks merged 3 systems, built a full Data Vault EDW, 5 star schemas and 3 reports. These individuals generated: • 90% of the ETL code for moving the data set • 100% of the Staging Data Model • 75% of the finished EDW data Model • 75% of the star schema data model 44
  • 45. The Competing Bid? The competition bid this with 15 people and 3 months to completion, at a cost of $250k! (they bid a Very complex system) Our total cost? $30k and 2 weeks! 45
  • 46. Results? Changing the direction of the river takes less effort than stopping the flow of water 46
  • 47. 47
  • 48. When NOT to use the Data Vault A review of some reasons why not to use a Data Vault Model 48
  • 49. When NOT to Use the Data Vault • You have: o a small set of point solution requirements o a very short time-frame for delivery o To use the data one-time, then throw it away o a single source system, single source application o A single business analyst in the entire company • You do NOT have: o audit requirements forcing you to keep history o multiple data center consolidation efforts o near-real-time to worry about o massive batch data to integrate o External data feeds outside your control o Requirements to do trend analysis of all your data o Pain – that forces you to reengineer every time you ask for a change to your current data warehousing systems 49
  • 50. Ontologies & Data Vault Hub, Link, Satellite - Definitions 50
  • 51. Business Keys = Ontology Firm Name Business Keys should be arranged in an ontology Drug Listing In order to learn the Product Number dependencies of the data set Dose Form Code NDA Application # NOTE: Different Ontologies represent different business views of Drug Label Code the data! Patent Number Patent Use Code 51
  • 52. Associations = Ontological Hooks Firm Name Firms Generate Drug Listing Product Listings Firms Manufacture Product Number Products Listings for Products are NDA Application # in NDA Applications Business Keys are associated by many linking factors, these links comprise the associations in the hierarchy. 52
  • 53. Descriptors = Context Firm Firm Name Locations Firms Generate Listing Drug Listing Product Listings Formulation Firms Manufacture Product Number Products Product Start & End of Ingredients manufacturing Descriptors provide the context at a specific point in time – they are the warehousing portion of the Data Vault 53
  • 54. A working Example National Drug Codes + Orange Book of Drug Patent Applications http://www.accessdata.fda.gov/scripts/cder/ndc/default.cfm http://www.fda.gov/Drugs/InformationOnDrugs/ucm129662.htm 54
  • 55. Hub Table Structures SQN = Sequence (insertion order) LDTS = Load Date (when the Warehouse first sees the data) RSRC = Record Source (System + App where the data ORIGINATED) 55
  • 56. Link Table Structures Note: A Link is really no different than a factless fact! 56
  • 57. Satellite Table Structures SQN = Sequence (parent identity number) LDTS = Load Date (when the Warehouse first sees the data) LEDTS = End of lifecycle for superseded record RSRC = Record Source (System + App where the data ORIGINATED) 57
  • 58. In Review… • Data Vault is… o A Data Warehouse Model & Methodology o Hub and Spoke Design o Simple, Easy, Repeatable Structures o Comprised of Standards, Rules & Procedures o Made up of Ontological Metadata o AUTOMATABLE!!! • Hubs = Business Keys • Links = Associations / Transactions • Satellites = Descriptors 58
  • 59. Why do we build Links this way? 59
  • 60. History Teaches Us… Portfolio The EDW is designed to handle TODAY’S 1 Today: relationship, as soon as history is loaded, it M breaks the model! Customer Hub Portfolio 1 Portfolio 5 years M From now M M Customer Hub Customer Portfolio M 10 Years ago 1 Customer This situation forces re-engineering of the model, load routines, and queries! 60
  • 61. History Teaches Us… Portfolio 1 Today: M Hub Portfolio Customer 1 M Portfolio 5 years LNK M from now Cust-Port M M Customer 1 Hub Customer Portfolio M 10 Years ago This design is flexible, handles 1 past, present, and future relationship changes Customer with NO RE-ENGINEERING! 61
  • 62. Applying the Data Vault to Global DW Manufacturing EDW Planning in Brazil in China Hub Hub Link Sat Sat Link Sat Sat Link Hub Link Hub Hub Sat Sat Sat Sat Sat Sat Sat Sat Base EDW Created in Corporate Financials in USA 62
  • 63. Query Performance Point-in-time and Bridge Tables, overcoming query issues 63
  • 64. PIT Table Architecture Satellite: Point In Time PARENT SEQUENCE Primary LOAD DATE Key {Satellite 1 Load Date} {Satellite 2 Load Date} {Satellite 3 Load Date} {…} PIT Sat {Satellite N Load Date} Sat 1 Sat 2 Hub PIT Sat Sat 3 Order Sat 1 Sat 4 Sat 2 Hub Hub Sat 1 Sat 3 Customer Product Sat 2 Link Line Sat 4 Item Satellite Line Item 64
  • 65. PIT Table Example SAT_CUST_CONTACT_NAME SAT_CUST_CONTACT_CELL SAT_CUST_CONTACT_ADDR SQN LOAD_DTS NAME SQN LOAD_DTS CELL SQN LOAD_DTS ADDR 1 10-14-2000 Dan L 1 10-14-2000 999-555-1212 1 08-01-2000 26 Prospect 1 11-01-2000 Dan Linedt 1 10-15-2000 999-111-1234 1 09-29-2000 26 Prosp St. 1 12-31-2000 Dan Linstedt 1 10-16-2000 999-252-2834 1 12-17-2000 28 November 1 10-17-2000 999.257-2837 1 01-01-2001 26 Prospect St 1 10-18-2000 999-273-5555 SQN LOAD_DTS SAT_NAME_LDTS SAT_CELL_LDTS SAT_ADDR_LDTS 1 08-01-2000 NULL NULL 08-01-2000 1 09-01-2000 NULL NULL 08-01-2000 1 10-01-2000 NULL NULL 09-29-2000 1 11-01-2000 11-01-2000 10-18-2000 09-29-2000 1 12-01-2000 11-01-2000 10-18-2000 09-29-2000 1 01-01-2001 12-31-2000 10-18-2000 01-01-2001 Snapshot Date 65
  • 66. BridgeTable Architecture Satellite: Bridge Primary UNIQUE SEQUENCE Key LOAD DATE {Hub 1 Sequence #} {Hub 2 Sequence #} {Hub 3 Sequence #} {Link 1 Sequence #} {Link 2 Sequence #} {…} {Link N Sequence #} {Hub 1 Business Key} {Hub 2 Business Key} {…} Bridge {Hub N Business Key} Sat 1 Sat 2 Hub Hub Link Link Hub Parts Sat 3 Seller Product Sat 4 Satellite Satellite 66
  • 67. Bridge Table Data Example Bridge Table: Seller by Product by Part SQN LOAD_DTS SELL_SQN SELL_ID PROD_SQN PROD_NUM PART_SQN PART_NUM 1 08-01-2000 15 NY*1 2756 ABC-123-9K 525 JK*2*4 2 09-01-2000 16 CO*24 2654 DEF-847-0L 324 MN*5-2 3 10-01-2000 16 CO*24 82374 PPA-252-2A 9938 DD*2*3 4 11-01-2000 24 AZ*25 25222 UIF-525-88 7 UF*9*0 5 12-01-2000 99 NM*5 81 DAN-347-7F 16 KI*9-2 6 01-01-2001 99 NM*5 81 DAN-347-7F 24 DL*0-5 Snapshot Date 67
  • 69. Where To Learn More • The Technical Modeling Book: http://LearnDataVault.com/ • On-Line Training direct from me: http://LearnDataVault.com/training • The Discussion Forums: & events http://LinkedIn.com – Data Vault Discussions • Contact me: http://DanLinstedt.com - web site DanLinstedt@gmail.com - email 69
  • 70. LIVE DEMONSTRATION Physical Demonstration, Loading Processes and Execution 70

Hinweis der Redaktion

  1. You’re not the first, nor will you be the last one to use it.Some of the worlds biggest companies are implementing Data Vaults.From Diamler Motors to Lockheed Martin, to the Department of Defense.JPMorgan and Chase used the Data Vault model to merge 3 companies in 90 days!
  2. Beginning: 5 advanced ETLBy the 1st month, they 5 advanced, and 15 basic/introBy the 6th month, they 5 advanced, but 50 basicBy the end of the 8th month they went to production with 10 MF sourcesAnd their team size was: 12 people (5 advanced, 7 basic – for support).