SlideShare a Scribd company logo
1 of 90
Download to read offline
Agile Data Warehouse Design with Big Data
John DiPietro & Jim Stagnitto!1
Agenda
• Introduction / a2c Overview
• Modeling for End Users
• Role of Dimensional Models in Big Data
• Example: eCommerce
• Structured Data: Sales
• Semi-structured Data: Clickstream
• Agile Dimensional Modeling Overview
• Case Study Review
• Q&A
!2
Introduction
• a2c
• Boutique EDM (Enterprise Data Management)
consultancy firm:
• Data Warehousing
• Master Data Management
• Closed Look Analytics and Visualization
• Data & Application Architecture
• John DiPietro
• Principal, Chief Technology Officer
• Jim Stagnitto
• Data Warehouse & MDM Architect
!3
a2c Corporate Overview
& Industry Experience
!4
Company Overview
• Technology Solution Consultancy headquartered in Philadelphia with
regional offices in New York and Boston
• Servicing Healthcare, Life Science, Tel-Com and Financial Services
industries with recent obtainment of our GSA schedule to pursue Federal
Government opportunities
• Consultant base of over 2500 proven IT professionals throughout the North
East Region with a recruiting network which provides national coverage
• Flexible approach to helping our clients with their initiatives
• Project-based Solutions
• Staff Augmentation
• Managed Service Offerings – “On-Shore QA , Development & Application Support”
• Executive & Professional Search
!5
Competitive Advantage
• Founders of a2c were part of the fastest growing privately held IT consulting and staff
augmentation firm in the US from 1994-2002. Our Executive Management Team has over a
100 years collective experience and been responsible for delivering over a half-billion dollars
of IT Consulting and staff augmentation revenue from 1994 through to the present day.
• a2c’s Recruiting Engine and Methodology is one of the best in the industry, capable of
producing quality results, on-demand for our clients
• Resource Managers continually “Silo” disciplines with available candidates whom have
proven their abilities with us over the last 10 years
• Our solutions organization is instrumentally involved during the screening and selection
process to ensure that candidates submitted to our clients are an ideal match
• a2c’s Culture provides an ability to attract and retain the best talent in the industry and fosters
creativity, integrity, growth and teamwork
• a2c provides our clients with an alternative solution to a “Big 4” consultancy at substantial
savings for projects that are between $500K and $5M due to our flexibility, agility and focus
!6
Representative Clients
03/19/12
!7
a2c Solution Engagement Structures
• Technology Strategy & Roadmap Formulation
• Needs & Readiness Assessment
• Package & Platform Selections
• Proof of Concept Implementation
• Requirements Discovery & Specifications
• Program/Project Management
• Full Life Cycle & Application Development
• Infrastructure & Facilities Initiatives
• Managed Services & Maintenance Support
!8
a2c Solutions Capabilities
• Enterprise Data Management Practice helps clients manage their complete Information
Lifecycle from their On-line Transactional systems to their Data Warehousing, Enterprise
Reporting, Data Migration, Back-Up and Recovery Strategies (See Slide 7)
• Business Architecture & Optimization Practice utilizes “Six Sigma Lean” methodologies to
analyze, re-engineer and automate our client’s business processes to leverage human
workflow and business rules engine technologies to create efficiencies and provide
business unit owners with the necessary metrics to continually improve performance
• Program Management Office oversees all aspects of solutions planning and delivery
across client engagement teams and provides the methodology and frameworks which
are based on PMI® industry standards
• Application Development & Managed Services Practice helps clients architect, implement
and deploy the latest Microsoft and Enterprise Java based applications which are built on
proven frameworks and architectures for the enterprise
• a2c's SDLC Delivery Model is comprised of over 20 years collective best practices and
industry proven methodologies that allow our delivery teams to rapidly design, develop
and implement solutions. Our SDLC model has been designed to complement our project
management methodology, utilizing iterative development cycles that enable project
teams to provide consistently high quality, on-time deliverables, regardless of technology
platform
!9
Agile DW Design
Overview
!10
Modeling for End Users
• How to Design to Answer
Business Questions?
• Think about how questions are articulated
• And how the answers should be
deliveredIdentify a common question
framework
• Design an architecture that
embraces and leverages this
common question framework
• Utilize the best designs and
technologies to:
• (a) derive the answers
• (b) present them in compelling ways that
lead to the next interesting question!
!11
What
How Do We Ask Questions?
“How do this quarter’s sales by sales rep of
electronic products that we promoted to retail
customers in the east compare with last year’s?
What
Who
Who
When
WhenWhere Why
!12
How Do We Ask Questions?
• Events / Transactions
• e.g. Sale
• a immutable "fact" that occurs in a time and (typically a)
place
• Interrogatives:
• Who, What, When, Where, Why
• Descriptive context that fully describes the event
• a set of “dimensions" that describe events
!13
Dimensional Value Proposition
• It makes sense to present answers to people using the same
taxonomy of events and interrogatives (aka: facts and dimensions
- dimensional structure) that they use when forming questions
• Events are instances of processes :
• It’s best to present information to people who will ask the system
questions in dimensional form
• This is true regardless of the type of information being
interrogated, it’s source, or IT stuff (like database technologies
utilized)
• It’s best to model this presentation layer based on the events (aka:
business processes) that underlie the questions
!14
How
Many
Why
Where How
WhoWhen
What
!15
Scenarios
• A brief discussion of how and where
dimensional modeling and/or
databases fit within common and
emerging “big data” data
warehousing architectures
!16
Kimball Dimensional DW
Dimensional BI Semantic Layer
Dimensional Data Warehouse
Data Movement / Integration
Source Data
(Structured)
!17
Kimball with Big Data
Dimensional BI Semantic Layer
Dimensional Data Warehouse
Data Movement / Integration Tier
Source Data Tier
(Un/Semi-Structured)
Big Data
Capture
(e.g. HDFS)
Big Data
Discovery
(e.g. MR)
Data Movement / Integration Tier
Source Data Tier
(Structured)
!18
Corporate Information Factory (CIF)
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Data Movement / Integration
Source Data
(Structured)
Corporate Information Factory 3NF DW
!19
CIF with Big Data
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Data Movement / Integration Tier
Source Data Tier
(Un/Semi-Structured)
Big Data
Capture
(e.g. HDFS)
Big Data
Discovery
(e.g. MR)
Data Movement / Integration Tier
Source Data Tier
(Structured)
Corporate Information
Factory 3NF DW
!20
Data Vault
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Data Movement / Integration
Source Data
(Structured)
Data Vault
!21
Data Vault with Big Data
Dimensional BI Semantic Layer
Dimensional Tier
(Virtual or Physical)
Data Movement / Integration Tier
Source Data Tier
(Un/Semi-Structured)
Big Data
Capture
(e.g. HDFS)
Big Data
Discovery
(e.g. MR)
Data Movement / Integration Tier
Source Data Tier
(Structured)
Data Vault
!22
Etc.
!23
Common Framework
Dimensional BI Semantic Layer
Dimensional Tier
[Physical (Kimball) or Virtual (CIF or Data Vault)
Un/Semi-Structured Data
Movement
Un/Semi-Structured Source Data
Persistant Un/
Semi-Structured
Staging Area
Unstructured ->
Structured
Data Discovery
Processing
Structured Data Movement
Structured Source Data
(Structured)
Persistent Structured Data
Repository
(not needed for Kimball)
!24
Insight
Generation /
Data Mining
Kitchen
Off Limits to End Users
Data Professionals Only Please
Dangerous / Inhospitable Environment
Data Assets “Not Ready for Primetime”
Structured Variably For Data Processing
Dining Room
Readily Accessible to End Users
(and BI Developers)
Safe, Hospital Environment
Data Assets “Ready for Primetime”
Dimensionally Structured
Common Framework
Dimensional BI Semantic Layer
Dimensional Tier
[Physical (Kimball) or Virtual (CIF or Data Vault)
Un/Semi-Structured Data Movement
Un/Semi-Structured Source Data
Persistant Un/
Semi-Structured
Staging Area
Unstructured ->
Structured Data
Discovery
Processing
Structured Data Movement
Structured Source Data
(Structured)
Persistent Structured Data
Repository
(not needed for Kimball)
eCommerce ExampleClickstream Data eCommerce Sale
!25
eCommerce Example: Clickstream
Raw Clickstream Data!
25 52 164 240 274 328 368 448 538 561 630 687 730 775 825
834
39 120 124 205 401 581 704 814 825 834
35 249 674 712 733 759 854 950
39 422 449 704 825 857 895 937 954 964
15 229 262 283 294 352 381 708 738 766 853 883 966 978
26 104 143 320 569 620 798
7 185 214 350 529 658 682 782 809 849 883 947 970 979
227 390
71 192 208 272 279 280 300 333 496 529 530 597 618 674 675
720 855 914 932
183 193 217 256 276 277 374 474 483 496 512 529 626 653 706
878 939
161 175 177 424 490 571 597 623 766 795 853 910 960
125 130 327 698 699 839
392 461 569 801 862
27 78 104 177 733 775 781 845 900 921 938
101 147 229 350 411 461 572 579 657 675 778 803 842 903
71 208 217 266 279 290 458 478 523 614 766 853 888 944 969
43 70 176 204 227 334 369 480 513 703 708 835 874 895
25 52 278 730
151 432 504 830 890
71 73 118 274 310 327 388 419 449 469 484 706 722 795 810
844 846 918
130 274 432 528 967
188 307 326 381 403 523 526 722 774 788 789 834 950 975
89 116 198 201 333 395 653 720 846
70 171 227 289 462 538 541 623 674 701 805 946 964
143 192 317 471 487 631 638 640 678 735 780 865 888 935
17 242 471 758 763 837 956
52 145 161 283 375 385 676 721 731 790 792 885
182 229 276 529
43 522 565 617 859
Semi-Structured
Recording of every page request
made by a user
Includes some structural elements –
such as when the request was
made and who the user is
Requires significant prep work in
order to fit into a traditional row-
based relational database
Apples and Oranges: Pre-
Sessionized Page Visits, Detailed
Product Views, Catalogue
Requests, Shopping Cart Adds /
Deletes / Abandons, etc.
Needs to be converted into
seperate-but-relatable dimensional
facts - with many shared
(conformed) dimensions
!26
Typical Clickstream “Page View” Dimensional
Model
What
Why Who
When
What
!27
eCommerce Example: Web Sales
• Fully Structured
• The Sale Transaction
typically carries all
fundamental dimensions:
• Time
• Customer
• Referring URL / Search
Phrase
• Product
• Purchase and/or Shipment
(Geo or URL) Locations
• Promotion / Campaign
• Etc.
• And “How Many”
Measures
• Unit and Price Quantities /
Amounts
• Discount Amounts
• Etc
!28
eCommerce Dimensionality
Facts (below) &
Dimensions (right)
Time!
(When)
Customer!
(Who)
Web Page!
(Where)
Product!
(What)
Referring
URL!
(Where)
Promotion
/
Campaign
(Why)
Activity
Type
(How)
Page Visit
View Start
View End
Session
Start
Session End
Visitor
Current

Previous
Next
✔
Detailed Product
View
View Start
View End
Session
Start
Session End
Prospect
Current

Previous
Next
✔ ✔
Shopping Cart
Activity
Activity Start
Activity End
Prospect ✔ ✔ ✔ ✔
Sale (Checkout)
Sale Start
Sale End
Customer ✔ ✔ ✔ ✔
Shipment / Delivery
Shipment
Delivery
Customer
Delivery
Recipient
✔
!29
Agile DW Design
Overview
!30
The first dimensional modeler:
R.K.Ralph Kimball?Rudyard Kipling
!31
–Rudyard Kipling
I keep six honest serving-men

(They taught me all I knew);

Their names are What and Why and When 

And How and Where and Who…
!32
!32
Who
!33
What
!34
When
!35
Where
!36
Why
!37
How
!38
How Many
!39
The 7WsFramework
How	

Many
Why
Where How
WhoWhen
What
How did we get here?
Corporate Information
Factory	

!
Data-Driven Analysis
Undisciplined Dimensional	

!
Report-Driven Analysis
Dimensional Bus
Architecture	

!
Process-Driven Analysis
DW Architectures: A Brief History
7Ws Dimensional Model
How – Facts:	

Much	

Many	

Often	

£ $ €
Where	

Location	

Geographic	

Store	

Ship To	

Hospital
Who	

Customer	

Employee	

Third Party	

Organization
What	

Product	

Service	

Transactions
When	

Time	

Day	

Month	

Fiscal Period
Why	

Causal	

Promotion	

Reason	

Weather	

Competition
??
Where
WhoWhen
What
HowWhy
How	

Many
BEAMBusiness Event Analysis & Modeling
How
do you design a data warehouse?
Tech Design Artifacts?
CALENDAR
Date Key
Date
Day
Day in Week
Day in Month
Day in Qtr
Day in Year
Month
Qtr
Year
Weekday Flag
Holiday Flag
PRODUCT
Product Key
Product Code
Product Description
Product Type
Brand
Subcategory
Category
PROMOTION
Promotion Key
Promotion Code
Promotion Name
Promotion Type
Discount Type
Ad Type
SALES FACT
Quantity Sold
Revenue
Cost
Basket Count
Date Key
Product Key
Store Key
Promotion Key
STORE
Store Key
Store Code
Store Name
URL
Store Manager
Region
Country
OK, NowValidate with
Why
Agile Data Warehousing?
Waterfall BI/DW
Analysis
Design
Development
Test
Release
Limited Stakeholder interaction
DATA
VALUE?Data	

Model
Stakeholder	

Input
ETL BIRequirements
BDUF
Next YearThis Year
Agile DW/BI Development
Iteration nIteration …Iteration 3Iteration 1
VALUE!VALUE? VALUE VALUE! VALUE!
Iteration 2
Stakeholder interaction
Next YearThis Year
Review	

Release
BI	

Prototyping
ETL
?
RevBIETLADM
JEDUF
DATA
State of The
DW Field
Solid:
Dimensional Data Warehouse Design is Mature
Proven Design Patterns Exist for Common
Requirements
Hit or Miss:
Collecting Unambiguous and Thorough
Requirements
Slotting Requirements into Proven Design
Patterns
End-User Ownership and Validation
Too Often: Snatching Defeat from the Jaws of
Victory
!52
Quick
Modelstorming
Data

Modeler BI Stakeholders
Inclusive
Interactive Fun
Structured, non-technical, collaborative working
conversation directly with BI Users
• BI User’s Business
Process, Organizational,
Hierarchical, and Data
Knowledge	

• Focused Data Profiling
• Logical and Physical
(Kimball-esque)
Dimensional Data Models	

• Example data	

• Detailed and Testable ETL
Specification	

• Instantiated DW
Prototype
BEAM✲
BEAM✲ Methodology
Data

Modeler
BI Stakeholders
Requirements =
Design
55
Collaboration at Every
Step
Agile Data Modeling Requirements
• Techniques for encouraging interaction
• Must use simple, inclusive notation and tools
• Must be quick: hours rather than days – modelstorming
• Balance ‘just in time’ (JIT) and ‘just enough design up
front’ (JEDUF) to reduce design rework
• DW designers must embrace data model change, allow models
to evolve, avoid generic data models; need design patterns they
can trust to represent tomorrow’s BI requirements tomorrow
• ETL and BI developers must embrace database change; need
tool support
!57
Whatkind of model?
CALENDAR
Date Key
Date	

Day	

Day in Week	

Day in Month	

Day in Qtr	

Day in Year	

Month	

Qtr	

Year	

Weekday Flag	

Holiday Flag
PRODUCT
Product Key
Product Code	

Product Description	

Product Type 	

Brand 	

Subcategory 	

Category
PROMOTION
Promotion Key
Promotion Code	

Promotion Name	

Promotion Type	

Discount Type	

Ad Type
SALES FACT
Quantity Sold 	

Revenue	

Cost	

Basket Count
Date Key	

Product Key	

Store Key	

Promotion Key
STORE
Store Key
Store Code	

Store Name	

URL	

Store Manager	

Region 	

Country
Customer
Country
Customer Type
Product Type
Category
Product
Month
Calendar
Holiday Type
Store Type
Store
City
Sales Fact
Modeling by Abstraction
Modeling by Example
Agile DW Design
Process
64
Who does what?
SubjectsVerb Objects
“Customers buy products”
BEAM✲
Modeler
BI Users
Collaborative / Conversational Design
Design Using Natural Language
• Verbs – Events – Relationships – Fact Tables
• Nouns – Details – Entities – Dimensions
• Main Clause – Subject-Verb-Object
• Prepositions – connect additional details to the
main clause
• Interrogatives – The 7Ws – Dimension Types
• Business Vocabulary - no IT-Speak
!66
“Spreadsheet”-like Models
Details
Example Data (4-6
rows)
Subject Column Name
Object Column Name
Verb
Interrogative
Event Table Name (filled in later)
Straightforward Methodology
Who
What
When
Where
How
(many)
Why
How
1
1
1
1
1
1
1
1
1
1
1
3
1
1
1
1
1
4
1
1
1
1
1
5
1
1
1
1
1
2
1
1
1
1
1
6
1
1
1
1
1
7
1
1
1
1
1
8
1
1
1
1
1
9
Declare Event Type
Subject-Verb-Object
Quantities - Facts
Sufficient Detail Fact
Granularity
Initial Data Examples
Capture Example Data
Engage business users
Clarify definitions / Conform Dimensions
Illustrate exceptions
Drive out uniqueness
“Show and tell”
verb on/at/every
SUBJECT OBJECT EVENT 

DATE
[who] [what] [when] [where] [how many] [why] [how]
Typical Typical/Popular Typical Typical Typical/Average Typical/Normal Typical/Normal
Different Different Different Different Different Different Different
Repeat Repeat Repeat Repeat Repeat Repeat Repeat
Missing Missing Missing Missing Missing Missing Missing
Group Multiple/Bundle Multi-Level Multiple Values
Old, Low Old, Low Value Oldest needed Near Min, Negative, 0
New, High New, High Most Recent, Future Far Max, Precision Exceptional Exceptional
Thoughtful Example Data
Detailed ETL
Specification
Identify Event Type Early
Adjust Conversation Based on Event Type
• Discrete Event - Transaction
• Instantaneous/short duration, irregularly occurring events or
transactions
• Recurring Event - Periodic Snapshot – measurement
• Regularly occurring events, ongoing processes, typically use to
measure cumulative of discrete events
• Evolving Event - Accumulating Snapshot – timeline
• Non-instantaneous/longer duration, irregularly occurring events or
transactions
• Represents current status - reflects adjustments
!72
Capture When Details
When do Customers order Products?
BEAM✲
Modeler
BI Users
“On the Order Date”
Any other Whens?
Any other Whos?
And so on...
Model How Many Measures
• Additive – can be summed up over any combination
of dimensions. No special rules
• Non-additive – can not be summed over any
dimension e.g. unit price or temperature
• Must be aggregated in other ways e.g. average, min, max
• Degenerate Dimensions – transaction #, timestamps, flags
• Semi-additive – can not be summed across at least
one dimension e.g. balances can not be summed
over time
!77
Modeling Dimensions
Annotate w Targeted Data Profiling
Proceed Through the Business ProcessValue Chain
Collaborative Dimension Conformance
Dimensions
TimeShipperCustomerPlant Response Product Promotion
Sales
Campaigns
Identify Hierarchy Types
Balanced
Complex
Simple
Ragged Variable
Depth
Graphically Depict Hierarchies
Visualize The Hierarchies
Paint The Organization
Prototype! Not “Data Model Review”
Recap
• Collaborative and Agile
• Data Modeling
• Data Sourcing
• Data Conformance
• Requirements = Design
• Slots directly into proven and mature dimensional data warehousing
design patterns
• Validation through Prototyping
• Semi-automated build of dimensional data warehouse
• Perfect compliment to Agile BI Tools and Methods (e.g. Pentaho)
!87
If you have been affected by

any of the issues raised

in this presentation
!
Agile Data Warehouse Design

Lawrence Corr, Jim Stagnitto, Decision Press, November 2011	

!
Questions / Comments

More Related Content

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Agile Data Warehouse Design for Big Data Presentation

  • 1. Agile Data Warehouse Design with Big Data John DiPietro & Jim Stagnitto!1
  • 2. Agenda • Introduction / a2c Overview • Modeling for End Users • Role of Dimensional Models in Big Data • Example: eCommerce • Structured Data: Sales • Semi-structured Data: Clickstream • Agile Dimensional Modeling Overview • Case Study Review • Q&A !2
  • 3. Introduction • a2c • Boutique EDM (Enterprise Data Management) consultancy firm: • Data Warehousing • Master Data Management • Closed Look Analytics and Visualization • Data & Application Architecture • John DiPietro • Principal, Chief Technology Officer • Jim Stagnitto • Data Warehouse & MDM Architect !3
  • 4. a2c Corporate Overview & Industry Experience !4
  • 5. Company Overview • Technology Solution Consultancy headquartered in Philadelphia with regional offices in New York and Boston • Servicing Healthcare, Life Science, Tel-Com and Financial Services industries with recent obtainment of our GSA schedule to pursue Federal Government opportunities • Consultant base of over 2500 proven IT professionals throughout the North East Region with a recruiting network which provides national coverage • Flexible approach to helping our clients with their initiatives • Project-based Solutions • Staff Augmentation • Managed Service Offerings – “On-Shore QA , Development & Application Support” • Executive & Professional Search !5
  • 6. Competitive Advantage • Founders of a2c were part of the fastest growing privately held IT consulting and staff augmentation firm in the US from 1994-2002. Our Executive Management Team has over a 100 years collective experience and been responsible for delivering over a half-billion dollars of IT Consulting and staff augmentation revenue from 1994 through to the present day. • a2c’s Recruiting Engine and Methodology is one of the best in the industry, capable of producing quality results, on-demand for our clients • Resource Managers continually “Silo” disciplines with available candidates whom have proven their abilities with us over the last 10 years • Our solutions organization is instrumentally involved during the screening and selection process to ensure that candidates submitted to our clients are an ideal match • a2c’s Culture provides an ability to attract and retain the best talent in the industry and fosters creativity, integrity, growth and teamwork • a2c provides our clients with an alternative solution to a “Big 4” consultancy at substantial savings for projects that are between $500K and $5M due to our flexibility, agility and focus !6
  • 8. a2c Solution Engagement Structures • Technology Strategy & Roadmap Formulation • Needs & Readiness Assessment • Package & Platform Selections • Proof of Concept Implementation • Requirements Discovery & Specifications • Program/Project Management • Full Life Cycle & Application Development • Infrastructure & Facilities Initiatives • Managed Services & Maintenance Support !8
  • 9. a2c Solutions Capabilities • Enterprise Data Management Practice helps clients manage their complete Information Lifecycle from their On-line Transactional systems to their Data Warehousing, Enterprise Reporting, Data Migration, Back-Up and Recovery Strategies (See Slide 7) • Business Architecture & Optimization Practice utilizes “Six Sigma Lean” methodologies to analyze, re-engineer and automate our client’s business processes to leverage human workflow and business rules engine technologies to create efficiencies and provide business unit owners with the necessary metrics to continually improve performance • Program Management Office oversees all aspects of solutions planning and delivery across client engagement teams and provides the methodology and frameworks which are based on PMI® industry standards • Application Development & Managed Services Practice helps clients architect, implement and deploy the latest Microsoft and Enterprise Java based applications which are built on proven frameworks and architectures for the enterprise • a2c's SDLC Delivery Model is comprised of over 20 years collective best practices and industry proven methodologies that allow our delivery teams to rapidly design, develop and implement solutions. Our SDLC model has been designed to complement our project management methodology, utilizing iterative development cycles that enable project teams to provide consistently high quality, on-time deliverables, regardless of technology platform !9
  • 11. Modeling for End Users • How to Design to Answer Business Questions? • Think about how questions are articulated • And how the answers should be deliveredIdentify a common question framework • Design an architecture that embraces and leverages this common question framework • Utilize the best designs and technologies to: • (a) derive the answers • (b) present them in compelling ways that lead to the next interesting question! !11
  • 12. What How Do We Ask Questions? “How do this quarter’s sales by sales rep of electronic products that we promoted to retail customers in the east compare with last year’s? What Who Who When WhenWhere Why !12
  • 13. How Do We Ask Questions? • Events / Transactions • e.g. Sale • a immutable "fact" that occurs in a time and (typically a) place • Interrogatives: • Who, What, When, Where, Why • Descriptive context that fully describes the event • a set of “dimensions" that describe events !13
  • 14. Dimensional Value Proposition • It makes sense to present answers to people using the same taxonomy of events and interrogatives (aka: facts and dimensions - dimensional structure) that they use when forming questions • Events are instances of processes : • It’s best to present information to people who will ask the system questions in dimensional form • This is true regardless of the type of information being interrogated, it’s source, or IT stuff (like database technologies utilized) • It’s best to model this presentation layer based on the events (aka: business processes) that underlie the questions !14
  • 16. Scenarios • A brief discussion of how and where dimensional modeling and/or databases fit within common and emerging “big data” data warehousing architectures !16
  • 17. Kimball Dimensional DW Dimensional BI Semantic Layer Dimensional Data Warehouse Data Movement / Integration Source Data (Structured) !17
  • 18. Kimball with Big Data Dimensional BI Semantic Layer Dimensional Data Warehouse Data Movement / Integration Tier Source Data Tier (Un/Semi-Structured) Big Data Capture (e.g. HDFS) Big Data Discovery (e.g. MR) Data Movement / Integration Tier Source Data Tier (Structured) !18
  • 19. Corporate Information Factory (CIF) Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Data Movement / Integration Source Data (Structured) Corporate Information Factory 3NF DW !19
  • 20. CIF with Big Data Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Data Movement / Integration Tier Source Data Tier (Un/Semi-Structured) Big Data Capture (e.g. HDFS) Big Data Discovery (e.g. MR) Data Movement / Integration Tier Source Data Tier (Structured) Corporate Information Factory 3NF DW !20
  • 21. Data Vault Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Data Movement / Integration Source Data (Structured) Data Vault !21
  • 22. Data Vault with Big Data Dimensional BI Semantic Layer Dimensional Tier (Virtual or Physical) Data Movement / Integration Tier Source Data Tier (Un/Semi-Structured) Big Data Capture (e.g. HDFS) Big Data Discovery (e.g. MR) Data Movement / Integration Tier Source Data Tier (Structured) Data Vault !22
  • 24. Common Framework Dimensional BI Semantic Layer Dimensional Tier [Physical (Kimball) or Virtual (CIF or Data Vault) Un/Semi-Structured Data Movement Un/Semi-Structured Source Data Persistant Un/ Semi-Structured Staging Area Unstructured -> Structured Data Discovery Processing Structured Data Movement Structured Source Data (Structured) Persistent Structured Data Repository (not needed for Kimball) !24 Insight Generation / Data Mining
  • 25. Kitchen Off Limits to End Users Data Professionals Only Please Dangerous / Inhospitable Environment Data Assets “Not Ready for Primetime” Structured Variably For Data Processing Dining Room Readily Accessible to End Users (and BI Developers) Safe, Hospital Environment Data Assets “Ready for Primetime” Dimensionally Structured Common Framework Dimensional BI Semantic Layer Dimensional Tier [Physical (Kimball) or Virtual (CIF or Data Vault) Un/Semi-Structured Data Movement Un/Semi-Structured Source Data Persistant Un/ Semi-Structured Staging Area Unstructured -> Structured Data Discovery Processing Structured Data Movement Structured Source Data (Structured) Persistent Structured Data Repository (not needed for Kimball) eCommerce ExampleClickstream Data eCommerce Sale !25
  • 26. eCommerce Example: Clickstream Raw Clickstream Data! 25 52 164 240 274 328 368 448 538 561 630 687 730 775 825 834 39 120 124 205 401 581 704 814 825 834 35 249 674 712 733 759 854 950 39 422 449 704 825 857 895 937 954 964 15 229 262 283 294 352 381 708 738 766 853 883 966 978 26 104 143 320 569 620 798 7 185 214 350 529 658 682 782 809 849 883 947 970 979 227 390 71 192 208 272 279 280 300 333 496 529 530 597 618 674 675 720 855 914 932 183 193 217 256 276 277 374 474 483 496 512 529 626 653 706 878 939 161 175 177 424 490 571 597 623 766 795 853 910 960 125 130 327 698 699 839 392 461 569 801 862 27 78 104 177 733 775 781 845 900 921 938 101 147 229 350 411 461 572 579 657 675 778 803 842 903 71 208 217 266 279 290 458 478 523 614 766 853 888 944 969 43 70 176 204 227 334 369 480 513 703 708 835 874 895 25 52 278 730 151 432 504 830 890 71 73 118 274 310 327 388 419 449 469 484 706 722 795 810 844 846 918 130 274 432 528 967 188 307 326 381 403 523 526 722 774 788 789 834 950 975 89 116 198 201 333 395 653 720 846 70 171 227 289 462 538 541 623 674 701 805 946 964 143 192 317 471 487 631 638 640 678 735 780 865 888 935 17 242 471 758 763 837 956 52 145 161 283 375 385 676 721 731 790 792 885 182 229 276 529 43 522 565 617 859 Semi-Structured Recording of every page request made by a user Includes some structural elements – such as when the request was made and who the user is Requires significant prep work in order to fit into a traditional row- based relational database Apples and Oranges: Pre- Sessionized Page Visits, Detailed Product Views, Catalogue Requests, Shopping Cart Adds / Deletes / Abandons, etc. Needs to be converted into seperate-but-relatable dimensional facts - with many shared (conformed) dimensions !26
  • 27. Typical Clickstream “Page View” Dimensional Model What Why Who When What !27
  • 28. eCommerce Example: Web Sales • Fully Structured • The Sale Transaction typically carries all fundamental dimensions: • Time • Customer • Referring URL / Search Phrase • Product • Purchase and/or Shipment (Geo or URL) Locations • Promotion / Campaign • Etc. • And “How Many” Measures • Unit and Price Quantities / Amounts • Discount Amounts • Etc !28
  • 29. eCommerce Dimensionality Facts (below) & Dimensions (right) Time! (When) Customer! (Who) Web Page! (Where) Product! (What) Referring URL! (Where) Promotion / Campaign (Why) Activity Type (How) Page Visit View Start View End Session Start Session End Visitor Current
 Previous Next ✔ Detailed Product View View Start View End Session Start Session End Prospect Current
 Previous Next ✔ ✔ Shopping Cart Activity Activity Start Activity End Prospect ✔ ✔ ✔ ✔ Sale (Checkout) Sale Start Sale End Customer ✔ ✔ ✔ ✔ Shipment / Delivery Shipment Delivery Customer Delivery Recipient ✔ !29
  • 31. The first dimensional modeler: R.K.Ralph Kimball?Rudyard Kipling !31
  • 32. –Rudyard Kipling I keep six honest serving-men
 (They taught me all I knew);
 Their names are What and Why and When 
 And How and Where and Who… !32 !32
  • 42. How did we get here?
  • 43. Corporate Information Factory ! Data-Driven Analysis Undisciplined Dimensional ! Report-Driven Analysis Dimensional Bus Architecture ! Process-Driven Analysis DW Architectures: A Brief History
  • 44. 7Ws Dimensional Model How – Facts: Much Many Often £ $ € Where Location Geographic Store Ship To Hospital Who Customer Employee Third Party Organization What Product Service Transactions When Time Day Month Fiscal Period Why Causal Promotion Reason Weather Competition ??
  • 46. How do you design a data warehouse?
  • 47. Tech Design Artifacts? CALENDAR Date Key Date Day Day in Week Day in Month Day in Qtr Day in Year Month Qtr Year Weekday Flag Holiday Flag PRODUCT Product Key Product Code Product Description Product Type Brand Subcategory Category PROMOTION Promotion Key Promotion Code Promotion Name Promotion Type Discount Type Ad Type SALES FACT Quantity Sold Revenue Cost Basket Count Date Key Product Key Store Key Promotion Key STORE Store Key Store Code Store Name URL Store Manager Region Country
  • 50. Waterfall BI/DW Analysis Design Development Test Release Limited Stakeholder interaction DATA VALUE?Data Model Stakeholder Input ETL BIRequirements BDUF Next YearThis Year
  • 51. Agile DW/BI Development Iteration nIteration …Iteration 3Iteration 1 VALUE!VALUE? VALUE VALUE! VALUE! Iteration 2 Stakeholder interaction Next YearThis Year Review Release BI Prototyping ETL ? RevBIETLADM JEDUF DATA
  • 52. State of The DW Field Solid: Dimensional Data Warehouse Design is Mature Proven Design Patterns Exist for Common Requirements Hit or Miss: Collecting Unambiguous and Thorough Requirements Slotting Requirements into Proven Design Patterns End-User Ownership and Validation Too Often: Snatching Defeat from the Jaws of Victory !52
  • 54. Structured, non-technical, collaborative working conversation directly with BI Users • BI User’s Business Process, Organizational, Hierarchical, and Data Knowledge • Focused Data Profiling • Logical and Physical (Kimball-esque) Dimensional Data Models • Example data • Detailed and Testable ETL Specification • Instantiated DW Prototype BEAM✲ BEAM✲ Methodology Data
 Modeler BI Stakeholders
  • 57. Agile Data Modeling Requirements • Techniques for encouraging interaction • Must use simple, inclusive notation and tools • Must be quick: hours rather than days – modelstorming • Balance ‘just in time’ (JIT) and ‘just enough design up front’ (JEDUF) to reduce design rework • DW designers must embrace data model change, allow models to evolve, avoid generic data models; need design patterns they can trust to represent tomorrow’s BI requirements tomorrow • ETL and BI developers must embrace database change; need tool support !57
  • 59.
  • 60. CALENDAR Date Key Date Day Day in Week Day in Month Day in Qtr Day in Year Month Qtr Year Weekday Flag Holiday Flag PRODUCT Product Key Product Code Product Description Product Type Brand Subcategory Category PROMOTION Promotion Key Promotion Code Promotion Name Promotion Type Discount Type Ad Type SALES FACT Quantity Sold Revenue Cost Basket Count Date Key Product Key Store Key Promotion Key STORE Store Key Store Code Store Name URL Store Manager Region Country
  • 65. Who does what? SubjectsVerb Objects “Customers buy products” BEAM✲ Modeler BI Users Collaborative / Conversational Design
  • 66. Design Using Natural Language • Verbs – Events – Relationships – Fact Tables • Nouns – Details – Entities – Dimensions • Main Clause – Subject-Verb-Object • Prepositions – connect additional details to the main clause • Interrogatives – The 7Ws – Dimension Types • Business Vocabulary - no IT-Speak !66
  • 67. “Spreadsheet”-like Models Details Example Data (4-6 rows) Subject Column Name Object Column Name Verb Interrogative Event Table Name (filled in later)
  • 69. Capture Example Data Engage business users Clarify definitions / Conform Dimensions Illustrate exceptions Drive out uniqueness “Show and tell” verb on/at/every SUBJECT OBJECT EVENT 
 DATE [who] [what] [when] [where] [how many] [why] [how] Typical Typical/Popular Typical Typical Typical/Average Typical/Normal Typical/Normal Different Different Different Different Different Different Different Repeat Repeat Repeat Repeat Repeat Repeat Repeat Missing Missing Missing Missing Missing Missing Missing Group Multiple/Bundle Multi-Level Multiple Values Old, Low Old, Low Value Oldest needed Near Min, Negative, 0 New, High New, High Most Recent, Future Far Max, Precision Exceptional Exceptional
  • 70. Thoughtful Example Data Detailed ETL Specification
  • 72. Adjust Conversation Based on Event Type • Discrete Event - Transaction • Instantaneous/short duration, irregularly occurring events or transactions • Recurring Event - Periodic Snapshot – measurement • Regularly occurring events, ongoing processes, typically use to measure cumulative of discrete events • Evolving Event - Accumulating Snapshot – timeline • Non-instantaneous/longer duration, irregularly occurring events or transactions • Represents current status - reflects adjustments !72
  • 73. Capture When Details When do Customers order Products? BEAM✲ Modeler BI Users “On the Order Date”
  • 77. Model How Many Measures • Additive – can be summed up over any combination of dimensions. No special rules • Non-additive – can not be summed over any dimension e.g. unit price or temperature • Must be aggregated in other ways e.g. average, min, max • Degenerate Dimensions – transaction #, timestamps, flags • Semi-additive – can not be summed across at least one dimension e.g. balances can not be summed over time !77
  • 79. Annotate w Targeted Data Profiling
  • 80. Proceed Through the Business ProcessValue Chain
  • 86. Prototype! Not “Data Model Review”
  • 87. Recap • Collaborative and Agile • Data Modeling • Data Sourcing • Data Conformance • Requirements = Design • Slots directly into proven and mature dimensional data warehousing design patterns • Validation through Prototyping • Semi-automated build of dimensional data warehouse • Perfect compliment to Agile BI Tools and Methods (e.g. Pentaho) !87
  • 88. If you have been affected by
 any of the issues raised
 in this presentation
  • 89. ! Agile Data Warehouse Design
 Lawrence Corr, Jim Stagnitto, Decision Press, November 2011 !