SlideShare a Scribd company logo
1 of 38
Download to read offline
Data Warehouse Design 
Best Practices
About me 
 Project Manager @ 
 12 years professional experience 
 .NET Web Development MCPD 
 SQL Server 2012 (MCSA) 
 Business Interests 
 Web Development, SOA, Integration 
 Security  Performance Optimization 
 Horizon2020, Open BIM, GIS, Mapping 
 Contact me 
 ivelin.andreev@icb.bg 
 www.linkedin.com/in/ivelin 
 www.slideshare.net/ivoandreev 
2 |
About me 
 Senior Developer @ 
 .NET Web Development MCPD 
 Business Interests 
 Web Development, WCF, Integration 
 SQL Server – Query Optimization and Tuning 
 Data Warehousing 
 Contact me 
 georgi.mishev@icb.bg 
 www.linkedin.com/in/georgimishev
Sponsors
Agenda 
 Why Data Warehouse 
 Main DW Architectures 
 Dimensional Modeling 
 Patterns  Practices 
 DW Maintenance 
 ETL Process 
 SSIS Demo
Lots of Data Everywhere 
 Can’t find data? 
 Data scattered over the network 
 Can’t get data? 
 Need an expert to get the data 
 Can’t understand data? 
 Data poorly documented 
 Can’t use data found? 
 Data needs to be transformed
Data Warehouse? 
Def: Central repository where data are organized, cleansed 
and in standardized format. 
 Integrated 
 Heterogeneous sources 
 Data clean and conversion ($, €, 元) 
 Focus on subject 
 i.e. Customer, Sale, Product 
 Time variant 
 Timestamp every key 
 Historical data (10+ years)
Different Problems - Different Solutions 
OLTP Database Data Warehouse 
Users Customer Knowledge worker 
Design Normalized, Data Integrity Denormalized 
Function Daily operation Decision making 
Data Current, Detailed Historical, Aggregated 
Usage Real time Ad-hoc 
Access Short R/W transactions Complex R/O queries 
Data accessed Comparatively lower Large Amounts 
# Records x100 x1’000’000 
# Users x1’000 x10 
DB Size x10 GB x100GB-TB
Different DW Architectures
B.Inmon Model 
Top-Down Approach 
 Warehouse (3NF) 
 Data Mart  OLAP (MD) 
http://sqlschoolgr.files.wordpress.com/2012/03/clip_image003_thumb.png?w=640h=368
R.Kimball Model 
Bottom-Up Approach 
 Data Marts (3NF or MD) 
 Warehouse  OLAP (MD) 
http://sqlschoolgr.files.wordpress.com/2012/03/clip_image005_thumb.png?w=640h=369
Data Vault (by Dan Linstedt) 
 Hubs 
 List of unique business keys 
 Links 
 Unique relationships between keys 
 Satellites 
 Hub and Link details and history
It is irrelevant which camp you belong… 
as far as you understand why!
Making Your Choice 
• Kimball (MD) 
+ Start small, scale big 
+ Faster ROI 
+ Analytical tools 
- Low reusability 
• Data Vault 
• Inmon (3NF) 
+ Structured 
+ Easy to maintain 
+ Easier data mining 
- Timely to build 
Backend Data Warehouse 
+ Multiple sources; Full history; Incremental build 
- Up-front work; Long-term payoff; Many joins
Dimensional modeling as de-facto standard
Dimensions 
Def: The object of BI interest 
 Keys 
 Surrogate key 
 Business key 
 Hierarchical attributes 
 Analysis and Drill Down 
 Member properties 
 Presentation labels 
 Auditing information (not for end users)
Slowly Changing Dimensions 
Def: Scheme for recording changes over time 
 Type 1 - Overwrite 
 Type 2 – Multiple Records
Facts 
Def: Measurement of a business process 
 Keys 
 FK from all dimensional tables (in the star) 
 PK - Composite (usually) or Surrogate 
 Measures 
 Numeric columns, that are of interest to the business 
 Additive, Non-additive, Semi-additive 
 Factless facts 
 Auditing information (optional)
Practices and Design Patterns
Data Warehouse Pitfalls 
 Admit it is not as it seems to be 
 You need education 
 Find what is of business value 
 Rather than focus on performance 
 Spend a lot of time in Extract-Transform-Load 
 Homogenize data from different sources 
 Find (and resolve) problems in source systems
Prepare your Sources 
 Data integrity 
 Avoid redundancy 
 Data quality 
 Master data source 
 Data validation 
 Auditing 
 CreatedDate / CreatedBy 
 ChangedDate / ChangedBy 
 Nightly jobs
Dimension Design 
 Business key with non-clustered index 
 Include date (if dimension has history) 
 Surrogate key 
 The smallest possible integer 
 Clustered index 
 FK constraints 
 Do not enforce (WITH NOCHECK) 
 Document the relation 
 Faster load 
 Data validation 
 Task for the Source system
Conformed Dimensions 
Def. Having the same meaning and content 
when referred from multiple fact tables 
 Date Dimension 
 Partitioning best candidate 
 Granularity 
 Do not store every hour, when reporting daily 
 Avoid surrogate keys 
 Saves lookup and joins 
 Integer representing date (yyyyMMdd, days after 1/1/1900)
Pre-join Hierarchies 
 Recursive relationships 
 Fast drill and report 
 Pre-computed aggregations 
Hierarchy Bridge 
 For each dimension row 
 1 association with self 
 1 row for each subordinate
Determine the Facts 
The center of a Star schema 
 Identify subject areas 
 Identify key business events 
 Identify dimensions 
 Start from OLTP logical model 
 Identify historical requirements 
 Identify attributes
The Grain 
Def: The level of detail of a fact table 
 What is the business objective? 
 Fine grain - behaviour and frequency analysis 
 Coarse grain - overall and trend analysis 
 Aggregates 
 DO NOT summarize prematurely 
 DO NOT mix detail and summary 
 DO use “summary tables”
C3-PO is fluent in 6M forms of communication. 
What about your customers?
Multinational DW 
 What parts need translation? 
 Where to store various language versions? 
 How to support future languages? 
 Dimensions 
 Add language attribute 
 Include text data in the dimension 
 Problem 1: The dimension key? 
 Replicate PK for every language 
Fact.DimId = Dim.Id AND Dim.Lang=[Lang] 
 Problem 2: Storage = [Dim] x [Lang] 
 Sub-dimension with language attributes 
TxtId Attr1 Attr2 LangId 
1 large Yes En 
2 small No En 
1 stor Ja No 
2 liten Nei No 
3 … … …
Data warehouse maintenance
How Large is “Large” 
Is big really big?
Partitioning 
 Why 
 Faster index maintenance 
 Faster load 
 Faster queries 
 When 
 Tables 10GB+ 
 How 
 Do not partition dimension tables 
 Partition by date (most analysis are time-based) 
 Eliminate partitions (WHERE [PartitionKey]=…) 
 Avoid split and merge of existing partitions 
 Can cause inefficient log generation
Columnstore Index 
 Non-clustered in SQL 2012 
 Clustered in SQL 2014 
 Pros 
 Better data compression 
 High performance on table scan 
 Clustered CSI Limitations 
 No other indexes allowed 
 Little advantage on seek operations 
 No XML, computed column or replication
Extract-Transform-Load 
 Extract data from OLTP 
 Data transformations 
 Data loads 
 DW maintenance
Efficient Load Process 
 Use simple recovery model during data load 
 Staging 
 Avoid indexing 
 Populate in parallel 
 Maintain DW 
 Disable indexes on load 
 Rebuild manually after load 
 Automatic stats update slow down SQL Server
To SSIS, or not to SSIS ? 
Pros 
 Minimum coding to none 
 Extensive support of various data sources 
 Parallel execution of migration tasks 
 Better organization of the ETL process 
Cons 
 Another way of thinking 
 Hidden options 
 T-SQL developer would do much faster 
 Auto-generated flows need optimization 
 Sometimes simply does not work (i.e. Sort by GUID)
Takeaways 
 Books 
 The Data Warehouse Toolkit (3rd ed), Ralph Kimball 
 Implementing DW with Microsoft SQL Server 2012 
 Data Warehousing Fundamentals, Paulraj Ponniah 
 Articles 
 Best Practices in Data Warehouse (Hanover Research Council) 
 http://www.kimballgroup.com/category/design-tips/ 
 http://sqlmag.com/business-intelligence 
 Resources 
 http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/ 
dimensional-modeling-techniques/ 
 http://www.databaseanswers.org/data_models/index.htm
Data Warehouse Design and Best Practices

More Related Content

What's hot

Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
Shanthi Mukkavilli
 

What's hot (20)

Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Traditional data warehouse vs data lake
Traditional data warehouse vs data lakeTraditional data warehouse vs data lake
Traditional data warehouse vs data lake
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Snowflake Architecture.pptx
Snowflake Architecture.pptxSnowflake Architecture.pptx
Snowflake Architecture.pptx
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Microsoft Azure Databricks
Microsoft Azure DatabricksMicrosoft Azure Databricks
Microsoft Azure Databricks
 
Snowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the UglySnowflake: The Good, the Bad, and the Ugly
Snowflake: The Good, the Bad, and the Ugly
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 

Similar to Data Warehouse Design and Best Practices

AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 

Similar to Data Warehouse Design and Best Practices (20)

Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Business Intelligence with SQL Server
Business Intelligence with SQL ServerBusiness Intelligence with SQL Server
Business Intelligence with SQL Server
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .NetCV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Datawarehousing & DSS
Datawarehousing & DSSDatawarehousing & DSS
Datawarehousing & DSS
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 
2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure2014.11.14 Data Opportunities with Azure
2014.11.14 Data Opportunities with Azure
 
Overview of business intelligence
Overview of business intelligenceOverview of business intelligence
Overview of business intelligence
 
OLAP Cubes in Datawarehousing
OLAP Cubes in DatawarehousingOLAP Cubes in Datawarehousing
OLAP Cubes in Datawarehousing
 

More from Ivo Andreev

More from Ivo Andreev (20)

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 

Recently uploaded

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 

Data Warehouse Design and Best Practices

  • 1. Data Warehouse Design Best Practices
  • 2. About me Project Manager @ 12 years professional experience .NET Web Development MCPD SQL Server 2012 (MCSA) Business Interests Web Development, SOA, Integration Security Performance Optimization Horizon2020, Open BIM, GIS, Mapping Contact me ivelin.andreev@icb.bg www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev 2 |
  • 3. About me Senior Developer @ .NET Web Development MCPD Business Interests Web Development, WCF, Integration SQL Server – Query Optimization and Tuning Data Warehousing Contact me georgi.mishev@icb.bg www.linkedin.com/in/georgimishev
  • 5. Agenda Why Data Warehouse Main DW Architectures Dimensional Modeling Patterns Practices DW Maintenance ETL Process SSIS Demo
  • 6. Lots of Data Everywhere Can’t find data? Data scattered over the network Can’t get data? Need an expert to get the data Can’t understand data? Data poorly documented Can’t use data found? Data needs to be transformed
  • 7. Data Warehouse? Def: Central repository where data are organized, cleansed and in standardized format. Integrated Heterogeneous sources Data clean and conversion ($, €, 元) Focus on subject i.e. Customer, Sale, Product Time variant Timestamp every key Historical data (10+ years)
  • 8. Different Problems - Different Solutions OLTP Database Data Warehouse Users Customer Knowledge worker Design Normalized, Data Integrity Denormalized Function Daily operation Decision making Data Current, Detailed Historical, Aggregated Usage Real time Ad-hoc Access Short R/W transactions Complex R/O queries Data accessed Comparatively lower Large Amounts # Records x100 x1’000’000 # Users x1’000 x10 DB Size x10 GB x100GB-TB
  • 10. B.Inmon Model Top-Down Approach Warehouse (3NF) Data Mart OLAP (MD) http://sqlschoolgr.files.wordpress.com/2012/03/clip_image003_thumb.png?w=640h=368
  • 11. R.Kimball Model Bottom-Up Approach Data Marts (3NF or MD) Warehouse OLAP (MD) http://sqlschoolgr.files.wordpress.com/2012/03/clip_image005_thumb.png?w=640h=369
  • 12. Data Vault (by Dan Linstedt) Hubs List of unique business keys Links Unique relationships between keys Satellites Hub and Link details and history
  • 13. It is irrelevant which camp you belong… as far as you understand why!
  • 14. Making Your Choice • Kimball (MD) + Start small, scale big + Faster ROI + Analytical tools - Low reusability • Data Vault • Inmon (3NF) + Structured + Easy to maintain + Easier data mining - Timely to build Backend Data Warehouse + Multiple sources; Full history; Incremental build - Up-front work; Long-term payoff; Many joins
  • 15. Dimensional modeling as de-facto standard
  • 16. Dimensions Def: The object of BI interest Keys Surrogate key Business key Hierarchical attributes Analysis and Drill Down Member properties Presentation labels Auditing information (not for end users)
  • 17. Slowly Changing Dimensions Def: Scheme for recording changes over time Type 1 - Overwrite Type 2 – Multiple Records
  • 18. Facts Def: Measurement of a business process Keys FK from all dimensional tables (in the star) PK - Composite (usually) or Surrogate Measures Numeric columns, that are of interest to the business Additive, Non-additive, Semi-additive Factless facts Auditing information (optional)
  • 20. Data Warehouse Pitfalls Admit it is not as it seems to be You need education Find what is of business value Rather than focus on performance Spend a lot of time in Extract-Transform-Load Homogenize data from different sources Find (and resolve) problems in source systems
  • 21. Prepare your Sources Data integrity Avoid redundancy Data quality Master data source Data validation Auditing CreatedDate / CreatedBy ChangedDate / ChangedBy Nightly jobs
  • 22. Dimension Design Business key with non-clustered index Include date (if dimension has history) Surrogate key The smallest possible integer Clustered index FK constraints Do not enforce (WITH NOCHECK) Document the relation Faster load Data validation Task for the Source system
  • 23. Conformed Dimensions Def. Having the same meaning and content when referred from multiple fact tables Date Dimension Partitioning best candidate Granularity Do not store every hour, when reporting daily Avoid surrogate keys Saves lookup and joins Integer representing date (yyyyMMdd, days after 1/1/1900)
  • 24. Pre-join Hierarchies Recursive relationships Fast drill and report Pre-computed aggregations Hierarchy Bridge For each dimension row 1 association with self 1 row for each subordinate
  • 25. Determine the Facts The center of a Star schema Identify subject areas Identify key business events Identify dimensions Start from OLTP logical model Identify historical requirements Identify attributes
  • 26. The Grain Def: The level of detail of a fact table What is the business objective? Fine grain - behaviour and frequency analysis Coarse grain - overall and trend analysis Aggregates DO NOT summarize prematurely DO NOT mix detail and summary DO use “summary tables”
  • 27. C3-PO is fluent in 6M forms of communication. What about your customers?
  • 28. Multinational DW What parts need translation? Where to store various language versions? How to support future languages? Dimensions Add language attribute Include text data in the dimension Problem 1: The dimension key? Replicate PK for every language Fact.DimId = Dim.Id AND Dim.Lang=[Lang] Problem 2: Storage = [Dim] x [Lang] Sub-dimension with language attributes TxtId Attr1 Attr2 LangId 1 large Yes En 2 small No En 1 stor Ja No 2 liten Nei No 3 … … …
  • 30. How Large is “Large” Is big really big?
  • 31. Partitioning Why Faster index maintenance Faster load Faster queries When Tables 10GB+ How Do not partition dimension tables Partition by date (most analysis are time-based) Eliminate partitions (WHERE [PartitionKey]=…) Avoid split and merge of existing partitions Can cause inefficient log generation
  • 32. Columnstore Index Non-clustered in SQL 2012 Clustered in SQL 2014 Pros Better data compression High performance on table scan Clustered CSI Limitations No other indexes allowed Little advantage on seek operations No XML, computed column or replication
  • 33. Extract-Transform-Load Extract data from OLTP Data transformations Data loads DW maintenance
  • 34. Efficient Load Process Use simple recovery model during data load Staging Avoid indexing Populate in parallel Maintain DW Disable indexes on load Rebuild manually after load Automatic stats update slow down SQL Server
  • 35. To SSIS, or not to SSIS ? Pros Minimum coding to none Extensive support of various data sources Parallel execution of migration tasks Better organization of the ETL process Cons Another way of thinking Hidden options T-SQL developer would do much faster Auto-generated flows need optimization Sometimes simply does not work (i.e. Sort by GUID)
  • 36.
  • 37. Takeaways Books The Data Warehouse Toolkit (3rd ed), Ralph Kimball Implementing DW with Microsoft SQL Server 2012 Data Warehousing Fundamentals, Paulraj Ponniah Articles Best Practices in Data Warehouse (Hanover Research Council) http://www.kimballgroup.com/category/design-tips/ http://sqlmag.com/business-intelligence Resources http://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/ dimensional-modeling-techniques/ http://www.databaseanswers.org/data_models/index.htm