SlideShare ist ein Scribd-Unternehmen logo
1 von 29
Datawarehousebest practices Dr.  Eduardo Castro, MSc ecastro@simsasys.com http://ecastrom.blogspot.com http://comunidadwindows.org http://tiny.cc/comwindows Facebook: ecastrom Twitter: edocastro
Sources This presentation is based on the following sources Datawarehouse Ravi RanJan Top 10 Best Practices for Building a Large Scale Relational Data Warehouse SQL CAT
Complexities of Creating a Data Warehouse Incomplete errors  Missing Fields Records or Fields That, by Design, are not Being Recorded Incorrect errors Wrong Calculations, Aggregations Duplicate Records Wrong Information Entered into Source System Source. Datawarehouse. Ravi RanJan
Data Warehouse Pitfalls You are going to spend much time extracting, cleaning, and loading data You are going to find problems with systems feeding the data warehouse You will find the need to store/validate data not being captured/validated by any existing system Large scale data warehousing can become an exercise in data homogenizing Source. Datawarehouse. Ravi RanJan
Data Warehouse Pitfalls… The time it takes to load the warehouse will expand to the amount of the time in the available window... and then some You are building a HIGH maintenance system You will fail if you concentrate on resource optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer Source. Datawarehouse. Ravi RanJan
Best Practices Complete requirements and design Prototyping is key to business understanding Utilizing proper aggregations and detailed data Training is an on-going process Build data integrity checks into your system. Source. Datawarehouse. Ravi RanJan
Top 10 Best Practices for Building a Large Scale Relational Data Warehouse Building a large scale relational data warehouse is a complex task.  This section describes some design techniques that can help in architecting an efficient large scale relational data warehouse with SQL Server. Most large scale data warehouses use table and index partitioning, and therefore, many of the recommendations here involve partitioning.  Most of these tips are based on experiences building large data warehouses on SQL Server  Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Consider partitioning large fact tables  Consider partitioning fact tables that are 50 to 100GB or larger.  Partitioning can provide manageability and often performance benefits. Faster, more granular index maintenance. More flexible backup / restore options. Faster data loading and deleting Faster queries when restricted to a single partition.. Typically partition the fact table on the date key. Enables sliding window. Enables partition elimination. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Build clustered index on the date key of the fact table This supports efficient queries to populate cubes or retrieve a historical data slice. If you load data in a batch window for the clustered index on the fact table then use the options 	ALLOW_ROW_LOCKS = OFF and 	ALLOW_PAGE_LOCKS = OFF  This helps speed up table scan operations during query time and helps avoid excessive locking activity during large updates. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Build clustered index on the date key of the fact table Build nonclustered indexes for each foreign key.  This helps ‘pinpoint queries' to extract rows based on a selective dimension predicate. Use filegroups for administration requirements such as backup / restore, partial database availability, etc. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Choose partition grain carefully Most customers use month, quarter, or year. For efficient deletes, you must delete one full partition at a time. It is faster to load a complete partition at a time. Daily partitions for daily loads may be an attractive option. However, keep in mind that a table can have a maximum of 1000 partitions. Partition grain affects query parallelism.  Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Choose partition grain carefully For SQL Server 2005: Queries touching a single partition can parallelize up to MAXDOP (maximum degree of parallelism).  Queries touching multiple partitions use one thread per partition up to MAXDOP. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Choose partition grain carefully For SQL Server 2008: Parallel threads up to MAXDOP are distributed proportionally to scan partitions, and multiple threads per partition may be used even when several partitions must be scanned. Avoid a partition design where only 2 or 3 partitions are touched by frequent queries, if you need MAXDOP parallelism (assuming MAXDOP =4 or larger).  Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Design dimension tables appropriately Use integer surrogate keys for all dimensions, other than the Date dimension.  Use the smallest possible integer for the dimension surrogate keys. This helps to keep fact table narrow. Use a meaningful date key of integer type derivable from the DATETIME data type (for example: 20060215). Don't use a surrogate Key for the Date dimension Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Design dimension tables appropriately Build a clustered index on the surrogate key for each dimension table Build a non-clustered index on the Business Key (potentially combined with a row-effective-date) to support surrogate key lookups during loads. Build nonclustered indexes on other frequently searched dimension columns. Avoid partitioning dimension tables. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Design dimension tables appropriately Avoid enforcing foreign key relationships between the fact and the dimension tables, to allow faster data loads. You can create foreign key constraints with NOCHECK to document the relationships; but don’t enforce them.  Ensure data integrity though Transform Lookups, or perform the data integrity checks at the source of the data. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Write effective queries for partition elimination Whenever possible, place a query predicate (WHERE condition) directly on the partitioning key (Date dimension key) of the fact table. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Use Sliding Window technique to maintain data Maintain a rolling time window for online access to the fact tables. Load newest data, unload oldest data. Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid split or merge of populated partitions. Splitting or merging populated partitions can be extremely inefficient, as this may cause as much as 4 times more log generation, and also cause severe locking. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Use Sliding Window technique to maintain data Create the load staging table in the same filegroup as the partition you are loading. Create the unload staging table in the same filegroup as the partition you are deleteing. It is fastest to load newest full partition at one time, but only possible when partition size is equal to the data load frequency (for example, you have one partition per day, and you load data once per day). Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Use Sliding Window technique to maintain data If the partition size doesn't match the data load frequency, incrementally load the latest partition.  Various options for loading bulk data into a partitioned table are discussed in the whitepaper  http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/loading_bulk_data_partitioned_table.mspx. Always unload one partition at a time. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Efficiently load the initial data Use SIMPLE or BULK LOGGED recovery model during the initial data load. Create the partitioned fact table with the Clustered index. Create non-indexed staging tables for each partition, and separate source data files for populating each partition. Populate the staging tables in parallel. Use multiple BULK INSERT, BCP or SSIS tasks. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Efficiently load the initial data Create as many load scripts to run in parallel as there are CPUs, if there is no IO bottleneck. If IO bandwidth is limited, use fewer scripts in parallel. Use 0 batch size in the load. Use 0 commit size in the load.  Use TABLOCK. Use BULK INSERT if the sources are flat files on the same server. Use BCP or SSIS if data is being pushed from remote machines. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Efficiently load the initial data Build a clustered index on each staging table, then create appropriate CHECK constraints. SWITCH all partitions into the partitioned table. Build nonclustered indexes on the partitioned table. Possible to load 1 TB in under an hour on a 64-CPU server with a SAN capable of 14 GB/Sec throughput (non-indexed table).  Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Efficiently delete old data Use partition switching whenever possible. To delete millions of rows from nonpartitioned, indexed tables Avoid DELETE FROM ...WHERE ... Huge locking and logging issues  Long rollback if the delete is canceled Usually faster to  INSERT the records to keep into a non-indexed table Create index(es) on the table Rename the new table to replace the original Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Efficiently delete old data As an alternative, ‘trickle' deletes using the following repeatedly in a loop	DELETE TOP (1000) ... ; 	COMMIT Another alternative is to update the row to mark as deleted, then delete later during non critical time.  Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Manage statistics manually Statistics on partitioned tables are maintained for the table as a whole. Manually update statistics on large fact tables after loading new data. Manually update statistics after rebuilding index on a partition. If you regularly update statistics after periodic loads, you may turn off autostats on that table. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Manage statistics manually This is important for optimizing queries that may need to read only the newest data. Updating statistics on small dimension tables after incremental loads may also help performance.  Use FULLSCAN option on update statistics on dimension tables for more accurate query plans. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Consider efficient backup strategies  Backing up the entire database may take significant amount of time for a very large database. For example, backing up a 2 TB database to a 10-spindle RAID-5 disk on a SAN may take 2 hours (at the rate 275 MB/sec). Snapshot backup using SAN technology is a very good option. Reduce the volume of data to backup regularly. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT
Consider efficient backup strategies  The filegroups for the historical partitions can be marked as READ ONLY. Perform a filegroup backup once when a filegroup becomes read-only. Perform regular backups only on the read / write filegroups.  Note that RESTOREs of the read-only filegroups cannot be performed in parallel. Source. Top 10 Best Practices for Building  Large Scale Relational Data Warehouse SQL CAT

Weitere ähnliche Inhalte

Was ist angesagt?

How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostAtScale
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingVenu Anuganti
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouseUday Kothari
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseRob Winters
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data WarehouseEric Sun
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digitalsambiswal
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher Tamir Dresher
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTechWell
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data LakeCalum Miller
 
Data lake analytics for the admin
Data lake analytics for the adminData lake analytics for the admin
Data lake analytics for the adminTillmann Eitelberg
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntSteven Moy
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveGeekNightHyderabad
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseBui Ha
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingKent Graziano
 

Was ist angesagt? (20)

Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data lake
Data lakeData lake
Data lake
 
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the CostHow to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
How to Optimize Sales Analytics Using 10x the Data at 1/10th the Cost
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Designing high performance datawarehouse
Designing high performance datawarehouseDesigning high performance datawarehouse
Designing high performance datawarehouse
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
How To Buy Data Warehouse
How To Buy Data WarehouseHow To Buy Data Warehouse
How To Buy Data Warehouse
 
Enterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable DigitalEnterprise Data Lake - Scalable Digital
Enterprise Data Lake - Scalable Digital
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher   Anatomy of a data driven architecture - Tamir Dresher
Anatomy of a data driven architecture - Tamir Dresher
 
Testing the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big ProblemsTesting the Data Warehouse―Big Data, Big Problems
Testing the Data Warehouse―Big Data, Big Problems
 
Data Vault Vs Data Lake
Data Vault Vs Data LakeData Vault Vs Data Lake
Data Vault Vs Data Lake
 
Data lake analytics for the admin
Data lake analytics for the adminData lake analytics for the admin
Data lake analytics for the admin
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
From Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data WarehouseFrom Hadoop to Enterprise Data Warehouse
From Hadoop to Enterprise Data Warehouse
 
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data ModelingAgile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
Agile Data Warehouse Modeling: Introduction to Data Vault Data Modeling
 

Ähnlich wie Data Warehouse Best Practices

Large scale sql server best practices
Large scale sql server   best practicesLarge scale sql server   best practices
Large scale sql server best practicesmprabhuram
 
Optimize access
Optimize accessOptimize access
Optimize accessAla Esmail
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftAWS Germany
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 
The High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High PerformanceThe High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High PerformanceEmbarcadero Technologies
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008paulguerin
 
Tips for Database Performance
Tips for Database PerformanceTips for Database Performance
Tips for Database PerformanceKesavan Munuswamy
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018Dave Stokes
 
Sql interview question part 5
Sql interview question part 5Sql interview question part 5
Sql interview question part 5kaashiv1
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoDave Stokes
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and ElasticsearchDean Hamstead
 
Crystal xcelsius best practices and workflows for building enterprise solut...
Crystal xcelsius   best practices and workflows for building enterprise solut...Crystal xcelsius   best practices and workflows for building enterprise solut...
Crystal xcelsius best practices and workflows for building enterprise solut...Yogeeswar Reddy
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersAdam Hutson
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftLars Kamp
 
Applyinga blockcentricapproach
Applyinga blockcentricapproachApplyinga blockcentricapproach
Applyinga blockcentricapproachoracle documents
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 

Ähnlich wie Data Warehouse Best Practices (20)

Large scale sql server best practices
Large scale sql server   best practicesLarge scale sql server   best practices
Large scale sql server best practices
 
Optimize access
Optimize accessOptimize access
Optimize access
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
How to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon RedshiftHow to Fine-Tune Performance Using Amazon Redshift
How to Fine-Tune Performance Using Amazon Redshift
 
Managing SQLserver
Managing SQLserverManaging SQLserver
Managing SQLserver
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
The High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High PerformanceThe High Performance DBA Optimizing Databases For High Performance
The High Performance DBA Optimizing Databases For High Performance
 
Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008Myth busters - performance tuning 102 2008
Myth busters - performance tuning 102 2008
 
Tips for Database Performance
Tips for Database PerformanceTips for Database Performance
Tips for Database Performance
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
Ebook5
Ebook5Ebook5
Ebook5
 
Sql interview question part 5
Sql interview question part 5Sql interview question part 5
Sql interview question part 5
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
Mysql For Developers
Mysql For DevelopersMysql For Developers
Mysql For Developers
 
Perl and Elasticsearch
Perl and ElasticsearchPerl and Elasticsearch
Perl and Elasticsearch
 
Crystal xcelsius best practices and workflows for building enterprise solut...
Crystal xcelsius   best practices and workflows for building enterprise solut...Crystal xcelsius   best practices and workflows for building enterprise solut...
Crystal xcelsius best practices and workflows for building enterprise solut...
 
SQL Server 2008 Development for Programmers
SQL Server 2008 Development for ProgrammersSQL Server 2008 Development for Programmers
SQL Server 2008 Development for Programmers
 
World-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon RedshiftWorld-class Data Engineering with Amazon Redshift
World-class Data Engineering with Amazon Redshift
 
Applyinga blockcentricapproach
Applyinga blockcentricapproachApplyinga blockcentricapproach
Applyinga blockcentricapproach
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 

Mehr von Eduardo Castro

Introducción a polybase en SQL Server
Introducción a polybase en SQL ServerIntroducción a polybase en SQL Server
Introducción a polybase en SQL ServerEduardo Castro
 
Creando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerCreando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerEduardo Castro
 
Seguridad en SQL Azure
Seguridad en SQL AzureSeguridad en SQL Azure
Seguridad en SQL AzureEduardo Castro
 
Azure Synapse Analytics MLflow
Azure Synapse Analytics MLflowAzure Synapse Analytics MLflow
Azure Synapse Analytics MLflowEduardo Castro
 
SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022Eduardo Castro
 
Novedades en SQL Server 2022
Novedades en SQL Server 2022Novedades en SQL Server 2022
Novedades en SQL Server 2022Eduardo Castro
 
Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Eduardo Castro
 
Machine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceMachine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceEduardo Castro
 
Novedades en sql server 2022
Novedades en sql server 2022Novedades en sql server 2022
Novedades en sql server 2022Eduardo Castro
 
Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Eduardo Castro
 
Introduccion a databricks
Introduccion a databricksIntroduccion a databricks
Introduccion a databricksEduardo Castro
 
Pronosticos con sql server
Pronosticos con sql serverPronosticos con sql server
Pronosticos con sql serverEduardo Castro
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Eduardo Castro
 
Introduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsIntroduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsEduardo Castro
 
Seguridad de SQL Database en Azure
Seguridad de SQL Database en AzureSeguridad de SQL Database en Azure
Seguridad de SQL Database en AzureEduardo Castro
 
Python dentro de SQL Server
Python dentro de SQL ServerPython dentro de SQL Server
Python dentro de SQL ServerEduardo Castro
 
Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Eduardo Castro
 
Script de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesScript de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesEduardo Castro
 
Introducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesIntroducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesEduardo Castro
 

Mehr von Eduardo Castro (20)

Introducción a polybase en SQL Server
Introducción a polybase en SQL ServerIntroducción a polybase en SQL Server
Introducción a polybase en SQL Server
 
Creando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerCreando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL Server
 
Seguridad en SQL Azure
Seguridad en SQL AzureSeguridad en SQL Azure
Seguridad en SQL Azure
 
Azure Synapse Analytics MLflow
Azure Synapse Analytics MLflowAzure Synapse Analytics MLflow
Azure Synapse Analytics MLflow
 
SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022
 
Novedades en SQL Server 2022
Novedades en SQL Server 2022Novedades en SQL Server 2022
Novedades en SQL Server 2022
 
Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Introduccion a SQL Server 2022
Introduccion a SQL Server 2022
 
Machine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceMachine Learning con Azure Managed Instance
Machine Learning con Azure Managed Instance
 
Novedades en sql server 2022
Novedades en sql server 2022Novedades en sql server 2022
Novedades en sql server 2022
 
Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022
 
Introduccion a databricks
Introduccion a databricksIntroduccion a databricks
Introduccion a databricks
 
Pronosticos con sql server
Pronosticos con sql serverPronosticos con sql server
Pronosticos con sql server
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2
 
Introduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsIntroduccion a Azure Synapse Analytics
Introduccion a Azure Synapse Analytics
 
Seguridad de SQL Database en Azure
Seguridad de SQL Database en AzureSeguridad de SQL Database en Azure
Seguridad de SQL Database en Azure
 
Python dentro de SQL Server
Python dentro de SQL ServerPython dentro de SQL Server
Python dentro de SQL Server
 
Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft
 
Script de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesScript de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure Enclaves
 
Introducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesIntroducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure Enclaves
 

Kürzlich hochgeladen

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Kürzlich hochgeladen (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Data Warehouse Best Practices

  • 1. Datawarehousebest practices Dr. Eduardo Castro, MSc ecastro@simsasys.com http://ecastrom.blogspot.com http://comunidadwindows.org http://tiny.cc/comwindows Facebook: ecastrom Twitter: edocastro
  • 2. Sources This presentation is based on the following sources Datawarehouse Ravi RanJan Top 10 Best Practices for Building a Large Scale Relational Data Warehouse SQL CAT
  • 3. Complexities of Creating a Data Warehouse Incomplete errors Missing Fields Records or Fields That, by Design, are not Being Recorded Incorrect errors Wrong Calculations, Aggregations Duplicate Records Wrong Information Entered into Source System Source. Datawarehouse. Ravi RanJan
  • 4. Data Warehouse Pitfalls You are going to spend much time extracting, cleaning, and loading data You are going to find problems with systems feeding the data warehouse You will find the need to store/validate data not being captured/validated by any existing system Large scale data warehousing can become an exercise in data homogenizing Source. Datawarehouse. Ravi RanJan
  • 5. Data Warehouse Pitfalls… The time it takes to load the warehouse will expand to the amount of the time in the available window... and then some You are building a HIGH maintenance system You will fail if you concentrate on resource optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer Source. Datawarehouse. Ravi RanJan
  • 6. Best Practices Complete requirements and design Prototyping is key to business understanding Utilizing proper aggregations and detailed data Training is an on-going process Build data integrity checks into your system. Source. Datawarehouse. Ravi RanJan
  • 7. Top 10 Best Practices for Building a Large Scale Relational Data Warehouse Building a large scale relational data warehouse is a complex task. This section describes some design techniques that can help in architecting an efficient large scale relational data warehouse with SQL Server. Most large scale data warehouses use table and index partitioning, and therefore, many of the recommendations here involve partitioning. Most of these tips are based on experiences building large data warehouses on SQL Server Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 8. Consider partitioning large fact tables Consider partitioning fact tables that are 50 to 100GB or larger. Partitioning can provide manageability and often performance benefits. Faster, more granular index maintenance. More flexible backup / restore options. Faster data loading and deleting Faster queries when restricted to a single partition.. Typically partition the fact table on the date key. Enables sliding window. Enables partition elimination. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 9. Build clustered index on the date key of the fact table This supports efficient queries to populate cubes or retrieve a historical data slice. If you load data in a batch window for the clustered index on the fact table then use the options ALLOW_ROW_LOCKS = OFF and ALLOW_PAGE_LOCKS = OFF This helps speed up table scan operations during query time and helps avoid excessive locking activity during large updates. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 10. Build clustered index on the date key of the fact table Build nonclustered indexes for each foreign key. This helps ‘pinpoint queries' to extract rows based on a selective dimension predicate. Use filegroups for administration requirements such as backup / restore, partial database availability, etc. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 11. Choose partition grain carefully Most customers use month, quarter, or year. For efficient deletes, you must delete one full partition at a time. It is faster to load a complete partition at a time. Daily partitions for daily loads may be an attractive option. However, keep in mind that a table can have a maximum of 1000 partitions. Partition grain affects query parallelism. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 12. Choose partition grain carefully For SQL Server 2005: Queries touching a single partition can parallelize up to MAXDOP (maximum degree of parallelism). Queries touching multiple partitions use one thread per partition up to MAXDOP. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 13. Choose partition grain carefully For SQL Server 2008: Parallel threads up to MAXDOP are distributed proportionally to scan partitions, and multiple threads per partition may be used even when several partitions must be scanned. Avoid a partition design where only 2 or 3 partitions are touched by frequent queries, if you need MAXDOP parallelism (assuming MAXDOP =4 or larger). Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 14. Design dimension tables appropriately Use integer surrogate keys for all dimensions, other than the Date dimension. Use the smallest possible integer for the dimension surrogate keys. This helps to keep fact table narrow. Use a meaningful date key of integer type derivable from the DATETIME data type (for example: 20060215). Don't use a surrogate Key for the Date dimension Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 15. Design dimension tables appropriately Build a clustered index on the surrogate key for each dimension table Build a non-clustered index on the Business Key (potentially combined with a row-effective-date) to support surrogate key lookups during loads. Build nonclustered indexes on other frequently searched dimension columns. Avoid partitioning dimension tables. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 16. Design dimension tables appropriately Avoid enforcing foreign key relationships between the fact and the dimension tables, to allow faster data loads. You can create foreign key constraints with NOCHECK to document the relationships; but don’t enforce them. Ensure data integrity though Transform Lookups, or perform the data integrity checks at the source of the data. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 17. Write effective queries for partition elimination Whenever possible, place a query predicate (WHERE condition) directly on the partitioning key (Date dimension key) of the fact table. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 18. Use Sliding Window technique to maintain data Maintain a rolling time window for online access to the fact tables. Load newest data, unload oldest data. Always keep empty partitions at both ends of the partition range to guarantee that the partition split (before loading new data) and partition merge (after unloading old data) do not incur any data movement. Avoid split or merge of populated partitions. Splitting or merging populated partitions can be extremely inefficient, as this may cause as much as 4 times more log generation, and also cause severe locking. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 19. Use Sliding Window technique to maintain data Create the load staging table in the same filegroup as the partition you are loading. Create the unload staging table in the same filegroup as the partition you are deleteing. It is fastest to load newest full partition at one time, but only possible when partition size is equal to the data load frequency (for example, you have one partition per day, and you load data once per day). Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 20. Use Sliding Window technique to maintain data If the partition size doesn't match the data load frequency, incrementally load the latest partition. Various options for loading bulk data into a partitioned table are discussed in the whitepaper http://www.microsoft.com/technet/prodtechnol/sql/bestpractice/loading_bulk_data_partitioned_table.mspx. Always unload one partition at a time. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 21. Efficiently load the initial data Use SIMPLE or BULK LOGGED recovery model during the initial data load. Create the partitioned fact table with the Clustered index. Create non-indexed staging tables for each partition, and separate source data files for populating each partition. Populate the staging tables in parallel. Use multiple BULK INSERT, BCP or SSIS tasks. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 22. Efficiently load the initial data Create as many load scripts to run in parallel as there are CPUs, if there is no IO bottleneck. If IO bandwidth is limited, use fewer scripts in parallel. Use 0 batch size in the load. Use 0 commit size in the load. Use TABLOCK. Use BULK INSERT if the sources are flat files on the same server. Use BCP or SSIS if data is being pushed from remote machines. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 23. Efficiently load the initial data Build a clustered index on each staging table, then create appropriate CHECK constraints. SWITCH all partitions into the partitioned table. Build nonclustered indexes on the partitioned table. Possible to load 1 TB in under an hour on a 64-CPU server with a SAN capable of 14 GB/Sec throughput (non-indexed table). Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 24. Efficiently delete old data Use partition switching whenever possible. To delete millions of rows from nonpartitioned, indexed tables Avoid DELETE FROM ...WHERE ... Huge locking and logging issues Long rollback if the delete is canceled Usually faster to INSERT the records to keep into a non-indexed table Create index(es) on the table Rename the new table to replace the original Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 25. Efficiently delete old data As an alternative, ‘trickle' deletes using the following repeatedly in a loop DELETE TOP (1000) ... ; COMMIT Another alternative is to update the row to mark as deleted, then delete later during non critical time. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 26. Manage statistics manually Statistics on partitioned tables are maintained for the table as a whole. Manually update statistics on large fact tables after loading new data. Manually update statistics after rebuilding index on a partition. If you regularly update statistics after periodic loads, you may turn off autostats on that table. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 27. Manage statistics manually This is important for optimizing queries that may need to read only the newest data. Updating statistics on small dimension tables after incremental loads may also help performance. Use FULLSCAN option on update statistics on dimension tables for more accurate query plans. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 28. Consider efficient backup strategies Backing up the entire database may take significant amount of time for a very large database. For example, backing up a 2 TB database to a 10-spindle RAID-5 disk on a SAN may take 2 hours (at the rate 275 MB/sec). Snapshot backup using SAN technology is a very good option. Reduce the volume of data to backup regularly. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT
  • 29. Consider efficient backup strategies The filegroups for the historical partitions can be marked as READ ONLY. Perform a filegroup backup once when a filegroup becomes read-only. Perform regular backups only on the read / write filegroups. Note that RESTOREs of the read-only filegroups cannot be performed in parallel. Source. Top 10 Best Practices for Building Large Scale Relational Data Warehouse SQL CAT