SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Downloaden Sie, um offline zu lesen
Data Warehousing
                   with
            FastTrack and PDW



 Assaf Fraenkel               Oded Shihor
Lead Architect, MCS     Senior Solution Architect, HP
‫איזה מכונית כדאי לקנות?‬
       ‫האם זו שאלה של מחיר?!‬
Agenda

Motivation
Fast Track Offering
 – Balanced Architecture Approach for DW
 – Example FastTrack Reference Architectures
 – Optimizing Storage, Load and Maintenance
 – Case Studies
Parallel Data Warehouse Offering Overview
Some SQL Data Warehouses today

Big SAN
Big SMP Server
Connected together




              What’s wrong with this picture?
Answer: system out of balance

 This server can consume 16 GB/Sec of IO, but the SAN can
 only deliver 2 GB/Sec
 – Even when the SAN is dedicated to the SQL Data Warehouse, which
   it often isn’t
 – Lots of disks for Random IOPS BUT
 – Limited controllers  Limited IO bandwidth
 System is typically IO bound
 Queries are slow


   Result: significant investment, not delivering performance
You can get more sophisticated…

Realize that queries performing complex calculations,
format conversions, multi-dimension hash joins, etc. will be
more cpu-intensive than others
 – Complex queries will consume data at a slower per-core rate
   than simpler queries
Alternative: Measure per-core data consumption for a
variety of queries, and take the weighted average
 – A standard approach to capacity planning
Or you can leave it to us…

We’ve measured a mix of TPCH queries that reflect a
‘prototype’ Data Warehouse workload
Concluded that SQL Sever 2008 R2 on current x64 cores
consume ~200 MB/Sec per core on average for this
workload
We use this as a basis for the published reference
architectures
Your mileage will vary!
 – For precise system sizing, measure your own workload
Potential Performance Bottlenecks


                        S                                      F
                C                                                                                              DISK     DISK
                P
                    W   Q
                                                       A       C
                    I   L    C                 FC                                                         A
                U                             HBA              S
                    N   S    A                         B                                                            LUN




                                                                                           CACHE
                C                                              W    A         STORAGE              A
   SERVER           D   E    C
                O
                    O   R    H                                 I    B        CONTROLLER            B           DISK     DISK
                R                              FC      A
                    W   V    E                                 T                                          B
                E                             HBA      B
                    S   E                                      C                                                    LUN
                S
                        R                                      H




CPU Feed Rate                 SQL Server            HBA Port Rate       Switch Port Rate   SP Port Rate       LUN Read Rate    Disk Feed Rate
                            Read Ahead Rate
The Alternative: A Balanced System
 Design a server + storage configuration that can deliver all the IO
 bandwidth that CPUs can consume when executing a SQL Relational
 DW workload

 Avoid sharing storage devices among servers

 Avoid overinvesting in disk drives
  – Focus on scan performance, not IOPS


 Layout and manage data to maximize range scan performance and
 minimize fragmentation
Microsoft Data Warehousing – Product Offering
                         PDW with
        Scale          Hub-and-spoke                            1   Minimal HW tune
        Complexity                          4                       up/optimization. Supports
        HA by default                                               mixed workloads
        SW-HW integration               3                       2   Balanced solution for mostly scan
                                                                    centric workloads.
                                   PDW                          3   Max HW tune up for most
                                                                    DW scenarios.
                                        SQL Server 2008 R2      4   Most flexible Architecture for
                                          with Fast Track           handling all DW scenarios.
                                       Reference Architecture
                                                          2

                SQL Server 2008 R2
                                                1
Agenda

Motivation
Fast Track Offering
 – Balanced Architecture Approach for DW
 – Example FastTrack Reference Architectures
 – Optimizing Storage, Load and Maintenance
Parallel Data Warehouse Offering Overview
SQL Server Fast Track Data Warehouse
Solution to help customers and partners accelerate their data warehouse deployments




 A       for designing a cost-effective, balanced system for Data
 Warehouse workloads
 Reference hardware                  developed in conjunction with
 hardware partners using this method
              for data layout, loading and management



                Relational Database Only – Not SSAS, IS, RS
Fast Track Scope
    Supporting Systems         BI Data Storage Systems                                 Presentation Layer Systems




          Integration                         Analysis Services
          Services ETL                        Cubes




                                                                   Presentation Data
                                                                  Presentation Data
                                                                                             Web Analytic Tools




                                  Data Path
                                                                                             Reporting Services

                                              Subject Area
                                              Data Marts




                                                                                              SharePoint Services

          SAN, Storage Array                                                                  Microsoft Office SharePoint
                                              Data Warehouse                                        PerformancePoint
                                              Data Staging,                                         Excel Services
                                              Bulk Loading

      Reference Architecture Scope (dashed)
HP Fast Track DL785 G6 Demo
Fast Track SQL DW Architecture vs. Traditional DW
Traditional SQL DW Architecture                 Fast Track SQL DW Architecture
Shared Infrastructure                           Dedicated DW Infrastructure
                                                Architecture modeled after DW Appliances
                                                Scalability from 4TB to 80TB
Enterprise Shared          Shared Network                Dedicated Network
SAN Storage                Bandwidth                     Bandwidth




                                       SQL 2008 Data Warehouse      Dedicated Low Cost
                                       4 Processor 16 + Core Server SAN Arrays 1 for every
                                                                    4 CPU Cores
                                            Benefits:
       OLTP Applications                    -Lower TCO
                                            -Balanced CPU to I/O Channel Optimized for DW
                                            -Modular Building Block Approach
                                            -Scale Out or Up within limits of Server and San
HP SQL Server Fast Track Data Warehousing
Fast Track G7 Configurations
                                                                                      Coming soon
 Scales from SMB to Enterprise
  – Prescriptive guidance and optimized methodology for deploying a data warehouse

  – Targeted at query workloads patterned for large sequential data reads

  – Balanced hardware approach

 HP provides
  – Configurations, tested performance, guidance and

  – Best practices for deploying/operating/managing

  – Packaged and custom support

                                         Basic         Mainstream      Mainstream      Premium
                                         8– 16TB         8 – 16TB       20 – 60TB      40– 80 TB
                                       DL38x G7w/      DL38x G7 w/     DL58x G7 w/    DL980 G7 w/
                                       MSA2000 G3      MSA P2000 G3    MSA P2000 G3   MSA P2000 G3
HP SQL Server Fast Track Data Warehousing                                                             Coming

Fast Track G7 configurations in test
                                                                                                       soon



                       Server: HP ProLiant DL380 G7 with
  Small SMP:           2x 6-core Intel Xeon
 2- Socket Processor   Storage : HP P2000 G3
    Configuration
                       Scalability: 8 – 16TB            2p; 12 core, 64-192GB RAM



                       Server: HP ProLiant DL 580 G7
Medium SMP: 4-         with 4x 8-core Intel Xeon
  Socket Processor     Storage : HP P2000 G3
   Configuration
                       Scalability: 20 – 40TB                          4p; 32 core, 144-512GB RAM




                       Server: HP ProLiant DL980 G7 with
  Large SMP:           8x 8-core Intel Xeon
 8- Socket Processor   Storage: HP P2000 G3
    Configuration
                       Scalability: 40 – 80TB                                  8p; 64 core, 2TB RAM
Fast Track Component Architecture



            SQL Server




                                         Storage Interconnect
         Windows Server OS                                      Storage Processor           Disk Array




   CPU            Host Storage Adaptor




              Server                                                            Storage Enclosure
Core Evaluation Metrics

 These metrics are used to both validate and position Fast
 Track Reference Architectures
 – Maximum Consumption Rate – Ability of SQL Server to process data for a
   specific CPU and Server combination and a standard SQL query.
 – Benchmark Consumption Rate – Ability of SQL Server to process data for a
   specific CPU and Server combination and a user workload or query.
 – User Data Capacity – Maximum available SQL Server storage for a specific
   Fast Track RA assuming 2.5:1 page compression factor.
Scaling the IO stack
                                               Storage Processor             RAID-1
                                                                                      RAID-1

      CPU Socket   CPU Socket
                                      Fiber    Storage Processor
                                                                                                RAID-1
                                                                                                         RAID-1
                                                                                                                  RAID-1
       (4 Core)     (4 Core)                                       Storage Enclosure
                                      Switch
                                               Storage Processor             RAID-1
                                                                                      RAID-1
      CPU Socket   CPU Socket                                                                   RAID-1
                                               Storage Processor                                         RAID-1
       (4 Core)     (4 Core)                                                                                      RAID-1
                                                                   Storage Enclosure

                                               Storage Processor             RAID-1
      CPU Socket   CPU Socket                                                         RAID-1
       (4 Core)     (4 Core)                                                                    RAID-1
                                               Storage Processor                                         RAID-1
                                                                                                                  RAID-1
                                                                   Storage Enclosure
      CPU Socket   CPU Socket
       (4 Core)     (4 Core)                   Storage Processor             RAID-1
                                                                                      RAID-1
                                                                                                RAID-1
                                               Storage Processor                                         RAID-1
                                                                                                                  RAID-1
                                                                   Storage Enclosure
                                HBA
                                               Storage Processor             RAID-1
                                HBA                                                   RAID-1
                                                                                               RAID-1 RAID-1
                                               Storage Processor
                                                                                                                  RAID-1
                                                                   Storage Enclosure
                                HBA
                                               Storage Processor             RAID-1
                                HBA                                                   RAID-1
                                                                                                RAID-1
                                               Storage Processor                                         RAID-1
                                                                                                                  RAID-1
                                                                   Storage Enclosure
                                HBA
                                               Storage Processor             RAID-1
                                HBA                                                   RAID-1
                                                                                                RAID-1
                                               Storage Processor                                         RAID-1
                                                                                                                  RAID-1
                                                                   Storage Enclosure
                                HBA
  Server                        HBA
                                               Storage Processor             RAID-1
                                                                                      RAID-1
                                                                                                RAID-1
                                               Storage Processor                                         RAID-1
                                                                                                                  RAID-1
                                                                   Storage Enclosure
User Data Capacity
 UDC is the data capacity required
 – Plan for projected growth
    • Based on your projections
    • Needs to be allocated up-front


 – Allocate for data management needs
    • Staging database requirements
    • Temporary objects


 – Allocate for TempDB
    • Typically 20-30% of primary data space
    • Tempdb is not compressed
Storage Layout Implications for SQL Server


                      LUN 1                   LUN 2                  LUN 3                                 LUN16


                                                              Permanent FG
   Permanant_DB




                  Permanent_1.ndf        Permanent_2.ndf       Permanent_3.ndf                     Permanent_16.ndf




                                                                Stage FG
 Database
  Stage




                   Stage_1.ndf             Stage_2.ndf         Stage_3.ndf                           Stage_16.ndf
                  Local Drive 1
   TempDB




                   TempDB.mdf (25GB)   TempDB_02.ndf (25GB)   TempDB_03ndf (25GB)             TempDB_16.ndf (25GB)



                                                                                       Log LUN 1

                                                                                    Permanent DB Log

                                                                                      Stage DB Log
Sequential Scan Components
    ARY01D1v01      ARY02D1v03      ARY03D1v05       ARY04D1v07



     4MB             4MB             4MB              4MB
     DB1-1.ndf       DB1-3.ndf       DB1-5.ndf        DB1-7.ndf



    ARY01D2v02      ARY02D2v04      ARY03D2v06      ARY04D2v08



      4MB            4MB             4MB              4MB
     DB1-2.ndf       DB1-4.ndf       DB1-6.ndf        DB1-8.ndf




 Contiguous allocation, data striping, pre-fetch, and read-ahead work to create efficient
 Sequential IO
 – Data stripe width is balanced against read-ahead “Depth”
 – Combined, these elements provide effective access to the full data stripe from a single thread
 Each element is necessary to maximize efficiency
loading

One of the important topics
I hope you saw the session yesterday
If not – you can watch the video
    OR
There is Appendix to this presentation -
Minimizing File fragmentation

 Pre-allocate database files
 • Size files correctly to prevent growth
 • Do not shrink files
 Do not use NTFS file fragmentation tools
 – Rebuild table to ensure disk block level optimal organization
 Writing data
 – Concurrent load operations to the same file will induce fragmentation
 – DML change operations (Update/Delete) may induce fragmentation
 Use Filegroups and Partitioning to manage concurrent writes
 for large tables
What’s next?
My car is too small 
•
•
Agenda
Motivation
Fast Track Offering
 – Balanced Architecture Approach for DW
 – Example FastTrack Reference Architectures
 – Optimizing Storage, Load and Maintenance
Parallel Data Warehouse Offering Overview
 – Scale Out Architecture Approach for DW
 – SQL Server in Scale Out Story
HP Enterprise Data Warehouse Appliance
Transforming today’s SQL
      BEFORE                     AFTER




                           The world’s most scalable,
                           easy-to-manage enterprise
                           data warehousing solution
HP Enterprise Data Warehouse Appliance




COMPLETE      SIMPLIFIED      FOR ANY SCALE
HP Enterprise Data Warehouse Appliance
Description
          Scale-Out of SQL Server: 10s TB ►100s TB ►PB
          Uses massively parallel processing (MPP)
          Highly optimised for DW workload at each layer of the
          stack
          Uses index-Light
          Deliver predictable performance at low cost
          Simplified deployment and maintenance via appliance
          model
          Integration with existing SQL Server 2008 DW via Hub &
          Spoke Architecture
          Lower total cost of ownership
HP Parallel Data Warehouse Appliance -
Hardware Architecture          Data Rack
                                        Storage Nodes                            Database Nodes
            Control node          Control Rack                                      HP ProLiant DL                              HP MSA P2000 G3

    Where clients apps connect     Control Nodes                                                     SQL
                                     HP ProLiant DL
    MPP engine runs here           Active / Passive                                                  Compute nodes
                                                                                                     SQL
    Controls DMS on all nodes                                                            Store user data
     Client Drivers                                   SQL                                            SQL
    Central point for all HW                                                             Perform local query processing




                                                                                                           Dual Fiber Channel
    monitoring                                                                           Run dataSQL
                                                                                                  movement service




                                                              Dual Infiniband
                                  Management Servers                                     Not accessible to outside world
                                                                                                 SQL
        Management node
     Data Center
    S/W upgrades  and patch                                                                          SQL
     Monitoring
    deployment staging place
    Holds S/W images in case a                                                  Landing Zone
                                                                                                     SQL

    node needs reimaging           Landing Zone
                                                                                                     SQL
   ETL Load Interface                                       Staging place for data
                                                            loading                                  SQL

            Backup node                                     Accessible to outside world
                                   Backup Node                                                       SQL
    Backup file storage
    Corporate Backup
    Accessible to outside world
        Solution                                                                 Spare Database Node

Corporate Network                 Private Network
Symmetric Multi-Processing vs. Massively
Parallel Processing
     SMP (SQL Server, Fast Track)          MPP (PDW)



      OLTP, Transactional,             Parallel Data Warehousing
      Data Warehousing              (esp. VLDB, complex workloads)
HP Enterprise Parallel Data Warehouse –
Impressive live demo

                               Massive parallel
                               query processing
                               106 billion rows;
                               10 TB table
                               High
                               performance
                               report without
                               indexing and
                               aggregations
Agenda
Motivation
Fast Track Offering
 – Balanced Architecture Approach for DW
 – Example FastTrack Reference Architectures
 – Optimizing Storage, Load and Maintenance
Parallel Data Warehouse Offering Overview
 – Scale Out Architecture Approach for DW
 – SQL Server in Scale Out Story
Data Distribution with replication
                               Database
                                  Date Dim

          Customer                D_DATE_SK
                                  D_DATE_ID
          C-CUSTOMER_SK           D_DATE
                                  D_MONTH
          C_CUSTOMER_ID                          Item
          C_CURRENT_ADDR          …
          …                                      I_ITEM_SK
                                                 I_ITEM_ID
                                                 I_REC_START_DATE
                                                 I_ITEM_DESC
                                                 …
         SS[1]                Store Sales

                              Ss_sold_date_sk
         SS[2]                Ss_item_sk
                              Ss_customer_sk
                              Ss_cdemo_sk
         SS[3]                Ss_store_sk
                              Ss_promo_sk
                              Ss_quantity
                                                 Promotion
         SS[4]
          Customer
                              …

          Demographics                           P_PROMO_SK
                                                 P_PROMO_ID
          CD_DEMO_SK                             P_START_DATE_SK
                                                 P_END_DATE_SK
          CD_GENDER           Store              …
          CD_MARITAL_STATUS
          CD_EDUCATION
          …                   S_STORE_SK
                              S_STORE_ID
                              S_REC_START_DATE
                              S_REC_END_DATE
                              S_STORE_NAME
                              …
Distributed Data Warehouse Architecture

                                        Departmental
                                         Reporting
                                                            MS Office 2010




                  Regional Reporting




 Enterprise data                       Central Enterprise
 can be maintained                         DW Hub

 on a PDW hub
Hub= unified EDW                            ETL Tools
Spoke= Federated data marts
Distributed Data Warehouse Approach
Hub & Spoke model
 Enables DW architecture to more closely match the
 structure of large enterprises.
 Separates user and data workloads eliminating traditional
 process and resource conflicts
 Integrate both SMP and MPP systems with “Shared
 Nothing”
 All systems connect via a dedicated high speed netwok
 Dual high speed Infiniband
 Supports multiple workloads on different systems
Microsoft Data Warehousing – Product Offering
                         PDW with
        Scale          Hub-and-spoke                            1   Minimal HW tune
        Complexity                          4                       up/optimization. Supports
        HA by default                                               mixed workloads
        SW-HW integration               3                       2   Balanced solution for mostly scan
                                                                    centric workloads.
                                   PDW                          3   Max HW tune up for most
                                                                    DW scenarios.
                                        SQL Server 2008 R2      4   Most flexible Architecture for
                                          with Fast Track           handling all DW scenarios.
                                       Reference Architecture
                                                          2

                SQL Server 2008 R2
                                                1
Resources
 SQL Server Fast Track DW Home Page
 – http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx


 Fast Track DW 2.0 Architecture Whitepaper
 – http://msdn.microsoft.com/en-us/library/dd459178.aspx


 Use minimal logged BULK operation (Trace Flag –T 610)
 – http://msdn.microsoft.com/en-us/library/dd425070.aspx
Perspectives: 2010
‫משובים ופייסבוק‬

                  ‫מירב- השלמה‬
‫!‪Let’s Party‬‬


 ‫ארוחת ערב – בין השעות 03:02-03:81‬
 ‫תחבורה למסיבה – שאטלים החל מ- 03:02‬
  ‫צמידים לכניסה - מקבלים במעטפות בקבלת החדרים‬
Alternatives for loading
Use a heap
 – Practical if queries need to scan whole partitions
or…Use a batchsize = 0
 – Fine if no parallelism is needed during load
or…Use a Two-Step Load
  1. Load to a Staging Table (heap) with constraint for Deltas
  2. INSERT-SELECT from Staging Table into Target CI
  Resulting rows are not fragmented
  Can use Parallelism in step 1 – essential for large data volumes
Two-Step Load Variations
 To achieve high parallelism during historical load
 – Typically into a partitioned table
 – Use a Staging Table (heap) that is partitioned identically to the Target
   Table
 – Use multiple concurrent streams to load the Staging Table with
   moderate batchsize (SSIS, Bulk Insert, etc)
 – INSERT-SELECT separate partitions into the Target Table –
   potentially in parallel
    • Use ALTER TABLE SET ( LOCK_ESCALATION = AUTO)
 – Note: If memory is limited, TempDB could be heavily used for sorting
Two-Step Load Variations (cont.)

To avoid most TempDB space and TempDB IO during load
 – Use a partitioned Staging Table that is also indexed identically to
   Target Table
 – Load Staging Table using moderate batchsize (< 1M rows)
 – Final INSERT-SELECTs will avoid any sort!
    • However the staging loads will be logged
 – Note: Parallelism will be limited if load batches overlap
Loading Data
Goal: maximize read performance
 – Minimizes Disk head movement
 – Maintains high average request size (Think ~400k not 8k)
 – Sustain high average scan rates
Key considerations for a Fast Track data load
 – Data Architecture: Destination table, partitioning, and filegroup
 – Source Data: Format & size
 – System Resources: CPU & Memory
Use minimal logged BULK operation (Trace Flag –T 610)
 – http://msdn.microsoft.com/en-us/library/dd425070.aspx

Weitere ähnliche Inhalte

Was ist angesagt?

Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 

Was ist angesagt? (9)

Improving HR Document Availability and Process Workflows with Electronic Imaging
Improving HR Document Availability and Process Workflows with Electronic ImagingImproving HR Document Availability and Process Workflows with Electronic Imaging
Improving HR Document Availability and Process Workflows with Electronic Imaging
 
Hana Offerings Engl
Hana Offerings EnglHana Offerings Engl
Hana Offerings Engl
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Hdfs high availability
Hdfs high availabilityHdfs high availability
Hdfs high availability
 
Impact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and careerImpact of in-memory technology and SAP HANA on your business, IT, and career
Impact of in-memory technology and SAP HANA on your business, IT, and career
 
Couchbase Korea User Gorup 2nd Meetup #1
Couchbase Korea User Gorup 2nd Meetup #1Couchbase Korea User Gorup 2nd Meetup #1
Couchbase Korea User Gorup 2nd Meetup #1
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
 
Sap On Esx Backup Methodology
Sap On Esx   Backup MethodologySap On Esx   Backup Methodology
Sap On Esx Backup Methodology
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 

Andere mochten auch

Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Wendy Frodyma
 

Andere mochten auch (13)

Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paperSql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
Sql server 2012_parallel_data_warehouse_breakthrough_platform_white_paper
 
Versa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinarVersa Shore Microsoft APS PDW webinar
Versa Shore Microsoft APS PDW webinar
 
PDW value proposition
PDW value propositionPDW value proposition
PDW value proposition
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stackBig Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
 
Modern Veri Ambarı_Cem Kubilay
Modern Veri Ambarı_Cem KubilayModern Veri Ambarı_Cem Kubilay
Modern Veri Ambarı_Cem Kubilay
 
Transitioning to a BI Role
Transitioning to a BI RoleTransitioning to a BI Role
Transitioning to a BI Role
 
Best Practices to Deliver BI Solutions
Best Practices to Deliver BI SolutionsBest Practices to Deliver BI Solutions
Best Practices to Deliver BI Solutions
 
SQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite ThingsSQL Server 2016: Just a Few of Our DBA's Favorite Things
SQL Server 2016: Just a Few of Our DBA's Favorite Things
 
SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)SQL - Parallel Data Warehouse (PDW)
SQL - Parallel Data Warehouse (PDW)
 
SQL Server on Linux - march 2017
SQL Server on Linux - march 2017SQL Server on Linux - march 2017
SQL Server on Linux - march 2017
 
What's new in SQL Server 2016
What's new in SQL Server 2016What's new in SQL Server 2016
What's new in SQL Server 2016
 

Ähnlich wie Bi303 data warehousing with fast track and pdw - Assaf Fraenkel

Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
Khanderao Kand
 
Chris Asano.dba.20160512a
Chris Asano.dba.20160512aChris Asano.dba.20160512a
Chris Asano.dba.20160512a
Chris Asano
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
Amazon Web Services
 

Ähnlich wie Bi303 data warehousing with fast track and pdw - Assaf Fraenkel (20)

SQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data WarehouseSQL Server 2008 Fast Track Data Warehouse
SQL Server 2008 Fast Track Data Warehouse
 
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesOverview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of Terabytes
 
Introduction to Amazon Redshift
Introduction to Amazon RedshiftIntroduction to Amazon Redshift
Introduction to Amazon Redshift
 
Big data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosqlBig data hadoop ecosystem and nosql
Big data hadoop ecosystem and nosql
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Chris Asano.dba.20160512a
Chris Asano.dba.20160512aChris Asano.dba.20160512a
Chris Asano.dba.20160512a
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
Building SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutionsBuilding SSRS 2008 large scale solutions
Building SSRS 2008 large scale solutions
 
Sql azure introduction
Sql azure introductionSql azure introduction
Sql azure introduction
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Experience sql server on l inux and docker
Experience sql server on l inux and dockerExperience sql server on l inux and docker
Experience sql server on l inux and docker
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis2011 04-dsi-javaee-in-the-cloud-andreadis
2011 04-dsi-javaee-in-the-cloud-andreadis
 
Hbase mhug 2015
Hbase mhug 2015Hbase mhug 2015
Hbase mhug 2015
 
SQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data WarehouseSQL Server 2008 R2 Parallel Data Warehouse
SQL Server 2008 R2 Parallel Data Warehouse
 
SQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best PracticesSQL Server Reporting Services: IT Best Practices
SQL Server Reporting Services: IT Best Practices
 
Bw on hana some obvious wins
Bw on hana some obvious winsBw on hana some obvious wins
Bw on hana some obvious wins
 
Cosbench apac
Cosbench apacCosbench apac
Cosbench apac
 
What’s new in SQL Server 2017
What’s new in SQL Server 2017What’s new in SQL Server 2017
What’s new in SQL Server 2017
 

Mehr von sqlserver.co.il

Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013
sqlserver.co.il
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
sqlserver.co.il
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013
sqlserver.co.il
 
SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3
sqlserver.co.il
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
sqlserver.co.il
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
sqlserver.co.il
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
sqlserver.co.il
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
sqlserver.co.il
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
sqlserver.co.il
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatial
sqlserver.co.il
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
sqlserver.co.il
 

Mehr von sqlserver.co.il (20)

Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Query handlingbytheserver
Query handlingbytheserverQuery handlingbytheserver
Query handlingbytheserver
 
Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatial
 
מיכאל
מיכאלמיכאל
מיכאל
 
נועם
נועםנועם
נועם
 
עדי
עדיעדי
עדי
 
מיכאל
מיכאלמיכאל
מיכאל
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
 

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Bi303 data warehousing with fast track and pdw - Assaf Fraenkel

  • 1. Data Warehousing with FastTrack and PDW Assaf Fraenkel Oded Shihor Lead Architect, MCS Senior Solution Architect, HP
  • 2. ‫איזה מכונית כדאי לקנות?‬ ‫האם זו שאלה של מחיר?!‬
  • 3. Agenda Motivation Fast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and Maintenance – Case Studies Parallel Data Warehouse Offering Overview
  • 4. Some SQL Data Warehouses today Big SAN Big SMP Server Connected together What’s wrong with this picture?
  • 5. Answer: system out of balance This server can consume 16 GB/Sec of IO, but the SAN can only deliver 2 GB/Sec – Even when the SAN is dedicated to the SQL Data Warehouse, which it often isn’t – Lots of disks for Random IOPS BUT – Limited controllers  Limited IO bandwidth System is typically IO bound Queries are slow Result: significant investment, not delivering performance
  • 6. You can get more sophisticated… Realize that queries performing complex calculations, format conversions, multi-dimension hash joins, etc. will be more cpu-intensive than others – Complex queries will consume data at a slower per-core rate than simpler queries Alternative: Measure per-core data consumption for a variety of queries, and take the weighted average – A standard approach to capacity planning
  • 7. Or you can leave it to us… We’ve measured a mix of TPCH queries that reflect a ‘prototype’ Data Warehouse workload Concluded that SQL Sever 2008 R2 on current x64 cores consume ~200 MB/Sec per core on average for this workload We use this as a basis for the published reference architectures Your mileage will vary! – For precise system sizing, measure your own workload
  • 8. Potential Performance Bottlenecks S F C DISK DISK P W Q A C I L C FC A U HBA S N S A B LUN CACHE C W A STORAGE A SERVER D E C O O R H I B CONTROLLER B DISK DISK R FC A W V E T B E HBA B S E C LUN S R H CPU Feed Rate SQL Server HBA Port Rate Switch Port Rate SP Port Rate LUN Read Rate Disk Feed Rate Read Ahead Rate
  • 9. The Alternative: A Balanced System Design a server + storage configuration that can deliver all the IO bandwidth that CPUs can consume when executing a SQL Relational DW workload Avoid sharing storage devices among servers Avoid overinvesting in disk drives – Focus on scan performance, not IOPS Layout and manage data to maximize range scan performance and minimize fragmentation
  • 10. Microsoft Data Warehousing – Product Offering PDW with Scale Hub-and-spoke 1 Minimal HW tune Complexity 4 up/optimization. Supports HA by default mixed workloads SW-HW integration 3 2 Balanced solution for mostly scan centric workloads. PDW 3 Max HW tune up for most DW scenarios. SQL Server 2008 R2 4 Most flexible Architecture for with Fast Track handling all DW scenarios. Reference Architecture 2 SQL Server 2008 R2 1
  • 11. Agenda Motivation Fast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and Maintenance Parallel Data Warehouse Offering Overview
  • 12. SQL Server Fast Track Data Warehouse Solution to help customers and partners accelerate their data warehouse deployments A for designing a cost-effective, balanced system for Data Warehouse workloads Reference hardware developed in conjunction with hardware partners using this method for data layout, loading and management Relational Database Only – Not SSAS, IS, RS
  • 13. Fast Track Scope Supporting Systems BI Data Storage Systems Presentation Layer Systems Integration Analysis Services Services ETL Cubes Presentation Data Presentation Data Web Analytic Tools Data Path Reporting Services Subject Area Data Marts SharePoint Services SAN, Storage Array Microsoft Office SharePoint Data Warehouse PerformancePoint Data Staging, Excel Services Bulk Loading Reference Architecture Scope (dashed)
  • 14. HP Fast Track DL785 G6 Demo
  • 15. Fast Track SQL DW Architecture vs. Traditional DW Traditional SQL DW Architecture Fast Track SQL DW Architecture Shared Infrastructure Dedicated DW Infrastructure Architecture modeled after DW Appliances Scalability from 4TB to 80TB Enterprise Shared Shared Network Dedicated Network SAN Storage Bandwidth Bandwidth SQL 2008 Data Warehouse Dedicated Low Cost 4 Processor 16 + Core Server SAN Arrays 1 for every 4 CPU Cores Benefits: OLTP Applications -Lower TCO -Balanced CPU to I/O Channel Optimized for DW -Modular Building Block Approach -Scale Out or Up within limits of Server and San
  • 16. HP SQL Server Fast Track Data Warehousing Fast Track G7 Configurations Coming soon Scales from SMB to Enterprise – Prescriptive guidance and optimized methodology for deploying a data warehouse – Targeted at query workloads patterned for large sequential data reads – Balanced hardware approach HP provides – Configurations, tested performance, guidance and – Best practices for deploying/operating/managing – Packaged and custom support Basic Mainstream Mainstream Premium 8– 16TB 8 – 16TB 20 – 60TB 40– 80 TB DL38x G7w/ DL38x G7 w/ DL58x G7 w/ DL980 G7 w/ MSA2000 G3 MSA P2000 G3 MSA P2000 G3 MSA P2000 G3
  • 17. HP SQL Server Fast Track Data Warehousing Coming Fast Track G7 configurations in test soon Server: HP ProLiant DL380 G7 with Small SMP: 2x 6-core Intel Xeon 2- Socket Processor Storage : HP P2000 G3 Configuration Scalability: 8 – 16TB 2p; 12 core, 64-192GB RAM Server: HP ProLiant DL 580 G7 Medium SMP: 4- with 4x 8-core Intel Xeon Socket Processor Storage : HP P2000 G3 Configuration Scalability: 20 – 40TB 4p; 32 core, 144-512GB RAM Server: HP ProLiant DL980 G7 with Large SMP: 8x 8-core Intel Xeon 8- Socket Processor Storage: HP P2000 G3 Configuration Scalability: 40 – 80TB 8p; 64 core, 2TB RAM
  • 18. Fast Track Component Architecture SQL Server Storage Interconnect Windows Server OS Storage Processor Disk Array CPU Host Storage Adaptor Server Storage Enclosure
  • 19. Core Evaluation Metrics These metrics are used to both validate and position Fast Track Reference Architectures – Maximum Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a standard SQL query. – Benchmark Consumption Rate – Ability of SQL Server to process data for a specific CPU and Server combination and a user workload or query. – User Data Capacity – Maximum available SQL Server storage for a specific Fast Track RA assuming 2.5:1 page compression factor.
  • 20. Scaling the IO stack Storage Processor RAID-1 RAID-1 CPU Socket CPU Socket Fiber Storage Processor RAID-1 RAID-1 RAID-1 (4 Core) (4 Core) Storage Enclosure Switch Storage Processor RAID-1 RAID-1 CPU Socket CPU Socket RAID-1 Storage Processor RAID-1 (4 Core) (4 Core) RAID-1 Storage Enclosure Storage Processor RAID-1 CPU Socket CPU Socket RAID-1 (4 Core) (4 Core) RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure CPU Socket CPU Socket (4 Core) (4 Core) Storage Processor RAID-1 RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure HBA Storage Processor RAID-1 HBA RAID-1 RAID-1 RAID-1 Storage Processor RAID-1 Storage Enclosure HBA Storage Processor RAID-1 HBA RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure HBA Storage Processor RAID-1 HBA RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure HBA Server HBA Storage Processor RAID-1 RAID-1 RAID-1 Storage Processor RAID-1 RAID-1 Storage Enclosure
  • 21. User Data Capacity UDC is the data capacity required – Plan for projected growth • Based on your projections • Needs to be allocated up-front – Allocate for data management needs • Staging database requirements • Temporary objects – Allocate for TempDB • Typically 20-30% of primary data space • Tempdb is not compressed
  • 22.
  • 23. Storage Layout Implications for SQL Server LUN 1 LUN 2 LUN 3 LUN16 Permanent FG Permanant_DB Permanent_1.ndf Permanent_2.ndf Permanent_3.ndf Permanent_16.ndf Stage FG Database Stage Stage_1.ndf Stage_2.ndf Stage_3.ndf Stage_16.ndf Local Drive 1 TempDB TempDB.mdf (25GB) TempDB_02.ndf (25GB) TempDB_03ndf (25GB) TempDB_16.ndf (25GB) Log LUN 1 Permanent DB Log Stage DB Log
  • 24. Sequential Scan Components ARY01D1v01 ARY02D1v03 ARY03D1v05 ARY04D1v07 4MB 4MB 4MB 4MB DB1-1.ndf DB1-3.ndf DB1-5.ndf DB1-7.ndf ARY01D2v02 ARY02D2v04 ARY03D2v06 ARY04D2v08 4MB 4MB 4MB 4MB DB1-2.ndf DB1-4.ndf DB1-6.ndf DB1-8.ndf Contiguous allocation, data striping, pre-fetch, and read-ahead work to create efficient Sequential IO – Data stripe width is balanced against read-ahead “Depth” – Combined, these elements provide effective access to the full data stripe from a single thread Each element is necessary to maximize efficiency
  • 25. loading One of the important topics I hope you saw the session yesterday If not – you can watch the video OR There is Appendix to this presentation -
  • 26. Minimizing File fragmentation Pre-allocate database files • Size files correctly to prevent growth • Do not shrink files Do not use NTFS file fragmentation tools – Rebuild table to ensure disk block level optimal organization Writing data – Concurrent load operations to the same file will induce fragmentation – DML change operations (Update/Delete) may induce fragmentation Use Filegroups and Partitioning to manage concurrent writes for large tables
  • 27. What’s next? My car is too small 
  • 29. Agenda Motivation Fast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and Maintenance Parallel Data Warehouse Offering Overview – Scale Out Architecture Approach for DW – SQL Server in Scale Out Story
  • 30. HP Enterprise Data Warehouse Appliance Transforming today’s SQL BEFORE AFTER The world’s most scalable, easy-to-manage enterprise data warehousing solution
  • 31. HP Enterprise Data Warehouse Appliance COMPLETE SIMPLIFIED FOR ANY SCALE
  • 32. HP Enterprise Data Warehouse Appliance Description Scale-Out of SQL Server: 10s TB ►100s TB ►PB Uses massively parallel processing (MPP) Highly optimised for DW workload at each layer of the stack Uses index-Light Deliver predictable performance at low cost Simplified deployment and maintenance via appliance model Integration with existing SQL Server 2008 DW via Hub & Spoke Architecture Lower total cost of ownership
  • 33. HP Parallel Data Warehouse Appliance - Hardware Architecture Data Rack Storage Nodes Database Nodes Control node Control Rack HP ProLiant DL HP MSA P2000 G3 Where clients apps connect Control Nodes SQL HP ProLiant DL MPP engine runs here Active / Passive Compute nodes SQL Controls DMS on all nodes Store user data Client Drivers SQL SQL Central point for all HW Perform local query processing Dual Fiber Channel monitoring Run dataSQL movement service Dual Infiniband Management Servers Not accessible to outside world SQL Management node Data Center S/W upgrades and patch SQL Monitoring deployment staging place Holds S/W images in case a Landing Zone SQL node needs reimaging Landing Zone SQL ETL Load Interface Staging place for data loading SQL Backup node Accessible to outside world Backup Node SQL Backup file storage Corporate Backup Accessible to outside world Solution Spare Database Node Corporate Network Private Network
  • 34. Symmetric Multi-Processing vs. Massively Parallel Processing SMP (SQL Server, Fast Track) MPP (PDW) OLTP, Transactional, Parallel Data Warehousing Data Warehousing (esp. VLDB, complex workloads)
  • 35. HP Enterprise Parallel Data Warehouse – Impressive live demo Massive parallel query processing 106 billion rows; 10 TB table High performance report without indexing and aggregations
  • 36. Agenda Motivation Fast Track Offering – Balanced Architecture Approach for DW – Example FastTrack Reference Architectures – Optimizing Storage, Load and Maintenance Parallel Data Warehouse Offering Overview – Scale Out Architecture Approach for DW – SQL Server in Scale Out Story
  • 37. Data Distribution with replication Database Date Dim Customer D_DATE_SK D_DATE_ID C-CUSTOMER_SK D_DATE D_MONTH C_CUSTOMER_ID Item C_CURRENT_ADDR … … I_ITEM_SK I_ITEM_ID I_REC_START_DATE I_ITEM_DESC … SS[1] Store Sales Ss_sold_date_sk SS[2] Ss_item_sk Ss_customer_sk Ss_cdemo_sk SS[3] Ss_store_sk Ss_promo_sk Ss_quantity Promotion SS[4] Customer … Demographics P_PROMO_SK P_PROMO_ID CD_DEMO_SK P_START_DATE_SK P_END_DATE_SK CD_GENDER Store … CD_MARITAL_STATUS CD_EDUCATION … S_STORE_SK S_STORE_ID S_REC_START_DATE S_REC_END_DATE S_STORE_NAME …
  • 38. Distributed Data Warehouse Architecture Departmental Reporting MS Office 2010 Regional Reporting Enterprise data Central Enterprise can be maintained DW Hub on a PDW hub Hub= unified EDW ETL Tools Spoke= Federated data marts
  • 39. Distributed Data Warehouse Approach Hub & Spoke model Enables DW architecture to more closely match the structure of large enterprises. Separates user and data workloads eliminating traditional process and resource conflicts Integrate both SMP and MPP systems with “Shared Nothing” All systems connect via a dedicated high speed netwok Dual high speed Infiniband Supports multiple workloads on different systems
  • 40.
  • 41. Microsoft Data Warehousing – Product Offering PDW with Scale Hub-and-spoke 1 Minimal HW tune Complexity 4 up/optimization. Supports HA by default mixed workloads SW-HW integration 3 2 Balanced solution for mostly scan centric workloads. PDW 3 Max HW tune up for most DW scenarios. SQL Server 2008 R2 4 Most flexible Architecture for with Fast Track handling all DW scenarios. Reference Architecture 2 SQL Server 2008 R2 1
  • 42. Resources SQL Server Fast Track DW Home Page – http://www.microsoft.com/sqlserver/2008/en/us/fasttrack.aspx Fast Track DW 2.0 Architecture Whitepaper – http://msdn.microsoft.com/en-us/library/dd459178.aspx Use minimal logged BULK operation (Trace Flag –T 610) – http://msdn.microsoft.com/en-us/library/dd425070.aspx
  • 44. ‫משובים ופייסבוק‬ ‫מירב- השלמה‬
  • 45. ‫!‪Let’s Party‬‬ ‫ארוחת ערב – בין השעות 03:02-03:81‬ ‫תחבורה למסיבה – שאטלים החל מ- 03:02‬ ‫צמידים לכניסה - מקבלים במעטפות בקבלת החדרים‬
  • 46.
  • 47. Alternatives for loading Use a heap – Practical if queries need to scan whole partitions or…Use a batchsize = 0 – Fine if no parallelism is needed during load or…Use a Two-Step Load 1. Load to a Staging Table (heap) with constraint for Deltas 2. INSERT-SELECT from Staging Table into Target CI Resulting rows are not fragmented Can use Parallelism in step 1 – essential for large data volumes
  • 48. Two-Step Load Variations To achieve high parallelism during historical load – Typically into a partitioned table – Use a Staging Table (heap) that is partitioned identically to the Target Table – Use multiple concurrent streams to load the Staging Table with moderate batchsize (SSIS, Bulk Insert, etc) – INSERT-SELECT separate partitions into the Target Table – potentially in parallel • Use ALTER TABLE SET ( LOCK_ESCALATION = AUTO) – Note: If memory is limited, TempDB could be heavily used for sorting
  • 49. Two-Step Load Variations (cont.) To avoid most TempDB space and TempDB IO during load – Use a partitioned Staging Table that is also indexed identically to Target Table – Load Staging Table using moderate batchsize (< 1M rows) – Final INSERT-SELECTs will avoid any sort! • However the staging loads will be logged – Note: Parallelism will be limited if load batches overlap
  • 50. Loading Data Goal: maximize read performance – Minimizes Disk head movement – Maintains high average request size (Think ~400k not 8k) – Sustain high average scan rates Key considerations for a Fast Track data load – Data Architecture: Destination table, partitioning, and filegroup – Source Data: Format & size – System Resources: CPU & Memory Use minimal logged BULK operation (Trace Flag –T 610) – http://msdn.microsoft.com/en-us/library/dd425070.aspx