SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Nauzad Kapadia
Quartz Systems
nauzadk@quartzsystems.com
Key Takeaways
 How to design your cubes efficiently
 How to effectively partition your facts
 How to optimize cube and query processing
 How to ensure that your solution scales well
Agenda
 Cube Design
 Storage and Partitioning
 Aggregations
 Processing
 Scalability
Tips for Designing Dimensions and Facts

 Base fact data sources on views
    Can use query hints
    Can facilitate write-back partitions for measure
    groups containing semi-additive measures
 Avoid Linked Dimensions
 Use the Unknown Member
Tips for designing Attributes
 Avoid unnecessary attributes
 Use AttributeHierarchyEnabled property with
 care
 Use Key Columns appropriately
Query performance
   Dimension storage access is faster
   Produces more optimal execution plans
Aggregation design
   Enables aggregation design algorithm to produce effective
   set of aggregations
Dimension security
   DeniedSet = {State.WA} should deny cities and customers
   in WA—requires attribute relationships
How attribute relationships affect
performance
After adding attribute relationships…
Don’t forget to remove redundant relationships!
All attributes have implicit relationship to key
Examples:
   Customer  City (not redundant)
   Customer  State (redundant)
   Customer  Country (redundant)
   Date  Month (not redundant)
   Date  Quarter (redundant)
   Date  Year (redundant)
User Defined Hierarchies
 Pre-defined navigation paths through
 dimensional space defined by attributes
 Why create user defined hierarchies?
    Guide end users to interesting navigation paths
    Existing client tools are not “attribute-aware”
    Performance
      Optimize navigation path at processing time
      Materialization of hierarchy tree on disk
      Aggregation designer favors user defined hierarchies
Natural Hierarchies
 1:M relation (via attribute relationships)
 between every pair of adjacent levels
 Examples:
    Country-State-City-Customer (natural)
    Country-City (natural)
    State-Customer (natural)
    Age-Gender-Customer (unnatural)
    Year-Quarter-Month (depends on key columns)
      How many quarters and months?
      4 & 12 across all years (unnatural)
      4 & 12 for each year (natural)
Natural Hierarchies
 Performance implications
    Only natural hierarchies are materialized on disk
    during processing
    Unnatural hierarchies are built on the fly during queries
    (and cached in memory)
 Create natural hierarchies where possible
    Using attribute relationships
    Not always appropriate (e.g., Age-Gender)
Benefits of Partitioning
 Partitions can be added, processed,
 deleted independently
    Update to last month’s data does not affect prior
    months’ partitions
    Sliding window scenario easy to implement
            e.g., 24 month window  add June 2006 partition
    and delete June 2004
 Partitions can have different storage settings
    Storage mode (MOLAP, ROLAP, HOLAP)
    Aggregation design
    Alternate disk drive
    Remote server
Benefits of Partitioning
 Partitions can be processed and
 queried in parallel
    Better utilization of server resources
    Reduced data warehouse load times
 Queries are isolated to relevant
 partitions  less data to scan
    SELECT … FROM … WHERE *Time+.*Year+.*2006]
    Queries only 2006 partitions
 Bottom line  partitions enable:
    Manageability
    Performance
    Scalability
Best Practices for Partitioning
 No more than 20M rows per partition
 Specify partition / data slice
    Optional (but still recommended) for MOLAP: server auto-
    detects the slice and validates against user-specified slice (if
    any)
    Should reflect, as closely as possible, the data in the
    partition
    Must be specified for ROLAP
 Remote partitions for scale out
Best Practices for Designing Partitions

  Design from the start
  Partition boundary and intervals
  Determine what storage model and aggregation
  level fits best
     Frequently queried  MOLAP with lots of aggs
     Periodically queried  MOLAP with less or no aggs
     Real-time ROLAP with no aggs
  Pick efficient data types in fact table
What is Proactive Caching
 Benefits of Proactive caching
 Considerations for using proactive caching
    Use correct latency and silence settings
    Useful in a transaction-oriented system in which
    changes are unpredictable
What are Aggregations
 Benefits of Aggregations
 Aggregating data in partitions
Aggregation Design Algorithm
 Evaluate cost/benefit of aggregations
     Relative to other aggregations
     Designed in “waves” from top of pyramid
     Cost is related to aggregation size
     Benefit is related to “distance”
     from another aggregation
 Storage design wizard
     Assumes all combinations of
     attributes are equally likely
     Can be done before you know                  Fact Table
     the query load
 Usage based optimization wizard
     Assumes query pattern resembles your selection from the query log
     Representative history is needed
Aggregation Design Algorithm
 Examines the AggregationUsage property to
 build list of candidate attributes
    Full: Every agg must include the attribute
    None: No agg can include the attribute
    Unrestricted: No restrictions on the algorithm
    Default: Unrestricted if attribute is All, key or
    belongs to a natural hierarchy, None otherwise
 Builds the attribute lattice
How to Monitor Aggregation Usage?

                                 

                     
                           Hit


                                 


                                     

                       
                Profiler
How to Monitor Aggregation Usage?

                            
                            


                   Miss
                             
                            




                Profiler
Best Practices for Aggregations
 Define all possible attribute relationships
 Set accurate attribute member counts and fact table
 counts
 Set AggregationUsage to guide agg designer
    Set rarely queried attributes to None
    Set commonly queried attributes to Unrestricted
 Do not build too many aggregations
    In the 100s, not 1000s!
 Do not build aggregations larger than 30% of fact table
 size (aggregation design algorithm doesn’t)
Best Practices for Aggregations
 Aggregation design cycle
    Use Storage Design Wizard (~20% perf gain)
    to design initial set of aggregations
    Enable query log and run pilot workload (beta
    test with limited set of users)
    Use Usage Based Optimization (UBO) Wizard
    to refine aggregations
    Use larger perf gain (70-80%)
    Reprocess partitions for new aggregations to
    take effect
    Periodically use UBO to refine aggregations
Processing Options
 ProcessFull
    Fully processes the object from scratch
 ProcessClear
    Clears all data—brings object to unprocessed state
 ProcessData
    Reads and stores fact data only (no aggs or indexes)
 ProcessIndexes
    Builds aggs and indexes
 ProcessUpdate
    Incremental update of dimension (preserves fact data)
 ProcessAdd
    Adds new rows to dimension or partition
Best Practices for Processing
 Use XMLA scripts in large production systems
    Automation (e.g., using ascmd)
    Finer control over parallelism, transactions,
    memory usage, etc.
    Don’t just process the entire database!
 Dimension processing
    Performance is limited by attribute relationships
    Key attribute is a big bottleneck
    Define all possible attribute relationships
    Eliminate redundant relationships—especially on key!
    Bind Dimension data sources to views instead of tables or
    named queries
Best Practices for Processing
 Partition processing
    Split ProcessFull into ProcessData + ProcessIndexes for large
    partitions—consumes less memory
    Monitor aggregation processing spilling to disk (perfmon
    counters for temp file usage)
       Add memory, turn on /3GB, move to x64/ia64
    Fully process partitions periodically
       Achieves better compression over repeated incremental processing
 Data sources
    Avoid using .NET data sources—OLEDB is order of
    magnitude faster for processing
Improving multi-user performance
 Increase Query parallelism
 Block long running queries
 Use a load balancing cluster
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should
 not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
                                                                           IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Weitere ähnliche Inhalte

Ähnlich wie Analysis Services Best Practices From Large Deployments

Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Aaron Shilo
 
High performance coding practices code project
High performance coding practices code projectHigh performance coding practices code project
High performance coding practices code project
Pruthvi B Patil
 

Ähnlich wie Analysis Services Best Practices From Large Deployments (20)

Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
Accelerate Database Development and Testing with Amazon Aurora (DAT313) - AWS...
 
Level 3 Certification: Setting up Sumo Logic - Oct 2018
Level 3 Certification: Setting up Sumo Logic - Oct  2018Level 3 Certification: Setting up Sumo Logic - Oct  2018
Level 3 Certification: Setting up Sumo Logic - Oct 2018
 
Advanced Analytics using Apache Hive
Advanced Analytics using Apache HiveAdvanced Analytics using Apache Hive
Advanced Analytics using Apache Hive
 
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and DatabricksSelf-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
Self-serve analytics journey at Celtra: Snowflake, Spark, and Databricks
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)
 
Elastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and actionElastic Stack: Using data for insight and action
Elastic Stack: Using data for insight and action
 
Automated Analytics at Scale
Automated Analytics at ScaleAutomated Analytics at Scale
Automated Analytics at Scale
 
Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2Cost Based Optimizer - Part 2 of 2
Cost Based Optimizer - Part 2 of 2
 
ADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic SolutionsADV Slides: Comparing the Enterprise Analytic Solutions
ADV Slides: Comparing the Enterprise Analytic Solutions
 
Teradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional ModelsTeradata Aggregate Join Indices And Dimensional Models
Teradata Aggregate Join Indices And Dimensional Models
 
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
Mastering MongoDB Atlas: Essentials of Diagnostics and Debugging in the Cloud...
 
Maximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL AnywhereMaximizing Database Tuning in SAP SQL Anywhere
Maximizing Database Tuning in SAP SQL Anywhere
 
SAS Training session - By Pratima
SAS Training session  -  By Pratima SAS Training session  -  By Pratima
SAS Training session - By Pratima
 
BI Environment Technical Analysis
BI Environment Technical AnalysisBI Environment Technical Analysis
BI Environment Technical Analysis
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017Setting Up Sumo Logic - Apr 2017
Setting Up Sumo Logic - Apr 2017
 
High performance coding practices code project
High performance coding practices code projectHigh performance coding practices code project
High performance coding practices code project
 

Mehr von rsnarayanan

Kevin Ms Web Platform
Kevin Ms Web PlatformKevin Ms Web Platform
Kevin Ms Web Platform
rsnarayanan
 
Harish Understanding Aspnet
Harish Understanding AspnetHarish Understanding Aspnet
Harish Understanding Aspnet
rsnarayanan
 
Harish Aspnet Dynamic Data
Harish Aspnet Dynamic DataHarish Aspnet Dynamic Data
Harish Aspnet Dynamic Data
rsnarayanan
 
Harish Aspnet Deployment
Harish Aspnet DeploymentHarish Aspnet Deployment
Harish Aspnet Deployment
rsnarayanan
 
Whats New In Sl3
Whats New In Sl3Whats New In Sl3
Whats New In Sl3
rsnarayanan
 
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
rsnarayanan
 
Advanced Silverlight
Advanced SilverlightAdvanced Silverlight
Advanced Silverlight
rsnarayanan
 
Occasionally Connected Systems
Occasionally Connected SystemsOccasionally Connected Systems
Occasionally Connected Systems
rsnarayanan
 
Developing Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And ServicesDeveloping Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And Services
rsnarayanan
 
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
rsnarayanan
 
J Query The Write Less Do More Javascript Library
J Query   The Write Less Do More Javascript LibraryJ Query   The Write Less Do More Javascript Library
J Query The Write Less Do More Javascript Library
rsnarayanan
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sql
rsnarayanan
 
Windows 7 For Developers
Windows 7 For DevelopersWindows 7 For Developers
Windows 7 For Developers
rsnarayanan
 
What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1
rsnarayanan
 
Ux For Developers
Ux For DevelopersUx For Developers
Ux For Developers
rsnarayanan
 
A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8
rsnarayanan
 

Mehr von rsnarayanan (20)

Walther Aspnet4
Walther Aspnet4Walther Aspnet4
Walther Aspnet4
 
Walther Ajax4
Walther Ajax4Walther Ajax4
Walther Ajax4
 
Kevin Ms Web Platform
Kevin Ms Web PlatformKevin Ms Web Platform
Kevin Ms Web Platform
 
Harish Understanding Aspnet
Harish Understanding AspnetHarish Understanding Aspnet
Harish Understanding Aspnet
 
Walther Mvc
Walther MvcWalther Mvc
Walther Mvc
 
Harish Aspnet Dynamic Data
Harish Aspnet Dynamic DataHarish Aspnet Dynamic Data
Harish Aspnet Dynamic Data
 
Harish Aspnet Deployment
Harish Aspnet DeploymentHarish Aspnet Deployment
Harish Aspnet Deployment
 
Whats New In Sl3
Whats New In Sl3Whats New In Sl3
Whats New In Sl3
 
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
Silverlight And .Net Ria Services – Building Lob And Business Applications Wi...
 
Advanced Silverlight
Advanced SilverlightAdvanced Silverlight
Advanced Silverlight
 
Netcf Gc
Netcf GcNetcf Gc
Netcf Gc
 
Occasionally Connected Systems
Occasionally Connected SystemsOccasionally Connected Systems
Occasionally Connected Systems
 
Developing Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And ServicesDeveloping Php Applications Using Microsoft Software And Services
Developing Php Applications Using Microsoft Software And Services
 
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
Build Mission Critical Applications On The Microsoft Platform Using Eclipse J...
 
J Query The Write Less Do More Javascript Library
J Query   The Write Less Do More Javascript LibraryJ Query   The Write Less Do More Javascript Library
J Query The Write Less Do More Javascript Library
 
Ms Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My SqlMs Sql Business Inteligence With My Sql
Ms Sql Business Inteligence With My Sql
 
Windows 7 For Developers
Windows 7 For DevelopersWindows 7 For Developers
Windows 7 For Developers
 
What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1What Is New In Wpf 3.5 Sp1
What Is New In Wpf 3.5 Sp1
 
Ux For Developers
Ux For DevelopersUx For Developers
Ux For Developers
 
A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8A Lap Around Internet Explorer 8
A Lap Around Internet Explorer 8
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Analysis Services Best Practices From Large Deployments

  • 2. Key Takeaways How to design your cubes efficiently How to effectively partition your facts How to optimize cube and query processing How to ensure that your solution scales well
  • 3. Agenda Cube Design Storage and Partitioning Aggregations Processing Scalability
  • 4. Tips for Designing Dimensions and Facts Base fact data sources on views Can use query hints Can facilitate write-back partitions for measure groups containing semi-additive measures Avoid Linked Dimensions Use the Unknown Member
  • 5. Tips for designing Attributes Avoid unnecessary attributes Use AttributeHierarchyEnabled property with care Use Key Columns appropriately
  • 6. Query performance Dimension storage access is faster Produces more optimal execution plans Aggregation design Enables aggregation design algorithm to produce effective set of aggregations Dimension security DeniedSet = {State.WA} should deny cities and customers in WA—requires attribute relationships
  • 7. How attribute relationships affect performance
  • 8. After adding attribute relationships… Don’t forget to remove redundant relationships! All attributes have implicit relationship to key Examples: Customer  City (not redundant) Customer  State (redundant) Customer  Country (redundant) Date  Month (not redundant) Date  Quarter (redundant) Date  Year (redundant)
  • 9. User Defined Hierarchies Pre-defined navigation paths through dimensional space defined by attributes Why create user defined hierarchies? Guide end users to interesting navigation paths Existing client tools are not “attribute-aware” Performance Optimize navigation path at processing time Materialization of hierarchy tree on disk Aggregation designer favors user defined hierarchies
  • 10. Natural Hierarchies 1:M relation (via attribute relationships) between every pair of adjacent levels Examples: Country-State-City-Customer (natural) Country-City (natural) State-Customer (natural) Age-Gender-Customer (unnatural) Year-Quarter-Month (depends on key columns) How many quarters and months? 4 & 12 across all years (unnatural) 4 & 12 for each year (natural)
  • 11. Natural Hierarchies Performance implications Only natural hierarchies are materialized on disk during processing Unnatural hierarchies are built on the fly during queries (and cached in memory) Create natural hierarchies where possible Using attribute relationships Not always appropriate (e.g., Age-Gender)
  • 12. Benefits of Partitioning Partitions can be added, processed, deleted independently Update to last month’s data does not affect prior months’ partitions Sliding window scenario easy to implement e.g., 24 month window  add June 2006 partition and delete June 2004 Partitions can have different storage settings Storage mode (MOLAP, ROLAP, HOLAP) Aggregation design Alternate disk drive Remote server
  • 13. Benefits of Partitioning Partitions can be processed and queried in parallel Better utilization of server resources Reduced data warehouse load times Queries are isolated to relevant partitions  less data to scan SELECT … FROM … WHERE *Time+.*Year+.*2006] Queries only 2006 partitions Bottom line  partitions enable: Manageability Performance Scalability
  • 14. Best Practices for Partitioning No more than 20M rows per partition Specify partition / data slice Optional (but still recommended) for MOLAP: server auto- detects the slice and validates against user-specified slice (if any) Should reflect, as closely as possible, the data in the partition Must be specified for ROLAP Remote partitions for scale out
  • 15. Best Practices for Designing Partitions Design from the start Partition boundary and intervals Determine what storage model and aggregation level fits best Frequently queried  MOLAP with lots of aggs Periodically queried  MOLAP with less or no aggs Real-time ROLAP with no aggs Pick efficient data types in fact table
  • 16. What is Proactive Caching Benefits of Proactive caching Considerations for using proactive caching Use correct latency and silence settings Useful in a transaction-oriented system in which changes are unpredictable
  • 17. What are Aggregations Benefits of Aggregations Aggregating data in partitions
  • 18. Aggregation Design Algorithm Evaluate cost/benefit of aggregations Relative to other aggregations Designed in “waves” from top of pyramid Cost is related to aggregation size Benefit is related to “distance” from another aggregation Storage design wizard Assumes all combinations of attributes are equally likely Can be done before you know Fact Table the query load Usage based optimization wizard Assumes query pattern resembles your selection from the query log Representative history is needed
  • 19. Aggregation Design Algorithm Examines the AggregationUsage property to build list of candidate attributes Full: Every agg must include the attribute None: No agg can include the attribute Unrestricted: No restrictions on the algorithm Default: Unrestricted if attribute is All, key or belongs to a natural hierarchy, None otherwise Builds the attribute lattice
  • 20. How to Monitor Aggregation Usage?   Hit    Profiler
  • 21. How to Monitor Aggregation Usage?   Miss   Profiler
  • 22. Best Practices for Aggregations Define all possible attribute relationships Set accurate attribute member counts and fact table counts Set AggregationUsage to guide agg designer Set rarely queried attributes to None Set commonly queried attributes to Unrestricted Do not build too many aggregations In the 100s, not 1000s! Do not build aggregations larger than 30% of fact table size (aggregation design algorithm doesn’t)
  • 23. Best Practices for Aggregations Aggregation design cycle Use Storage Design Wizard (~20% perf gain) to design initial set of aggregations Enable query log and run pilot workload (beta test with limited set of users) Use Usage Based Optimization (UBO) Wizard to refine aggregations Use larger perf gain (70-80%) Reprocess partitions for new aggregations to take effect Periodically use UBO to refine aggregations
  • 24. Processing Options ProcessFull Fully processes the object from scratch ProcessClear Clears all data—brings object to unprocessed state ProcessData Reads and stores fact data only (no aggs or indexes) ProcessIndexes Builds aggs and indexes ProcessUpdate Incremental update of dimension (preserves fact data) ProcessAdd Adds new rows to dimension or partition
  • 25. Best Practices for Processing Use XMLA scripts in large production systems Automation (e.g., using ascmd) Finer control over parallelism, transactions, memory usage, etc. Don’t just process the entire database! Dimension processing Performance is limited by attribute relationships Key attribute is a big bottleneck Define all possible attribute relationships Eliminate redundant relationships—especially on key! Bind Dimension data sources to views instead of tables or named queries
  • 26. Best Practices for Processing Partition processing Split ProcessFull into ProcessData + ProcessIndexes for large partitions—consumes less memory Monitor aggregation processing spilling to disk (perfmon counters for temp file usage) Add memory, turn on /3GB, move to x64/ia64 Fully process partitions periodically Achieves better compression over repeated incremental processing Data sources Avoid using .NET data sources—OLEDB is order of magnitude faster for processing
  • 27. Improving multi-user performance Increase Query parallelism Block long running queries Use a load balancing cluster
  • 28.
  • 29.
  • 30. © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.