SlideShare ist ein Scribd-Unternehmen logo
1 von 30
In-Memory Columnstore Indexes-Make Your Data Warehouse Fly
Joe D’Antoni
Philadelphia SQL Server Business Intelligence
Group
19 November 2013
About Me
Solution Architect, Anexinet
@jdanton – Twitter
jdanton1@yahoo.com
Joedantoni.wordpress.com – Blog, Slides
Agenda
Indexes—a basic overview
Columnstore—an introduction
Report Performance—Demo
2012 and 2014—What’s Changing?

2014—Demo
Questions
Indexes
• Data Structure that allows us to
speed data retrieval, by
maintaining an extra copy of
data
• Can be filtered
• Can be function based, or
ordered
• Penalty is that writes become
more expensive
• More storage required
Indexes in SQL Server
• Clustered vs Nonclustered
• Non-clustered index ―just
an index‖
Clustered Index
• Data is ordered as is inserted into
pages
• Data in clustered index is only
stored on disk once (it’s the data
from the tables)
• Table without a clustered index is
called a heap—no order at all
Non-Clustered Index
• Duplicate copy of the data in table
• Provides point from index to table
data
• No specific order of data in index
So Why All This Talk About Indexes?
Data Warehouse Queries
• Data Warehouses have a lot of
data
• Querying lots of a data can
take a really long time
• Processing data row by row—
may not be the most efficient
way to perform aggregations
Traditional Approaches To Improving
Performance
• Partitioned Tables
• Indexed Views
• Data Compression
Introducing Columnstore Indexes (SQL
2012)
• Data is stored in columns, as
opposed to rows
• This allows a much higher rate of
compression
• Columns not used in a query a
simply not scanned, nor returned
• Recommended practice is to add
most columns in a table to a index
Columnar Data
Storage
Columnstore 2012
Demo
So How is So Much Faster?
• Very good compression ratio for Column oriented
data
• Better use of Memory
• Segment Elimination Skips Large Chunks of Data
• Batch Mode
• Processes data in chunks of a 1000 row
―batches‖ rather than row by row
• 7-40x CPU savings with batch mode

“The key to getting the best
performance is to make sure
your queries process the
large majority of data in
Columnstore All The Things?
• Awesome performance—so what’s
the negative?
• Can’t update/insert in 2012
• Can only be nonclustered index—
so we are storing more data on
disk
• Data types are somewhat limited
• One index per table
• Can’t be a sorted index
So Where To Use Columnstore
Indexes?
• Only on Large Tables—Fact
tables and Dimension Tables >
3 Million Rows
• Include Every Column
• Structure Queries as star joins
with grouping and aggregation

More details here
Columnstore 2014
Columnstore in 2014
• Fewer Data Type Limitations
• Updateable
• Can be Clustered Index

• New Archival Compression Mode
• Batch Mode Improvements
Columnstore Updates (2014)

Updates To
Index

Collected
until they
reach 1000
rows

Tuple Movers
Move into
Index
Columnstore Updates (2014)
• Bulk Inserts go through
special API
• Updates are processed
as inserts and deletes,
so expensive
opertation
Columnstore 2014
Demo
What Do We Do Differently in 2014
• Best Practices are mostly the
same
• Batch mode gets enhanced and
gains more query types
• No need to worry about dropping
and rebuilding indexes—just
append data
• Still focus on large tables where
data is not frequently updated
• Archival Compression Good for
old unused data
Questions
Contact
jdanton1@yahoo.com
Joedantoni.wordpress.com
@jdanton
In memory columnstore indexes--make your data warehouse
In memory columnstore indexes--make your data warehouse
In memory columnstore indexes--make your data warehouse
In memory columnstore indexes--make your data warehouse
In memory columnstore indexes--make your data warehouse
In memory columnstore indexes--make your data warehouse

Weitere ähnliche Inhalte

Mehr von Joseph D'Antoni

Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new features
Joseph D'Antoni
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
Joseph D'Antoni
 
Deploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dcDeploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dc
Joseph D'Antoni
 

Mehr von Joseph D'Antoni (20)

Sql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday RichmondSql Server 2012 HA and DR -- SQL Saturday Richmond
Sql Server 2012 HA and DR -- SQL Saturday Richmond
 
Sql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampaSql server 2012 ha and dr sql saturday tampa
Sql server 2012 ha and dr sql saturday tampa
 
Windows server 2012 failover clustering new features
Windows server 2012 failover clustering new featuresWindows server 2012 failover clustering new features
Windows server 2012 failover clustering new features
 
Sql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dcSql server 2012 ha and dr sql saturday dc
Sql server 2012 ha and dr sql saturday dc
 
San presentation nov 2012 central pa
San presentation nov 2012 central paSan presentation nov 2012 central pa
San presentation nov 2012 central pa
 
Always on availability groups way too deep
Always on availability groups way too deepAlways on availability groups way too deep
Always on availability groups way too deep
 
South jersey sql virtualization
South jersey sql virtualizationSouth jersey sql virtualization
South jersey sql virtualization
 
Virtualization for DBA
Virtualization for DBAVirtualization for DBA
Virtualization for DBA
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr nova
 
Sql server 2012 ha dr
Sql server 2012 ha drSql server 2012 ha dr
Sql server 2012 ha dr
 
Sql saturday powerpoint dc_san
Sql saturday powerpoint dc_sanSql saturday powerpoint dc_san
Sql saturday powerpoint dc_san
 
Sql saturday dc vm ware
Sql saturday dc vm wareSql saturday dc vm ware
Sql saturday dc vm ware
 
Deploying your Application to SQLRally
Deploying your Application to SQLRallyDeploying your Application to SQLRally
Deploying your Application to SQLRally
 
Deploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dcDeploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dc
 
Building your first sql server cluster
Building your first sql server clusterBuilding your first sql server cluster
Building your first sql server cluster
 
Deploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dcDeploying data tier applications sql saturday dc
Deploying data tier applications sql saturday dc
 
Server virtualization and cloud computing
Server virtualization and cloud computingServer virtualization and cloud computing
Server virtualization and cloud computing
 
Management data warehouse
Management data warehouseManagement data warehouse
Management data warehouse
 

Kürzlich hochgeladen

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Kürzlich hochgeladen (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

In memory columnstore indexes--make your data warehouse

  • 1. In-Memory Columnstore Indexes-Make Your Data Warehouse Fly Joe D’Antoni Philadelphia SQL Server Business Intelligence Group 19 November 2013
  • 2. About Me Solution Architect, Anexinet @jdanton – Twitter jdanton1@yahoo.com Joedantoni.wordpress.com – Blog, Slides
  • 3. Agenda Indexes—a basic overview Columnstore—an introduction Report Performance—Demo 2012 and 2014—What’s Changing? 2014—Demo Questions
  • 4. Indexes • Data Structure that allows us to speed data retrieval, by maintaining an extra copy of data • Can be filtered • Can be function based, or ordered • Penalty is that writes become more expensive • More storage required
  • 5. Indexes in SQL Server • Clustered vs Nonclustered • Non-clustered index ―just an index‖
  • 6. Clustered Index • Data is ordered as is inserted into pages • Data in clustered index is only stored on disk once (it’s the data from the tables) • Table without a clustered index is called a heap—no order at all
  • 7. Non-Clustered Index • Duplicate copy of the data in table • Provides point from index to table data • No specific order of data in index
  • 8. So Why All This Talk About Indexes?
  • 9. Data Warehouse Queries • Data Warehouses have a lot of data • Querying lots of a data can take a really long time • Processing data row by row— may not be the most efficient way to perform aggregations
  • 10. Traditional Approaches To Improving Performance • Partitioned Tables • Indexed Views • Data Compression
  • 11. Introducing Columnstore Indexes (SQL 2012) • Data is stored in columns, as opposed to rows • This allows a much higher rate of compression • Columns not used in a query a simply not scanned, nor returned • Recommended practice is to add most columns in a table to a index
  • 14. So How is So Much Faster? • Very good compression ratio for Column oriented data • Better use of Memory • Segment Elimination Skips Large Chunks of Data • Batch Mode • Processes data in chunks of a 1000 row ―batches‖ rather than row by row • 7-40x CPU savings with batch mode “The key to getting the best performance is to make sure your queries process the large majority of data in
  • 15. Columnstore All The Things? • Awesome performance—so what’s the negative? • Can’t update/insert in 2012 • Can only be nonclustered index— so we are storing more data on disk • Data types are somewhat limited • One index per table • Can’t be a sorted index
  • 16. So Where To Use Columnstore Indexes? • Only on Large Tables—Fact tables and Dimension Tables > 3 Million Rows • Include Every Column • Structure Queries as star joins with grouping and aggregation More details here
  • 18. Columnstore in 2014 • Fewer Data Type Limitations • Updateable • Can be Clustered Index • New Archival Compression Mode • Batch Mode Improvements
  • 19. Columnstore Updates (2014) Updates To Index Collected until they reach 1000 rows Tuple Movers Move into Index
  • 20. Columnstore Updates (2014) • Bulk Inserts go through special API • Updates are processed as inserts and deletes, so expensive opertation
  • 22. What Do We Do Differently in 2014 • Best Practices are mostly the same • Batch mode gets enhanced and gains more query types • No need to worry about dropping and rebuilding indexes—just append data • Still focus on large tables where data is not frequently updated • Archival Compression Good for old unused data

Hinweis der Redaktion

  1. Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
  2. . Generally, nonclustered indexes are created to improve the performance of frequently used queries not covered by the clustered index or to locate rows in a table without a clustered index (called a heap). You can create multiple nonclustered indexes on a table or indexed view.
  3. The columnstore index in SQL Server employs Microsoft’s patented Vertipaq™ technology, which itshares with SQL Server Analysis Services and PowerPivot. SQL Server columnstore indexes don’t have tofit in main memory, but they can effectively use as much memory as is available on the server. Portionsof columns are moved in and out of memory on demand
  4. What data types cannot be used in a columnstore index?The following data types cannot be used in a columnstore index: decimal or numeric with precision > 18, datetimeoffset with precision > 2, binary, varbinary, image, text, ntext, varchar(max), nvarchar(max), cursor, hierarchyid, timestamp, uniqueidentifier, sqlvariant, xml.The SQL Server 2012 implementation did not support a number of data types such as numeric beyond precision 18, datetimeoffset beyond precision 2, GUID and binary columns. The upcoming version adds support for all the above data types. It also introducessupport for storing short strings by value instead of converting all strings to a 32 bit id within a dictionary. This removes the extraoverhead associated with the dictionary and helps improve the column store compression even further.
  5. Include every column of the table in the columnstore index. If you don't, then a query that references a column not included in the index will not benefit from the columnstores index much or at all.Structure your queries as star joins with grouping and aggregation as much as possible. Avoid joining pairs of large tables. Join a single large fact table to one or more smaller dimensions using standard inner joins. Use a dimensional modeling approach for your data as much as possible to allow you to structure your queries this way.Use best practices for statistics management and query design. This is independent of columnstore technology. Use good statistics and avoid query design pitfalls to get the best performance. See the white paper on SQL Server statistics  for guidance. In particular, see the section "Best Practices for Managing Statistics."