The document discusses column store indexes in SQL Server 2012. Some key points:
- Column store indexes provide significantly faster query performance compared to row-based indexes by storing and processing data by column rather than by row. Performance gains of 10x to 1000x are possible.
- xVelocity is Microsoft's term for next-generation technologies in SQL Server that take advantage of modern hardware through vector processing and in-memory optimizations.
- Column stores compress data more efficiently than row stores by storing identical values only once across columns rather than repeating in each row.
- Fact tables are good candidates for column store indexes due to their large size and aggregation query patterns in data warehouses.
- Considerations for designing
Make Your Application “Oracle RAC Ready” & Test For It
VertiPaq-ing With SQL 2012 Columnstore Indexes for Astonishing Results
1. VertiPaq-ing With SQL
2012 Columnstore Indexes
xVelocity
Abstract
Significant work on the SQL 2012. One of such
column store has changed most expectant
the storage paradigm for implementation is xVelocity
data warehousing. The Column Store Indexes. This
column store supported by paper contains the detailed
vector based query discussion on column store
execution and substantial touching on the basics,
progress in data limitations & design
compression have edged considerations with the
the technology as potential demonstration examples.
game changers. Microsoft Columnar and Column store
targeting high on next-gen will be used
technologies increasing use interchangeably
of in-memory but memory in the below discussion.
optimized techniques in
www.aditi.com
2. Audiences
This paper targets IT Planners, and Memory are highly
Architects, BI Users, CTO‟s and dependent on the method of
CIO‟s evaluating the SQL 2012 storage and querying. Queries
answering their large data can be seen as (1) read-only
grounded business needs. It also workloads which are mostly
targets the enthusiasts of 2012 to reporting and DW systems and
provide new dimensions and out (2) the read-write workloads
of box thinking to the mostly OLTP systems. The
organization to maintain data potential game changer in read-
using SQL 2012. only workloads is the storage
method to minimize I/O and
Overview Memory based operations
Data is growing exponentially where-as conventional RDBMS
and performance is becoming a stores data in row based storage
recurring cost for the system. Based on the columnar
organizations. Performance design a gain or speedup in
impact can be broadly factorized queries can be seen from 10X up
as (1) I/O based operations (2) to 1000X.
Memory based operations & (3)
Operations to transfer data in For instance suppose the
N/W or other peripherals. I/O employee information looks like
…
ID Name Street Address
32498 Diamond John Crouse Manson
45298 Mary Anglos Wilson Street
Acronyms
RDBMS Relational Data Base Management System HCC Hybrid Columnar Compression
I/O Input Output FCC Full Columnar Compression
N/W Network LOB Large Objects
OLTP Online Transaction Processing OLAP Online Analytical Processing
DW Data Warehouse CSI Column Store Indexes
ETL Extract Transform & Load
www.aditi.com
3. Then in the conventional RDBMS (e.g. SQL Server Where-as the Columnar Storage (e.g. SQL Server 2012
2008) it will be stored in row by row-wise fashion as using column store index) will store it columnar fashion
shown in the Diagram-1 below. as shown in Diagram-2 below.
Diagram-1 Diagram-2
VertiPaq-ing with
xVilocity Columnnar Indexes
Why Columnar Indexes
There is a great debate for the
columnar structure. Below are
benefits of using columnar indexes
specifically to SQL Server 2012
Astonishing Results
Thought is to start the result driven comparison of query time with shows that warm cache is taking
discussion. Below are the graphs “Column Store Index” Vs. very less time comparatively. The
for the query performance results. “Conventional Indexes” are graph-3 below is gain in “X”
We have started with the 12.5 exceptionally revealing in favor of number of times in warm and cold
million rows and doubled it every column store indexes. The graph 1 cache. The result really excites the
time till 400 million records to get and graph 2 shows the big gap of use of CSI.
total sales across products. The hundreds of seconds. The result
www.aditi.com
4. Query Execution - Cold Cache
Query Exec Time (in sec.) 1400
1200
1000 Column Store
800 Indexes
600 Conventional
Indexes
400
200
0
12.5 25 50 100 200 400
Number of Rows (in millions)
Graph-3
Query Execution - Warm Cache
1400
Query Exec Time (in sec.)
1200
1000
Column Store
800 Indexes
600 Conventional
Indexes
400
200
0
12.5 25 50 100 200 400
Number of Rows (in millions)
Graph-4
Gain in query performance - Warm Vs. Cold Cache
90
Perf Gain (number of times)
75
60
Warm Cache
45
Cold Cache
30
15
0
12.5 25 50 100 200 400
Number of Rows (in millions)
Graph-5
www.aditi.com
6. VartiPaq-ing & Appolo
VertiPaq-ing is vertical partition of the data or in other words storing in the data in column-wise fashion. The
Diagram-3 shows the difference between the row and column store data layout in terms of pages which is
basic unit of storage. For detailed discussion refer to the Basics Behind the Scenes section below. The goal
behind it is to accelerate the common DW queries.
Appolo is the code name available in SQL 2012 with the new feature available targeting
•
•
•
•
www.aditi.com
7. xVelocity optimized to use multi cores and the machine. Concisely they are
xVelocity is term used by SQL high memory. Some more highly optimized in-memory
Server family to define next- utilization of these techniques are operations. Below is the screen
generation technologies. These there in Analysis Services and shot taken during column store
technologies targets surprisingly PowerPivot. Portions of data are index creation of CPU utilization
high query performance in moved in and out of memory by xVelocity technologies.
modern hardware. They are based on the memory available in
Graph-7
Basics Behind the Scenes
Full Column Store & Hybrid only indexes). Refer to row. The rows spans over
Column Store diagram-2 and 3 for details. multiple data blocks. The
SQL 2012 is full columnar On the other hand hybrid diagram-5 shows the detail of
storage where each column is column used both rows and the concept. This way the large
compressed and stored columns to store data. Hybrid amount of compression is
together. This technique has its technique creates column achieved as well as the
own advantages but it may vector for each performance issues of the full
negatively impact the column, compresses and stores columnar databases is also
performance on accessing in data blocks. The mitigated.
more columns or perform compression unit contains
small number of updates more than one data blocks and
(although SQL 2012 has read- it contains all columns for a
Graph-8
www.aditi.com
8. For the warehousing scenario the HCC approach many times is less performing because of
•
•
•
Segments & Dictionaries
The columnar indexes are of the same data types in a about segments.
physically stored in the form of segment. Even the large repeated A value reference links to an entry
Segments. Typically data per data the compression is even in one of up to two hash
column is broken as one million better as a unique small symbol Dictionaries. Dictionaries are kept
rows per segment (a.k.a. row is stored for the duplicate value in memory and the data value ids
groups) for each column. The which saves the size with large from the segment are referred
segments are stored as LOB and degrees. Segments also have from these dictionaries but this
can contain multiple pages. The header records containing process is held over as long as
index build process runs in parallel max_data_id, min_data_id etc. possible for better performance
and creates as many full segments These header information is reasons. Simply for a table with
as possible but some of the used to omit he compete one partition every column added
segments can have comparatively partition commonly known as to the column store index will be
small size. These segments store segment estimation. The anti- added as a row in the segment.
highly compressed values because patterns part details even more
Batch Mode Processing & Row Processing
Query processing can be done disk (mostly in hash join); which each column within batch is stored
either in Row mode or in Batch can be checked by tempdb uses; as vector in memory which is
Mode. While taking an example of also increases the memory uses known as vector-based query
join physical join operation takes 2 for processing. processing. It uses the latest
sets as input parameters and Vector processing was one of the algorithms to utilize the multicore
produces the output set based on biggest revolutions which brought CPUs and the latest hardware.
the join conditions. In the row the fundamentals of batch Batch processing works on the
processing each these sets are processing. These physical compressed data when possible
processed in row-by-row mode operators for query processing and thus reduces the CPU
e.g. nested loop join etc. and large takes batch of rows in form of an overhead on join operations; filter
amount of CPU is used. Most of array (of same type) and process etc. (only some of the operators)
the times while operating on large the data. Here batch typically
amount of data also spill over the consists of 1000 rows of data and
www.aditi.com
9. Demonstration Example
For the demonstration purpose Contoso Retail DW is being used, made available from Microsoft.
Creation of Columnar Index Ultimately they both are same. The Store Indexes Vs. Conventional
Column Store Indexes Vs. basic T-Sql index is as below and Indexesfrom here.
Conventional Indexesquery. details can be captured Column
Creation of Columnar Index – Code Block 1
CREATE [ NONCLUSTERED ] COLUMNSTORE INDEX index_name ON <tablename> ( column [ ,...n ] )
[ WITH ( <column_index_option> [ ,...n ] ) ]
[ ON {
{ partition_scheme_name ( column_name ) }
| filegroup_name
| "default"
}
]
Below steps can be followed to create the column store index from index creation wizard.
•
•
www.aditi.com
11. Performance Observations
The performance check ContosoRetailDW having detailed with example in
was done one the table 12.6 million records. More the Anti-Patterns section
dbo.FactOnlineSales‟ from facts and limitations are below.
•
•
•
•
•
Graph-10
www.aditi.com
12. Design Considerations
Candidates for Column Store Index
DW scenarios most commonly fall designed to accelerate the queries do not take considerable large
in the pattern of having read-only satisfying above said criteria. This space. Although the algorithm is
data where data is appended makes CSI an absolutely perfect fit designed to compress in large
periodically commonly using for the DW scenarios. So the rule scale still for the best practice we
sliding window pattern. They of thumb says large fact tables are should only include the all the
seldom have updates. Data is the candidates for CSI. Security of dimension keys and measures
retained for longer time of at least the data is not a big concern from the table.
8 to 10 years resulting huge because CSI also supports The fact-less fact tables and
volume of data in Transparent Data Encryption (TDE). multivalued dimensions are not
gigabytes, terabytes or even Another question is what all always perfect fit because they will
petabytes for some scenarios. The columns need to be added to CSI? not gain the benefit of batch
DW data mostly is divided either The answer seems considerably processing but the advantage of
in star or snowflake pattern where easy that all the columns can be compression and parallel read and
fact table contains millions and included as long as they follow the segment estimation will definitely
billions or records ready to be prerequisites quoted in the Anti- be there. Below is the example of
aggregate in different fashion. All Patterns section. This decision can choosing candidate tables for CSI.
these schemas are queried be true when we talk about the This selection mostly is based
typically using star join queries for small or medium scale DW upon number of rows and mostly
grouping aggregations. because audit columns or some of they will be fact tables only.
Column store indexes are the text columns in the fact tables
Candidates for Column Store Index - Code Block 1 Diagram-9
--Choose candidate tables for CSI
SELECT O.name TableName ,SUM(P.rows) CountOfRows
FROM sys.partitions P
JOIN sys.objects O ON P.object_id = O.object_id
WHERE O.type = 'U' --user tables
GROUP BY O.name
ORDER BY 2 DESC
Graph-11
Below is the example for choosing while creating CSI. We‟ll ignore the primary key defined in it so will
candidate columns for the fact audit and degenerated dimension automatically be added to the CSI
table using FactOnlineSales. The columns here e.g. if not mentioned in column list.
mark is used to show the selection SalesOrderNumber Corresponding SQL code refers to
for the columns for the ,SalesOrderLineNumber „Candidates for Column Store
dimensions. Along with it all the ,ETLLoadID ,LoadDate Index – Code Block 2‟ in SQL file.
measures will also be included ,UpdateDate. OnlineSalesKey have
www.aditi.com
13. Graph-12
SQL code „Candidates for are accelerated within a primary
Column Store Index – Code second. Both the star and or and secondary snow flake
Block 3‟ in corresponding SQL snowflake schema query are dimension is too large to
file contains the example of the benefited by CSI. Snowflake support batch mode.
star join query where results may have issues if any of the
Anti-Patterns
Design considerations are limitations always provide first soprano of design
always with-in the defined foundations to decide on decision.
boundaries. Anti-patterns / boundaries making them the
• Only one CSI can be created on a table. It returns the below error.
Msg 35339, Level 16, State 1, Line 1
Multiple nonclustered columnstore indexes are not supported.
• Msg 35339, Level 16, State 1, Line 1
• Multiple nonclustered columnstore indexes are not supported.
• Key column concept does not relevant in CSI because data is stored in columnar fashion hence each
column will be stored in its own way. Having a clustered key will make difference only while creating the
column store index in terms of reads and the order but there is no impact in query performance.
• The base table on which index is created is read-only i.e. can‟t be updated or altered. Managing updates
is quoted below.
• Interestingly the order of the columns in the create index statement do not have impact either in
creating index or in query performance.
• Only limited data types are allowed for CSI i.e.
I
nt, bigint, smallint, tinyint, money, smallmoney, bit, float, real, char(n), varchar(n), nchar(n), nvarchar(n),
date, datetime, datetime2, smalldatetime, time, datetimeoffset with precision <=2, decimal/numeric with
precision <= 18
• CSI can have at most 1024 columns and don‟t support
www.aditi.com
14. - Sparse & Computed Columns
- Indexed Views or Views
- Filtered & Clustered Index
- With INCLUDE, ASC, DESC, FORCESEEK keywords
- Page and row compression, and vardecimal storage format
- Replication, Change tracking, Change data capture & Filestream
• CSI can simply be ignored using „IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX‟. This option helps user not to
know the other index names. Even more helpful if other index names are left to the automatic naming by SQL
server when it is difficult to know the name while writing queries.
Anti-Patterns - Code Block 1
SELECT P.BrandName Product ,SUM(SalesAmount) Sales
FROM dbo.FactOnlineSales S
JOIN dbo.DimProduct P ON S.ProductKey = P.ProductKey
GROUP BY P.BrandName
OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX)
• Only below operators support batch mode processing therefore use of CSI.
- Filter
- Project
- Scan
- Local hash (partial) aggregation
- Hash inner join
- (Batch) hash table build
• Outer joins, MAXDOP 1, NOT IN, UNION ALL are not supported for batch mode execution. Rather we can have a
tweak for the existing query for the same. Some examples are here below.
• Although filters in CSI are pushed down to the segments to get benefit of segment estimation but string filters
do not have max or min values hence they do not utilize these filters. So string filters or joins should be avoided
on CSI.
• It is observed during above investigation that after the partition switching the query compilation for the first time
takes lot of time. Similar behavior was not found while insertion or deletion of data from the table. It may be
because of the estimation changes due to partition switching mainly in large data scenario. It is recommended to
warm the cache after the partition switching.
Managing Column Store Indexes
Memory Considerations
Column Store Indexes (CSI) is the available CPUs and the MaxDOP or 8658 comes. It can be resolved
technology which is created setting restrictions. For large data by granting enough memory to
considering modern hardware creation of CSI takes the server and the corresponding
with multiple CPUs and high comparatively more time than the workgroup. There can be requests
memory operating on large B-Tree indexes. Before the for more memory at later point of
amount of data specially terabytes. creation there is memory estimate execution and if enough memory
For CSI memory is used during the done initially for the query is not there the insufficient
creation and execution time. We‟ll execution and that memory grant memory error can flash i.e. error
discuss them separately. is provided. There may be cases code 701 or 802.
Creating CSI is a parallel where initial memory request is
operation. It is dependent on the not granted and error code 8657
www.aditi.com
15. Latter error codes come during resolution for other 2 errors is at optimizer changes the query plan
the execution at run time whereas 701 and 802. On concluding to use the row mode. We can
the former come at the starting of remarks the CSI can‟t be created if check the batch mode processing
the query execution. Solution to enough memory is not there in the uses by query plan as below. Batch
the problem is to change memory system. One of the easiest mode processing always uses the
grant for the workgroup or solutions for such memory memory so whenever there disk
increase memory in the server. The considerations is vertical partition spill due to large data the row
8657 or 8658 sometimes can occur of the existing table i.e. breaking by row processing replaces the
because of the SQL server the existing table to two or more batches; mostly seen during
configuration of „min server tables. hash join for large tables. Another
memory‟ and „max server CSI uses the batch mode reason for row by row processing
memory‟. Suppose the minimum processing for the execution. is incorrect statistics update which
memory needed for the CSI is 3GB Typically a batch consists of 1000 in turn spills the data into the
and SQL Server have not taken on rows stored in the vector. This disks resulting row by row
1GB memory due to min server type of processing is optimized to operation. To check this
memory configuration then it can use the modern hardware. This operation the extended event
happen. The resolution can be provides better parallelism. Batch batch_hash_table_build_bailout‟
either run a COUNT(*) query on operators can work on can be configured. The warning
any of the large tables before the compressed data resulting in high „Operator used tempdb to spill
index creation or make the min degree of processing in small data during execution‟ also flashes
and max server memory values to memory. A considerable amount for this kind of behavior.
same number. This will help SQL of memory is needed to execute
server to take the required the batch mode query processing.
memory at the starting time. The If the memory is not present the
Graph-13
Add & Modify Data in Column
Store Index
Table with CSI is read only i.e. we
can‟t perform operations like
INSERT, UPDATE, DELETE or
MERGE. These operations fail with
the error message e.g.
Msg 35330, Level 15, State 1, Line 1
UPDATE statement failed because data cannot be updated in a table with a columnstore index. Consider disabling the
columnstore index before issuing the UPDATE statement, then rebuilding the columnstore index after UPDATE is complete.
www.aditi.com
16. Considering this we have the below options or workarounds for the operation.
• Have staging/work tables without CSI (most of the cases these are drop and recreate tables). Create CSI and
switch it to the empty partition of the table. We have to make sure that we have the empty partition because if
there is data in the partition and CSI is created into the table we can‟t split it. Below is the example code segment
for the same. Corresponding SQL code refers to „Add & Modify Data – Code Block 2‟ in SQL file.
Add & Modify Data - Code Block 1
ALTER INDEX csiFactOnlineSales ON dbo.FactOnlineSales DISABLE
GO
UPDATE dbo.FactOnlineSales
SET SalesAmount = SalesAmount * 2
GO
ALTER INDEX csiFactOnlineSales ON dbo.FactOnlineSales REBUILD
• Switch a partition from table to the empty staging table. Drop CSI from staging table and perform updates, inserts
etc. and build the CSI and switch the staging table to the empty (empty by previous switch) partition.
Corresponding SQL code refers to „Add & Modify Data – Code Block 3‟ in SQL file.
• We can choose to create different underline tables to represent a fact table and access all of them using UNION
ALL views. Just disable the index in the most recent table which will have the updates and rebuild/recreate the CSI.
We can always get the data from those UNION ALL views.
• Put the data into the staging table, create the CSI in staging table and just drop the existing table and rename the
staging table to original (better to do both of the operations in a transaction, note that both of them will be
metadata operations only). This will have the more processing time but will ensure the high availability. This option
can be chosen only when there are relatively small or medium scales of data in the table.
Size of Column Store Index Statistics are another valuable statistics object for CSI is used for
Size of the CSI is based on the size consideration. We have statistics the database cloning (DB clone is
of the segment and dictionaries. for the base table having CSI but copy of the statistics-only
Most of the space is used by the not for the CSI in particular. The database investigating query plan
segments. We can get the same in statistics object is created for the issues). Corresponding SQL code
more simplified manner. Here are CSI but SHOW_STATISTICS shows refers to „Size of Column Store
the simple and the actual size null for the CSI and show values Index – Code Block 1‟ in SQL file.
estimation query. for the clustered index. The
www.aditi.com
17. Column Store Indexes Vs. Conventional Indexes
Column Store Index vs. Clustered Indexes
CSI is different than all other highly selective query i.e. only few automatically figures out the
conventional indexes. Both of records are being queried using highly utilized query. Moreover
them are the utilities for different both of the indexes. Please take a plan guides can also be pinned for
type of scenarios. Till now we have note that we are using only those abnormal behavior of the queries.
seen that CSI are a lot faster than columns which are being used in Here for the apple to apple
the conventional indexes. Here the CSI creation. The comparisons comparison we are using only the
below is the example where CSI is among the indexes is always columns used to create CSI.
taking almost 99% in the relative based on the nature of uses on
query plan. Here we are using the the data i.e. queries. SQL Server
Column Store Index vs. Clustered Indexes – Code Block 1
SELECT SalesAmount ,ProductKey FROM dbo.FactOnlineSales S WITH (INDEX(PK_FactOnlineSales_SalesKey))
WHERE OnlineSalesKey IN (32188091,23560484,31560484,27560484)
SELECT SalesAmount ,ProductKey FROM dbo.FactOnlineSales S WITH (INDEX(csiFactOnlineSales))
WHERE OnlineSalesKey IN (32188091,23560484,31560484,27560484)
Graph-14
Column store index Vs. them to the B-Tree using INCLUDE Selecting one more column can
Covering Indexes Vs. keyword. A very detailed make the covering index
One index each column description can be referred ineffective which is not the case
Covering index is the highly used from here. with normal index. Creating each
terminology to achieve the high CSI or the covering index, again index each column will not be
performing queries. Creation of the discussion depends on the useful on selecting multiple
covering index is always a cautious amount of data, the query and the columns. Moreover the size of all
decision. It is very difficult to put memory. On the same nodes CSI the covering or other indexes
indexes which cover all the uses compression as well as the captures relatively larger footprint
queries, particularly in the data batch mode processing hence on the disk, which is multiple
warehousing scenarios where faster scans. If we have the entire copies of the same data resulting
users are open to use any kind of star schema for our DW the CSI is more maintenance and sometimes
queries. Covering index can be best to use for aggregative adding to downtime to the
achieved either by adding the queries. It also reduces the index application.
columns into the index i.e. design and maintenance time and
composite index or by pinning one index shows all of the magic.
www.aditi.com
18. On the other hand here is another row execution. Here we‟ll just query plan. Corresponding SQL
example which shows that CSI is create example table joining with code refers to „Column store index
not benefiting more on query FactOnlineSales. Both of the table Vs. Covering Indexes – Code Block
execution time because of the will have the same cardinality. We 1‟ in SQL file.
large hash joins and batch can easily see a warning message
execution turning back to row by and warning icon in the actual
Graph-15
Graph-16
Performance Tuning Considerations
Analyzing TempDB Uses
TempDB is core of all the temp tempdb as well. Point of analysis is row operation instead of batch
operations for which memory is tempdb uses during creating and ensuring data is spilled into the
not granted. SQL server uses the query on CSI. For the surprise the disk i.e. tempdb is used and above
tempdb extensively and if users tempdb was not used during the is the example showing this
have read only permissions on any creation as well as querying time. behavior. Corresponding SQL code
of the databases that ensures the The tempdb will be used when the refers to „Analyzing TempDB Uses
read-write permissions on the execution is done using row by – Code Block 1‟ in SQL file.
www.aditi.com
19. Maximizing Segment Estimation
The data of CSI is divided in segments and this information is stored in the „column_store_segments‟ system table.
The columns for the relevance to understand segment estimation are in below query.
Maximizing Segment Estimation - Code Block 1
SELECT S.column_id ,S.segment_id ,S.min_data_id ,S.max_data_id
FROM sys.column_store_segments S
Here segment stores the min and to the segment the scan for that writing a query for the below
max value for the segment and if segment is ignored i.e. called which says „OnlineSalesKey >
the filter value does not belongs segment estimation. E.g. if we are 30000000‟ the second segment
will be ignored.
Graph-16
Here in the example we are seeing one segment is eliminated. Here and the values are aligned
that the min and max values are we need to find how to arrange properly to the segments. We can
skewed. This is not ideal for the the data so that we have use the below techniques.
segment estimation because only maximum number of partitions
•
•
Maximizing Segment Estimation - Code Block 2
SELECT G.ContinentName ,SUM(S.SalesAmount) TotalSales
FROM dbo.FactOnlineSales S
JOIN dbo.DimCustomer C ON C.CustomerKey = S.CustomerKey
JOIN dbo.DimGeography G ON G.GeographyKey = C.GeographyKey
WHERE S.DateKey BETWEEN '2012-01-01' AND '2012-12-30'
GROUP BY G.ContinentName
www.aditi.com
20. Graph-17 Graph-18
On running the above query again also should have enough data for that none of the queries are using
we can found that the segment each partition so that the MAXDOP 1 option. Below example
estimation will scan will skip the segments are utilized. If we‟ll have shows the difference in the
crossed partitions and thus the less than 1 million records we may execution plan. The below query
segment estimation is maximized. end up doing crash landing and plan shows that there is no use of
It is nice to use this approach but queries may not help as expected. parallel and batch operations.
it is very hard to manage these Moreover the cost for the CSI scan
kinds of partitions and it may end Ensuring Batch Mode Execution is also more for MAXDOP 1.
up coming out to be another tool. Batch mode vector based
Moreover adding other multiple execution helps the query a lot.
dimensions will add similar MAXDOP configuration helps to
complexity to the partitions. We check this behavior by ensuring
Graph-19
Batch mode processing is not join records. The query plan shows have to mark time and have close
supported for outer joins for this all different results where batch eye for each query being written
release of SQL Server. To get the mode and row mode is used along on CSI. Query plans should be
benefit of the batch processing we with the parallelism. It also shows monitored closely for further
need to change the queries a bit. that the alternate query just takes changes not only in development
One of the typical example of the 12% of the relative cost. but also in production
changing the query is as below These examples shows that we environments. Corresponding SQL
where we first are getting inner need to redesign our conventional code refers to „Ensuring Batch
join values and joining them back queries to take advantage of the Mode Execution – Code Block 1‟ in
to the dimension table for outer batch mode. The bottom line is we SQL file.
www.aditi.com
22. Using Partitioning partitioning is detailed here.
Partitions are another key Every partition is compressed
Conclusions
Columnar indexes are the
performance factor which still separately and has its own
breakthrough innovation with
works with columnar indexes dictionaries. These dictionaries
the capability to expend the
having said that every non- are shared across the segments
envelope to improve overall
partitioned table always has one within same partition. We can
performance of the ETL and
physical partition. We can create easily find which segments and
DW workload. Current version
partitions on the table and then dictionaries belong to any
of 2012 has a bit of limitation
create CSI. The CSI must be specific partition. Thus partition
with expectation of
partition aligned with the table switching still is a metadata
improvement in future version
partition. CSI uses the same operation. We already have
especially with respect to
partition scheme used by the partitions created on the table
addition and modification of
base table and equal number of while exploring CSI management
data. It‟s recommended to use
partitions will be created. We can above and will be using same to
the best
also switch in or switch out the explore more. Below example
practices, understanding case
partition and that will partition shows that we have 2 dictionaries
studies and fitting the defined
the corresponding CSI. Segments for each partition irrespective of
business problem to the
are created on the partitions. So segments except partition 5
columnar solution pattern.
if the partitions are skewed we which is without dictionaries.
can see more number of Exploring this behavior is left to
segments. Modifying CSI using the reader.
Using Partitioning - Code Block 1 Diagram-21
/*Exploring Partitions*/
SELECT * FROM sys.partition_schemes
SELECT * FROM sys.partition_functions
SELECT * FROM sys.partition_range_values
SELECT * FROM sys.partition_parameters
SELECT P.partition_number ,COUNT(*) Segment#
FROM sys.column_store_segments S
JOIN sys.partitions P ON P.hobt_id = S.hobt_id
WHERE P.object_id = OBJECT_ID('[dbo].[FactOnlineSales]')
GROUP BY P.partition_number
SELECT P.partition_number ,COUNT(*) Dictionary#
FROM sys.column_store_dictionaries D
JOIN sys.partitions P ON P.hobt_id = D.hobt_id
WHERE P.object_id = OBJECT_ID('[dbo].[FactOnlineSales]')
GROUP BY P.partition_number
About Aditi
Aditi helps product companies, web businesses and enterprises leverage the power of www.aditi.com
cloud,
e-social and mobile, to drive competitive advantage. We are one of the top 3 Platform- https://www.facebook.com/AditiTechnologies
as-a-Service solution providers globally and one of the top 5 Microsoft technology
partners in US. http://www.linkedin.com/company/aditi-
technologies
We are passionate about emerging technologies and are focused on custom
development. We provide innovation solutions in 4 domains: http://adititechnologiesblog.blogspot.in/
Digital Marketing solutions that enable online businesses increase customer https://twitter.com/WeAreAditi
acquisition
Cloud Solutions that help companies build for traffic and computation surge
Enterprise Social that enables enterprises enhance collaboration and productivity
Product Engineering services that help ISVs accelerate time-to-market
www.aditi.com