SQL Server 2012 introduced columnstore indexes which provide significant performance improvements for data warehouse and analytics queries against large datasets. Columnstore indexes store data by column rather than by row, allowing queries to access only the relevant columns needed. This results in lower I/O and higher data compression compared to row storage. Columnstore indexes also use a new batch processing execution mode which can further improve query performance by processing many rows at once in memory rather than row-by-row. Columnstore indexes require the table to be read-only but provide an easy way to boost query performance for analytics workloads by 10-100x without needing separate data marts or cubes.
2. Columnstore indexes
• Column Store vs. Row Store
• Columnstore benefits
• Columnstore indexes
• CS indexes Internals
• Adding data to Columnstore index
3. Row Store and Column Store
In row store, data is stored tuple by tuple.
In column store, data is stored column by column
3
4. Row Store and Column Store
name address
Most of the queries does not id city state age
process all the attributes of a
particular relation.
SELECT c.name, c.address
FROM Customers c
WHERE c.region = ‘Moskow'
4
5. Row Store and Column Store
Row Store Column Store
(+) Easy to add/modify a record (+) Only need to read in relevant data
(-) Might read in unnecessary data (-) Tuple writes require multiple accesses
So column stores are suitable for read-mostly, read-intensive,
large data repositories
5
6. Compression
Trades I/O for CPU
Higher data value locality in column stores
Techniques such as run length encoding far more useful
Schemes
Null Suppression
Dictionary encoding
Run Length encoding
Bit-Vector encoding
Heavyweight schemes
6
9. Improved Data Warehouse Query performance
Columnstore indexes provide an
easy way to significantly improve
data warehouse and decision
support query performance against
very large data sets
Performance improvements for
“typical” data warehouse queries
from 10x to 100x
Ideal candidates include queries
against star schemas that use
filtering, aggregations and grouping
against very large fact tables
10
10. Good Candidates for Columnstore
Indexing
Table candidates:
Very large fact tables (for example – billions of rows)
Larger dimension tables (millions of rows) with compression friendly column
data
If unsure, it is easy to create a columnstore index and test the impact on
your query workload
Query candidates (against table with a columnstore index):
Scan versus seek (columnstore indexes don’t support seek operations)
Aggregated results far smaller than table size
Joins to smaller dimension tables
Filtering on fact / dimension tables – star schema pattern
Sub-set of columns (being selective in columns versus returning ALL
columns) 11
12. Defining the Columnstore Index
Base
OR
Columnstore index is nonclustered
table (secondary)
Clustered Heap
index Base table can be clustered index or heap
One CS index per table
Multiple other nonclustered (B-tree)
Nonclustered Nonclustered Nonclustered
index index columnstore indexes allowed
index
But may not be needed
CS index must be partition-aligned if table
is partitioned
Indexed Filtered
view index
13. segment 1
Column Segments and
Dictionaries
C1 C2 C3 C4 C5 C6
Set of about
1M rows
… dictionaries
segment N
Column
Segment
15
14. Memory management
• Memory management is automatic
• Columnstore is persisted on disk
• Needed columns fetched into memory
• Columnstore segments flow between disk and memory
SELECT C2,
SUM(C4)
T.C1 T.C2 T.C3 T.C4 FROM T T.C4
T.C2
T.C1
T.C3
GROUP BY C2;
T.C1 T.C2 T.C3 T.C4
T.C1 T.C4
T.C2
T.C3
T.C1 T.C3 T.C4
T.C2
16
16. Xvelocity
Microsoft SQL Server family of memory-optimized and
in-memory technologies
xVelocity In-Memory Analytics Engine
xVelocity Memory-Optimized Columnstore Indexes
The xVelocity engine is designed with 3 principles in
mind:
Performance, Performance, Performance! 18
17. How Are These Performance Gains
Achieved?
Two complimentary technologies:
Storage
Data is stored in a compressed columnar data format (stored
by column) instead of row store format (stored by row).
New “batch mode” execution
Vector-based query execution capability
Data can then be processed in batches versus row-by-row
Depending on filtering and other factors, a query may also
benefit by “segment elimination” - bypassing million row
chunks (segments) of data, further reducing I/O 19
18. Batch mode processing
Batch object
Process ~1000 rows at
Column vectors
a time
bitmap of qualifying rows
Vector operators
implemented
Greatly reduced CPU
time (7 to 40X)
19. Segment Elimination
select Date, count(*)
from dbo.Purchase
where Date >= 20120201
column_i group by Date
segment_id min_data_id max_data_id
d
1 1 20120101 20120131
1 2 20120115 20120215
1 3 20120201 20120228
20. Columnstore format + batch mode
Variations
Columnstore indexing alone + traditional row mode in
Query Processor
Columnstore indexing + batch mode in Query
Processor
Columnstore indexing + hybrid of batch and traditional
row mode in Query Processor
23
21. Plan operators supported in batch mode
Filter
Project
Scan
Local hash (partial) aggregation
Hash inner join
(Batch) hash table build
24
23. Maintaining Data in a Columnstore Index
Once built, the table becomes “read-only” and
INSERT/UPDATE/DELETE/MERGE is no longer
allowed
ALTER INDEX REBUILD / REORGANIZE not
allowed
How can I modify index data?
Drop columnstore index / make modifications / add
columnstore index
UNION ALL (but be sure to validate performance)
Partition switches (IN and OUT) 27
25. Summary
SQL Server 2012 offers significantly faster query performance
for data warehouse and decision support scenarios
10x to 100x performance improvement depending on the schema
and query
I/O reduction and memory savings through columnstore compressed
storage
CPU reduction with batch versus row processing, further I/O reduction if
segmentation elimination occurs
Easy to deploy and requires less management than some legacy
ROLAP or OLAP methods
No need to create intermediate tables, aggregates, pre-processing and
cubes
Interoperability with partitioning
29
26. Resources
Columnar Storage in SQL Server 2012 (PDF)
SQL Server Columnstore Performance Tuning
Inside the SQL Server 2012 Columnstore Indexes
24 HOP Russia 2013 – Dmitry Pilyugin (video - rus)
SQL Server Columnstore Performance Tuning (video)
30
27. SQL SERVER 2012 - COLUMNSTORE INDEXES
Denis Reznik
Senior Database Architect at The Frayman Group
Microsoft SQL Server MVP
denisreznik@live.ru
@denisreznik
http://reznik.uneta.com.ua