This document proposes techniques to optimize multi-column sorting in column-oriented databases. The state-of-the-art approach sorts each column individually, but this is inefficient for queries that sort on multiple columns. The proposed techniques manipulate bits across columns to improve parallelism, such as "bit borrowing" where bits are borrowed from one column to another. Experiments show the techniques provide up to 5.5x speedup over the state-of-the-art approach for analytic queries on real and benchmark datasets.
A Secure and Reliable Document Management System is Essential.docx
fast multi-column sorting in main-memory column-stores
1. Fast Multi-Column Sorting in
Main-Memory Column-Stores
Wenjian Xu†, Ziqiang Feng†, Eric Lo‡
†The Hong Kong Polytechnic University
‡The Chinese University of Hong Kong
13. Option 1: Stitch Together
13
X (20-bit)
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
32-bit bank
SIMD-sort
8x parallelism
0x10001
0x10001
0x10001
0x10003
0x10003
0x10003
0x10003
Y (12-bit)
32-bit bank
SIMD-sort
8x parallelism
32-bit bank
SIMD-sort
8x parallelism
0xAAAAA
0xCCCCC
0xAAAAA
0xCCCCC
0xCCCCC
0xAAAAA
0xCCCCC
0x00C
0x00A
0x00F
0x00A
0x00B
0x00A
0x00C
0x00A
0x00C
0x00F
0x00A
0x00A
0x00B
0x00C
LOOKUPLOOKUP
24 20
Supercolumn
(32-bit)
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
AAAAA
CCCCC
AAAAA
CCCCC
CCCCC
AAAAA
CCCCC
32-bit bank
SIMD-sort
4x parallelism
0x00000AAA
0x00000CCC
0x00000FFF
0xEEEEEAAA
0xEEEEEAAA
0xEEEEEBBB
0xEEEEECCC
Stitch Stitch X and Y
44
64
Column-at-a-Time
Lower Data
Parallelism
Any alternatives other than Stitching X
and Y in this example?
14. 0xAAAAA
0xCCCCC
0xAAAAA
0xCCCCC
0xCCCCC
0xAAAAA
0xCCCCC
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
Option 2: Bit Borrowing
14
X (24-bit)
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
0x000000
0xEEEEEE
32-bit bank
SIMD-sort
8x parallelism
0x10001
0x10001
0x10001
0x10003
0x10003
0x10003
0x10003
Y (20-bit)
32-bit bank
SIMD-sort
8x parallelism
32-bit bank
SIMD-sort
8x parallelism
0x00C
0x00A
0x00F
0x00A
0x00B
0x00A
0x00C
0x00A
0x00C
0x00F
0x00A
0x00A
0x00B
0x00C
LOOKUPLOOKUP
<< 4 bits
X (24-bit) Y (20-bit)
0xAAAAA
0xCCCCC
0xAAAAA
0xCCCCC
0xCCCCC
0xAAAAA
0xCCCCC
A
C
A
C
C
A
C
32-bit bank
SIMD-sort
8x parallelism
16-bit bank
SIMD-sort
16x parallelism
16-bit bank
SIMD-sort
16x parallelism
0x000000A
0x000000C
0x000000C
0xEEEEEEA
0xEEEEEEA
0xEEEEEEC
0xEEEEEEC
28 16
Option 1: Stitch Together
Column-at-a-Time
Borrowing bits from Y to X
Improved
parallelism
LOOKUP
15. Optimal Plan
• Given 3 columns with 11-bit, 14-bit, and 21-bit to be sorted:
15
• Cost model
• Plan enumeration and
search
Stitch
together?
Bit
borrowing?
Split into
more rounds? In the paper:
Num. of possible
Plans: 2(11+14+21)
16. Experiments
• Setup
Intel Xeon E5 10-core & Intel i7 quad-core
AVX2 instruction set (256 bits)
• Data sets
TPC-H
TPC-H Skew
TPC-DS
Real data (Airline Origin and Destination Survey)
16
20. Summary
• First work to pinpoint and tackle the issue of multi-column
sorting
• Our technique: manipulate the bits across input columns
• Up to 5.5X speedup in query execution.
20
Hinweis der Redaktion
Co-operated with Ziqiang Feng from PolyU and Eric Lo From CU
1,
This work is basically all around analytic databases
You know in such databases we deal with read-most queries.
To support real-time query processing, we try to put all the data in memory
We use a column oriented store
Furthermore, Data columns are encoded for memory efficiency
And we use the de-normalization techniques to eliminate joins
A crucial operation in mmcs, as it could be used to implement SQL operators…
--Especially, utilize SIMD features offered by modern CPU.
--encoded with 44 bit, load them into SIMD registers for sorting
--current SIMD register is usually 256-b long, much wider than normal CPU register
--In such register, each operand, or bank size, could be 8b, 16b, 32b or 64b.
--have to use 64-bit bank (need to mention that 32 not enough!)
--during the sorting process, each SIMD instruction could process 4 column values in parallel. Compared to scalar sorting, SIMD sort could achieve…
May not be familiar with *code width* => metion with *column is encoded with 36-bit* and *another column with 16-bit as the code width*
Floating point numbers with limited precision can be scaled to integers by multiplication with a certain factor
Explain more why 16-bit bank is used, 8 not enough, 32 too wasteful
How to bring out multi-column sorting:
Column stored, two columns stored separately. Traditional column-store First sorts column order_date, then it sorts column retail_price for each group of tied order_date values
Red part represents the time spent for multi-column sorting; blue part refers to time spent for other operations
Next , we turn to work on column Y.
Note that according to the order by clause, the ordering of column Y should be conducted based on the ordering of Column X.
Before sort column Y, we have to re-order it according to sorted column X
This can be achieved by a sequence of lookup operation through the object id of column X.
Now we get column Y ordered by X
Next, we need to identify that there two groups in column x where each group contains tied values of X; correspondingly, second round sorting is performed within each group of column y;
Obtain the same result as column-at-a-time solution.
Essentially, this stitch strategy just Sorts two columns in one go, thus eliminating one round of sorting
The reduced data parallelism may offset that benefit and make the stitching strategy inferior to the column-at-a-time solution.
Our cost model can accurately quantify the cost of each plan. In the model, we divide the process of MCS in detailed steps and run calibration experiments to improve the accuracy of plan cost on specific platforms
As for plan search, we invent pruning rules to make sure that search process itself would not be a bottleneck.