Weitere ähnliche Inhalte Ähnlich wie Wed 1030 mc_knight_william_color (20) Mehr von DATAVERSITY (20) Kürzlich hochgeladen (20) Wed 1030 mc_knight_william_color1. Unlock Potential
Columnar Databases:
Data Does the Twist and Analytics Shout
William McKnight, President, McKnight Consulting Group
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 1
2. William McKnight,
www.mcknightcg.com
Helping organizations adopt business-effective information
management practices and technologies.
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 2
3. Agenda
• Row-Wise Design
• Columnar Storage
• Materialization
• Wrap-Up
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 3
4. Unlock Potential
Row-Wise Design
© McKnight Consulting Group, 2010
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 4
5. DBMS Design over the years
RDBMS design is virtually unchanged, except for
parallelism
Hardware, however:
Disk capacity has increased tremendously
(and got far cheaper)
CPU performance has improved too, but…
Transfer rates and seek times have increased
modestly
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 5
6. L2 Cache Misses
CPU
L1
L2
Memory
Disk
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 6
7. Row-Wise DBMS Stores Data in Rows
CustomerIDCompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber
1119 m4ii dhamotharan achaiyan solutions architect 91222507176
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227
1138 CP Associates Wilson Mak Consultant 252-92593731
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1143 aft greg tanner cto 303.233.6122
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 7
8. Data Page Layout
Page Header
1120Aris Doug Johnson Practice
Director 206-676-5636
doug.johnson@aris.com
Records
1121Stolt Offshore MS Ltd Craig Lennox Mr
+66 1226 71269
craig.lennox@stoltoffshore.com
1122Medtronic, Inc.
Database Administrator
Mark Kohls Principle Page
763.516.2557
mark.kohls@medtronic.com
Footer
© McKnight Consulting Group, 2010
Row IDs
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 8
9. Traditional databases
Calculate the average
sales for the “A”
stores in “NY”
Traditional approach: Date Store # State Class Sales Category …
• Data stored by row using
3/1/2010 32 NY A 6 Gen
small data pages (4K or 8K) 3/1/2010 35 CT A 9 Spec
• For queries, select a ‘filter’ 3/1/2010 36 CT C 11 Gen
-Build B-tree index for filters, 3/1/2010 39 SD D 8 Gen
-BUT If filter is not selective 3/1/2010 42 KY A 5 Spec
enough then scan the table 3/1/2010 43 VT C 14 Spec
-Go to selected pages and add 3/1/2010 47 GA A 31 Gen
up sales numbers 3/1/2010 51 MD A 4 Sub
-Randomly distributed data 3/1/2010 55 DC D 16 Gen
will result in most pages being 3/1/2010 59 NY B 7 Gen
read 3/1/2010 62 NJ C 9 Spec
-Still have to read irrelevant
data in each page
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 9
10. Unlock Potential
Columnar Storage
© McKnight Consulting Group, 2010
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 10
11. Columnar DBMS Stores Data in
Columns
CustomerID 1119 1120 1121 1122 1123 1124 1126 1127 1128 1133 1134
CompanyName m4ii Aris Stolt Offshore MS Ltd
Medtronic, Inc. Beckman Coulter Banco de Bogotá The Boeing Company Consulting
IT/1 Banco de Bogotá The HArtford CGI Group
ContactFirstName dhamotharan Doug Craig Mark Tim José Alfredo Mike Leif B. JOSE ALFREDO Jimmy Terry
ContactLastName achaiyan Johnson Lennox Kohls Parsons López Arias Roberts Soerensen LOPEZ ARIAS Chen Petherick
ContactTitle solutions architect Practice Director Mr Principle DatabaseBusiness Systems Administrador DWH enior Business Process Architect Consultant DWH usiness System Analyst Consultant
Administrator Manager S Data Warehouse Administrador B Senior
PhoneNumber 91222507176 206-676-5636 +66 1226 712519 763.516.2557 +61 22 996 0963 5713320032 (206)655-7155 +65 26236691 5713320032 215-653-2662 613-236-2155
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 11
12. Columnar Data Page Layout
Page Header
1120
1121
1122
1123
Records 1124
1125
…
Page
Footer
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 12
13. Vertical Partitioning of Data
Date Store # State Class Sales Category …
Columnar -
Columns are 3/1/2010 32 NY A 6 Gen
3/1/2010 35 CT A 9 Spec
stored 3/1/2010 36 CT C 11 Gen
independently 3/1/2010 39 SD D 8 Gen
3/1/2010 42 KY A 5 Spec
3/1/2010 43 VT C 14 Spec
3/1/2010 47 GA A 31 Gen
3/1/2010 51 MD A 4 Sub
3/1/2010 55 DC D 16 Gen
3/1/2010 59 NY B 7 Gen
3/1/2010 62 NJ C 9 Spec
Benefits:
• Consistent data types are easy to compress
• Resulting storage size is typically less than 50% the
size of the raw data
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 13
14. Columnar Storage Options
Decomposed Storage Model
Positional Representation
Modified B-Tree/Row Length Encryption
Bitmap
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 14
15. Modified B-Tree/Run Length
Encryption
Qtr Store# Sales Qtr
Q1 32 6 Q1 1 500
Q1 35 9 Q2 501 999
Q1 36 11 Q3 1000 1498
Q1 39 8 Store#
Q1 42 5 32 1 1
Q1 43 14 35 2 2
Q2 32 31 36 3 3
Q2 35 4
Q2 36 16
Q2 39 7
Q2 42 9
(Value, StartPosition, Count)
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 15
16. Workload Splitting
Row-based CustomerID CompanyName
CustomerID
CustomerID
1119 m4ii CompanyName
1119 m4ii
Columnar
CompanyName
ContactFirstName ContactLastName ContactTitle
ContactFirstName ContactLastName ContactTitle
ContactFirstName ContactLastName ContactTitle
dhamotharan achaiyan
dhamotharan
solutions architect
achaiyan solutions architect
PhoneNumber
PhoneNumber
91222507176
PhoneNumber
91222507176
1120 Aris m4ii
1119 Doug dhamotharan Johnsonachaiyan solutions architect
Practice Director 91222507176
206-676-5636
1120 Aris Doug Johnson Practice Director 206-676-5636
1121 StoltAris
1120 Offshore MS Ltd CraigDoug Johnson
Lennox Mr Practice Director 206-676-5636
+66 1226 712519
CustomerIDCompanyName ContactFirstName ContactLastName ContactTitle PhoneNumber 1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519
1121 Stolt Offshore MS Ltd Mark
1122 Medtronic, Inc. Craig Lennox
Kohls Principle Database Administrator 763.516.2557 712519
Mr +66 1226
1119 m4ii dhamotharan achaiyan solutions architect 91222507176 1122 Medtronic, Inc.
1122 Medtronic, Inc.
1123 Beckman Coulter Tim Mark
Mark
Kohls
Parsons
Kohls Principle Database Administrator 763.516.2557
Principle Database Administrator 22 996 0963
Business Systems Manager +61 763.516.2557
1120 Aris Doug Johnson Practice Director 206-676-5636 1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963
1123 Beckman Coulter
1124 Banco de Bogotá Tim
José Alfredo Parsons
López Arias AdministradorSystems Manager
Business DWH 5713320032 0963
+61 22 996
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1121 Stolt Offshore MS Ltd Craig Lennox Mr +66 1226 712519 1126 The Banco de Bogotá
1124 Boeing Company Mike José Alfredo Roberts Arias
López Administrador DWH 5713320032
Senior Business Process Architect (206)655-7155
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155
1122 Medtronic, Inc. Mark Kohls Principle Database Administrator 763.516.2557 CustomerID 1126 The Boeing CompanyLeif B.
CompanyName IT/1 Consulting Mike
1127 IT/1 Consulting
1127 ContactFirstName ContactLastName SoerensenWarehouse Consultant Architect 26236691 +65 26236691
Leif B.
Roberts
Soerensen Data Senior Business Process PhoneNumber
ContactTitle +65 (206)655-7155
Data Warehouse Consultant
1123 Beckman Coulter Tim Parsons Business Systems Manager +61 22 996 0963 CustomerID IT/1 Consulting dhamotharan
m4ii Banco 1128 Banco de BogotáALFREDO ContactLastName ContactTitle DWH Consultant91222507176
1127 CompanyName
1119CustomerID de Bogotá
1128 CompanyName ContactFirstName LOPEZ ARIAS LOPEZ ARIAS Warehouse
JOSEContactFirstName ContactLastName Data
Leif B. Soerensen Administrador +65 26236691
5713320032
JOSE ALFREDO solutions architect Administrador DWH PhoneNumber 5713320032
achaiyan ContactTitle PhoneNumber
1124 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032 1120 Aris m4ii Banco de BogotáDoug JOSE ALFREDO LOPEZ ARIAS solutions architectAnalyst
1128
1119 1119HArtford The HArtford
1133 The m4ii
1133 dhamotharan Johnson achaiyanChen Business System DWH System206-676-5636
Jimmy
Jimmy
Chen
dhamotharanachaiyan Practice Administrador
solutions architect
Director Business 5713320032
215-653-2662
Analyst 91222507176
91222507176 215-653-2662
1126 The Boeing Company Mike Roberts Senior Business Process Architect (206)655-7155 1121 StoltAris The HArtford Group Doug Doug Lennox Chen
1133
Offshore MS CGI Craig Jimmy
1134 CGI Group
1120 1120 Aris1134Ltd Terry
Terry Johnson Petherick Business System Consultant+66 1226 712519 613-236-2155
Petherick
Johnson Practice DirectorAnalyst
Senior Consultant
Mr Practice DirectorSenior
215-653-2662
613-236-2155
206-676-5636
206-676-5636
1121 StoltCGI Group Metavante Corporation Kohls Petherick Kundinger DatabasePresidentVice President 1226 712519
1135 1121Offshore MS Ltd Mark Terry
1134 Stolt Corporation Ron
Metavante
1122 Medtronic, Inc. Offshore MS LtdCraig Craig
1135 Ron LennoxLennox PrincipleSenior Consultant
Kundinger Mr Mr Vice Assistant
Assistant +66 613-236-2155
616-577-9227 712519
Administrator 763.516.2557 1226 616-577-9227
+66
1127 IT/1 Consulting Leif B. Soerensen Data Warehouse Consultant +65 26236691
1122 Medtronic, Inc. CP Associates Ron
1138 1122 Coulter Corporation
1135 Associates
CP Metavante Wilson
1123 Beckman Medtronic, Inc. Tim Mark Mark
1138 ParsonsKundinger Mak Principle DatabasePresident +61 22 996 0963 252-92593731
Mak
Wilson Kohls
Kohls Business PrincipleVice Administrator 763.516.2557
Assistant Database Administrator 616-577-9227
Consultant
Systems Consultant
Manager 252-92593731
763.516.2557
1128 Banco de Bogotá JOSE ALFREDO LOPEZ ARIAS Administrador DWH 5713320032 1138de Beckman Coulter Tim Wilson López Arias
1124 Banco CP Associates José Alfredo
1142 Beckman Coulter
PRSBBogotá
1123 1123 Ming Long
Tim Wu Mak
Parsons Consultant Systems
BusinessAssistant Administrator 252-92593731
Assistant Administrator Manager +61 22 996 0963 ext 719
Parsons Wu Business Systems Manager
Administrador DWH 226-2-23931261 0963
5713320032 22 996
+61
1142 PRSB Ming Long 226-2-23931261 ext 719
1133 The HArtford Jimmy Chen Business System Analyst 215-653-2662 1143 1124 Banco de Bogotá José Alfredo Roberts Wu
1142 PRSBBogotá
aft
1124 Banco de1143 aft
1126 The Boeing Company Mikegreg José Alfredo López Arias Arias AdministradorAdministrator
Ming Long tanner cto Assistant DWH DWH
López tanner Business Process Architect (206)655-7155
Senior Administrador 226-2-23931261 ext 719
303.233.6122
5713320032 303.233.6122
5713320032
greg cto
1134 CGI Group Terry Petherick Senior Consultant 613-236-2155 1127 IT/1 The aft Solutions Company greg
1143 The
1144 Zamba Boeing
1126 1126Boeing CompanyLeif B.
Consulting Jeff
Mike Mike
1144 Zamba Solutions
tanner
McCallRoberts McCallWarehouseBusiness Process Architect 303.233.6122
Roberts
Soerensen
Jeff Data cto
Executive Vice President Architect 26236691
602-626-6125
Senior Business ProcessVice President (206)655-7155
Senior Consultant
Executive +65 (206)655-7155 602-626-6125
1144 Zamba Solutions JOSE ALFREDO
1127 IT/1 Consultancy
1146 1127Consulting
MR Bogotá Jeff
Mukesh
1128 Banco de IT/11146 MR Consultancy Leif B. LOPEZ ARIAS
Consulting Leif B. McCall
Rughani Mr Executive Vice President 5713320032 26236691
602-626-6125
+66 (0)1379 662219
1135 Metavante Corporation Ron Kundinger Assistant Vice President 616-577-9227 Mukesh Soerensen Data Warehouse Consultant
Soerensen Rughani DataDWH
Administrador Warehouse Consultant +65 26236691 +66 (0)1379 662219
Mr +65
1138 CP Associates Wilson Mak Consultant 252-92593731 1133 The Banco de1147 Intellor Jimmy MukeshALFREDO Rughani ARIASAdministradorAnalyst Coordinator 5713320032 301-202-6766
1146 MR Consultancy
1147 Intellor Group
Group Robin LOPEZ Martin
Mr
Robin ALFREDO LOPEZ ARIAS Business Administrador DWH
1128 1128 Banco de Bogotá JOSEJOSE Chen
HArtford Bogotá Martin Project Coordinator
System DWH
Project
301-202-6766 662219
+66 (0)1379
5713320032
215-653-2662
1134 CGI The IntellorBogotá
1133 1133HArtford Banco Terry Robin
1147 The Group
1148 Group de HArtford
Banco
1148 Jimmy
de Bogotá
Martin
José Jimmy Petherick Chen
Alfredo Business Coordinator
Project
Chen Arias López Arias System Analyst
López
José Alfredo Senior Consultant DWH Analyst 613-236-2155
Administrador System 301-202-6766
5713320032
BusinessAdministrador DWH 215-653-2662 5713320032
215-653-2662
1142 PRSB Ming Long Wu Assistant Administrator 226-2-23931261 ext 719
1134 CGI Group de Bogotá
1148 Banco
1135 Metavante Corporation RonTerry Terry
1134 CGI Group José Alfredo Petherick AriasAssistant Vice President
López
KundingerPetherick Senior Consultant DWH
Administrador
Senior Consultant 5713320032
613-236-2155
613-236-2155
616-577-9227
1143 aft greg tanner cto 303.233.6122 1138 CP Associates Corporation Ron Ron
1135 Metavante
1135 Metavante Corporation
Wilson MakKundinger
Kundinger Consultant Vice President
Assistant
Assistant Vice President 616-577-9227
616-577-9227
252-92593731
1144 Zamba Solutions Jeff McCall Executive Vice President 602-626-6125 1142 PRSB Associates
1138 CP CP Associates Ming Long
1138 Wilson
Wilson Wu Mak Mak Consultant
Consultant
Assistant Administrator 252-92593731
252-92593731
226-2-23931261 ext 719
1146 MR Consultancy Mukesh Rughani Mr +66 (0)1379 662219 1143 aft PRSB
1142 1142 PRSB greg Ming Long Long Wu Wu
Ming tanner cto Assistant Administrator
Assistant Administrator 226-2-23931261 ext 719 719
226-2-23931261 ext
303.233.6122
1147 Intellor Group Robin Martin Project Coordinator 301-202-6766 1143 aft aft
1143
1144 Zamba Solutions Jeff greg greg McCall tanner
tanner Executive cto President
cto Vice 303.233.6122
303.233.6122
602-626-6125
1146 MR Consultancy Solutions Jeff Jeff
1144 Zamba Solutions
1144 Zamba Mukesh Rughani McCall Mr Executive Vice President
McCall Executive Vice President 602-626-6125
602-626-6125
+66 (0)1379 662219
1148 Banco de Bogotá José Alfredo López Arias Administrador DWH 5713320032
1147 Intellor Group ConsultancyRobin
1146 MR Consultancy
1146 MR Mukesh
Mukesh MartinRughani
Rughani Project Coordinator
Mr Mr 301-202-6766 (0)1379 662219
+66 (0)1379 662219
+66
1148 Banco de Intellor Group José Alfredo
1147 IntellorBogotá
1147 Group RobinRobin López Arias
MartinMartin Project Coordinator
Project Coordinator
Administrador DWH 301-202-6766
5713320032301-202-6766
1148 Banco de Bogotá
1148 Banco de Bogotá José Alfredo
José Alfredo López Arias Arias Administrador DWH DWH
López Administrador 5713320032
5713320032
Same data in both structures
Optimizer or user determines which to use
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 16
17. The Value of Performance
“How many MALES are NOT INSURED in CALIFORNIA?
RDBMSGender State 800 Bytes x 10M = 500,000 I/Os
Insured
M
M
NY
CA
Y
Y
16K Page
10M F CT N Process large amounts of
ROWS M MA Y unused data
M CA N
- - Often requires full
800 Bytes/Row
table scan
10M Bits x 3 col / 8 = 235 I/Os
Gender Insured State 16K Page
1 M Y CA
1 0 1
2 M N CA
1 1 1
3 F Y NY
10M
Bits + + = 2
0 0 0
4 M N CA 1 1
1
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 17
18. Unlock Potential
Materialization
© McKnight Consulting Group, 2010
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 18
19. Materialization Strategies
Function of ‘projection’
Row-stores = removes unneeded columns
from result set
Column-stores = when to GLUE
Early Materialization
Construct rows before processing
Decompress all compressed columns first
Late Materialization
Wait until end of operation
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 19
20. Early Materialization
4 1 3 13 Projection 3 13
Selection (where)
4 1 3 80 (select) 3 80
4 2 2 7
4 1 3 13 SELECT custID,price
FROM Sales
4 3 3 42
WHERE (prodID = 4) AND (storeID = 1)
4 1 3 80
Materialize
(4,1,4) 2 2 7
1 3 13
prodID
3 3 42
1 3 80
storeID custID price
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 20
21. Late Materialization
3 13
3 80
AND
Construct
1 0
1 1 3 13
1 0 3 80
1 1
Select Select SELECT custID, price
prodId = 4 storeID = 1
FROM Sales
WHERE (prodID = 4) AND (storeID = 1)
(4,1,4) 2
1
prodID
3
1
storeID
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 21
22. Unlock Potential
Wrap-Up
© McKnight Consulting Group, 2010
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 22
23. Summary: Column Databases
Is an alternative to row storage
Is seeing more adoption – vendors/customers
Stores each column independently
Addresses idle CPUs and disk bottlenecks
Is great for compression
Is best when there is a lot of data, long rows and
when you can isolate the loads
Is great for high column selectivity queries
Takes longer to load
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 23
24. Columnar Databases: Data Does the
Twist and Analytics Shout
Presented by:
William McKnight
President
McKnight Consulting Group LLC
(214) 514-1444
wmcknight@mcknightcg.com
www.mcknightcg.com
Twitter @williammcknight
Copyright © 2011 McKnight Consulting Group, LLC All Rights Reserved – Confidential and Proprietary Slide 24