More Related Content Similar to Architecturesfor massive parallel data base clustersproviding linear scale out and fault tolerance on commodityhardware for OLTP workloads - XLDB Conference 13 @CERN Similar to Architecturesfor massive parallel data base clustersproviding linear scale out and fault tolerance on commodityhardware for OLTP workloads - XLDB Conference 13 @CERN (20) More from Romeo Kienzler (20) Architecturesfor massive parallel data base clustersproviding linear scale out and fault tolerance on commodityhardware for OLTP workloads - XLDB Conference 13 @CERN1. © 2009 IBM Corporation
Architectures for Massive Parallel Data Base Clusters
providing Linear Scale-Out and Fault Tolerance on
Commodity Hardware for OLTP Workloads
Lightning Talk: XLDB Workshop 2013 @CERN, 28.05.2013
Romeo Kienzler, IBM Innovation Center Zurich
2. © 2009 IBM Corporation
IBM Presentation Template Full Version
2
Source: If applicable, describe source origin
Shared Disk vs. Shared Nothing
Centralized Locking Distributed Locking
Compute Node Fault Tolerance Partition Replication
Ad-Hoc Load Balancing Data Partitioning, Data Skew
Resource-Starvation on Disk System Linear Scale-Out for Writes
Write-Limited Write-Limited for Distributed Two Phase
Commit
Requires Distributed Buffering Effectiveness of Local Buffer Pools
Inherent Data-Shipping support Performance Impact on Data-Shipping
3. © 2009 IBM Corporation
IBM Presentation Template Full Version
3
Source: If applicable, describe source origin
Show-Stopper for Shared-Nothing
Partition-Skew for
Random Access Patterns
4. © 2009 IBM Corporation
IBM Presentation Template Full Version
4
Source: If applicable, describe source origin
BUT
Large-Scale Shared-Disk Systems
introduce Bottlenecks
5. © 2009 IBM Corporation
IBM Presentation Template Full Version
5
Source: If applicable, describe source origin
IDEA
Cluster File System
6. © 2009 IBM Corporation
IBM Presentation Template Full Version
6
Source: If applicable, describe source origin
GPFS Declustered RAID
7. © 2009 IBM Corporation
IBM Presentation Template Full Version
7
Source: If applicable, describe source origin
GPFS Declustered RAID
8. © 2009 IBM Corporation
IBM Presentation Template Full Version
8
Source: If applicable, describe source origin
GPFS - Example
9. © 2009 IBM Corporation
IBM Presentation Template Full Version
9
Source: If applicable, describe source origin
GPFS - Example
10. © 2009 IBM Corporation
IBM Presentation Template Full Version
10
Source: If applicable, describe source origin
IDEA
Compute Nodes without Disks
11. © 2009 IBM Corporation
IBM Presentation Template Full Version
11
Source: If applicable, describe source origin
Problem: No Data Locality
200K Disks => 60 ms
12. © 2009 IBM Corporation
IBM Presentation Template Full Version
12
Source: If applicable, describe source origin
Problem: No Data Locality
-------------------------------
13. © 2009 IBM Corporation
IBM Presentation Template Full Version
13
Source: If applicable, describe source origin
IDEA
Point-To-Point Connections
14. © 2009 IBM Corporation
IBM Presentation Template Full Version
14
Source: If applicable, describe source origin
Switching Fabric
15. © 2009 IBM Corporation
IBM Presentation Template Full Version
15
Source: If applicable, describe source origin
Network Bottleneck Problem Solved
16. © 2009 IBM Corporation
IBM Presentation Template Full Version
16
Source: If applicable, describe source origin
IDEA
Centralized Lock Management
17. © 2009 IBM Corporation
IBM Presentation Template Full Version
17
Source: If applicable, describe source origin
Centralized Locking
Infiniband
Low Latency
Up to 60 Gbit/s
RDMA
Source: http://thetechjournal.com
Source: http://www.mellanox.co.jp
18. © 2009 IBM Corporation
IBM Presentation Template Full Version
18
Source: If applicable, describe source origin
Centralized Buffer Pool
19. © 2009 IBM Corporation
IBM Presentation Template Full Version
19
Source: If applicable, describe source origin
IDEA
Centralized Lock
Management
Switching Fabric
Compute NodesClients
Cluster File System
Centralized Buffer Pool
20. © 2009 IBM Corporation
IBM Presentation Template Full Version
20
Source: If applicable, describe source origin
DB2 pureScale – General Concepts
Based on DB2z Parallel Sysplex concept1¹
Shared disk concept
Multiple DB2 worker nodes
Single GPFS file system
Centralized buffer pool and lock management
¹For example, Toronto Dominion Bank (TD Bank) has had 100 percent availability of customer information for 10 consecutive years, including two DB2 for
z/OS upgrades during that timeframe.
21. © 2009 IBM Corporation
IBM Presentation Template Full Version
21
Source: If applicable, describe source origin
DB2 pureScale – Operation Model
Infiniband, RDMA
Infiniband, 10 GBit Ethernet, 8 Gbit/s SAN
22. © 2009 IBM Corporation
IBM Presentation Template Full Version
22
Source: If applicable, describe source origin
DB2 pureScale – Fault Tolerance
Active-active concept
Clean pages don't need to be recovered -> GPFS reliability
Dirty pages are known to the CF
CF locks dirty pages
Recovery DB2 instance flushes dirty pages to GPFS
23. © 2009 IBM Corporation
IBM Presentation Template Full Version
23
Source: If applicable, describe source origin
DB2 pureScale – Recovery Performance
24. © 2009 IBM Corporation
IBM Presentation Template Full Version
24
Source: If applicable, describe source origin
DB2 pureScale - Scale-Out
0
1
2
3
4
5
6
7
8
9
10
11
12
0 5 10 15
25. © 2009 IBM Corporation
IBM Presentation Template Full Version
25
Source: If applicable, describe source origin
Summary
●
Linear Scale-Out
●
Fault Tolerance
●
Commodity Hardware
●
Support for OLTP Workloads
26. © 2009 IBM Corporation
IBM Presentation Template Full Version
26
Source: If applicable, describe source origin
Summary
●
Linear Scale-Out
●
Fault Tolerance
●
Commodity Hardware
●
Support for OLTP Workloads
27. © 2009 IBM Corporation
IBM Presentation Template Full Version
27
Source: If applicable, describe source origin
Summary
●
Linear Scale-Out
●
Fault Tolerance
●
Commodity Hardware
●
Support for OLTP Workloads
28. © 2009 IBM Corporation
IBM Presentation Template Full Version
28
Source: If applicable, describe source origin
Summary
●
Linear Scale-Out
●
Fault Tolerance
●
Commodity Hardware
●
Support for OLTP Workloads
29. © 2009 IBM Corporation
IBM Presentation Template Full Version
29
Source: If applicable, describe source origin
Summary
●
Linear Scale-Out
●
Fault Tolerance
●
Commodity Hardware
●
Support for OLTP Workloads
30. © 2009 IBM Corporation
IBM Presentation Template Full Version
30
Source: If applicable, describe source origin
Summary
●
Linear Scale-Out
●
Fault Tolerance
●
Commodity Hardware
●
Support for OLTP Workloads