Servers, Storage and Networking have all been virtualized, the next big wave is the database. SQL databases are the one thing in the cloud that require single dedicated instances. Database virtualization changes all of this, enabling full elasticity without sacrificing functionality.
2. 2
Agenda
• Big Data: A Moving Target
• Common Understanding of Virtualization
• Database Virtualization Challenge
• Alternative 1: NoSQL
• Alternative 2: Sharding
• Introducing Database Virtualization
• Narrowing the Gap Between Databases and Big Data
3. 3
Big Data: A Moving Target
• Definition: Too much data to
handle in a traditional database
• Big Data tools leverage scale-
out architectures e.g. Hadoop
• Technology advances make Big
Data a moving target
• Databases adopting scale-
out, virtual database
architectures
DataVolume
Time
BIG Data
5. 5
The Dedicated Server
A Server
Server Utilization
Headroom (to avoid failure)
Usage Spike
(Average 10%)
6. 6
The Virtualized App Server
Shared among many customers
Plenty of room for usage peaks
Virtualization enables Cloud Providers to sell 3-4 TIMES more
servers than they actually own. This is how they make money.
7. 7
Database Virtualization Challenges
• No coordination between databases (data & locking)
Bank Balance = $10M
Withdraw $10M
Wire $8M
Wire $8M
Bank Balance = -$16M
Bank
You
• Requires a distributed locking solution
• Distributed locking is fairly easy to build…
• …but building it to perform well is extremely hard
• It took Oracle RAC 10 years …70 “cloud years”
9. 9
Alternative 1: NoSQL
Moves functionality to the application tier…more work for you
Your Application
Cons:
1. Non-relational (build this into your app)
2. Reduces consistency: different users/different answers
3. Removes transactions (build this into your app)
4. Less functionality e.g. joins (build these into your app)
The DBMS SQL
NoSQL
App App
You buy this part
You build & maintain this part
Pros:
1. Scalability
2. Elastic = high utilization
10. 10
Alternative 2: SQL Sharding
Masters
Slaves
EACH server must handle the peak for ITS data
Cons:
1. Not elastic = no bursting across servers
2. Rigid partitioning model
3. Requires slaves for fail-over (vs. high-availability)
4. You have to build & maintain routing code
Pros:
1. Relational
2. Consistent data (ACID)
3. Transactional
4. Full functionality
No elasticity means no bursting
across servers, requiring low
utilization.
Not highly-available, relies on
fail-over
11. 11
Introducing Database Virtualization
Highly-available data tier
shared across multiple
database clusters
Database Tier
(CPU)
Storage Tier
(I/O)
Virtualizes & Shares Storage Tier across Elastic Database Clusters
Shared among many customers
Plenty of room for usage peaks
Pros:
1. Relational
2. Consistent data (ACID)
3. Transactional
4. Full functionality
5. Elastic
6. No slaves
12. 12
Introducing Database Virtualization
Processed at the storage
tier, only results are sent
back to the database
Database Tier
(CPU)
Storage Tier
(I/O)
Distributed Parallel Process Across Storage Servers
Query:
What were my sales last month?
• Distributed Parallel Processing: Similar to Map-Reduce & Oracle Exadata
• This Narrows the Gap between Databases and Big Data
13. 13
Database Virtualization Enables DBaaS
Processing shared
across database nodes
Highly-available data tier
shared across multiple
database clusters
Database Tier
(CPU)
Storage Tier
(I/O)
Virtualizes & Shares Storage Tier across Elastic Database Clusters
16. 16
Performance: ScaleDB vs. InnoDB
Performance tests running on DL380 servers, large data set
0
500
1000
1500
2000
2500
550
1238
1884
2236
MariaDB
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
ScaleDB
3-Nodes
Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 200M Rows, MariaDB V5.3.5
OperationsperSecond
17. 17
Performance: ScaleDB vs. InnoDB
Performance tests running on HP Cloud (Read:Write Ratio = 1:1)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: YCSB Workload A, 1:1 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42
OperationsperSecond
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
544
3542
4668
18. 18
Performance: ScaleDB vs. InnoDB
Performance tests running on HP Cloud (Read-Only)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: YCSB Workload A, 1:0 Read/Write Ratio, Database Size: 40M Rows, MySQL V5.1.42
0
2000
4000
6000
8000
10000
12000
930
6117
11920
OperationsperSecond
19. 19
Performance: ScaleDB vs. InnoDB
Sysbench benchmark running on HP Cloud (Read-Only)
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: Sysbench, Read-Only, Database Size: 500M Rows, MySQL V5.1.42
TransactionsperSecond
0
50
100
150
200
250
7
134
250
20. 20
Performance: ScaleDB vs. InnoDB
Sysbench benchmark running on HP Cloud (10% Write )
MySQL
+InnoDB
ScaleDB
1-Node
ScaleDB
2-Nodes
Benchmark Details: Sysbench, 10% Write, Database Size: 500M Rows, MySQL V5.1.42
TransactionsperSecond
0
10
20
30
40
50
60
70
80
3
50
79
21. 21
Summary
• Database Scale-out & Parallelization Address Big Data
• Scaling-out SQL Database Problem: Distributed Locking
• Alternative 1: NoSQL
• Alternative 2: Sharding
• Both Shift Functionality to the Application Tier
• Introducing Database Virtualization…with Performance!
• Closing the Gap Between Databases and Big Data
Average server utilization runs at about 10%, that then enables your IT or your cloud provider to use/sell the unused capabilities.
Companies no longer have to
Companies no longer have to
Easy to build, you simply lock the other nodes, while one is writing….but then your performance is terrible. How hard is it to build this distributed lock manager? It took Oracle 10 years to get it right with RAC. 10 Years….That’s 70 cloud years…who has time for that?
Mitigating Factors: “It depends”Distribution of data/loadUse of slaves to handle read load
ScaleDB virtualizes the database, turning it into a database tier and a storage tier. The storage tier provides a pool of cache that is shared among various clusters, enabling it to share I/O peaks across multiple nodes. The database tier then enables very high utilization because they elastically expand to handle peaks. The only Con to this architecture is that it takes the developer a long time to build…but we’ve done that!
ScaleDB virtualizes the database, turning it into a database tier and a storage tier. The storage tier provides a pool of cache that is shared among various clusters, enabling it to share I/O peaks across multiple nodes. The database tier then enables very high utilization because they elastically expand to handle peaks.
ScaleDB virtualizes the database, turning it into a database tier and a storage tier. The storage tier provides a pool of cache that is shared among various clusters, enabling it to share I/O peaks across multiple nodes. The database tier then enables very high utilization because they elastically expand to handle peaks.