The last few years have seen a huge demand for distributed applications that has led to a proliferation of data-driven apps in the data centers. A strategy to have big enough servers that can address peak workloads has led to a large number of underutilized bare metal servers scattered all around the data center. Today’s data centers need an effective consolidation strategy that lets multiple data applications share the same set of hardware resources.
But concerns of performance predictability still looms large. How do I address noisy neighbor situations? How do I guarantee performance SLAs for my most data intensive applications?
3. TRADITIONAL IT- CAPACITY VS USAGE
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Attributed to AWS - https://www.slideshare.net/AmazonWebServices/aws-101-cloud-computing-seminar-2012/10-On_and_Off_Fast_GrowthVariable
4. TRADITIONAL IT- CAPACITY VS USAGE
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Attributed to AWS - https://www.slideshare.net/AmazonWebServices/aws-101-cloud-computing-seminar-2012/10-On_and_Off_Fast_GrowthVariable
5. TRADITIONAL IT- CAPACITY VS USAGE
CONFIDENTIAL – RESTRICTED DISTRIBUTION
Attributed to AWS - https://www.slideshare.net/AmazonWebServices/aws-101-cloud-computing-seminar-2012/10-On_and_Off_Fast_GrowthVariable
6. PROLIFERATION OF DATA APPLICATIONS : IMPACT ON SIZING
› Standalone systems are seldom sized correctly and do not
automatically shrink or grow
› Usually “configured for high water”
› Basic requirement is to be big enough
› Efficient use of resources secondary consideration
› While these apps are good at meeting peak demands overall
utilization of system resources is pretty low
ROBIN Confidential – InternalCONFIDENTIAL – RESTRICTED DISTRIBUTION
7. CONFIDENTIAL – RESTRICTED DISTRIBUTION
Low Server
Utilization
Hardware
Sprawl
OS Sprawl
High Energy
Usage &
Costs
High
Operational
Overhead
Top 5 Data Center Management Challenges
Custom
Apps
8. Key is to maximize productive use of resources and eliminate overheads
Unconsolidated
› OS Overhead replicated for each application
› Competing workloads interfere with each other
› Net throughput is limited and gridlock is possible
Consolidate Disparate Systems
Consolidated
› Single, shared set of overheads
› Effective resource management
› Greatly reduced competition between workloads
› Improved throughput and good response time
Custom
Apps
CONFIDENTIAL – RESTRICTED DISTRIBUTION
9. SHARE “PEAKS AND TROUGHS”
SQL
Legend
Unconsolidated Consolidated
NoSQL
Hadoop
CONFIDENTIAL – RESTRICTED DISTRIBUTION
10. How do I ensure performance SLAs
are met?
Can I support ad hoc analytics
without worrying about
performance ?
Can I run multiple applications
without worrying about noisy
neighbors? Can I run applications with different
performance priority on the same
setup ?
PERFORMANCE CONCERNS IN A CONSOLIDATED ENVIRONMENT…
CONFIDENTIAL – RESTRICTED DISTRIBUTION
How do I handle sudden
Spikes in workload?
Custom
Apps
11. … FOR INADEQUATE INFRASTRUCTURE OPTIONS
CONFIDENTIAL – RESTRICTED DISTRIBUTION
WHAT ARE THE OPTIONS?
Virtual Machines
Easy to provision machines
Unpredictable Performance
Bare Metal Servers
Good performance
Poor Agility, Low Utilizations
Integrated Systems
Very Expensive – e.g. Exadata
Application Specific
Cloud
Easy to provision machines
Unpredictable Performance
Current Infrastructure Solutions are focused on IT cost management
Custom
Apps
13. WHAT IS A CONTAINER ?
Containers are an operating
system-level virtualization method
for running multiple isolated Linux
systems on a single control host.
› Not a virtual machine
› Provides a virtual environment
› Own CPU, memory, block I/O,
network etc.
CONFIDENTIAL – RESTRICTED DISTRIBUTION
14. DEPLOYMENT CHOICES
BARE METAL
HOST OS
bins/Libs bins/Libs bins/Libs
APP
PROCESSES
APP
PROCESSES
APP
PROCESSES
• No Isolation
• No Performance overhead
• Not Portable
VIRTUAL MACHINE
HOST OS / HYPERVISOR
Guest OS Guest OS Guest OS
bins/Libs bins/Libs bins/Libs
APP
PROCESSES
APP
PROCESSES
APP
PROCESSES
• Full Isolation
• Performance overhead
• Partially Portable
CONTAINERS
OS
bins/Libs bins/Libs bins/Libs
APP
PROCESSES
APP
PROCESSES
APP
PROCESSES
• Run Time Isolation
• No Performance overhead
• Portable
CONFIDENTIAL – RESTRICTED DISTRIBUTION
15. HOW IS ROBIN DEPLOYED?
Your commodity hardware
Container-based
Application-aware Virtualization
Application-aware
Scale Out Block Storage
PUSH BUTTON
APPLICATION
LIFECYCLE MGMT
DEPLOY
SCALE
FAILOVER
SNAPSHOT
CLONE
QOS
Storage Node
Compute Node
Storage Node
Compute Node
Big Data Apps NoSQL Apps RDBMS Other
Custom
Apps
Run & Manage Applications, not Containers or Virtual MachinesRun & Manage Applications
Your Commodity Hardware + Robin AVP Software
Converged Node Converged Node
CONFIDENTIAL – RESTRICTED DISTRIBUTION
16. CONFIDENTIAL – RESTRICTED DISTRIBUTION
Reactive Performance Management
§ How do I handle sudden spikes in workload ?
§ Can I run multiple applications without worrying about noisy
neighbors?
17. HOW DO I HANDLE SUDDEN SPIKES IN WORKLOAD ?
› On-demand Instant scale-out
› Helps to right size your cluster with growing demand
APPLICATION CLUSTER
Cassandra CassandraCassandra Cassandra
CONFIDENTIAL – RESTRICTED DISTRIBUTION
18. HOW DO I HANDLE SUDDEN SPIKES IN WORKLOAD ?
› On-demand Instant Scale-up
› No data redistribution overhead
› No need to stop the cluster
› Ideal to meet temporary or seasonal demand
APPLICATION CLUSTER
Cassandra CassandraCassandra
› Scale-out isn’t always the solution
› Results in data redistribution, which is expensive and time consuming
› In some cases a non-reversible operation
CONFIDENTIAL – RESTRICTED DISTRIBUTION
19. ROBIN APP-AWARE COMPUTE LAYER
ROBIN CONTAINER-AWARE DATA LAYER
SOLVING NOISY-NEIGHBOR PROBLEM
Postgres Hadoop
Apps using multiple data volumes:
Very common for most Data Apps
(Hadoop, Cassandra, Oracle, …)
Because Robin controls all IOs originating on the compute host it can do IO tagging & App-aware IO scheduling
v2v1 v3 v4
v1 v2v1 v3 v4
v1
1. But App could generate IOPs at a rate equal to
sum of max IOPs of each volume it uses
2. Arbitrarily capping each volume prevents active
volumes from utilizing available capacity when
other volumes are quite
Non-Robin Solution
Throttle max IOPs for each volume separately
FAIL
1. Configure max IOPs SLA per App (not individual
data volumes)
2. Tag IOs across each volume with the App ID
3. Enforce max IOPs SLA per App ID
Robin Solution
Throttle max IOPs per App
v1 v2v1 v3 v4
CONFIDENTIAL – RESTRICTED DISTRIBUTION
20. CONFIDENTIAL – RESTRICTED DISTRIBUTION
Proactive Performance Management
§ How do I ensure performance SLAs are met?
§ Can I run applications with different performance priority on the same setup ?
§ Can I support ad hoc analytics without worrying about performance ?
21. APP-TO-DISK PERFORMANCE SLA
ROBIN APP-AWARE COMPUTE LAYER
node.js
ROBIN APP-AWARE COMPUTE LAYER
ROBIN CONTAINER-AWARE DATA LAYER
IOIOIO
Application-centric QoS
› Max IOPs to throttle usage
› Min IOPs to guarantee performance
› Relative weights to prioritize apps
according to business needs
PostgresMongoDB Cassandra
MAX enforced here
MIN guaranteed here
Because Robin controls the entire IO pipeline
(App-to-Disk) it can truly enforce QoS
Priorities enforced here
CONFIDENTIAL – RESTRICTED DISTRIBUTION
22. QUEUES FOR MINIMUM I/O GUARANTEE
› For a given arrival rate, the number of requests in the system is
proportional to the average service time
› Little’s law:
N = l T
where
N: average # of packets in the system
l: arrival rate (packets per unit time)
T: average service time (time in the system)
› Example:
› TSA on your boarding pass
› HOV lanes on the highway
CONFIDENTIAL – RESTRICTED DISTRIBUTION
SERVERS
Dedicated
Production
Queue
Combined
Clone
Queue
23. TRADITIONAL STORAGE: THIN CLONE PRODUCTION DBS
CONFIDENTIAL – RESTRICTED DISTRIBUTION
23
PROD
Disk IOPS Capacity
Read & Write
CLONE1
Read for
original
data
Read & write for
delta changes
SSD Drive
CLONE2
Read for
original
data
Read & write for
delta changes
. . .
Bad Design: Since Production DB performance sacrificed to serve Clone IOs
24. TRADITIONAL STORAGE: WORKAROUND FOR THIN CLONING
PROD Test Master
Thin Cloned
Dev & Test DBs
Full Copy
Challenges with using Test Master:
• 2x storage
• Long data copy
• Manual configuration
• Manual data refresh
CONFIDENTIAL – RESTRICTED DISTRIBUTION
25. ROBIN IO CAGING: ZERO-IMPACT, PRODUCTION DB THIN CLONES
CONFIDENTIAL – RESTRICTED DISTRIBUTION
25
PROD
Disk IOPS Capacity
Read & Write
CLONE1
Read for
original
data
Read & write for
delta changes
SSD Drive
CLONE2
Read for
original
data
Read & write for
delta changes
. . .
IO Caging Limit
For ALL Clones
Guaranteed IOPs for Prod DB
26. ANALYTICS SANDBOX PROVISIONING – FOR RDBMS
Thinly-Provisioned Sandbox
• Full Dataset available for analysis
DIM
DIM
DIM
DIM
FACT DIM
THIN CLONE
DIM
DIM
DIM
DIM
FACT DIM
DESTROY
CONFIDENTIAL – RESTRICTED DISTRIBUTION