3. Cloud compute services offer overwhelming choices
EC2 instances cost ranges from $3.4 to $19482 per month (on demand)
https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-ec2-instances-performance-optimization-best-practices-cmp307r1-
aws-reinvent-2018
4. Cloud storage services provide various price and
performance points
https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018
EBS cost ranges from $0.025 to $0.125 per GB-month + provisioned IOPS
5. Cloud compute instances and storage types are
interdependent
https://www.slideshare.net/AmazonWebServices/deep-dive-on-amazon-elastic-block-storage-amazon-ebs-stg310r1-aws-reinvent-2018
EC2 to EBS network can limit actual volume performance (e.g. IOPS)
bottleneck!
7. The model-based approach, aka cloud right sizing
recommendations
https://cloud.google.com/compute/docs/instances/apply-sizing-recommendations-for-instances
8. The experimental approach, aka load test your app
“There is no substitute for measuring the
performance of your entire application, because
application performance can be impacted by the
underlying infrastructure or by software and
architectural limitations.
We recommend application-level testing,
including the use of application profiling and load
testing tools and services”
https://aws.amazon.com/ec2/instance-types/
9. A bigger problem: same specs, different performance
across different cloud providers
“CockroachDB 2.1 achieves 40%
more throughput (tpmC) on
TPC-C when tested on AWS
using c5d.4xlarge than on GCP
via n1-standard-16.
We were shocked that AWS
offered such superior
performance”
Cockroach Labs
https://www.cockroachlabs.com/blog/2018_cloud_report/
10. Why current approaches can not assure optimal
application performance and low costs?
● May not consider end to end application performance
● May not capture hidden bottlenecks
● May not capture unique application / workload behaviour
● May not factor in cloud-specific platforms and
implementations (e.g. hypervisors, CPU architectures)
● Can’t scale to the sheer complexity of cloud options
15. The use case
Goal
Minimize price/performance of a MongoDB database hosted on AWS
Performance is throughput of the database (queries/sec), price is monthly
AWS price for the provisioned resources
Scenario
Akamas driving automated optimization including application load tests
Workflow to provision AWS EC2 and EBS resources as suggested by AI
engine
Optimization scope
AWS EC2 instances and EBS storage volumes powering MongoDB
16. Modeling the cloud cost-optimization problem
c5d.2xlarge
Instance family
Instance generation
Additional capabilities
Volume type
Instance size
Volume size
Volume IOPS
io1
70 GB
1000 IOPS
EC2
EBS
18. AI-driven price-performance optimization results
Baseline configuration:
price/performance of
r4.large, gp2 70GB
Best configuration: -68%
price/performance
after 18 experiments
or approx 22 hours
19. Best configuration: for the same price, 3x throughput and -
90% latency
Price: - 2.9%
65.52 (best) vs 67.48 (baseline)
€/month
Throughput: +205%
7605 (best) vs 2493 (baseline)
query/sec
Latency (avg): -90%
1330 (best) vs 14575 (baseline)
milliseconds
20. How did AI achieve that? A look at the best configuration
Instance
Name
Use cases vCPUs Memory
(GiB)
Instance
Storage
Block
Storage (EBS)
r4.large
(baseline)
Memory
optimized
2 x Intel
Xeon E5-
2686
15.25 - gp2 70GB
m5d.large
(best)
General
purpose
2 x Custom
Intel Xeon
Platinum
8175M
8 1 x 150 GB
NVMe SSD
n/a
The best configuration for this workload is:
m5d.large
HW specs comparison
21. AI can find unusual configurations: AMD CPUs with half
memory can cut costs and still improve throughput
The cheapest configuration for this workload
is m5a.large
-24% cost with +12% throughput
Instance
Name
Use cases vCPUs Memory
(GiB)
Instance
Storage
Block Storage
(EBS)
r4.large
(baseline)
Memory
optimized
2 x Intel
Xeon
E5-2686
15.25 - gp2 70 GB
m5a.large
(cheapest)
Memory
optimized
2 x AMD
EPYC
8 - gp2 114 GB
HW specs comparison
Searching instances with EBS storage
Top 5 best configurations
24. Takeaways
● Technology landscape is becoming more and more complex
● Traditional approaches are not effective and can’t scale - significant
optimization opportunities are left on the table
● AI for IT optimization is required and can reach previously unthinkable
benefits, beyond what human experts can do
● In the cloud, 70% price/performance improvements are possible by
properly exploiting choices we have
● Cloud rightsizing recommendations may suggest higher price options