Getting to Profitability discusses how companies can optimize costs on AWS to achieve profitability. It recommends focusing on three key things: increasing revenue, decreasing unit costs, and increasing margins. It describes how AWS pricing models like reserved instances and spot instances can help reduce costs. It also emphasizes the importance of cost-aware architecting techniques like offloading static content to S3 and CloudFront for caching to optimize compute costs. With the right optimizations, the document shows how one company achieved a 54% reduction in unit costs through various AWS services and pricing strategies.
10. An example
Enterprise software provider in APAC
Focused on SaaS for storage, security, collaboration, etc.
Backed by leading VC’s in the region
Strong growth – winning customers globally
Focused on profitability & reducing unit costs
Worked closely with the AWS team to optimize its architecture
11. “Based on a True Story”
Margin
Growth
54%
reduction in
unit costs
-20%
-10%
price drop RI purchase
in S3
-22%
Migration
Cassandra
to Dynamo
-18%
Price drop in
S3 of 25%
14. Cost Optimization using different purchase models
Free Tier
On-Demand
Reserved
Spot
Get Started on AWS
with free usage & no
commitment
Pay for compute
capacity by the hour
with no long-term
commitments
Make a low, one-time
payment and receive a
significant discount on
the hourly charge
Bid for unused capacity,
charged at a Spot Price
which fluctuates based
on supply and demand
For POCs and
getting started
For spiky workloads,
or to define needs
For committed
utilization
For time-insensitive or
transient workloads
16. Reserved Instance Pricing
Make a low, one-time payment and receive a
significant discount on the hourly charge
For committed utilization
3 Versions
• Light Utilization RI
• Medium Utilization RI
• High Utilization RI
2 Terms
• 1-year
• 3-year
17. Reserved Instance Pricing
Utilization
RI option
Savings over On-Demand
<10%
On-Demand
10% - 40%
Light Utilization RI
Up to 56%
40% - 75%
Medium Utilization RI
Up to 66%
>75%
Heavy Utilization RI
Up to 71%
18.
19. • Most traffic happens in the afternoons and evenings, so they reduce the number of
instances at night by 40%.
• At peak traffic $52 an hour is spent on EC2 and at night, during off peak, the spend is as
little as $15 an hour. Saving per hour = 71%
20. Save more money by using Spot Instances
Spot market for underutilized capacity
Requested Bid Price and
Pay as you go
Spot Price < On-Demand Price
Up to 85% savings over On Demand pricing
21. Use Cases for Spot Pricing
Use Case
Batch Processing
Types of Applications
Generic background processing (scale out computing)
Hadoop
Hadoop/MapReduce processing type jobs (e.g. Search, Big Data, etc.)
Scientific Computing
Scientific trials/simulations/analysis in chemistry, physics, and biology
Video and Image
Processing/Rendering
Testing
Transform videos into specific formats
Web/Data Crawling
Financial
HPC
Analyzing data and processing it
Hedgefund analytics, energy trading, etc
Utilize HPC servers to do embarrassingly parallel jobs
Cheap Compute
Backend servers for Facebook games
Provide testing of software, web sites, etc
22.
23. Optimizing Video Transcoding Workloads
for a FREEMIUM model
Free Offering
Premium Offering
Optimize for reducing cost
Acceptable Delay Limits
Optimized for Faster response
No Delays
Implementation
–
–
–
–
Leverage spot pricing
Maximum Bid Price
< On-demand Rate
Use on-demand Instances, if delay
Get strongly reduced price for your
workload
Implementation
– Invest in Reserved Instances
– Use on-demand for Elasticity
Get Instant Capacity for higher price
25. “Give me 4 fault tolerant algorithms and I can pick
the best one almost with my eyes closed.
If you then ask me which one is best for the
business, in terms of dollar costs, I would be
clueless...”
Werner Vogels, CTO, Amazon
26. Cost optimization through „Cost Aware Architecting‟
Reduce Cost of…
Compute
…by leveraging:
1. S3 & CloudFront for Caching & Offloading
2. Auto-Scaling done Right
Storage
3. Storing derivative objects in S3 „Reduced Redundancy‟
Database
4. Read Replicas and/or ElastiCache
Test & Dev
5. Rapid proto-typing & Lean Dev/Test
27. Cost Aware Architecting to Reduce costs of EC2
1. S3 & CloudFront for Caching & Offloading
• Reduce your compute demand and costs
• Improve end-user experience
• Increase reliability and durability
33. Cost Aware Architecting to Reduce costs of EC2
2. Auto-Scaling done Right with Real Time reaction response
• Elastic Load Balancing and (event-driven) Auto Scaling
• Notification of pending news flash (with audible alarm)
• On-demand ramp up of capacity (6 mins.)
• Subscriber alert push delivered
• Mass response traffic handled (followed by ramp down)
35. Cost Aware Architecting to Reduce costs of EC2
2. Auto-Scaling done Right with Real Time reaction response
Buuuk
Straits Times
36. Cost Aware Architecting to Reduce costs of EC2
2. Auto-Scaling done Right with Real Time reaction response
37. Cost Aware Architecting to Reduce costs of EC2
2. Auto-Scaling done Right with Real Time reaction response
38. Cost Aware Architecting to Reduce costs of EC2
2. Auto-Scaling done Right with Real Time reaction response
39. Cost Aware Architecting to Reduce costs of EC2
2. Auto-Scaling done Right with Real Time reaction response
40. Cost Aware Architecting to Reduce costs of S3
3. Storing derivative objects in S3 „Reduced Redundancy‟
• Original vs. derived assets : 33% savings
• Single reference and consistency
• Control, accurate logs and tracking
Reduced Redundancy Storage
„RRS‟
41. Cost Aware Architecting to Reduce costs of DB
4. Read Replicas and/or ElastiCache („Database Smarts‟)
•
•
•
•
Scale out and share work
Optimal performance, minimize load
Enhance reliability, ensure data safety
Cost reduction
42.
43.
44.
45. Cost Aware Architecting to Reduce costs of Test/Dev
5. Rapid proto-typing & Lean Dev/Test
• Inexpensive idea validation
• Seamless switch over and versioning
• Rapid dev / test agility
48. Traditional HW / Hosting
WASTE
On and Off
Fast Growth
Variable peaks
Predictable peaks
CUSTOMER DISSATISFACTION
49. AWS = Elastic Capacity
On and Off
Fast Growth
Variable peaks
Predictable peaks
50. When calculating TCO…
#1 Start by understanding your use cases & usage patterns
#2 Apples to Apples – Take all the fixed costs into consideration
51.
52.
53. When calculating TCO…
#1 Start by understanding your use cases & usage patterns
#2 Apples to Apples – Take all the fixed costs into consideration
#3 Leverage ‘Cost Aware Architecting’ to reduce resources
54. Traditional Hosting vs AWS
60
# of
(virtual)
servers
50
40
30
20
10
0
Hosting
Hosting
Offload
to S3
Caching
with CF
AutoScaling
Etc.
55. When calculating TCO…
#1 Start by understanding your use cases & usage patterns
#2 Apples to Apples – Take all the fixed costs into consideration
#3 Leverage ‘Cost Aware Architecting’ to reduce resources
#4 Include pricing models (RI, Spot) and economies of scale
56. “Based on a True Story”
Margin
Growth
54%
reduction in
unit costs
-20%
-10%
price drop RI purchase
in S3
-22%
Migration
Cassandra
to Dynamo
-18%
Price drop in
S3 of 25%
57. When calculating TCO…
#1 Start by understanding your use cases & usage patterns
#2 Apples to Apples – Take all the fixed costs into consideration
#3 Leverage ‘Cost Aware Architecting’ to reduce resources
#4 Include pricing models (RI, Spot) and economies of scale
#5 Take a look at what’s included: Intangible Cost Savings !
58. Did you know?
Free Usage Tier
Free Services
Data Transfer
AWS Elastic Beanstalk
AWS CloudFormation
AWS IAM
Auto Scaling
Consolidated Billing
No Charge for
New Customers
Amazon EC2
Amazon RDS
Amazon ELB
Amazon S3
Amazon EBS
For All Customers
Amazon SQS/SNS
Amazon DynamoDB
Amazon SES
Amazon SWF
And more…
Inbound Data Transfer
Data Transfer between
Instances within an
Availability Zone
62. So what does this mean in terms of costs?
Standard Architecture
Optimized Architecture
Month
Month
Medium EC2 instances
Medium EC2 instances 4
1
$ 121
$ 485
CloudFront Data Transfer Out 1Tb $ 168
AWS Data Transfer Out 1Tb $ 194
TOTAL
$ 679
CloudFront Requests
TOTAL
57% lower cost – 6 x faster
$1.89
$ 291
TIME TO MARKETNeed to launch the business quicklyLong development cycles and high costsInability to experiment and test the hypotheses that underpin the businessSCALABILITYUnpredictable demandNeed to deal with spiky traffic or sudden increase in usersNeed to scale out to cover new markets / regionsCOST & REVENUENo CAPEX budget Inability to forecast demand & commit long term contractsNeed to run a lean business & focus on generating revenue
Let us see how AWS helps to scale your Web Application to support 10’s of Millions of users. Start Small and grow big, build an architecture that scales at each progressive stage.
We have a variety of purchase options that allow you to match your workload to the right model, and we’re happy to help you optimize your bill by working with you to choose the right mix of several of these.
■One of the fastest growing sites in history. Cites AWS for making it possible to handle 18 million visitors in March, a 50% increase from the previous month, with very little IT infrastructure. ■12 employees as of last December. Using the cloud a site can grow dramatically while maintaining a very small team. Looks like 31 employees as of now.
Let us see how AWS helps to scale your Web Application to support 10’s of Millions of users. Start Small and grow big, build an architecture that scales at each progressive stage.
Many companies still have their static content, e.g. pics, at their serverThis means for every user, this requires computational effort as pictures are being called from the serverA better way is to offload all your static content and put it in S3, and have it delivered from our CDN, called CloudfrontThis reduces the number of API calls to your server, and hence, you can lower the number of servers you run, thereby lowering costsCloudFront is fully automated: you don’t have to configured detailed configs, your users just get the content from the nearest of our 43 global Cloudfront POPS – lower latency, means a an improved end-user experienceDurability goes up with S3: 11 9’s. But also: consistency as all your data is in one location, not across all your servers. DRY
Many companies still have their static content, e.g. pics, at their serverThis means for every user, this requires computational effort as pictures are being called from the serverA better way is to offload all your static content and put it in S3, and have it delivered from our CDN, called CloudfrontThis reduces the number of API calls to your server, and hence, you can lower the number of servers you run, thereby lowering costsCloudFront is fully automated: you don’t have to configured detailed configs, your users just get the content from the nearest of our 43 global Cloudfront POPS – lower latency, means a an improved end-user experienceDurability goes up with S3: 11 9’s. But also: consistency as all your data is in one location, not across all your servers. DRY
Many companies still have their static content, e.g. pics, at their serverThis means for every user, this requires computational effort as pictures are being called from the serverA better way is to offload all your static content and put it in S3, and have it delivered from our CDN, called CloudfrontThis reduces the number of API calls to your server, and hence, you can lower the number of servers you run, thereby lowering costsCloudFront is fully automated: you don’t have to configured detailed configs, your users just get the content from the nearest of our 43 global Cloudfront POPS – lower latency, means a an improved end-user experienceDurability goes up with S3: 11 9’s. But also: consistency as all your data is in one location, not across all your servers. DRY
Many companies still have their static content, e.g. pics, at their serverThis means for every user, this requires computational effort as pictures are being called from the serverA better way is to offload all your static content and put it in S3, and have it delivered from our CDN, called CloudfrontThis reduces the number of API calls to your server, and hence, you can lower the number of servers you run, thereby lowering costsCloudFront is fully automated: you don’t have to configured detailed configs, your users just get the content from the nearest of our 43 global Cloudfront POPS – lower latency, means a an improved end-user experienceDurability goes up with S3: 11 9’s. But also: consistency as all your data is in one location, not across all your servers. DRY
Many companies still have their static content, e.g. pics, at their serverThis means for every user, this requires computational effort as pictures are being called from the serverA better way is to offload all your static content and put it in S3, and have it delivered from our CDN, called CloudfrontThis reduces the number of API calls to your server, and hence, you can lower the number of servers you run, thereby lowering costsCloudFront is fully automated: you don’t have to configured detailed configs, your users just get the content from the nearest of our 43 global Cloudfront POPS – lower latency, means a an improved end-user experienceDurability goes up with S3: 11 9’s. But also: consistency as all your data is in one location, not across all your servers. DRY
Perx = mobile loyalty program. iPhone app for loyalty in restaurants, bars etc. Location Based = tells you when you walk around where to get a deal.Logo’s for all the rastaurants is static content. When they changed to S3 + Cloudfront, user experience went up, users loved it. Easier to manage as they only had to manage changes, new ones etc ONCEThen, we started offering CloudFront for Dynamic Content. For Perx, that works for all the different offers that restaurants put out there on a weekly basis. “What is the best deal today at Subway or Starbucks?”. These can now be cached at the edge as well. Offload dynamic calls to your server, thereby again lowering the load on your servers, and your costs!
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
Buuuk is the mobile app partner of Singapore Press Holding (SPH), the publisher of Straits Times and much moreThey distribute their mobile app to millions and millions usersThey have Breaking News Alerts, which obviously drive an immediate surge in users and hence in the number of required capacityIn real time, they react to this. When SPH tells the system that there is a Breaking News announcement. When the Buuuk system receives this, they automatically issue a command to increase the number of EC2 instances and delay the news by 5 minutesAfter 5 minutes, the system is fully scaled up and the announcement goes out. People receive it and what do they do? They visit the news site with millions at the same time. The surge in users can then be easily dealt with = customer satisfaction From experience, they know this peak is normally only less than hour, so within an hour, they scale down again so that they only pay for 1 hour of excess capacity
For many content, media files, etc. derivatives are being created. For examples, thumbnails, versions for iOS, Android, etc.These files can be re-generated from the originalOr: you do have the source files elsewhere.In that case: you can consider S3 Reduced Redundancy. Not 11 9’s, but 4 9’s. Still 99.99% durability, which is 400 times more than a normal harddrive. BUT – 33% cheaper than the standard S3Improved consistency and logging / tracking as you exactly know where the content is, who calls it, how often, etc.
If you can add a number of read replicas, you can offload a number of tasks such as reportingYou do not need to peak your Master and reduce the size of your entire database fleet. You can further offload certain activities to other services such as DynamoDB or ElastiCacheYou can even make your EC2 instances smart, so they know the read replicas are there and when to ping those
A first option = read replicas to deal with API calls to the databaseAll reads go there, not to the Master so you can avoid the Master from needing to grow and growWhen you are really smart, you can even auto-scale the Read Replicas to only have them when usage increases
Additional offload is ElasticCaheSometimes, 90% of calls can be offloaded to ElasticCache as the calls are the same CloudWatch can actually tell you the CPU utilization of your RDS so if its low, you can reduce the size
Copy-paste entire infrastructure & try out bothLeverage CloudFormation to describe a stack and create templates to automate process of spinning up new stacksYou can copy-paste, change a few things in the copied environment to test whether it works betterThis allows for rapid dev & test and is often used for optimization of performance & conversion metrics: A/B TESTINGUsed by Obama in campaign, but also extensively in gaming
Let us see how AWS helps to scale your Web Application to support 10’s of Millions of users. Start Small and grow big, build an architecture that scales at each progressive stage.
Each of these examples is typified by wasted IT resources. Where you planned correctly, the IT resources will be over provisioned so that services are not impacted and customers lost during high demand. In the worst cases, that capacity will not be enough, and customer dissatisfaction will result. Most businesses have a mix differing patterns at play, and much time and resource is dedicated to planning and management to ensure services are always available. And when a new online service is really successful, you often can't ship in new capacity fast enough. Some say that's a nice problem to have, but those that have lived through it will tell you otherwise!
You control how and when your service scales, so you can closely match increasing load in small increments, scale up fast when needed, and cool off and reduce the resources being used at any time of day. Even the most variable and complex demand patterns can be matched with the right amount of capacity - all automatically handled by AWS.