SlideShare ist ein Scribd-Unternehmen logo
1 von 31
Downloaden Sie, um offline zu lesen
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jarrod Spiga - Solutions Architect, Amazon Web Services
John Matheson – Cloud & Automation Platform Manager, News Corp Australia
September 5, 2017
Spot On!
Replacing Reserved Instances with Spot
Spot 101
Workloads running on Spot therefore need to be fault-tolerant, or
able to be restarted again at a later time.
Customers can bid on unused AWS capacity.
The tradeoff is if capacity becomes constrained, your Spot
Instances may be terminated after a two-minute warning.
Spot 101
Each instance family, each instance size, each availability zone in
each region is a separate spot market.
The Spot market is where price of compute fluctuations based on
supply and demand.
You’ll never pay more than your bid - you’ll only ever pay the market
price. When the market price exceeds your bid, you get two minutes
to wrap up.
Spot 101
Spot 101
A Review of Spot Fleet
Launch many spot instances with one call.
Select whether you want instances in the cheapest market, or opt to
diversify to reduce impact of market variability.
Weighting allows you to scale based on cores, memory, latency,
etc.
This diversification option is what we are using to maximise availability!
Challenge 1 – ELB Registration
Q: How do we register a Spot Fleet Instance to an ELB/ALB?
A: Use EC2 User Data.
aws elb register-instances-with-load-balancer 
--load-balancer-name my-loadbalancer --instances $instance_id;
As there is no mechanism to automatically register a Spot Instance provisioned via a Spot Fleet Request
with an ELB/ALB, this needs to be implemented to distribute load across the fleet.
Challenge 2 – EIP Attachment
Q: How do we associate an Elastic IP Address to a Spot Fleet Instance?
A: Use EC2 User Data.
aws ec2 allocate-address --domain vpc;
aws ec2 associate-address --instance-id $instance_id 
--allocation-id $eip_allocation_id;
This will be required for services that need direct connectivity to the Internet such as NAT hosts and
proxy servers.
But it’s a bit more complicated…
An alternative is to enable automatic public IP addressing – but this is a VPC-wide setting.
This use case has been raised with the service team and a feature request has been created.
Challenge 3 – De-registration from an ELB
Q: How do we de-register a terminated Spot Instance from an ELB/ALB?
A: Run a script that polls the instance metadata for a termination time.
while true
do
if curl -s http://169.254.169.254/latest/meta- data/spot/termination-time | 
grep -q .*T.*Z;
then instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id); 
aws elb deregister-instances-from-load-balancer 
--load-balancer-name my-load-balancer 
--instances $instance_id;
else
sleep 5
fi
done;
We need a mechanism to ensure that ELB traffic is not routed through to an instance that’s about to be
terminated.
This is a well documented process.
Challenge 3 – De-registration from an ELB
A (v2.0): Use a Lambda function to deregister the instance upon termination.
The previous solution does not cater for instances that are manually terminated. Also, it cannot be tested.
Finally, we can “think bigger” and build a solution that applies to ALL instances – on-demand or spot.
When an EC2 instance is terminated, an EC2 Instance State-change Notification event is raised in
CloudWatch Events. When an EC2 Instance changes state, a Lambda function can be executed.
This function would first check that the instance was terminated, then would de-register the instance.
Q: But how do we ensure that requests don’t get directed to an instance that’s
about to be terminated?
Utilise existing health check functionality within the ELB/ALB.
When our termination notice is posted to our Spot Instance, poison the health check!
Alternatively, you can use a scheduled CloudWatch Event to routinely initiate the Lambda function.
Challenge 4 – Spot Price Variability
Q: What happens if the Spot market price goes up beyond our bid price?
A: Handle the outage or run on-demand instances in parallel.
This needs to be considered if we are to have any guarantee of service, especially for production
environments.
Deploying diversified Spot Fleets helps greatly here, but there is still a chance that ALL markets in the
fleet could be outbid… Running on-demand instances in parallel is the typical answer.
But surely there is a better way?
Challenge 4 – Spot Price Variability
A (v2.0): Pre-empt the market.
Each Spot Fleet request has an associated CloudWatch Metric called “EligibleInstancePoolCount”, which
enumerates how many pools that a Fleet Request could fulfil a request from.
We can configure a CloudWatch Alarm that triggers when the number of eligible pools drops below a
certain threshold – say, 2 pools.
Our On-Demand instances running in parallel can now be replaced with an AutoScaling group that
typically has no instances running. When the alarm triggers, a Lambda functions is invoked to manipulate
the AutoScling group configuration and provision On-Demand instances.
Challenge 5 – T2 Instances
Q: What happens if my workload runs on T2 Instance types?
A: Use the larger instance types
There is currently no Spot markets for T2 instance types - meaning that workloads may have to run on
m3.medium or larger instance types to take advantage of Spot.
Deploying Spot Fleets across a diverse range of pools means that we don’t need to be constrained by a
particular instance type. It’s very rare that a workload has adverse performance with more resources!
Challenge 5 – T2 Instances
Let’s look at an example. A t2.small instance type
has an hourly rate of $0.032. This instance type
features a single (burstable) vCPU and 2GiB of
memory.
Compare this with an m3.medium instance.
This instance type gives you a single (dedicated)
vCPU and 3.75GiB of memory.
More predictable performance and more memory at 40% of the cost!
In all three AZs in the Sydney region, the market
price of an m3.medium instance type has not
exceeded $0.020 over the last 3 months...
… and our average hourly rate in the most expensive AZ still ended up being $0.0128
Challenge 5 – T2 Instances
Now compare a t2.small with a m3.large instance.
The m3.large instance type gives you a two
(dedicated) vCPU and 7.50GiB of memory.
We can further diversify our fleet to also utilize m3.large instances in the spot market, and still make
savings over what would have been charged if we were running t2.small instances!
In all three AZs in the Sydney region, the market
price of an m3.large instance type has not
exceeded $0.0318 over the last 3 months...
Challenge 6 – Automation
Q: How do we automate all of this?
A: In steps the automation team
This needs to be considered if we are to have any guarantee of service, especially for production
environments.
The advantage to using CloudWatch Events and Alarms as triggers to Lambda functions is that the
Lambda functions should be able to be implemented in a single account and invoked by each stack.
That said, the team have spent a LOT of time working through the complexities of building Spot Fleet
requests in to CloudFormation stacks.
Development teams are still provided with same baseline CloudFormation templates (utilizing ASGs) that
have been provided in the past. However, a new tool has been written by the Automation team that takes
these baseline templates and converts them for use with Spot Fleets.
Given the recent launch of CloudFormation StackSets, we’re about to start looking at ways where this
can be further simplified.
The Solution Going Forward: Deploy
1. A deployment plan has been initiated.
We start with a standard ASG based template (ASG, ELB, Baked AMI-ID, SecurityGroups, Subnets etc).
2. A lambda function is used to convert this ASG template into a skeleton Spot
Fleet resource request
The Solution Going Forward: Deploy
Lambda Function
(Conversion tool)
The Solution Going Forward: Deploy
Lambda Function
(InstanceTypes & Bids List)
3. A Lambda Function is triggered generating a list of instances types similar to
the one provided. It also calculates appropriate bid prices for those instance
types.
Lambda Function
(Conversion tool)
The Solution Going Forward: Deploy
Lambda Function
(InstanceTypes & Bids List)
Lambda Function
(Dynamic Spot Fleet Template)
4. The provided list is pushed to a third Lambda Function which dynamically
creates the Spot Fleet Template and uploads it to S3.
Lambda Function
(Conversion tool)
5. Application Cloudformation Stack is created.
The Solution Going Forward: Deploy
Lambda Function
(InstanceTypes & Bids List)
Lambda Function
(Dynamic Spot Fleet Template)
Cloudformation
(Application Stack)
Lambda Function
(Conversion tool)
1. The CloudFormation stack provisions an ELB/ALB and a Spot Fleet
Request is made.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
2. The Spot Fleet Request is fulfilled and Spot Instances register with their
ELB via EC2 User Data.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
3. If the market price for a Spot Instance exceeds the bid price, the Instance
is flagged for Termination. Health check on host is poisoned. Instance
marked as offline by ELB.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer
4. After two minutes, Spot Instance terminated. Scheduled CloudWatch Event
triggered, which initiates a Lambda function that ensures that unhealthy
instances are terminated and deregisters terminated instances from the ELB.
CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
5. A replacement Spot Instance is provisioned. Again, this Spot Instance
registers itself with the ELB.
The Solution Going Forward: Stack
Spot Fleet
Elastic Load Balancer CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
6. If the number of pools that Spot Fleet can fulfil instances from gets low…
The Solution Going Forward: Stack
Spot Fleet On-Demand Fleet
Elastic Load Balancer
CloudWatch Alarm
(EligibleInstancePoolCount)
CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
Lambda Function
(ModifyOnDemandCapacity)
7. … a CloudWatch Alarm will trigger a Lambda function that manipulates an
On-Demand AutoScaling group, which will commence provisioning On-
Demand EC2 Instances to maintain capacity for the workload.
The Solution Going Forward: Stack
Spot Fleet On-Demand Fleet
Elastic Load Balancer
CloudWatch Alarm
(EligibleInstancePoolCount)
CloudWatch Event
(1 minute Scheduled Rule)
Lambda Function
(TerminateEC2Instance)
Lambda Function
(ModifyOnDemandCapacity)
CloudWatch Alarm
(Pending Capacity > 0 for > 5 min)
8. If a spot instance has not been able to be provisioned for more than 5
minutes, an on-demand instance is also added.
Development Recommendations
Build Stateless Applications
If you can’t, persist state outside of the EC2 Instance using services such as DynamoDB,
RDS, Aurora, Elasticache, EFS or S3.
Poison Application Health Checks within the 2-minute warning period
Detect when a Spot Instance is scheduled for termination and cause the ELB/ALB to
think that the workload is out of service. That server will then be removed from the pool
of Healthy servers, allowing your application to gracefully handle a termination event.
Set your bid price appropriately
A bid price that is too low will introduce volatility in to your workload and reduce the
number of spot markets that you can draw instances from.
Reference Material
• EC2 Spot Instances - http://aws.amazon.com/ec2/spot/
• Spot Bid Advisor - http://aws.amazon.com/ec2/spot/bid-advisor/
• Getting Started with Spot - http://aws.amazon.com/ec2/spot/getting-started/
• Spot FAQs - http://aws.amazon.com/ec2/spot/faqs/
• Spot Testimonials - http://aws.amazon.com/ec2/spot/testimonials/
•
• Documentation: Using Spot Instances -
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html
• Documentation: Spot Fleet -
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html
Any Questions?

Weitere ähnliche Inhalte

Was ist angesagt?

AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAmazon Web Services
 
AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership  AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership Amazon Web Services
 
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAmazon Web Services
 
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...Amazon Web Services
 
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAmazon Web Services
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...Amazon Web Services
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Amazon Web Services
 
Proactive Cost Management for AWS Cloud
Proactive Cost Management for AWS CloudProactive Cost Management for AWS Cloud
Proactive Cost Management for AWS CloudNutanix Beam
 
Managing Amazon AWS Costs
Managing Amazon AWS CostsManaging Amazon AWS Costs
Managing Amazon AWS CostsJoe Kinsella
 
Using the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - RogersUsing the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - RogersAmazon Web Services
 
Cost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesCost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesAmazon Web Services
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudWolfgang Gentzsch
 
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAmazon Web Services
 
APN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAPN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAmazon Web Services
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoAmazon Web Services
 
AWS Cost optimization at scale
AWS Cost optimization at scaleAWS Cost optimization at scale
AWS Cost optimization at scaleBrett Pollak
 
Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Amazon Web Services
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Amazon Web Services
 

Was ist angesagt? (19)

AWS Cost Optimization in 5 Perspective
AWS Cost Optimization in 5 PerspectiveAWS Cost Optimization in 5 Perspective
AWS Cost Optimization in 5 Perspective
 
AWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices WebinarAWS Cost Optimisation Best Practices Webinar
AWS Cost Optimisation Best Practices Webinar
 
AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership  AWS Webcast - Total Cost of (Non) Ownership
AWS Webcast - Total Cost of (Non) Ownership
 
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with CloudabilityAWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
AWS Partner Webcast - Improving Your AWS Cost Efficiency with Cloudability
 
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...
 
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your BusinessAWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
AWS Summit Sydney 2014 | Moving to the Cloud. What does it Mean to your Business
 
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
The Total Cost of Ownership of Cloud Storage (TCO) - AWS Cloud Storage for th...
 
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
Risk Management and Particle Accelerators: Innovating with New Compute Platfo...
 
Proactive Cost Management for AWS Cloud
Proactive Cost Management for AWS CloudProactive Cost Management for AWS Cloud
Proactive Cost Management for AWS Cloud
 
Managing Amazon AWS Costs
Managing Amazon AWS CostsManaging Amazon AWS Costs
Managing Amazon AWS Costs
 
Using the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - RogersUsing the AWS TCO Calculator - Rogers
Using the AWS TCO Calculator - Rogers
 
Cost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot InstancesCost Optimize EC2 with Amazon EC2 Spot Instances
Cost Optimize EC2 with Amazon EC2 Spot Instances
 
Smart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the CloudSmart Manufacturing: CAE in the Cloud
Smart Manufacturing: CAE in the Cloud
 
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWSAWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
AWS Cloud Kata 2013 | Singapore - Achieving Profitability on AWS
 
APN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO ConversationsAPN Partner Webinar - Having Effective and Critical TCO Conversations
APN Partner Webinar - Having Effective and Critical TCO Conversations
 
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP ExpoOptimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
Optimizing Your AWS Apps & Usage to Reduce Costs - IP Expo
 
AWS Cost optimization at scale
AWS Cost optimization at scaleAWS Cost optimization at scale
AWS Cost optimization at scale
 
Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)Cost Optimization on AWS (REPEAT)
Cost Optimization on AWS (REPEAT)
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 

Ähnlich wie AWS Cost Opt Meetup 2 - News corp - Spot On deep dive

An introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - WebinarAn introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - WebinarCMPUTE
 
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot FleetAWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot FleetAdam Book
 
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...Amazon Web Services
 
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...Amazon Web Services
 
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)Amazon Web Services
 
Getting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar SeriesGetting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar SeriesAmazon Web Services
 
AWS Cost Control
AWS Cost ControlAWS Cost Control
AWS Cost ControlBob Brown
 
Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options  Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options Cloudreach
 
(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You ThousandsAmazon Web Services
 
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...Edureka!
 
Optimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot InstancesOptimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot InstancesAmazon Web Services
 
Cut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than BatchCut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than BatchRightScale
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsAmazon Web Services
 
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAmazon Web Services
 
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...Amazon Web Services
 

Ähnlich wie AWS Cost Opt Meetup 2 - News corp - Spot On deep dive (20)

An introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - WebinarAn introduction to Spot Instances and AWS Fleet - Webinar
An introduction to Spot Instances and AWS Fleet - Webinar
 
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot FleetAWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
AWS Atlanta Meetup -AWS Spot Blocks and Spot Fleet
 
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
AWS re:Invent 2016: Save up to 90% and Run Production Workloads on Spot - Fea...
 
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
Coding Apps in the Cloud to reduce costs up to 90% - September 2016 Webinar S...
 
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
AWS re:Invent 2016: Lessons Learned from a Year of Using Spot Fleet (CMP205)
 
Reduce Your Cloud Spending With AWS Spot Instances
Reduce Your Cloud Spending With AWS Spot InstancesReduce Your Cloud Spending With AWS Spot Instances
Reduce Your Cloud Spending With AWS Spot Instances
 
Getting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar SeriesGetting Started with EC2 Spot - November 2016 Webinar Series
Getting Started with EC2 Spot - November 2016 Webinar Series
 
AWS Cost Control
AWS Cost ControlAWS Cost Control
AWS Cost Control
 
Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options  Cloudreach Voices EC2 Making Sense of the Cost Options
Cloudreach Voices EC2 Making Sense of the Cost Options
 
(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands(CMP311) This One Weird API Request Will Save You Thousands
(CMP311) This One Weird API Request Will Save You Thousands
 
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
AWS Interview Questions And Answers | AWS Solution Architect Interview Questi...
 
Optimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot InstancesOptimize Content Processing in the Cloud with GPU and Spot Instances
Optimize Content Processing in the Cloud with GPU and Spot Instances
 
Cut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than BatchCut AWS Costs: Using Spot Instances for More Than Batch
Cut AWS Costs: Using Spot Instances for More Than Batch
 
Amazon EC2 Spot Instances
Amazon EC2 Spot InstancesAmazon EC2 Spot Instances
Amazon EC2 Spot Instances
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
Introduction to Amazon EC2 Spot
Introduction to Amazon EC2 SpotIntroduction to Amazon EC2 Spot
Introduction to Amazon EC2 Spot
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
This One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You ThousandsThis One Weird API Request Will Save You Thousands
This One Weird API Request Will Save You Thousands
 
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot InstancesAWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
AWS Compute Evolved Week: Cost Optimize EC2 with Amazon EC2 Spot Instances
 
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
Run Your CI/CD and Test Workloads for 90% Less with Amazon EC2 Spot - CMP317 ...
 

Kürzlich hochgeladen

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 

Kürzlich hochgeladen (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 

AWS Cost Opt Meetup 2 - News corp - Spot On deep dive

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jarrod Spiga - Solutions Architect, Amazon Web Services John Matheson – Cloud & Automation Platform Manager, News Corp Australia September 5, 2017 Spot On! Replacing Reserved Instances with Spot
  • 2. Spot 101 Workloads running on Spot therefore need to be fault-tolerant, or able to be restarted again at a later time. Customers can bid on unused AWS capacity. The tradeoff is if capacity becomes constrained, your Spot Instances may be terminated after a two-minute warning.
  • 3. Spot 101 Each instance family, each instance size, each availability zone in each region is a separate spot market. The Spot market is where price of compute fluctuations based on supply and demand. You’ll never pay more than your bid - you’ll only ever pay the market price. When the market price exceeds your bid, you get two minutes to wrap up.
  • 6. A Review of Spot Fleet Launch many spot instances with one call. Select whether you want instances in the cheapest market, or opt to diversify to reduce impact of market variability. Weighting allows you to scale based on cores, memory, latency, etc. This diversification option is what we are using to maximise availability!
  • 7. Challenge 1 – ELB Registration Q: How do we register a Spot Fleet Instance to an ELB/ALB? A: Use EC2 User Data. aws elb register-instances-with-load-balancer --load-balancer-name my-loadbalancer --instances $instance_id; As there is no mechanism to automatically register a Spot Instance provisioned via a Spot Fleet Request with an ELB/ALB, this needs to be implemented to distribute load across the fleet.
  • 8. Challenge 2 – EIP Attachment Q: How do we associate an Elastic IP Address to a Spot Fleet Instance? A: Use EC2 User Data. aws ec2 allocate-address --domain vpc; aws ec2 associate-address --instance-id $instance_id --allocation-id $eip_allocation_id; This will be required for services that need direct connectivity to the Internet such as NAT hosts and proxy servers. But it’s a bit more complicated… An alternative is to enable automatic public IP addressing – but this is a VPC-wide setting. This use case has been raised with the service team and a feature request has been created.
  • 9. Challenge 3 – De-registration from an ELB Q: How do we de-register a terminated Spot Instance from an ELB/ALB? A: Run a script that polls the instance metadata for a termination time. while true do if curl -s http://169.254.169.254/latest/meta- data/spot/termination-time | grep -q .*T.*Z; then instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id); aws elb deregister-instances-from-load-balancer --load-balancer-name my-load-balancer --instances $instance_id; else sleep 5 fi done; We need a mechanism to ensure that ELB traffic is not routed through to an instance that’s about to be terminated. This is a well documented process.
  • 10. Challenge 3 – De-registration from an ELB A (v2.0): Use a Lambda function to deregister the instance upon termination. The previous solution does not cater for instances that are manually terminated. Also, it cannot be tested. Finally, we can “think bigger” and build a solution that applies to ALL instances – on-demand or spot. When an EC2 instance is terminated, an EC2 Instance State-change Notification event is raised in CloudWatch Events. When an EC2 Instance changes state, a Lambda function can be executed. This function would first check that the instance was terminated, then would de-register the instance. Q: But how do we ensure that requests don’t get directed to an instance that’s about to be terminated? Utilise existing health check functionality within the ELB/ALB. When our termination notice is posted to our Spot Instance, poison the health check! Alternatively, you can use a scheduled CloudWatch Event to routinely initiate the Lambda function.
  • 11. Challenge 4 – Spot Price Variability Q: What happens if the Spot market price goes up beyond our bid price? A: Handle the outage or run on-demand instances in parallel. This needs to be considered if we are to have any guarantee of service, especially for production environments. Deploying diversified Spot Fleets helps greatly here, but there is still a chance that ALL markets in the fleet could be outbid… Running on-demand instances in parallel is the typical answer. But surely there is a better way?
  • 12. Challenge 4 – Spot Price Variability A (v2.0): Pre-empt the market. Each Spot Fleet request has an associated CloudWatch Metric called “EligibleInstancePoolCount”, which enumerates how many pools that a Fleet Request could fulfil a request from. We can configure a CloudWatch Alarm that triggers when the number of eligible pools drops below a certain threshold – say, 2 pools. Our On-Demand instances running in parallel can now be replaced with an AutoScaling group that typically has no instances running. When the alarm triggers, a Lambda functions is invoked to manipulate the AutoScling group configuration and provision On-Demand instances.
  • 13. Challenge 5 – T2 Instances Q: What happens if my workload runs on T2 Instance types? A: Use the larger instance types There is currently no Spot markets for T2 instance types - meaning that workloads may have to run on m3.medium or larger instance types to take advantage of Spot. Deploying Spot Fleets across a diverse range of pools means that we don’t need to be constrained by a particular instance type. It’s very rare that a workload has adverse performance with more resources!
  • 14. Challenge 5 – T2 Instances Let’s look at an example. A t2.small instance type has an hourly rate of $0.032. This instance type features a single (burstable) vCPU and 2GiB of memory. Compare this with an m3.medium instance. This instance type gives you a single (dedicated) vCPU and 3.75GiB of memory. More predictable performance and more memory at 40% of the cost! In all three AZs in the Sydney region, the market price of an m3.medium instance type has not exceeded $0.020 over the last 3 months... … and our average hourly rate in the most expensive AZ still ended up being $0.0128
  • 15. Challenge 5 – T2 Instances Now compare a t2.small with a m3.large instance. The m3.large instance type gives you a two (dedicated) vCPU and 7.50GiB of memory. We can further diversify our fleet to also utilize m3.large instances in the spot market, and still make savings over what would have been charged if we were running t2.small instances! In all three AZs in the Sydney region, the market price of an m3.large instance type has not exceeded $0.0318 over the last 3 months...
  • 16. Challenge 6 – Automation Q: How do we automate all of this? A: In steps the automation team This needs to be considered if we are to have any guarantee of service, especially for production environments. The advantage to using CloudWatch Events and Alarms as triggers to Lambda functions is that the Lambda functions should be able to be implemented in a single account and invoked by each stack. That said, the team have spent a LOT of time working through the complexities of building Spot Fleet requests in to CloudFormation stacks. Development teams are still provided with same baseline CloudFormation templates (utilizing ASGs) that have been provided in the past. However, a new tool has been written by the Automation team that takes these baseline templates and converts them for use with Spot Fleets. Given the recent launch of CloudFormation StackSets, we’re about to start looking at ways where this can be further simplified.
  • 17. The Solution Going Forward: Deploy 1. A deployment plan has been initiated. We start with a standard ASG based template (ASG, ELB, Baked AMI-ID, SecurityGroups, Subnets etc).
  • 18. 2. A lambda function is used to convert this ASG template into a skeleton Spot Fleet resource request The Solution Going Forward: Deploy Lambda Function (Conversion tool)
  • 19. The Solution Going Forward: Deploy Lambda Function (InstanceTypes & Bids List) 3. A Lambda Function is triggered generating a list of instances types similar to the one provided. It also calculates appropriate bid prices for those instance types. Lambda Function (Conversion tool)
  • 20. The Solution Going Forward: Deploy Lambda Function (InstanceTypes & Bids List) Lambda Function (Dynamic Spot Fleet Template) 4. The provided list is pushed to a third Lambda Function which dynamically creates the Spot Fleet Template and uploads it to S3. Lambda Function (Conversion tool)
  • 21. 5. Application Cloudformation Stack is created. The Solution Going Forward: Deploy Lambda Function (InstanceTypes & Bids List) Lambda Function (Dynamic Spot Fleet Template) Cloudformation (Application Stack) Lambda Function (Conversion tool)
  • 22. 1. The CloudFormation stack provisions an ELB/ALB and a Spot Fleet Request is made. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer
  • 23. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer 2. The Spot Fleet Request is fulfilled and Spot Instances register with their ELB via EC2 User Data.
  • 24. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer 3. If the market price for a Spot Instance exceeds the bid price, the Instance is flagged for Termination. Health check on host is poisoned. Instance marked as offline by ELB.
  • 25. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer 4. After two minutes, Spot Instance terminated. Scheduled CloudWatch Event triggered, which initiates a Lambda function that ensures that unhealthy instances are terminated and deregisters terminated instances from the ELB. CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance)
  • 26. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) 5. A replacement Spot Instance is provisioned. Again, this Spot Instance registers itself with the ELB.
  • 27. The Solution Going Forward: Stack Spot Fleet Elastic Load Balancer CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) 6. If the number of pools that Spot Fleet can fulfil instances from gets low…
  • 28. The Solution Going Forward: Stack Spot Fleet On-Demand Fleet Elastic Load Balancer CloudWatch Alarm (EligibleInstancePoolCount) CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) Lambda Function (ModifyOnDemandCapacity) 7. … a CloudWatch Alarm will trigger a Lambda function that manipulates an On-Demand AutoScaling group, which will commence provisioning On- Demand EC2 Instances to maintain capacity for the workload.
  • 29. The Solution Going Forward: Stack Spot Fleet On-Demand Fleet Elastic Load Balancer CloudWatch Alarm (EligibleInstancePoolCount) CloudWatch Event (1 minute Scheduled Rule) Lambda Function (TerminateEC2Instance) Lambda Function (ModifyOnDemandCapacity) CloudWatch Alarm (Pending Capacity > 0 for > 5 min) 8. If a spot instance has not been able to be provisioned for more than 5 minutes, an on-demand instance is also added.
  • 30. Development Recommendations Build Stateless Applications If you can’t, persist state outside of the EC2 Instance using services such as DynamoDB, RDS, Aurora, Elasticache, EFS or S3. Poison Application Health Checks within the 2-minute warning period Detect when a Spot Instance is scheduled for termination and cause the ELB/ALB to think that the workload is out of service. That server will then be removed from the pool of Healthy servers, allowing your application to gracefully handle a termination event. Set your bid price appropriately A bid price that is too low will introduce volatility in to your workload and reduce the number of spot markets that you can draw instances from.
  • 31. Reference Material • EC2 Spot Instances - http://aws.amazon.com/ec2/spot/ • Spot Bid Advisor - http://aws.amazon.com/ec2/spot/bid-advisor/ • Getting Started with Spot - http://aws.amazon.com/ec2/spot/getting-started/ • Spot FAQs - http://aws.amazon.com/ec2/spot/faqs/ • Spot Testimonials - http://aws.amazon.com/ec2/spot/testimonials/ • • Documentation: Using Spot Instances - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html • Documentation: Spot Fleet - http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet.html Any Questions?