Este documento describe cómo las instancias Amazon EC2 Spot pueden ayudar a ahorrar hasta un 90% en los costos de procesamiento por lotes ejecutando trabajos en instancias disponibles a precios más bajos. Se explica cómo las aplicaciones de procesamiento por lotes pueden aprovechar las instancias Spot para simulaciones, modelado molecular, procesamiento de video y más, al tiempo que se mantiene la confiabilidad mediante el uso de colas SQS y almacenamiento S3.
Ahorre hasta un 90% en ambientes productivos con instancias Spot
1. Ahorre hasta un 90% en ambientes
productivos con instancias Spot
Ivan Salazar, Enterprise Solutions Architect
2. Bajo demanda
Pague por capacidad
de cómputo por hora
sin compromisos a
largo plazo
Para cargas variables
o para definir
necesidades
Modelos de consumo de AWS EC2
Reservado
Haga un pago único
bajo y reciba un
descuento
significativo al cargo
por hora
Para utilización con
compromiso
Subasta (Spot)
Puje por capacidad
ociosa, cargada a
precio de subasta
que fluctúa de
acuerdo a la pferta y
la demanda
Para cargas no
sensibles al tiempo
3. ¿Qué son las instancias Amazon EC2 Spot?
Son instancias EC2 de sobra por las que usted puede subastar para correr sus
aplicaciones en la nube. Las instancias Spot se encuentran disponibles a precios más
bajos que las bajo demanda, así que usted puede reducir significativamente el costo
de ejecutar sus aplicaciones, crecer la capacidad de las mismas por el mismo
presupuesto y habilitar nuevos tipos de aplicaciones para la nube.
$1
4. Con Spot las reglas son simples
Mercados donde el precio del
cómputo cambia en base a la
oferta y la demanda
Nunca pagará más de lo que
puja. Cuando el mercado excede
su puja, usted tiene dos minutos
para concluir su trabajo
5. ¿Por qué usar Spot? Ejemplos de clientes
39 años de investigación farmaceutica porcesadas, utilizando más de
80,000 cores, en 9 horas por $4,232 USD
- Aproximadamente un pico de 87,000 núcleos de cómputo
- Estimados 39 años de química computacional realizados en 9 horas
- Tres compuestos candidatos identificados existósamente
6. “Al utilizar las instancias Spot de AWS hemos logrado ahorrar 75% al
mes con simplemente cambiar cuatro líneas de código. Hace perfecto
sentido para ahorrar dinero cuando estás ejecutando cargas de
integraciñon contínua o procesos de pipeline” - Matthew Leventi, Lead
Engineer, Lyft
¿Por qué usar Spot? Ejemplos de clientes
7. $0.27 $0.29$0.50
1b 1c1a
8XL
$0.30 $0.16$0.214XL
$0.07 $0.08$0.082XL
$0.05 $0.04$0.04XL
$0.01 $0.04$0.01L
C3
$1.76
On
Demand
$0.88
$0.44
$.22
$0.11
¡Enséñame los mercados!
Cada familia de instancia
Cada tamaño de instancia
Cada zona de disponibilidad
En cada región
Es un mercado Spot Separado
8. Puja 50%
Puja 75%
Usted paga
el precio de
mercado
Precio de puja Vs precio de mercado
Puja 25%
9. Spot Bid Advisor
1) Facilitamos las subastas
2) Con selección deliberada de
pool y puja, usted
conservará sus instancias el
tiempo que necesite.
3) Y con nuevas características
como Spot Fleet
diversificado, nosotros nos
encrgamos de la carga
pesada…
10. Interfaz fácil de usar que le
permite lanzar instancias de sobra
en segundos
Le ayuda a seleccionar la puja para
instancias EC2 que cumpla con los
requerimientos de su aplicación
Tablero fácil de usar que le permite
modificar y administrar la
capacidad de cómputo de sus
aplicaciones
La consola de EC2 Spot
11. Spot fleet le ayuda
Lance miles de instancias Spot
Con sólo un llamado RequestSpotFleet
Obtenga el mejor precio
Encuentre la capacidad al mejor precio que funcione
para usted.
o
Diversifique sus recursos
Diversifique su flota. Crezca la disponibilidad
Y
Aplique distribución personalizada
Creé su propia unidad de capacidad basada en las
necesidades de su aplicación
13. Diversifique con EC2 Spot fleet
Múltiples instancias EC2 Spot
seleccionadas
Múltiples Zonas de Disponibilidad
seleccionadas
Escoja las intancias con
rendimiento y características
similares ej. c3.large, m3.large,
m4.large, r3.large, c4.large.
14. Sólo con un parámetro
adicional
Se ejecutan
continuamente
hasta por 6 horas
Ahorre hasta el 50%
del precio bajo
demanda
EC2 Spot Blocks
$1
15. ¿Qué hay en 6 horas?
~ 21% menos de 1 hora
~ 35% menos de 2 horas
~ 40% menos de 3 horas
En total al rededor de 50%
todas las instancias viven
menos de 6 horas
19. Para aplicaciones Web, en DynamoDB.
• Los datos se replican entre AZ.
También puede elegir otras bases de datos para mantener el
estado en su arquitectura.
• Amazon RDS en modo Multi-AZ
• Amazon Elasticache
¿Dónde almacenar el estado de la sesión?
20. Capitalizando la advertencia de los 2 minutos
Cuando el precio de mercado
excede el rpecio de puja, la
instancia recibirá una
advertencia dos minutos
antes
Verifique la notificación de los
dos minutos cada 5 segundos
aprovechando un script
invocado al momento de
lanzar la instancia
21. Script de ejemplo
1) Verifique la advertencia
de dos minutos
2) Si EXISTE, remueva la
instancia del ELB
3) SI NO, no haga nada
4) Espere 5 segundos
$ if curl -s http://169.254.169.254/latest/meta-
data/spot/termination-time |
grep -q .*T.*Z; then instance_id=$(curl -s
http://169.254.169.254/latest/meta-data/instance-id);
aws elb deregister-instances-from-load-balancer
--load-balancer-name my-load-balancer
--instances $instance_id;
/env/bin/flushsessiontoDBonterminationscript.sh; fi
22. Dos grupos de auto scaling
• Bajo demanda + Reservadas
para uso regular
• Agregue un grupo de auto
scaling adicional con Spot
Ambos grupos detrás del mismo
Elastic Load Balancer.
Use el bid advisor par seleccionar
el tiempo correcto de instancia
para su aplicación.
Si están usando grupos de autoscale
23. Arquitectura de aplicación Web con Spot
Elastic Load
Balancing
Stateless
Web Servers
Stateless
Web Servers
On Demand Auto
Scaling group
Session
State Data
Stateless Web
Servers (Spot)
Stateless Web
Servers (Spot)
Spot Auto
Scaling group
Availability Zone A
Availability Zone B
Stateless Web
Servers (Spot)
Stateless Web
Servers (Spot)
Spot Auto
Scaling group
24. Arquitectura de aplicación Web con Spot
Elastic Load
Balancing
Stateless
Web Servers
Stateless
Web Servers
On Demand Auto
Scaling group
Session
State Data
Stateless Web
Servers (Spot)
Stateless Web
Servers (Spot)
Spot Auto
Scaling group
Availability Zone A
Availability Zone B
ASG Bajo
demanda
ASG Spot
27. Nodos Core
Nodo
maestro
Master instance group
Hadoop cluster
Core instance group
HDF
S
HDF
S
Puede agregar
nodos core:
Más CPU
Más Memoria
Más espacio HDFS
HDF
S
28. Nodos para tareas – Oportunidad de Spot
Nodo
maestro
Hadoop cluster
HDF
S
HDF
S
Sin HDFS
Provee recusros de
cómputo:
CPU
Memoria
Core instance group Task instance group
29. Nodos para tareas – Múltiples tipos de instancias
Nodo
maestro
Hadoop cluster
HDF
S
HDF
S
Puede agregar y
quitar nodos para
tareas
c3.8xl, r3.8xl, r3.4xl, etc
La
oportunidad
Core instance group
30. ¿Pero qué acerca de HDFS?
Nodo
maestro
Hadoop cluster
HDF
S
HDF
S
Puede agregar y
quitar nodos para
tareas CORE Tareas
cc2.8xl, r3.8xl, d2.4xl, etc
Spot Blocks? EMR/S3?
31. • Sin necesidad de escalar
HDFS
– Capacidad
– Replicación para durabilidad
• Amazon S3 escala junto con
sus datos
– En IOPs y almacenamiento
– Masivamente parlelo
EMRFS - Amazon
S3 como HDFS
Spot blocks para HDFS
• Si el cluster de HDFS vive
por menos de 6 horas
32. Hadoop en EC2 Spot – aprendizajes
Su trabajo
Ejecute los nodos de tareas por separado con EC2 Spot fleet
Considere Spot blocks para nodos core/HDFS
Lo que EC2 Spot fleet hace por usted
Le ahorra dinero
Administración heterogénea de instancias
Escale en la unidad que le importa
Acelera los resultados (el tiempo es oro)
33. Aplicaciones orientadas a batch pueden aprovechar el procesamiento
bajo demanda usando EC2 Spot para ahorrar hasta el 90%:
Procesamiento batch con Amazon EC2 Spot
Simulación
Montecarlo
Modelado
molecular
Procesamiento
de video
Simulaciones de
alta energía
34. Procesamiento batch con Amazon EC2 Spot
On Demand Auto-
Scaling group
Output S3
bucket
Spot Auto-
Scaling group 2
Availability Zone A
Availability Zone B
Spot Auto-
Scaling group 1
Upload object
into input S3
bucket
Job SQS Queue
Auto Scaling groups will scale up based
on queue depth and scale down based on
CPU utilization CW metrics
Workers will
check for
jobs in the
queue
Workers will update Job status
(start time, SLA end time, etc)
in DynamoDB
Uploads to S3 will
trigger a Lamda
function to put jobs in
SQS and DynamoDB
EFS
EC2 instance
worker fleet
35. AWS Lambda es un servicio de cómputo que ejecuta su código en
respuesta de eventos y automáticamente administra los recursos de
cómputo por usted, volviendo muy fácil el construir aplicaciones que
respondan rápidamente a nueva información.
Amazon Simple Queue Service (SQS) es un servicio de colas de
mensajes rápido, confiable, escalable y completamente administrado
para desacoplar componentes.
Dependiendo de las necesidades de la aplicación, múltiples
colas de SQS se pueden requerir para funciones y prioridades.
Acerca de Lambda y SQS
36. ¿Más automatización?
Use una función Lambda para administrar de manera
dinámica grupos de Auto Scaling en base al mercado Spot.
• La función Lambda puede ivocar de manera periódica el API
de EC2 Spot para evaluar los precios del mercado y
disponibilidad, y responder creando nuevas Launch
Configurations y grupos de manera automática.
• Esta función también puede eliminar cualquier Grupo de Auto
Scaling con Spot y lanzar configuraciones que no tengan
instancias.
AWS Data Pipeline se puede usar para invocar la función
Lambda usando el AWS CLI en intervalos regulares
calendarizando pipelines
37. Arquitectura automatizada en batch con Spot
Worker
Worker
On Demand
Autoscaling group
Output S3
bucket
Worker (spot)
Worker(spot)
Spot Autoscaling
group 2
Availability Zone A
Availability Zone B
Worker(spot)
Worker (spot)
Spot Autoscaling
group 1
Upload object
into input S3
bucket
Job SQS Queue
AutoScaling groups will scale up
based on queue depth and scale
down based on CPU utilization
CW metrics
Workers will
check for
jobs in the
queue
Workers will update Job status
(start time, SLA end time, etc)
in DynamoDB
DataPipeline can invoke a Lambda
function in a scheduled manner
which can manage AutoScaling
groups based on the spot market
Uploads to S3 will
trigger a Lamda
function to put jobs in
DynamoDB and SQS EFS
Slide: AWS Purchase Models
As shown by the previous slide, it is possible to launch significant amounts of compute power for a low cost. Customer have several models available when using Amazon EC2.
- Cover the three pricing models on the slide
On demand is the easiest way to get started with AWS. No commitment, pay as you go.
Reserved instances provide a significant discount in exchange for a commitment to use the services for some period of time, either 1 or 3 years. Reserved instances also come with an actual capacity reservation, which can be important for large enterprises who need a high level of assurance that computing resources will be available when they are needed.
Spot instances are a unique and powerful pricing model, in particular for HPC. With Spot, customers can bid on unused AWS capacity and are often able to launch instances on the cloud for as little as 10% of the equivalent on-demand rate. The tradeoff for Spot is if other customers are willing to pay more than you for the same AWS instance type, or capacity of that type becomes constrained, your running jobs may be terminated without warning. Jobs running on Spot therefore need to be fault-tolerant, or able to be restarted again at a later time.
Amazon EC2 Spot instances are spare EC2 instances that you can bid on to run your cloud computing applications. Since Spot instances are often available at a lower price, you can significantly reduce the cost of running your applications, grow your application’s compute capacity and throughput for the same budget, and enable new types of cloud computing applications.
Now, when I say spare capacity it is important to understand
What spare capacity looks like at scale.
AWS has more than a million active customers in 190 countries.
Amazon EC2 instance usage has increased 93% YoY, comparing Q4 2014 and Q4 2013, not including Amazon use.
Amazon S3 holds trillions of objects and regularly peaks at millions of requests per second.
So with EC2 Spot the rules are actually really simple.
Rule 1: The Spot market is where price of compute fluctuations based on supply and demand.
Rule 2: You’ll never pay more than your bid, in fact you’ll only ever pay the market price. When the market price exceeds your bid you get 2 minutes to wrap up.
Market price is on average 85% lower than On-Demand prices
Lyft: Savings $15K per month with 4 lines of code. After using Spot in CICD Lyft recognized the stability of the platform, and the opportunity to leverage it as part of their Hadoop stack (run by Qubole) arose. They’ve since been able to shift more than a third of their Qubole managed Hadoop cluster onto EC2 Spot. Saving even further.
What is in a market.. This is one of the most important, and unfortunately misunderstood elements of how the spot market works. While we say Spot market there are actually hundreds of Spot markets available to all our customers. AWS has 11 (?) regions around the world, in each region there are multiple availability zones and multiple instance families and multiple instance sizes per family.. (START CLICK THROUGH and READ). E.g. c3. e.g. large, xlarge, 8xlarge, e.g. US-West-2a, US-West-2b, e.g. Dublin Region, Oregon Region, Sydney Region.
Now that we understand what a spot market is and that there are many I’ll explain how we acquire the capacity. I’m going to pick just one market to highlight this. There are two numbers you care about with Spot.
Bid price. Think of this as the cap, the maximum you’re willing to pay for a given instance per hour.
Market price. This is the price you pay. Market price is set by periodic auctions
The r3.4xlarge costs $1.4 under our On-Demand purchasing option.
See it in action via 3 bids. 25%, 50%, 75%. Single Zone.
25% you kept your instance for almost 7 days, being impacted during a few short periods. However, you only paid the market price which was 86% off, just less than 20c per hour during the last week, only 14% of the OD price.
At 50% you would have been interrupted just once, for a very short period of time during the sixth day. You’re average discount during the week is 85% just 21c per hour, paying just 15% of OD.
At 75% you would not once have been interrupted, achieving an average discount of 85% just 21c an hour, again paying just 15% of OD.
1st - Check out the Spot Bid Advisor, which we launched earlier this year to guide customers in finding the resources, discount and instance lifetime they need.
The bid advisor has helped many new customers discover what some already knew. That with deliberate instance pool selection it can be straight forward to begin using Spot.
Take this is a snap I took from the tool last week and it shows that even at a 50% max bid there many different Spot markets that would have gone uninterrupted for over a week, while they got an average discount over 80-90%!
Now you might realize, wouldn't it be great if I could automate using all the pools that suit my application? Lets not get ahead of ourselves. First we need to understand, what is a Spot market?
Hopefully many of you have come across the EC2 Spot fleet API. This one weird API makes it easy to:
Launch 1,2 or 3000 Spot instances with one API call
You can select whether you’d like to put your capacity into the single cheapest market,
Or opt to diversify to minimize the impact of any individual Spot market
Finally, by introducing Weights you can now scale based on the metric that matter most to you. It might be cores, memory, instances, latency.. It is your call.
1st - Check out the Spot Bid Advisor, which we launched earlier this year to guide customers in finding the resources, discount and instance lifetime they need.
The bid advisor has helped many new customers discover what some already knew. That with deliberate instance pool selection it can be straight forward to begin using Spot.
Take this is a snap I took from the tool last week and it shows that even at a 50% max bid there many different Spot markets that would have gone uninterrupted for over a week, while they got an average discount over 80-90%!
Now you might realize, wouldn't it be great if I could automate using all the pools that suit my application? Lets not get ahead of ourselves. First we need to understand, what is a Spot market?
We will first run through what the ‘best practices’ for EC2. While these are not necessary, they’re what the most sophisticated customers do to get high performance, high availability and low costs.
Standard practice
Stateless
Fault tolerant
Multi-AZ
SOA/Loosely coupled design
Spot Practice
Be instance flexible
This can mean c3.large, c3.xlarge,..r3.large
Or m3.large, r3.large, c3.large (ELB)
No seriously, your application can work with other instances (use example, drive this message home hard).
You use c3.xlarge and you can’t AT all use c3.2xlarge? Really? Really? Even if we give you 70% off for twice the c3.xlarge specs?
Some additional considerations I’ll cover briefly.
Options for shifting state off web/app servers
Load balancing a fault tolerant application with ELB
Capitalizing on the Two Minute Warning
I won’t spend long explaining how you can shift state away from your web servers as there are many different ways. While there are many options Cassandra, Redis, or traditional database technologies (MySQL) DynamoDB is a common choice for users to maximize performance, minimize management overhead while replicating across multiple availability zones. This last piece, using multiple AZ’s is why Spot customers love DynamoDB. You can use any AZ to launch your Spot nodes and achieve low latent performance to dynamoDB. Also because DynamoDB is a regional service there is no data transfer charge.
Cross-zone load balancing - Cross-zone load balancing reduces the need to maintain equivalent numbers of back-end instances in each Availability Zone, and improves your application's ability to handle the loss of one or more back-end instances. However, we still recommend that you maintain approximately equivalent numbers of instances in each Availability Zone for higher fault tolerance.
Because there is a 2 minute warning on Spot we recommend establishing a timeout of 90 seconds for connection draining.
Must manual attach Spot instances with user data in current iteration of fleet.
We’re already architected the application to be resilient to instance termination. However, while we might have minimize the impact of an instance termination we can use the two minute warning to take it a step further. As I mentioned we can capitalize on the two minute warning by detaching it from an ELB set to drain connections. To do that we recommend checking the instance meta data regularly, about every 5 seconds, for the two minute warning.. Then.
Here is a simple sample of what some customers will back into their AMI, or bootstrap actions. This small script checks for a instance termination notice (404 will be returned if you aren’t in the two minute warning) then detaches itself from the ELB if that two minute warning is active.
If you’re current using ASG and would like to scale your capacity using Spot you’re not alone. There are many reasons to use Spot as part of a broader ASG strategy including RI’s and OD. i.e. to scale to meet peak load, address a marginal cost vs benefit equation i.e. if you have 2 servers that strain under heavy load but it doesn’t warrant the cost of a 3rd running on-demand.
It is actually as easy as a tick box in the console during setup, but first we check the EC2 Spot Bid Advisor. Then selecting your bid price. I’ve selected the EU-West-1 region and place at bid at the on-demand price. You can see the 3 markets (one per AZ) for c3.large are approximately 83% cheaper than OD.x You can also copy another ASG config and simply add ‘spot’ to it.
Método para aproximar expresiones matemáticas complejas
Batch has long been in the wheelhouse for Spot usage. Customers have been using Spot in
Monte Carlo simulations in risk analytics for insurance and finserv (Ufora)
Molecular modeling (Novartis)
Media rendering Animation and FX rendering, and batch image processing pipeline (FinDesign)
High energy simulations (Brookhaven)
They’ve found it valuable to accelerate processing and results. To run simulations that are otherwise cost prohibitive. To train algorithms at the lowest possible price. To achieve the scale they need i.e. . For example, an engineer running electromagnetic simulations could run larger numbers of parametric sweeps than would otherwise be practical, by using very large numbers of Amazon EC2 Spot Instances (and/or OD instances), and using automation to launch independent and parallel simulation jobs.