Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Optimize Content Processing in the Cloud with GPU and Spot Instances
1. Optimize Content Processing in the Cloud
with GPU and Spot Instances
Chad Schmutzer | Solutions Architect – EC2 Spot
Amazon Web Services
2. What are we going to do today?
… build a transcoding pipeline with GPUs
… learn about EC2 Spot
… while saving up to 90% on your EC2 Bill
… using AWS CloudFormation, in about 10
minutes
3. On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
AWS EC2 Consumption Models
Reserved
Make a low, one-time
payment and receive
a significant discount
on the hourly charge
For committed
utilization
Spot
Bid for unused
capacity, charged at
a Spot Price which
fluctuates based on
supply and demand
For time-insensitive
or transient
workloads
4. Spare capacity at scale
AWS has more than a
million active customers
in 190 countries.
Amazon EC2 instance
usage has increased 93%
YoY, comparing Q4 2014
and Q4 2013, not
including Amazon use.
7. $0.27 $0.29$0.50
1b 1c1a
8XL
$0.30 $0.16$0.214XL
$0.07 $0.08$0.082XL
$0.05 $0.04$0.04XL
$0.01 $0.04$0.01L
C3
$1.76
On
Demand
$0.88
$0.44
$.22
$0.11
Show me the markets!
Each instance family
Each instance size
Each Availability Zone
In every region
Is a separate Spot Market
9. Amazon EC2 Spot – in the wild
1) We make this easy using the
Spot bid advisor
2) With deliberate pool
selection and bidding, you
will keep your Spot instance
as long as you need to.
3) And with new features like
Spot fleet diversified we do
the heavy lifting for you...
13. Why use Spot – customer examples
39 years of drug research re-processed, using over 80,000 cores, in 9
hours for $4,232
- Approximately 87,000 compute cores at peak
- Estimated 39 years of computational chemistry performed in 9 hours
- Three candidate compounds successfully identified
14. “By using AWS Spot instances, we've been able to save 75% a month
simply by changing four lines of code. It makes perfect sense for saving
money when you're running continuous integration workloads or
pipeline processing.” - Matthew Leventi, Lead Engineer, Lyft
Why use Spot – customer examples
15. The $9 Billion Experiment
Why use Spot – customer examples
16. Why use Spot – customer examples
Scaling up as many as 1000 Spot instances a day to handle real time ad
delivery
Petabyte-Scale Data Pipelines with Docker, Luigi and Elastic Spot
Instances
17. A large scale POC for animation rendering on AWS:
•Cloud Rendering at Walt Disney Animation Studios (available on SlideShare)
•Automated environment leveraging Spot Fleet
•Launched 40K cores in 20 min
at less than $0.02 per core-hour
Why use Spot – customer examples
19. Spot fleet helps you
Launch Thousands of Spot Instances
with one RequestSpotFleet call.
Get Best Price
Find the lowest priced horsepower that works for you.
or
Get Diversified Resources
Diversify your fleet. Grow your availability.
And
Apply Custom Weighting
Create your own capacity unit based on your application
needs
20. Diversification with EC2 Spot fleet
Multiple EC2 Spot instances
selected
Multiple Availability Zones
selected
Pick the instances with similar
performance characteristics e.g.
c3.large, m3.large, m4.large,
r3.large, c4.large.
21. Results - Grid
Requested 1000
vCores over 30 days
Minimum 960 vCores
Mode 1024 vCores
Average 1012 vCores
Average Price of $0.012
per vCore
Savings of over 80%
22. Walt Disney Animation Studios
Core Count
./aws_spot_fleet_request -p reinvent --cpu 8 --ram 64 -m 4.7 -c 1500
25. An easy to use interface that
lets you launch spare EC2
instances in seconds
Helps you select and bid on the
EC2 instances that meet your
applications requirements
Simple to use dashboard lets
you modify and manage your
application’s compute capacity
EC2 Spot Console
26. Using a single
additional Parameter
Run continuously
for up to 6 hours
Save up to 50% off
On-Demand pricing
EC2 Spot block
$1
27. What’s in 6 hours?
~ 21% less than 1 hour
~ 35% less than 2 hours
~ 40% less than 3 hours
In total roughly 50% of all
instances live less than 6
hours
28. Capitalizing on two minute warning
When the Spot price exceeds
your bid price, the instance will
receive a two-minute warning
Check for the 2 minute spot
instance termination
notification every 5 seconds
leveraging a script invoked at
instance launch
29. Sample script – two minutes left!
1) Check for 2 minute
warning
2) If YES, detach instance
from ELB
3) OTHERWISE, do nothing
4) Sleep for 5 seconds
$ if curl -s http://169.254.169.254/latest/meta-
data/spot/termination-time |
grep -q .*T.*Z; then instance_id=$(curl -s
http://169.254.169.254/latest/meta-data/instance-id);
aws elb deregister-instances-from-load-balancer
--load-balancer-name my-load-balancer
--instances $instance_id;
/env/bin/flushsessiontoDBonterminationscript.sh; fi
31. Batch oriented applications can leverage on-demand
processing using EC2 Spot to save up to 90% cost:
Batch Processing with Amazon EC2 Spot
Monte Carlo
simulation
Molecular
modeling
Media
processing
High energy
simulations
33. EC2 Spot fleet to setup a
heterogeneous, scalable “grid”
of EC2 spot instances with
multiple capacity pools as
worker nodes
Scaling to 50,000 cores
EC2 Spot blocks for less
flexible jobs that must run
continuously.
35. Disney Animation Renderfarm
Renderfarm
Avere FXT
cluster
WDAS Data Center
Renderfarm
Avere FXT
cluster
Storage
Remote Data Center
Renderfarm
Avere FXT
cluster
Remote Data Center
San Francisco
Los Angeles
Burbank
Artists
Redundant 10Gb
Redundant10G
b
36. Disney Animation Renderfarm
Renderfarm
Avere FXT
cluster
WDAS Data Center
Renderfarm
Avere FXT
cluster
Storage
Remote Data Center
Renderfarm
Avere FXT
cluster
Remote Data Center
San Francisco
Los Angeles
Burbank
Artists
Redundant 10Gb
Redundant10Gb
virtual private cloud
Avere vFXT
Oregon
Spot Instances
10Gb Primary, 1Gb backup
EFS
37. Mez
Archival
Backup Origin
Primary Origin
G2
G2
Ingest Bucket S3 Events SQS Queue
Source Encoder
SPOT or On-Demand
Edge Cache Fleet
Failover
ALB CloudFront Viewers
Diversified SPOT Fleet
G2
M4
Egress Bucket
Live/VOD 360 OTT on EC2 Spot
Direct
Connect
Multi
Tenancy
Multi
CDN
ContainerEncoding
Full OTTCMS / DRM
GPU / CPU
38. Ingest Store Transform Process
PUSH OR PULL
MEZ, LIVE & VOD
CREATE A CENTRALIZED
CONTENT LAKE ON S3
MEDIA DELIVERY AND/OR
HANDS-ON POSTPRODUCTION
SCALE OUT ON ELASTIC
CAPACITY FOR ALL PROCESSING
Content production and post-production companies are leveraging AWS to accelerate and streamline creative,
editing, compositing and streaming delivery workloads with highly scalable cloud computing and storage.
Media Pipeline
Slide: AWS Purchase Models
As shown by the previous slide, it is possible to launch significant amounts of compute power for a low cost. Customer have several models available when using Amazon EC2.
- Cover the three pricing models on the slide
On demand is the easiest way to get started with AWS. No commitment, pay as you go.
Reserved instances provide a significant discount in exchange for a commitment to use the services for some period of time, either 1 or 3 years. Reserved instances also come with an actual capacity reservation, which can be important for large enterprises who need a high level of assurance that computing resources will be available when they are needed.
Spot instances are a unique and powerful pricing model, in particular for HPC. With Spot, customers can bid on unused AWS capacity and are often able to launch instances on the cloud for as little as 10% of the equivalent on-demand rate. The tradeoff for Spot is if other customers are willing to pay more than you for the same AWS instance type, or capacity of that type becomes constrained, your running jobs may be terminated without warning. Jobs running on Spot therefore need to be fault-tolerant, or able to be restarted again at a later time.
What spare capacity looks like at scale.
AWS has more than a million active customers in 190 countries.
Amazon EC2 instance usage has increased 93% YoY, comparing Q4 2014 and Q4 2013, not including Amazon use.
Amazon S3 holds trillions of objects and regularly peaks at millions of requests per second.
So with EC2 Spot the rules are actually really simple.
Rule 1: The Spot market is where price of compute fluctuations based on supply and demand.
Rule 2: You’ll never pay more than your bid, in fact you’ll only ever pay the market price. When the market price exceeds your bid you get 2 minutes to wrap up.
Market price is on average 85% lower than On-Demand prices
What is in a market.. This is one of the most important, and unfortunately misunderstood elements of how the spot market works. While we say Spot market there are actually hundreds of Spot markets available to all our customers. AWS has 11 (?) regions around the world, in each region there are multiple availability zones and multiple instance families and multiple instance sizes per family.. (START CLICK THROUGH and READ). E.g. c3. e.g. large, xlarge, 8xlarge, e.g. US-West-2a, US-West-2b, e.g. Dublin Region, Oregon Region, Sydney Region.
Now that we understand what a spot market is and that there are many I’ll explain how we acquire the capacity. I’m going to pick just one market to highlight this. There are two numbers you care about with Spot.
Bid price. Think of this as the cap, the maximum you’re willing to pay for a given instance per hour.
Market price. This is the price you pay. Market price is set by periodic auctions
The r3.4xlarge costs $1.4 under our On-Demand purchasing option.
See it in action via 3 bids. 25%, 50%, 75%. Single Zone.
25% you kept your instance for almost 7 days, being impacted during a few short periods. However, you only paid the market price which was 86% off, just less than 20c per hour during the last week, only 14% of the OD price.
At 50% you would have been interrupted just once, for a very short period of time during the sixth day. You’re average discount during the week is 85% just 21c per hour, paying just 15% of OD.
At 75% you would not once have been interrupted, achieving an average discount of 85% just 21c an hour, again paying just 15% of OD.
1st - Check out the Spot Bid Advisor, which we launched earlier this year to guide customers in finding the resources, discount and instance lifetime they need.
The bid advisor has helped many new customers discover what some already knew. That with deliberate instance pool selection it can be straight forward to begin using Spot.
Take this is a snap I took from the tool last week and it shows that even at a 50% max bid there many different Spot markets that would have gone uninterrupted for over a week, while they got an average discount over 80-90%!
Now you might realize, wouldn't it be great if I could automate using all the pools that suit my application? Lets not get ahead of ourselves. First we need to understand, what is a Spot market?
1st - Check out the Spot Bid Advisor, which we launched earlier this year to guide customers in finding the resources, discount and instance lifetime they need.
The bid advisor has helped many new customers discover what some already knew. That with deliberate instance pool selection it can be straight forward to begin using Spot.
Take this is a snap I took from the tool last week and it shows that even at a 50% max bid there many different Spot markets that would have gone uninterrupted for over a week, while they got an average discount over 80-90%!
Now you might realize, wouldn't it be great if I could automate using all the pools that suit my application? Lets not get ahead of ourselves. First we need to understand, what is a Spot market?
We will first run through what the ‘best practices’ for EC2. While these are not necessary, they’re what the most sophisticated customers do to get high performance, high availability and low costs.
Standard practice
Stateless
Fault tolerant
Multi-AZ
SOA/Loosely coupled design
Spot Practice
Be instance flexible
This can mean c3.large, c3.xlarge,..r3.large
Or m3.large, r3.large, c3.large (ELB)
No seriously, your application can work with other instances (use example, drive this message home hard).
You use c3.xlarge and you can’t AT all use c3.2xlarge? Really? Really? Even if we give you 70% off for twice the c3.xlarge specs?
Lyft: Savings $15K per month with 4 lines of code. After using Spot in CICD Lyft recognized the stability of the platform, and the opportunity to leverage it as part of their Hadoop stack (run by Qubole) arose. They’ve since been able to shift more than a third of their Qubole managed Hadoop cluster onto EC2 Spot. Saving even further.
Brookhaven Labs, ATLAS experiment needed instances to live for as much as 24 hours in order to add value. Some software simply cannot check point. They needed the equivalent of 50,000 physical cores to meet the 1500 scientific researchers demand for resources. It takes a trillion proton collisions in the collider to produce evidence of a single Higgs boson particle’s decay. Over 5 days, less than 1% of instances were terminated, leaving them with a significant margin of safety. Instead of building a 50,000 core data center they were able to successfully use AWS Spot for 5 days and pay just $45,000.
ATLAS - The experiment is designed to take advantage of the unprecedented energy available at the LHC and observe phenomena that involve highly massive particles which were not observable using earlier lower-energy accelerators. It is hoped that it will shed light on new theories of particle physics beyond the Standard Model.
Spot is a powerful economic reward for fault tolerant, cloud first architectures. How powerful? Examples
Novartis: 39 years of drug research re-processed, using over 80,000 cores, in 9 hours for $4,232.
Lyft: Savings $15K per month with 4 lines of code
Adroll: Petabyte-Scale Data Pipelines with Docker, Luigi and Elastic Spot Instances
Hopefully many of you have come across the EC2 Spot fleet API. This one weird API makes it easy to:
Launch 1,2 or 3000 Spot instances with one API call
You can select whether you’d like to put your capacity into the single cheapest market,
Or opt to diversify to minimize the impact of any individual Spot market
Finally, by introducing Weights you can now scale based on the metric that matter most to you. It might be cores, memory, instances, latency.. It is your call.
Why does it have to be different to a normal ASG? As we’ve discussed there are multiple independent markets available in Spot. These markets are NOT correlated. Customers have for a long time followed a diversification strategy for time sensitive, mission critical workloads. With fleet you can scale. With Spot fleet we’ve made it easy. E.g. if you can use the 5 instances above across 2 availability zones we know that any one price fluctuation will only impact 1/8 of our capacity or 12.5%. Much like the index fund.
1000 vCores, at an average saving of 80% off On-Demand. While some capacity fluctuated we had our desired capacity of 1000 for over 98% of the time. During the 30 days we were never more than 4%, or 40 cores below our desired capacity while maintaining an average of 1012 cores.
Instances used - c3.2xlarge c3.4xlarge c3.8xlarge cc2.8xlarge cr1.8xlarge r3.2xlarge r3.4xlarge r3.8xlarge in All AZs
Just 5 months ago we launched fleet and have continued the AWS trend of rapid innovation based on customer feedback. We’ve launched 4 major features to spot fleet over the last 5 months and we’re nowhere near finished. We’ve also made it so easy!
We’re already architected the application to be resilient to instance termination. However, while we might have minimize the impact of an instance termination we can use the two minute warning to take it a step further. As I mentioned we can capitalize on the two minute warning by detaching it from an ELB set to drain connections. To do that we recommend checking the instance meta data regularly, about every 5 seconds, for the two minute warning.. Then.
Here is a simple sample of what some customers will back into their AMI, or bootstrap actions. This small script checks for a instance termination notice (404 will be returned if you aren’t in the two minute warning) then detaches itself from the ELB if that two minute warning is active.
Batch has long been in the wheelhouse for Spot usage. Customers have been using Spot in
Monte Carlo simulations in risk analytics for insurance and finserv (Ufora)
Molecular modeling (Novartis)
Media rendering Animation and FX rendering, and batch image processing pipeline (FinDesign)
High energy simulations (Brookhaven)
They’ve found it valuable to accelerate processing and results. To run simulations that are otherwise cost prohibitive. To train algorithms at the lowest possible price. To achieve the scale they need i.e. . For example, an engineer running electromagnetic simulations could run larger numbers of parametric sweeps than would otherwise be practical, by using very large numbers of Amazon EC2 Spot Instances (and/or OD instances), and using automation to launch independent and parallel simulation jobs.
There are numerous batch oriented applications in place today that can leverage this style of on-demand processing, including claims processing, large scale transformation, media processing and multi-part data processing work. There are many different architectures for Batch processing architecture because while components here are certainly useful as a guide there are lots of different approaches here. However at a high level there are some common methods using AWS services
- these are common processes for content production across pp/p/finishing/etc
Some additional considerations I’ll cover briefly.
Options for shifting state off web/app servers
Load balancing a fault tolerant application with ELB
Capitalizing on the Two Minute Warning
Some additional considerations I’ll cover briefly.
Options for shifting state off web/app servers
Load balancing a fault tolerant application with ELB
Capitalizing on the Two Minute Warning
I mentioned Novartis at the beginning who back in2013, ran a project that involved virtually screening 10 million compounds against a common cancer target in less than a week. They calculated that it would take 50,000 cores and close to a $40 million investment if they wanted to run the experiment internally. Partnering with Cycle Computing and Amazon Web Services (AWS), Novartis built a platform leveraging Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (Amazon EBS), and four Availability Zones. The project ran across 10,600 Spot Instances (approximately 87,000 compute cores) and allowed Novartis to conduct 39 years of computational chemistry in 9 hours for a cost of $4,232. Out of the 10 million compounds screened, three were successfully identified.
Schrodinger in their quest for computational chemistry for better solar power stood up a 156,314 core cluster. The estimated computation time to process the 205,000 organic compounds was 264 years, but was completed in 18 hours. They achieved 1.21 petaFLOPS (Rpeak) for just $33,000 or 16¢ per molecule.