This document provides an overview and agenda for a workshop on deploying a deep learning framework on Amazon ECS and Spot Instances. The workshop will:
- Introduce MXNet, an open source deep learning framework, and how it can be used to define, train, and deploy neural networks.
- Discuss containers and how they can increase infrastructure utilization and make it easy to deploy diverse applications on shared hardware.
- Provide an overview of Amazon ECS for managing Docker containers, Amazon ECR for storing container images, and Spot Instances for running containers on unused EC2 capacity.
- Include hands-on labs to set up the environment, build an MXNet Docker image,
2. What to Expect From This Workshop
• Introduce MXNet
• Containers
• Overview of Amazon ECS + Amazon ECR
• Overview of AWS CloudFormation
• Overview of Amazon EC2 Spot Fleet / Spot Instances
• Hands-on workshop
3. • Open source, deep learning framework -
https://github.com/dmlc/mxnet
• Define, train, and deploy deep neural
networks
• Highly scalable – single/multiple hosts,
CPU/GPU support
• Support for multiple languages
What Is MXNet?
OutputInput
4. Why Containers?
• Increase infrastructure utilization
• Environment isolation and fidelity
• Run diverse applications on shared hardware
• Changes are tracked
• Easy to deploy
6. Amazon ECS Benefits
Cluster management
made easy
Flexible scheduling Integrated and
extensible
Security Performance at scale
7. Amazon ECS Architecture
Docker
Task
Container instance
Amazon
ECS
Container
ECS agent
ELB
Internet
ELB
User /
Scheduler
API
Cluster Management Engine
Task
Container
Docker
Task
Container instance
Container
ECS agent
Task
Container
Docker
Task
Container instance
Container
ECS agent
Task
Container
AZ 1 AZ 2
Key/Value Store
Agent Communication Service
8. What Is Amazon ECR?
Amazon EC2 Container Registry (Amazon ECR) is a fully-
managed Docker container registry that makes it easy for
developers to store, manage, and deploy Docker container
images.
Amazon ECR is integrated with Amazon EC2 Container
Service (Amazon ECS), simplifying your development to
production workflow.
Learn more: https://aws.amazon.com/ecr/
9. How Does Amazon ECS Use Amazon ECR?
Amazon
ECR
Subnet
Amazon
ECS
Spot Instance
14. Why Do Customers Use CloudFormation?
Developers/DevOps teams value CloudFormation for its ability to treat
infrastructure as code, allowing them to apply software engineering principles,
such as SOA, revision control, code reviews, integration testing to
infrastructure.
IT Admins and MSPs value CloudFormation as a platform to enable
standardization, managed consumption, and role-specialization.
ISVs value CloudFormation for its ability to support scaling out of multi-tenant
SaaS products by quickly replicating or updating stacks. ISVs also value
CloudFormation as a way to package and deploy their software in their
customer accounts on AWS.
16. Choosing the Right Amazon EC2 Instance
EC2 Instance types are optimized for different use cases & come in
multiple sizes. This allows you to optimally scale resources to your
workload requirements.
AWS utilizes Intel® Xeon® processors for EC2 Instances providing
customers with high performance and value.
Consider the following when choosing your instances: Core count,
Memory size, Storage size & type, Network performance, & CPU
technologies.
Hurry Up & Go Idle - A larger compute instance can save you time and
money, therefore paying more per hour for a shorter amount of time
can be less expensive.
17. Get the Intel® Advantage
Intel’s latest 22nm Haswell microarchitecture on new C4 instances,
with custom Intel® Xeon® v3 processors, provides new features:
Haswell microarchitecture has better branch prediction; greater
efficiency at prefetching instructions and data; along with other
improvements that can boost existing applications’ performance by
30% or more.
P state and C state control provides the ability to individually tune each
cores performance and sleep states to improve application
performance.
Intel® AVX2.0 instructions can double the floating-point performance for
compute-intensive workloads over Intel® AVX, and provide additional
instructions useful for compression and encryption.
18. Intel® Processor Technologies
Intel® AVX – Get dramatically better performance for highly
parallel HPC workloads such as life science engineering, data
mining, financial analysis, or other technical computing
applications. AVX also enhances image, video, and audio
processing.
Intel® AES-NI – Enhance your security with these new
encryption instructions that reduce the performance penalty
associated with encrypting/decrypting data.
Intel® Turbo Boost Technology – Get more computing power
when you need it with performance that adapts to spikes in your
workload with Intel® Turbo Boost Technology 2.0
21. On-Demand
Pay for compute
capacity by the hour
with no long-term
commitments
For spiky workloads,
or to define needs
Amazon EC2 Consumption Models
Reserved
Make a low, one-time
payment and receive
a significant discount
on the hourly charge
For committed
utilization
Spot
Bid for unused
capacity, charged at a
Spot Price that
fluctuates based on
supply and demand
For time-insensitive
or transient
workloads
22. With Spot, the Rules Are Simple
Markets where the price of
compute changes based on
supply and demand.
You’ll never pay more than your
bid. When the market exceeds
your bid, you get 2 minutes to
wrap up your work.
23. $0.27 $0.29$0.50
1b 1c1a
8XL
$0.30 $0.16$0.214XL
$0.07 $0.08$0.082XL
$0.05 $0.04$0.04XL
$0.01 $0.04$0.01L
C4
$1.76
On
Demand
$0.88
$0.44
$.22
$0.11
Show Me the Markets!
Each instance family
Each instance size
Each Availability Zone
In every region
Is a separate Spot Market
25. Spot Fleet Helps You…
Launch thousands of Spot Instances
with one single API call
Get the best price
Find the lowest-priced horsepower that works for you
Or
Get diversified resources
Diversify your fleet and grow your availability
Apply custom weighting
Create your own capacity unit based on your application
needs
26. Diversification with Amazon EC2 Spot Fleet
Multiple Spot Instances selected
Multiple Availability Zones
selected
Pick the instances with similar
performance characteristics, e.g.,
c3.large, m3.large, m4.large,
r3.large, c4.large
28. Overall Architecture
Amazon
S3
AWS CLI
Public subnet – AZ #1 Public subnet – AZ #2
Amazon
ECS
Spot Instance Spot Instance
Spot Fleet
>_ SSH
Amazon
ECR
AWS Management
Console
RunTask
29. Amazon
S3
Public subnet – AZ #1 Public subnet – AZ #2
Amazon
ECS
Spot Instance Spot Instance
Spot Fleet
Amazon
ECR
AWS
CloudFormation
Lab 1: Set up the Workshop Environment
36. Related Sessions
• Deep Dive into Apache MXNet on AWS (BDA401)
• An Introduction to Amazon Rekognition, for Deep
Learning-based Computer Vision (BDA301)
• Getting the Most Bang for Your Buck with #EC2
#Winning (SRV301)
• Save 90% on Your Containerized Workload (DEM07)