We’ll briefly touch on each of the technologies we’re using in today’s workshop and why they’re important. We’ll keep it brief, so you can focus
Flexible:
Supports both imperative and symbolic programming
Portable:
Runs on CPUs or GPUs, on clusters, servers, desktops, or mobile phones
Multiple Languages:
Supports multiple languages, including C++, Python, R, Scala, Julia, Matlab and Javascript - All with the same amazing performance.
Auto-Differentiation:
Calculates the gradient automatically for training a model
Distributed on Cloud:
Supports distributed training on multiple CPU/GPU machines, including AWS, GCE, Azure, and Yarn clusters
Performance:
Optimized C++ backend engine parallelizes both I/O and computation
Amazon EC2 Container Service is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run distributed applications on a managed cluster of Amazon EC2 instances. Amazon EC2 Container Service lets you launch and stop container-enabled applications with simple API calls, allows you to query the state of your cluster from a centralized service, and gives you access to many familiar Amazon EC2 features like security groups, EBS volumes and IAM roles. You can use EC2 Container Service to schedule the placement of containers across your cluster based on your resource needs, isolation policies, and availability requirements. Amazon EC2 Container Service eliminates the need for you to operate your own cluster management and configuration management systems or worry about scaling your management infrastructure.
Cluster Management Made Easy:
Amazon EC2 Container Service allows you to make containers a foundational building block for your applications. The service eliminates the need for you to run and manage a cluster manager or configuration management system by providing you programmatic access to the full state of your cluster and allowing you to schedule containers based on your application requirements.
Flexible Scheduling:
Amazon EC2 Container Service schedules containers to help find a balance between your resource needs and availability requirements. The service provides you complete cluster state information allowing you to integrate your own custom scheduler as well as open source schedulers to meet your specific business and application requirements.
Blog post about Mesos integration with ECS
https://aws.amazon.com/blogs/compute/cluster-management-with-amazon-ecs/
Integrated and extensible:
Amazon EC2 Container Service can easily be integrated or extended through simple APIs. Amazon EC2 Container Service provides complete visibility and control into your AWS resources, so you can easily integrate and use your own container scheduler or connect EC2 Container Service into your existing software delivery process (e.g., continuous integration and delivery systems). List of partners that provides integration with ECS/ECR: Weave (Service Discovery/Monitoring); Datadog and Ruxit (Monitoring); CircleCI, Cloudbees, Codeship, Shippable, Solano Labs, Wercker (Continuous Integration/Deployment), Twistlock (Security), Convox (PaaS).
Security:
Amazon EC2 Container Service launches your containers on your own EC2 instances, so that you do not share resources with other customers, places your clusters in a VPC, and allows you to use your own VPC security groups and network ACLs. These features provide you a high level of isolation and help you use EC2 Container Service to build secure and reliable applications.
You own the EC2 instances that your containers run in.
Amazon EC2 works in conjunction with Amazon VPC to provide security and robust networking functionality for your compute resources.
Your compute instances are located in a Virtual Private Cloud (VPC) with an IP range that you specify. You decide which instances are exposed to the Internet and which remain private.
Security Groups and networks ACLs allow you to control inbound and outbound network access to and from your instances.
You can connect your existing IT infrastructure to resources in your VPC using industry-standard encrypted IPsec VPN connections.
You can provision your EC2 resources as Dedicated Instances. Dedicated Instances are Amazon EC2 Instances that run on hardware dedicated to a single customer for additional isolation.
Performance at Scale:
Amazon EC2 Container Service is built on technology developed from many years of experience running highly scalable services. Using EC2 Container Service you can launch clusters with thousands of instances and schedule tens of thousands of containers in seconds.
Summing everything, what ECS allows is the reduction on the amount of code you need to go from idea to implementation when building distributed systems.
So, rather than having Mesos or other cluster management software having to manage a set of machines directly, ECS manages your instances.
Much of the undifferentiated heavy lifting and housekeeping has been abstracted behind a set of APIs.
The ability to run multiple tasks on a shared pool of resources can also lead to higher utilization and faster task completion than if compute resources are statically partitioned.
Common use cases include running applications and services or batch processes
CICD: Docker is a great image versioning system. Code on laptop is the same as in the test & prod environments. You can easily spin up large testing environments on shared servers. Developers can share app containers with operations team and the operations team can share base images (OS + security patches) and utility containers (logging, Redis, MySQL, etc. ) with the developers. This allows easy configuration and easy deployments.
Containers also provide great application isolation. You can use containers for microservices by breaking apart your application into independent processes. For example, you can have separate containers for your webserver, application server, and message queue. Containers are ideal for running single tasks or processes, so you can use containers as the base unit for a task when scaling up and scaling down. Each task of your application can be upgraded independent of the other tasks because there are no library conflicts
The speed of containers and the ability to deploy them in mixed environments allow them to also be easily used to run ETL and batch jobs. You can schedule batch jobs when compute resources free up on your cluster.
ECS enables you to run your applications, services, and batch jobs at scale
This slide lists the key benefits of ECR. The following slides in the presentation go into more detail on each benefit.
Fully Managed:
Amazon EC2 Container Registry eliminates the need to operate and scale the infrastructure required to power your container registry. There is no software to install and manage or infrastructure to scale. Just push your container images to Amazon ECR and pull the images when you need to deploy.
No registry software to install and manage
Hundreds of concurrent pulls
Secure
Amazon EC2 Container Registry transfers your container images over HTTPS and automatically encrypts your images at rest. You can configure policies to manage permissions and control access to your images using AWS Identity and Access Management (IAM) users and roles without having to manage credentials directly on your EC2 instances.
IAM resource-based policies
Transfer via HTTPS
Image encryption at rest
Highly Available:
Amazon EC2 Container Registry has a highly scalable, redundant, and durable architecture. Your container images are highly available and accessible, allowing you to reliably deploy new containers for your applications.
Backed up by Amazon S3
Simplified Workflow
Amazon EC2 Container Registry integrates with Amazon ECS and the Docker CLI, allowing you to simplify your development and production workflows. You can easily push your container images to Amazon ECR using the Docker CLI from your development machine, and Amazon ECS can pull them directly for production deployments.
Tight integration with Amazon ECS
Use Docker CLI commands (e.g., pull, push, list, tag)
AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion.
You can use AWS CloudFormation’s sample templates or create your own templates to describe the AWS resources, and any associated dependencies or runtime parameters, required to run your application. You don’t need to figure out the order for provisioning AWS services or the subtleties of making those dependencies work. CloudFormation takes care of this for you. After the AWS resources are deployed, you can modify and update them in a controlled and predictable way, in effect applying version control to your AWS infrastructure the same way you do with your software.
You can deploy and update a template and its associated collection of resources (called a stack) by using the AWS Management Console, AWS Command Line Interface, or APIs. CloudFormation is available at no additional charge, and you pay only for the AWS resources needed to run your applications.
So if you look at the components behind Cloudformation. It's starts off with a template.
This is the JSON/YAML formatted script file, that deals with things like parameter definition that drive a user driven template, such as name of my databases.
It deals with the resource creation, so the creation of AWS components such as EC2 instances or RDS databases.
And it deals with the configuration actions I wish to apply against this resources, so it might be install software or might be creating an SQS queue for example.
Than that template is deployed into the cloud formation framework. And the framework deals what we call Stack creation, updates and any error detection and rollback required in the creation of a stack.
So a stack is collection of resources that you want to manage together. And the resulting artifact is what we call a Stack of configured AWS services. So this could be in an Elastic Load Balancer and Autosclaing group with EC2 instances and an RDS database.
So the stack is service event aware, the stack creation actions or the changing of that environment can be feed back into Cloudfomration and trigger actions within the CloudFormation tempalte.
And it is also customizable, so once you created a stack you can of course access the underlying resources and change them of modify them as you so which.
Now the error detection and rollback is an interesting point. If at any time in the stack creation a problem is detected, the default behaviour of Cloudformation is to roll-back the creation of all resources and put you back in a constitent known state. So you know if your stack is working or is rolled back and is not.
Developers/DevOps teams value CloudFormation for its ability to treat infrastructure as code, allowing them to apply software engineering principles, such as SOA, revision control, code reviews, integration testing to infrastructure.
IT Admins and MSPs value CloudFormation as a platform to enable standardization, managed consumption, and role-specialization.
ISVs value CloudFormation for its ability to support scaling out of multi-tenant SaaS products by quickly replicating or updating stacks. ISVs also value CloudFormation as a way to package and deploy their software in their customer accounts on AWS.
Slide: AWS Purchase Models
As shown by the previous slide, it is possible to launch significant amounts of compute power for a low cost. Customer have several models available when using Amazon EC2.
- Cover the three pricing models on the slide
On demand is the easiest way to get started with AWS. No commitment, pay as you go.
Reserved instances provide a significant discount in exchange for a commitment to use the services for some period of time, either 1 or 3 years. Reserved instances also come with an actual capacity reservation, which can be important for large enterprises who need a high level of assurance that computing resources will be available when they are needed.
Spot instances are a unique and powerful pricing model, in particular for HPC. With Spot, customers can bid on unused AWS capacity and are often able to launch instances on the cloud for as little as 10% of the equivalent on-demand rate. The tradeoff for Spot is if other customers are willing to pay more than you for the same AWS instance type, or capacity of that type becomes constrained, your running jobs may be terminated without warning. Jobs running on Spot therefore need to be fault-tolerant, or able to be restarted again at a later time.
So with EC2 Spot the rules are actually really simple.
Rule 1: The Spot market is where price of compute fluctuations based on supply and demand.
Rule 2: You’ll never pay more than your bid, in fact you’ll only ever pay the market price. When the market price exceeds your bid you get 2 minutes to wrap up.
Market price is on average 85% lower than On-Demand prices
What is in a market.. This is one of the most important, and unfortunately misunderstood elements of how the spot market works. While we say Spot market there are actually hundreds of Spot markets available to all our customers. AWS has 11 (?) regions around the world, in each region there are multiple availability zones and multiple instance families and multiple instance sizes per family.. (START CLICK THROUGH and READ). E.g. c3. e.g. large, xlarge, 8xlarge, e.g. US-West-2a, US-West-2b, e.g. Dublin Region, Oregon Region, Sydney Region.
Hopefully many of you have come across the EC2 Spot fleet API. This one weird API makes it easy to:
Launch 1,2 or 3000 Spot instances with one API call
You can select whether you’d like to put your capacity into the single cheapest market,
Or opt to diversify to minimize the impact of any individual Spot market
Finally, by introducing Weights you can now scale based on the metric that matter most to you. It might be cores, memory, instances, latency.. It is your call.
Why does it have to be different to a normal ASG? As we’ve discussed there are multiple independent markets available in Spot. These markets are NOT correlated. Customers have for a long time followed a diversification strategy for time sensitive, mission critical workloads. With fleet you can scale. With Spot fleet we’ve made it easy. E.g. if you can use the 5 instances above across 2 availability zones we know that any one price fluctuation will only impact 1/8 of our capacity or 12.5%. Much like the index fund.
Here’s what you’ll be working with today. We’re going to take you on a journey starting with the MXNet library on github. You’ll containerize MXNet and enable your users to interact with MXNet in a variety of ways, so you can cater to different teams, skillsets, levels of access. For example, you might want to give your data science team access to Juypter, which is a web-based interactive notebook that allows users to step through code. You’ll experiment with this in Lab 4. Devs with a higher level of access might want to attach to running containers to develop and run scripts on a running MXNet container. Or perhaps you want to bundle MXNet tasks and use the ECS RunTask capability to further abstract user access. But first…next slide
Let’s set up the foundation which we’ll use CloudFormation to help with this. Next slide…
After the image has been created, you could run the image locally on the instance to test. Instead…next slide
We’ll use ECS to deploy the container to a running host. Once that’s done…next slide
You can now run through examples of training and prediction.
Now that everything is working, you’ll wrap the image classfication training and prediction in an ECS Task that you can call with RunTask.