This webinar will provide an overview of the AWS High Performance Computing (HPC) tool CfnCluster. We will cover the basics of what CfnCluster is and how it can help with the migration of traditional HPC applications to the cloud. The webinar will also provide guidance on how to install and configure CfnCluster in a way that will allow you to scale to thousands of cores in just a few minutes on AWS.
2. Webinar Highlights
• What is CfnCluster and when to use it
• Architecture guidance to fit your
security models
• How to install and configure of
CfnCluster
• Demo: Review of CfnCluster and
managing compute at scale
3. Introduction to CfnCluster
• AWS CloudFormation + Cluster = CfnCluster
• Simple to install, easy to manage
• Everything you need to get a cluster up and running in
minutes
• Head node with scheduler
• Shared NFS Storage
• /home
• /shared
• OpenMPI
• Compute nodes that grow and shrink on demand
4. Workloads Well Suited for CfnCluster
• Computational Fluid Dynamics
• Semiconductor Design
• Weather Modeling
• Genomics and Molecular Simulation
• Seismic and reservoir simulations
• 3D rendering and visualizations
• … anything that uses a traditional HPC scheduler
5. Cluster HPC and Grid HPC
Cluster HPC
Tightly coupled,
latency sensitive
applications
Use larger EC2
compute instances,
placement groups,
Enhanced Networking
Grid HPC
Loosely coupled,
pleasingly parallel.
Requires very little
node to node
interaction.
Grids of Clusters
Use a grid strategy on the cloud
to run a group of parallel,
individually clustered HPC jobs
6. Computational Fluid Dynamics
ANSYS Fluent
• AWS c4.8xlarge
• 140M cells
• F1 car CFD benchmark
http://www.ansys-blog.com/simulation-on-the-cloud/
9. Many AWS services to tie it all together
• CloudFormation manages the state of the cluster
• Amazon CloudWatch & Auto Scaling lets compute fleet
grow and shrink on demand
• Amazon SQS & Amazon SNS allows compute nodes to
signal to master when they’re online
• AWS Identity and Access Management (IAM) allows for
fine grained access control
• Amazon S3 for storage of CloudFormation templates
11. Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
Private Subnet
Master Server
Auto Scaling
Compute Fleet
CloudFormation
Public Subnet
VPC NAT
gateway
Private Subnet Route Table
VPC Traffic -> Local
0.0.0.0 -> Nat Gateway
Public Subnet Route Table
VPC Traffic -> Local
0.0.0.0 -> Internet Gateway
Isolated CfnCluster
Bastian Server
12. Amazon S3
DynamoDB
Amazon SQS
CloudWatch
Internet
Gateway
(IGW)
Private Subnet
Master Server
Auto Scaling
Compute Fleet
CloudFormation
Public Subnet
VPC NAT
gateway
Corporate Data Center
Engineer VPN Connection
Private Subnet Route Table
VPC Traffic -> Local
Corp IP Range -> VPN
0.0.0.0 -> Nat Gateway
Public Subnet Route Table
VPC Traffic -> Local
Corp IP Range -> VPN
0.0.0.0 -> Internet Gateway
Isolated CfnCluster w/ VPN
13. Private Subnet
Master Server
Auto Scaling
Compute Fleet
Amazon S3
DynamoDB
Amazon SQS
CloudWatch
CloudFormation
Corporate Data Center
Proxy Server
VPN Connection
Internet
Connection
Private Subnet Route Table
VPC Traffic -> Local
Corp IP Range -> VPN
0.0.0.0 -> VPN
Private CfnCluster w/ VPN & Proxy
14. Creating an IAM User
• Create an IAM user with Administrative privileges
• Fine grain access controls can be done later
• Generate an Access & Secret key and keep it safe
15. Create an SSH Key
• Generate or import the key you’ll use for user login
17. Creating the Base Configuration
• First, create the base
config required to
start a cluster.
$ cfncluster configure
18. Edit the configuration file to meet your needs
• Reference the configuration docs
• http://cfncluster.readthedocs.io/en/latest/configuration.html
$ vim ~/.cfncluster/config
19. Launch the Cluster
$ cfncluster create mycluster
• Cluster creation usually
takes ~15 minutes
• Completely managed by
CloudFormation
20. Submit your first job
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -pe mpi 2
#$ -S /bin/bash
#
module load openmpi-x86_64
mpirun -np 2 hostname
[ec2-user@ip-10-0-0-17 ~]$ qsub hw.qsub
Your job 1 ("hw.qsub") has been submitted
[ec2-user@ip-10-0-0-17 ~]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
------------------------------------------------------------------------------------------------
1 0.55500 hw.qsub ec2-user r 02/01/2015 05:57:25 all.q@ip-10-0-0-44.ap-southeas 2
[ec2-user@ip-10-0-0-17 ~]$ ls -l
total 8
-rw-rw-r-- 1 ec2-user ec2-user 110 Feb 1 05:57 hw.qsub
-rw-r--r-- 1 ec2-user ec2-user 26 Feb 1 05:57 hw.qsub.o1
[ec2-user@ip-10-0-0-17 ~]$ cat hw.qsub.o1
ip-10-0-0-44
ip-10-0-0-45
21. EBS Snapshots for Software & Storage Management
• Install your applications and
store any working data to
/shared
• Create a snapshot of that
volume
• Re-use that snapshot every
time you launch your cluster
ebs_snapshot_id = snap-xxxxx
Master Server
Root & Home
Volume (/ & /home)
NFS Shared Volume
(/shared)
Amazon EBS
Snapshot
(snap-xxxxx)
22. Upgrading Hardware is Easy!
• Simple upgrade from Ivy Bridge to Haswell
1. Let all compute nodes stop
2. Edit ~/.cfncluster/config and change
compute_instance_type = c3.8xlarge
to
compute_instance_type = c4.8xlarge
3. Update the cluster
$ cfncluster update mycluster
C3
C4