Learning Objectives:
- How to simply scale out your batch workflows on AWS
- How to think about container/job management within managed, high-throughput workflows
- How to build a scalable orchestration framework within AWS Step Functions
2. What we will cover
• What Are High-Throughput Workflows?
• Architecture Overview
• Service Overview – AWS Batch
• Service Overview – AWS Step Functions
• Architecture Deep Dive
3. What are high-throughput workflows?
Start
Pre-processing
Long-running
operation
Post-
Processing
Copy results
to S3
End
Now run this same workflow for thousands
of inputs while also:
• Starting each step at the right time
• Running each step on appropriate
compute resources
• Managing concurrency
• Scaling infrastructure up and down
• Handling errors
• Providing notifications
• Accelerating workflow development
Network I/O
and CPU
Disk I/O and
large memory
GPU-
Accelerated
Network I/O
4. High-throughput workflows are everywhere
Media & Entertainment
Transportation & Logistics
Manufacturing & Design
Financial Services
Life Sciences Earth Sciences & Geospatial Analytics
7. Introducing AWS Batch
Fully Managed
No software to install or
servers to manage. AWS
Batch provisions and
scales your infrastructure
Integrated with AWS
AWS Batch jobs can easily
and securely interact with
services such as Amazon S3,
DynamoDB, and Rekognition
Cost-Efficient
AWS Batch launches compute
resources tailored to your jobs
and can provision Amazon EC2
and EC2 Spot instances
12. AWS Step Functions…
…makes it easy to
coordinate the components
of distributed applications
using visual workflows.
13. Application Lifecycle in AWS Step Functions
Visualize in the
Console
Define in JSON Monitor
Executions
14. Seven State Types
Task A single unit of work
Choice Adds branching logic
Parallel Fork and join the data across tasks
Wait Delay for a specified time
Fail Stops an execution and marks it as a failure
Succeed Stops an execution successfully
Pass Passes its input to its output
15. Build Visual Workflows Using State Types
AWS STEP FUNCTIONS
Task
Choice
Fail
ParallelMountains
People
Snow
20. Considerations for Batch Layer: Data Sharing
Consideration: Jobs are managed at the container, not
instance level. Cannot guarantee consecutive containers in
a workflow will run on same instance.
Solution: Stage all data in Amazon S3, and read and write
everything from there. Also important for traceability,
logging, etc.
21. Considerations for Batch Layer: Multitenancy
Consideration: May have multiple containers running
batch processes on same instance in same base working
directory.
Solution: Within scratch directory, each batch process
creates a subfolder with a unique ID. All scratch data
written to this subdirectory.
22. Considerations for Batch Layer: Volume Reuse
Consideration: Scratch data should live only as long as the
job using it in order to optimize for instance and Amazon
EBS storage costs.
Solution: Within scratch directory, each batch process
creates a subfolder with a unique ID. All scratch data written
to this subdirectory. Delete subdirectory at end of job.
25. A Flexible Workflow Deployment Model
• Decouple batch engine and workflow orchestration
• Workflow creation now done as JSON
• Easier to deploy
• Easier to automate
• Easier to test
• Can integrate non-Batch applications as well