In this Dev Lounge Express Edition breakfast session, we take a look at how AWS Step Functions makes it easier for you to build distributed applications with complex business workflows implemented using multiple microservices. We walk through creating a state machine using AWS Step Functions via the AWS console, and more complex examples building state machines with tasks implemented as AWS Lambda functions using AWS CloudFormation. We take a look at the various features of AWS Step Functions such as exception handling, making choices and manual steps, and gain valuable insights into the performance of our distributed application using AWS X-ray.
2. D E V L O U N G E
• Distributed complexity and the challenges it introduces
• Orchestrating async state machines with AWS Step Functions
• Throwing and handling exceptions in AWS Step Functions
• Triggering state machine workflows in response to events
• Using AWS X-ray to analyse the calls and performance
of Step Functions workflows
• Lots of code and demos!
LEARNING OBJECTIVES
4. D E V L O U N G E
Serverless microservices
AWS LambdaAmazon API
Gateway
• Highly available
• Scalable
• Fault tolerant
• Cost effective
• Secure
Microservice
Challenges:
• Stateless
• Discrete / isolated data stores
• Distributed / asynchronous
5. D E V L O U N G E
Challenge: Transactional Integrity
• How do we handle transactional integrity?
• Polyglot persistence generally translates into
eventual consistency
• Asynchronous calls allow non-blocking, but
return data (state) needs to be propagated
• What about failures and retries?
ERROR
STATE?
ROLLBACK?
6. D E V L O U N G E
Challenge: Multi-stage long-running tasks
• Tasks that require co-ordination across multiple systems
through multiple states
• Tasks that may require manual intervention
• Tasks that are expected to be long-running
and take hours/days/months to resolve
• Tasks that are expected to periodically fail
and need to be retried as a matter of course
7. D E V L O U N G E
Building applications out of distributed functions
• “I want to sequence functions”
• “I want to run functions in parallel”
• “I want to select functions based on input data or current state”
• “I want to retry functions with backoff”
• “I want try/catch/finally”
• “I have code that runs for hours or needs manual intervention”
Coordination of asynchronous functions
8. D E V L O U N G E
Coordination by function chaining
Lambda function A B
C
D
EF
G
9. D E V L O U N G E
Coordination by function chaining
Lambda function A B
C
D
EF
G
10. D E V L O U N G E
Coordination by database
Amazon DynamoDB
AWS Lambda
function
11. D E V L O U N G E
Coordination by queues
Amazon
SQS
AWS Lambda
function
12. D E V L O U N G E
What would an orchestration solution look like?
• Scales out
• Doesn’t lose state
• Deals with errors/timeouts/retries
• Easy to build & operate – declarative, not code-based
• Automatable
• Auditable
• Visible and traceable
Coordination must-haves
13. D E V L O U N G E
• Fully managed service
making it easy to coordinate the
components of distributed applications
and microservices using visual workflows
• You construct your application’s flows
as a state machine, a series of
steps that together capture the
behavior of the application
AWS Step Functions
14. D E V L O U N G E
AWS Step Functions: State types
Parallel Steps Choice State Catch Failure
Retry Failure Wait State
15. D E V L O U N G E
Example: “Calculator” State Machine
16. D E V L O U N G E
Quick anatomy of a state machine
• Each state is named uniquely but arbitrarily
• StartAt – the entry point
• Each state has a type – Choice, Pass, Parallel, Fail, Wait, Task…
• Every non-fatal state has a Next state
• A fatal state is denoted by End:true or Type:Fail
• Task states have a Resource attribute that defines
how the state will complete
• Tasks can declare a Retry clause based on the type of
application-defined error that has occurred
17. D E V L O U N G E
“I want to retry functions”
We get transient errors from a RESTful
service we depend on, once every four or
five times we call it. But if we keep retrying,
it eventually works.
‘
’
18. D E V L O U N G E
{
"Comment": "Call out to a RESTful service",
"StartAt": "Call out",
"States": {
"Call out": {
"Type": "Task",
"Resource":
"arn:aws:lambda:ap-southeast-2:123456789012:function:RESTCall",
"Retry": [
{ "ErrorEquals": [ ”MyTransientError" ], "MaxAttempts": 10 }
],
"End": true
}
}
}
AWS Step Functions: Error Handling & Retries
19. D E V L O U N G E
“I want to run functions in parallel”
We want to send the captured image to
three OCR providers and take the result
with the highest confidence value.
‘ ’
21. D E V L O U N G E
"Send for OCR": {
"Type": "Parallel",
"Next": "Pick result",
"Branches": [
{
"StartAt": "Prepare1",
"States": {
"Prepare1": {
"Type": "Pass",
"Result": { "inputList": [ "OCR Provider 1" ] },
"Next": "Execute1"
},
"Execute1": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-southeast-2:123456789012:function:OCR1",
"End": true
}
}
...
}]
},
"Pick Result": { ... }
AWS Step Functions: Parallelism
22. D E V L O U N G E
“I want to perform a multi-step task and handle errors”
We want to classify images based on
their content, and if the images don’t
contain the right content, we ignore the
uploaded image. We need to distinguish
between the types of errors that can be
generated.
‘
23. D E V L O U N G E
Amazon
S3
AWS
Lambda
Trigger on
upload
Amazon
Rekognition
24. D E V L O U N G E
"state.process.Type.Image.Dog":
{
"Type": "Task",
"Resource": "arn:aws:lambda:xxxxx",
"Next": "state.process.Complete",
"Catch": [
{
"ErrorEquals": ["devlounge.exceptions.FileProcessingException"],
"Next": "state.error.FileProcessingException"
},
{
"ErrorEquals": ["States.ALL"],
"Next": "state.error.GeneralException"
}]
}
AWS Step Functions: Error handling
26. D E V L O U N G E
“I want to allow for manual decisions”
We want to classify images based on
their content, and if a confident decision
cannot be made automatically, we want
an operator to be prompted to intervene.
‘
’
27. D E V L O U N G E
Amazon
S3
AWS
Lambda
Trigger on
upload
Amazon
Rekognition
28. D E V L O U N G E
Tasks, Activities and Lambda functions
• A task is a unit of work
• Tasks can be implemented by an AWS Lambda function or
an activity which is a placeholder for any compute engine to
implement – on-cloud or off-cloud
• The activity must be resolved by either calling the
SendTaskSuccess or SendTaskFailure APIs
• By implementing a task as an activity, you can implement
manual steps in the state machine. A Lambda function won’t
be called automatically for an activity task.
29. D E V L O U N G E
"state.process.Type.Unknown":
{
"Type": "Task",
"Resource" : "arn:aws:states:::activity:ManuallyDecide",
"TimeoutSeconds": 3600,
"HeartbeatSeconds": 60,
"Next": ”ContinueTaskAfterManualStep"
}
AWS Step Functions: Activities
If HeartbeatSeconds is provided, the provider must call SendTaskHeartbeat()
within the specified time or the task will fail
30. D E V L O U N G E
Example: Waiting for a manual activity to complete
• state.process.Type.ManualDecisionRequired is of type activity
• A polling agent periodically checks for activity tasks and obtains
a token to refer to the activity via a call to
stepfunctions::getActivityTask()
• Email sent to an operator with ’manual decision’ links
• When clicked, the links resolve the task as successful or not
• Implemented by a Lambda function behind API Gateway
31. D E V L O U N G E
Amazon
CloudWatch
AWS
Lambda
AWS Step
Functions
Amazon
SES Amazon API
Gateway
Manual Approval via Email Notification
Scheduled
event getActivityTask()
sendTaskSuccess()
It’s a Dog!
It’s a Cat!
{
output: "”cat"",
taskToken: "xxxx"
}
AWS
Lambda
32. {
"title": "Numbers to add",
"numbers": [ 3, 4 ]
}
{
"Type": "Task",
"InputPath": "$.numbers",
"Resource": "arn:aws:lambda…"
…
[ 3, 4 ]
Raw input:
State spec:
Task input:
AWS Step Functions: Execution Input State
33. Q: InputPath not provided?
A: State gets raw input as-is.
Q: InputPath is null?
A: State gets an empty JSON object: {}
Q: InputPath produces plural output?
A: State gets it wrapped in a JSON array.
AWS Step Functions: Execution State Input Processing
34. {
"title": "Numbers to add",
"numbers": [ 3, 4 ]
}
{
"Type": "Task",
"InputPath": "$.numbers",
"ResultPath": "$.sum",
…
Raw input:
State spec:
Output: {
"title": "Numbers to add",
"numbers": [ 3, 4 ],
"sum": 7
}
AWS Step Functions: Execution State Result Placement
35. Q: ResultPath not provided?
A: Input discarded, raw output used.
Q: ResultPath is null?
A: State input is state output.
Q: ResultPath produces plural output?
A: Not allowed, validator won’t accept.
AWS Step Functions: Execution State Result Placement
37. Analyze and debug distributed applications
Calls through AWS SDK automatically captured, or
inject your own custom segments
End-to-End Tracing, cross-service view of requests
made to your application.
AWS X-ray
38. Annotations
• Key-value pairs with string, number, or Boolean values
• Indexed for use with filter expressions
• Use annotations to record data that you want to use to group traces in the
console, or when calling the GetTraceSummaries API
Metadata
• Key-value pairs that can have values of any type
• Not indexed for use with filter expressions
Custom data: Metadata vs Attributes
D E V L O U N G E
40. • Creating/debugging state machines with ‘Pass’
• Throwing and handling exceptions in AWS Step Functions
• Running state machine workflows in response to events
• Using AWS X-ray to analyse the calls and performance
of Step Functions workflows
• Automating deployment of AWS Step Functions
using AWS CloudFormation
Distributed workflows with AWS Step Functions
D E V L O U N G E