Learning Objectives:
- Learn how to troubleshoot issues or performance problems in your serverless application
- Learn how to use distributed stack tracing to troubleshoot issues in your application
- Understand the basics of AWS X-Ray
When you're running a large, multi-faceted serverless application with AWS Lambda, how can you effectively diagnose issues in your application's performance? In this session, we'll show you how to peek under the hood of your function executions so you can determine where performance issues are occurring in your serverless application. We'll demonstrate a live demo of a new service, AWS X-Ray, which is a distributed tracing service that helps developers analyze and debug distributed applications. We'll show you how you can use X-Ray to diagnose your serverless applications using dynamic stack tracing, call analysis, and the X-Ray service graph which visually depicts service calls made to your application.
3. Event Driven Computing maps well to Microservices
Microservices Architecture
GET /pets
PUT /pets
DELETE /pets
GET /describe/pet/$id
PUT /describe/pet/$id
Events
4. Event Driven Compute maps well to FaaS
GET /pets
PUT /pets
DELETE /pets
GET /describe/pet/$id
PUT /describe/pet/$id
Events Functions As A Service
8. Traditional host-based
monolithic App
• Agents passively monitoring
“availability” of hosts
• We have a “pulse” we can
monitor
• The lack of data often signals a
problem
• Signal to noise ratio of
individual application
components to the whole
becomes hard to break apart
Monitoring a serverless application
Serverless App
• Resources come and go on
demand
• Resources only exist during
execution time, so no pulse
• We only get data during an
execution aka no “idle”
• Signal specific to event
execution and the resources
touched as part of that. Can
more easily measure Event A
vs. Event B
9. The serverless compute manifesto
Functions are the unit of deployment and scaling.
No machines, VMs, or containers visible in the programming model.
Permanent storage lives elsewhere.
Scales per request. Users cannot over- or under-provision capacity.
Never pay for idle (no cold servers/containers or their costs).
Implicitly fault-tolerant because functions can run anywhere.
BYOC – Bring your own code.
Metrics and logging are a universal right.
10. The serverless compute manifesto
Functions are the unit of deployment and scaling.
No machines, VMs, or containers visible in the programming model.
Permanent storage lives elsewhere.
Scales per request. Users cannot over- or under-provision capacity.
Never pay for idle (no cold servers/containers or their costs).
Implicitly fault-tolerant because functions can run anywhere.
BYOC – Bring your own code.
Metrics and logging are a universal right.Metrics and logging are a universal right.
11. Metrics and logging are a universal right!
CloudWatch Metrics:
• 6 Built in metrics for Lambda
• Invocation Count, Invocation duration, Invocation
errors, Throttled Invocation, Iterator Age, DLQ Errors
• Can call “put-metric-data” from your function code
for custom metrics
• 7 Built in metrics for API-Gateway
• API Calls Count, Latency, 4XXs, 5XXs, Integration
Latency, Cache Hit Count, Cache Miss Count
• Error and Cache metrics now support averages and
percentiles
12. Metrics and logging are a universal right!
CloudWatch Logs:
• API Gateway Logging
• 2 Levels of logging, ERROR and INFO
• Optionally log method request/body content
• Set globally in stage, or override per method
• Lambda Logging
• Logging directly from your code with your language’s equivalent
of console.log()
• Basic request information included
• Log Pivots
• Build metrics based on log filters
• Jump to logs that generated metrics
• Export logs to AWS ElastiCache or S3
• Explore with Kibana or Athena/QuickSight
13. But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
14. But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
� Correlate logs from API
Gateway and Lambda via
time stamp?
� Tool Lambda to log a ton
of extra output around my
DynamoDB call?
15. But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
16. But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
AWS Lambda
Amazon
DynamoDB
Amazon
SNS
18. Where we’re going we don’t need roads
OK so:
• We can’t install a traditional monitoring/metrics/APM
agent on any “server”
• We could correlate logs across various systems
ourselves but that is tedious
• We could hand craft an application level event
correlation system and maintain it until we die
Or
• We could use AWS X-Ray
22. X-Ray Key Concepts
Trace End-to-end data related a single request across services
Segments Portions of the trace that correspond to a single service
Sub-segments Remote call or local compute sections within a service
Annotations Business data that can be used to filter traces
Metadata Business data that can be added to the trace but not used
for filtering traces
Errors Normalized error message and stack trace
Sampling Percentage of requests to your application to capture as
traces
28. X-Ray SDK w/ Lambda
Available for Java, Node.js, Python (Beta)
Adds filters to automatically captures metadata for calls to:
• AWS services using the AWS SDK
• Non-AWS services over HTTP and HTTPS
• Databases (MySQL, PostgreSQL, and Amazon DynamoDB)
• Queues (Amazon SQS)
Enables you to get started quickly without having to manually
instrument your application code to log metadata about requests
32. X-Ray API
Raw trace data is available using batch get APIs.
X-Ray provides a set of APIs to enable you to send, filter, and retrieve
trace data.
You can send trace data directly to the service without having to use
our SDKs. (e.g. you can write your own SDKs for languages/frameworks not currently supported)
PutTraceSegments Uploads segment documents to AWS X-Ray
BatchGetTraces Retrieves a list of traces specified by ID
GetServiceGraph Retrieves a document that describes services in your
application and their connections
GetTraceSummaries Retrieves IDs and metadata for traces available for a
specified time frame using an optional filter
33. Rackspace Fleece
An open source Python SDK for X-Ray!
Repo: https://github.com/racker/fleece
Blog post: https://blog.rackspace.com/instrumenting-
lambda-functions-x-ray
From the blog post:
• Automatically wrap all boto3 (AWS SDK) calls, and annotate metrics that
help the X-Ray console visualizations.
• Automatically wrap all HTTP requests made through ”requests” (HTTP
library for Python)
• A decorator that can be applied to any function or method to report how
long that particular block of code took to execute.
34. X-Ray “gotchas”
• Currently no Active API Gateway support
• Still needs more and deeper database integration
• More AWS service integration
• Step Functions integration!
• Official C# SDKs
• NOT an APM replacement!!
• No monitoring
• No alarming
• No long/historical term trend analysis.
• Have to weigh the cost of it constantly running vs. turning it on for
specific troubleshooting exercises