SlideShare a Scribd company logo
1 of 38
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distributed Stack Tracing and
Monitoring for Serverless Applications
Hint: The “serverless” part makes it different
Balaji Iyer, Cloud Infrastructure Architect
AWS Professional Services
<2001
Development transformation at Amazon: 2001+
>2001
monolithic
application + teams
microservices + 2 pizza teams
Event Driven Computing maps well to Microservices
Microservices Architecture
GET /pets
PUT /pets
DELETE /pets
GET /describe/pet/$id
PUT /describe/pet/$id
Events
Event Driven Compute maps well to FaaS
GET /pets
PUT /pets
DELETE /pets
GET /describe/pet/$id
PUT /describe/pet/$id
Events Functions As A Service
Serverless application
SERVICES (ANYTHING)
Changes in
data state
Requests to
endpoints
Changes in
resource state
EVENT SOURCE FUNCTION
Node.js
Python
Java
C#
Let’s get to the “Stack Tracing
and Monitoring” part!
I’ll just install some
agents on the ser...�
Traditional host-based
monolithic App
• Agents passively monitoring
“availability” of hosts
• We have a “pulse” we can
monitor
• The lack of data often signals a
problem
• Signal to noise ratio of
individual application
components to the whole
becomes hard to break apart
Monitoring a serverless application
Serverless App
• Resources come and go on
demand
• Resources only exist during
execution time, so no pulse
• We only get data during an
execution aka no “idle”
• Signal specific to event
execution and the resources
touched as part of that. Can
more easily measure Event A
vs. Event B
The serverless compute manifesto
Functions are the unit of deployment and scaling.
No machines, VMs, or containers visible in the programming model.
Permanent storage lives elsewhere.
Scales per request. Users cannot over- or under-provision capacity.
Never pay for idle (no cold servers/containers or their costs).
Implicitly fault-tolerant because functions can run anywhere.
BYOC – Bring your own code.
Metrics and logging are a universal right.
The serverless compute manifesto
Functions are the unit of deployment and scaling.
No machines, VMs, or containers visible in the programming model.
Permanent storage lives elsewhere.
Scales per request. Users cannot over- or under-provision capacity.
Never pay for idle (no cold servers/containers or their costs).
Implicitly fault-tolerant because functions can run anywhere.
BYOC – Bring your own code.
Metrics and logging are a universal right.Metrics and logging are a universal right.
Metrics and logging are a universal right!
CloudWatch Metrics:
• 6 Built in metrics for Lambda
• Invocation Count, Invocation duration, Invocation
errors, Throttled Invocation, Iterator Age, DLQ Errors
• Can call “put-metric-data” from your function code
for custom metrics
• 7 Built in metrics for API-Gateway
• API Calls Count, Latency, 4XXs, 5XXs, Integration
Latency, Cache Hit Count, Cache Miss Count
• Error and Cache metrics now support averages and
percentiles
Metrics and logging are a universal right!
CloudWatch Logs:
• API Gateway Logging
• 2 Levels of logging, ERROR and INFO
• Optionally log method request/body content
• Set globally in stage, or override per method
• Lambda Logging
• Logging directly from your code with your language’s equivalent
of console.log()
• Basic request information included
• Log Pivots
• Build metrics based on log filters
• Jump to logs that generated metrics
• Export logs to AWS ElastiCache or S3
• Explore with Kibana or Athena/QuickSight
But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
� Correlate logs from API
Gateway and Lambda via
time stamp?
� Tool Lambda to log a ton
of extra output around my
DynamoDB call?
But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
But, what happens when?
Amazon
API Gateway
Amazon
DynamoDB
AWS Lambda
AWS Lambda
Amazon
DynamoDB
Amazon
SNS
�UGH.
Where we’re going we don’t need roads
OK so:
• We can’t install a traditional monitoring/metrics/APM
agent on any “server”
• We could correlate logs across various systems
ourselves but that is tedious
• We could hand craft an application level event
correlation system and maintain it until we die
Or
• We could use AWS X-Ray
How X-Ray works
{
"Document": {
"id": "1002f461023e348d",
"name": "sam-dance-serverless-stack-GetSAMPartyCount",
"start_time": 1495473906.507,
"end_time": 1495473906.515,
"http": {
"response": {
"status": 200
}
},
"aws": {
"request_id": "992ef7c3-3f13-11e7-a916-69715be648e7"
},
"trace_id": "1-59231ef2-66ca280188e8d5696973ea32",
"origin": "AWS::Lambda",
"resource_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack-
GetSAMPartyCount"
},
"Id": "1002f461023e348d”
}
{
"Document": {
"id": "1002f461023e348d",
"name": "sam-dance-serverless-stack-GetSAMPartyCount",
"start_time": 1495473906.507,
"end_time": 1495473906.515,
"http": {
"response": {
"status": 200
}
},
"aws": {
"request_id": "992ef7c3-3f13-11e7-a916-69715be648e7"
},
"trace_id": "1-59231ef2-66ca280188e8d5696973ea32",
"origin": "AWS::Lambda",
"resource_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack-
GetSAMPartyCount"
},
"Id": "1002f461023e348d”
}
Timing
Outcome
Overall Trace
Origin of Trace
Resource
Segment
X-Ray Key Concepts
Trace End-to-end data related a single request across services
Segments Portions of the trace that correspond to a single service
Sub-segments Remote call or local compute sections within a service
Annotations Business data that can be used to filter traces
Metadata Business data that can be added to the trace but not used
for filtering traces
Errors Normalized error message and stack trace
Sampling Percentage of requests to your application to capture as
traces
A look at a Service Map
{
"Duration": 0.026000022888183594,
"Id": "1-59238e78-b3942ccc530e037059499e46",
"Segments": [
{
"Document": {
"id": "29d3d9056fbadc37",
"name": "sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV",
"start_time": 1495502456.046,
"end_time": 1495502456.056,
"parent_id": "65126414694bafc0",
"aws": {
"function_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV",
"resource_names": [
"sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV"
],
"account_id": "102901597367"
},
"trace_id": "1-59238e78-b3942ccc530e037059499e46",
"origin": "AWS::Lambda::Function"
},
"Id": "29d3d9056fbadc37"
},
{
"Document": {
"id": "65126414694bafc0",
"name": "sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV",
"start_time": 1495502456.035,
"end_time": 1495502456.061,
"http": {
"response": {
"status": 200
}
},
"aws": {
"request_id": "120787e2-3f56-11e7-856b-2d9012b4fffd"
},
"trace_id": "1-59238e78-b3942ccc530e037059499e46",
"origin": "AWS::Lambda",
"resource_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV"
},
"Id": "65126414694bafc0"
}
]
}
AWS::Lambda::Function Segment
AWS::Lambda Segment
Whole Trace
A look at a Service Map
A look at a Service Map
I broke it.
What did I break?
X-Ray SDK w/ Lambda
Available for Java, Node.js, Python (Beta)
Adds filters to automatically captures metadata for calls to:
• AWS services using the AWS SDK
• Non-AWS services over HTTP and HTTPS
• Databases (MySQL, PostgreSQL, and Amazon DynamoDB)
• Queues (Amazon SQS)
Enables you to get started quickly without having to manually
instrument your application code to log metadata about requests
Application instrumentation (Node.js)
X-Ray API
Raw trace data is available using batch get APIs.
X-Ray provides a set of APIs to enable you to send, filter, and retrieve
trace data.
You can send trace data directly to the service without having to use
our SDKs. (e.g. you can write your own SDKs for languages/frameworks not currently supported)
PutTraceSegments Uploads segment documents to AWS X-Ray
BatchGetTraces Retrieves a list of traces specified by ID
GetServiceGraph Retrieves a document that describes services in your
application and their connections
GetTraceSummaries Retrieves IDs and metadata for traces available for a
specified time frame using an optional filter
Rackspace Fleece
An open source Python SDK for X-Ray!
Repo: https://github.com/racker/fleece
Blog post: https://blog.rackspace.com/instrumenting-
lambda-functions-x-ray
From the blog post:
• Automatically wrap all boto3 (AWS SDK) calls, and annotate metrics that
help the X-Ray console visualizations.
• Automatically wrap all HTTP requests made through ”requests” (HTTP
library for Python)
• A decorator that can be applied to any function or method to report how
long that particular block of code took to execute.
X-Ray “gotchas”
• Currently no Active API Gateway support
• Still needs more and deeper database integration
• More AWS service integration
• Step Functions integration!
• Official C# SDKs
• NOT an APM replacement!!
• No monitoring
• No alarming
• No long/historical term trend analysis.
• Have to weigh the cost of it constantly running vs. turning it on for
specific troubleshooting exercises
DEMO!
aws.amazon.com/serverless
Thank you!
Balaji Iyer
balaiyer@amazon.com
@balajit
?
https://secure.flickr.com/photos/dullhunk/202872717/

More Related Content

More from Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced Attacks
Amazon Web Services
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Amazon Web Services
 

More from Amazon Web Services (20)

Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 
Come costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWSCome costruire un'architettura Serverless nel Cloud AWS
Come costruire un'architettura Serverless nel Cloud AWS
 
AWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei serverAWS Serverless per startup: come innovare senza preoccuparsi dei server
AWS Serverless per startup: come innovare senza preoccuparsi dei server
 
Crea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSightCrea dashboard interattive con Amazon QuickSight
Crea dashboard interattive con Amazon QuickSight
 
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker AutopilotCostruisci modelli di Machine Learning con Amazon SageMaker Autopilot
Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot
 
Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows Migra le tue file shares in cloud con FSx for Windows
Migra le tue file shares in cloud con FSx for Windows
 
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
La tua organizzazione è pronta per adottare una strategia di cloud ibrido?
 
Protect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced AttacksProtect your applications from DDoS/BOT & Advanced Attacks
Protect your applications from DDoS/BOT & Advanced Attacks
 
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
Track 6 Session 6_ 透過 AWS AI 服務模擬、部署機器人於產業之應用
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Distributed Stack Tracing and Monitoring for Serverless Applications - August 2017 AWS Online Tech Talks

  • 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distributed Stack Tracing and Monitoring for Serverless Applications Hint: The “serverless” part makes it different Balaji Iyer, Cloud Infrastructure Architect AWS Professional Services
  • 2. <2001 Development transformation at Amazon: 2001+ >2001 monolithic application + teams microservices + 2 pizza teams
  • 3. Event Driven Computing maps well to Microservices Microservices Architecture GET /pets PUT /pets DELETE /pets GET /describe/pet/$id PUT /describe/pet/$id Events
  • 4. Event Driven Compute maps well to FaaS GET /pets PUT /pets DELETE /pets GET /describe/pet/$id PUT /describe/pet/$id Events Functions As A Service
  • 5. Serverless application SERVICES (ANYTHING) Changes in data state Requests to endpoints Changes in resource state EVENT SOURCE FUNCTION Node.js Python Java C#
  • 6. Let’s get to the “Stack Tracing and Monitoring” part!
  • 7. I’ll just install some agents on the ser...�
  • 8. Traditional host-based monolithic App • Agents passively monitoring “availability” of hosts • We have a “pulse” we can monitor • The lack of data often signals a problem • Signal to noise ratio of individual application components to the whole becomes hard to break apart Monitoring a serverless application Serverless App • Resources come and go on demand • Resources only exist during execution time, so no pulse • We only get data during an execution aka no “idle” • Signal specific to event execution and the resources touched as part of that. Can more easily measure Event A vs. Event B
  • 9. The serverless compute manifesto Functions are the unit of deployment and scaling. No machines, VMs, or containers visible in the programming model. Permanent storage lives elsewhere. Scales per request. Users cannot over- or under-provision capacity. Never pay for idle (no cold servers/containers or their costs). Implicitly fault-tolerant because functions can run anywhere. BYOC – Bring your own code. Metrics and logging are a universal right.
  • 10. The serverless compute manifesto Functions are the unit of deployment and scaling. No machines, VMs, or containers visible in the programming model. Permanent storage lives elsewhere. Scales per request. Users cannot over- or under-provision capacity. Never pay for idle (no cold servers/containers or their costs). Implicitly fault-tolerant because functions can run anywhere. BYOC – Bring your own code. Metrics and logging are a universal right.Metrics and logging are a universal right.
  • 11. Metrics and logging are a universal right! CloudWatch Metrics: • 6 Built in metrics for Lambda • Invocation Count, Invocation duration, Invocation errors, Throttled Invocation, Iterator Age, DLQ Errors • Can call “put-metric-data” from your function code for custom metrics • 7 Built in metrics for API-Gateway • API Calls Count, Latency, 4XXs, 5XXs, Integration Latency, Cache Hit Count, Cache Miss Count • Error and Cache metrics now support averages and percentiles
  • 12. Metrics and logging are a universal right! CloudWatch Logs: • API Gateway Logging • 2 Levels of logging, ERROR and INFO • Optionally log method request/body content • Set globally in stage, or override per method • Lambda Logging • Logging directly from your code with your language’s equivalent of console.log() • Basic request information included • Log Pivots • Build metrics based on log filters • Jump to logs that generated metrics • Export logs to AWS ElastiCache or S3 • Explore with Kibana or Athena/QuickSight
  • 13. But, what happens when? Amazon API Gateway Amazon DynamoDB AWS Lambda
  • 14. But, what happens when? Amazon API Gateway Amazon DynamoDB AWS Lambda � Correlate logs from API Gateway and Lambda via time stamp? � Tool Lambda to log a ton of extra output around my DynamoDB call?
  • 15. But, what happens when? Amazon API Gateway Amazon DynamoDB AWS Lambda
  • 16. But, what happens when? Amazon API Gateway Amazon DynamoDB AWS Lambda AWS Lambda Amazon DynamoDB Amazon SNS
  • 18. Where we’re going we don’t need roads OK so: • We can’t install a traditional monitoring/metrics/APM agent on any “server” • We could correlate logs across various systems ourselves but that is tedious • We could hand craft an application level event correlation system and maintain it until we die Or • We could use AWS X-Ray
  • 20. { "Document": { "id": "1002f461023e348d", "name": "sam-dance-serverless-stack-GetSAMPartyCount", "start_time": 1495473906.507, "end_time": 1495473906.515, "http": { "response": { "status": 200 } }, "aws": { "request_id": "992ef7c3-3f13-11e7-a916-69715be648e7" }, "trace_id": "1-59231ef2-66ca280188e8d5696973ea32", "origin": "AWS::Lambda", "resource_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack- GetSAMPartyCount" }, "Id": "1002f461023e348d” }
  • 21. { "Document": { "id": "1002f461023e348d", "name": "sam-dance-serverless-stack-GetSAMPartyCount", "start_time": 1495473906.507, "end_time": 1495473906.515, "http": { "response": { "status": 200 } }, "aws": { "request_id": "992ef7c3-3f13-11e7-a916-69715be648e7" }, "trace_id": "1-59231ef2-66ca280188e8d5696973ea32", "origin": "AWS::Lambda", "resource_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack- GetSAMPartyCount" }, "Id": "1002f461023e348d” } Timing Outcome Overall Trace Origin of Trace Resource Segment
  • 22. X-Ray Key Concepts Trace End-to-end data related a single request across services Segments Portions of the trace that correspond to a single service Sub-segments Remote call or local compute sections within a service Annotations Business data that can be used to filter traces Metadata Business data that can be added to the trace but not used for filtering traces Errors Normalized error message and stack trace Sampling Percentage of requests to your application to capture as traces
  • 23. A look at a Service Map
  • 24. { "Duration": 0.026000022888183594, "Id": "1-59238e78-b3942ccc530e037059499e46", "Segments": [ { "Document": { "id": "29d3d9056fbadc37", "name": "sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV", "start_time": 1495502456.046, "end_time": 1495502456.056, "parent_id": "65126414694bafc0", "aws": { "function_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV", "resource_names": [ "sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV" ], "account_id": "102901597367" }, "trace_id": "1-59238e78-b3942ccc530e037059499e46", "origin": "AWS::Lambda::Function" }, "Id": "29d3d9056fbadc37" }, { "Document": { "id": "65126414694bafc0", "name": "sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV", "start_time": 1495502456.035, "end_time": 1495502456.061, "http": { "response": { "status": 200 } }, "aws": { "request_id": "120787e2-3f56-11e7-856b-2d9012b4fffd" }, "trace_id": "1-59238e78-b3942ccc530e037059499e46", "origin": "AWS::Lambda", "resource_arn": "arn:aws:lambda:us-east-1:1234567890:function:sam-dance-serverless-stack-GetSAMPartyCount-10UUDT1RTN0OV" }, "Id": "65126414694bafc0" } ] } AWS::Lambda::Function Segment AWS::Lambda Segment Whole Trace
  • 25. A look at a Service Map
  • 26. A look at a Service Map I broke it.
  • 27. What did I break?
  • 28. X-Ray SDK w/ Lambda Available for Java, Node.js, Python (Beta) Adds filters to automatically captures metadata for calls to: • AWS services using the AWS SDK • Non-AWS services over HTTP and HTTPS • Databases (MySQL, PostgreSQL, and Amazon DynamoDB) • Queues (Amazon SQS) Enables you to get started quickly without having to manually instrument your application code to log metadata about requests
  • 30.
  • 31.
  • 32. X-Ray API Raw trace data is available using batch get APIs. X-Ray provides a set of APIs to enable you to send, filter, and retrieve trace data. You can send trace data directly to the service without having to use our SDKs. (e.g. you can write your own SDKs for languages/frameworks not currently supported) PutTraceSegments Uploads segment documents to AWS X-Ray BatchGetTraces Retrieves a list of traces specified by ID GetServiceGraph Retrieves a document that describes services in your application and their connections GetTraceSummaries Retrieves IDs and metadata for traces available for a specified time frame using an optional filter
  • 33. Rackspace Fleece An open source Python SDK for X-Ray! Repo: https://github.com/racker/fleece Blog post: https://blog.rackspace.com/instrumenting- lambda-functions-x-ray From the blog post: • Automatically wrap all boto3 (AWS SDK) calls, and annotate metrics that help the X-Ray console visualizations. • Automatically wrap all HTTP requests made through ”requests” (HTTP library for Python) • A decorator that can be applied to any function or method to report how long that particular block of code took to execute.
  • 34. X-Ray “gotchas” • Currently no Active API Gateway support • Still needs more and deeper database integration • More AWS service integration • Step Functions integration! • Official C# SDKs • NOT an APM replacement!! • No monitoring • No alarming • No long/historical term trend analysis. • Have to weigh the cost of it constantly running vs. turning it on for specific troubleshooting exercises
  • 35. DEMO!