Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Serverless Application
Troubleshooting
I watch a lot of TV shows…
protagonist is shot
3 hours earlier… protagonist is shot
3 hours earlier… protagonist is shot
3 hours earlier… protagonist emerges
victoriously
protagonist is shot
happened
happened user impact
happened system repaireduser impact
happened system repaireduser impact
goal: to fail without users noticing
happened system repaireduser impact
reduce MTTR
Yan Cui
http://theburningmonk.com
@theburningmonk
Developer Advocate @
Independent Consultant
AWS user since 2009
since 20...
What do you mean
by ‘serverless’?
“Serverless”
Gojko Adzic
It is serverless the same way
WiFi is wireless.
http://bit.ly/2yQgwwb
Serverless means…
don’t pay for it if no-one uses it
don’t need to worry about scaling
don’t need to provision and manage ...
in other words, it’s a lot like taking a cab
Ownership
Fuel
Navigate
To get there!
Focus on
getting there!
HW Ownership
OS
Runtime & Scale
Code
Focus on
getting there!
Physical
Servers
Virtual
Machines
Containers Serverless
Nano Services Self Managed Cost Paradigm
ChangeAsync
Dynamic agile env
happened system repaireduser impact
reduce MTTR
Identify & Resolve
Issues
Understanding
costs
Visibility
Identify & Resolve
Issues
Understanding
costs
Visibility
happened system repaireduser impact
MTTDiscovery
“What alerts should I have?”
It depends on what you’re building…
But, this is a good starting point
Lambda
error rate %
throttle count
DLR error count
iterator age
regional concurrency
Lambda
error rate %
throttle count
DLR error count
iterator age
regional concurrency
API Gateway
p90/95/99 latency
success...
API Gateway
p90/95/99 latency
success rate %
4xx rate %
5xx rate %
SQS
message age
Lambda
error rate %
throttle count
DLR ...
API Gateway
p90/95/99 latency
success rate %
4xx rate %
5xx rate %
SQS
message age
Step Functions
failed count
throttle co...
SQS
message age
Step Functions
failed count
throttle count
timed out count
API Gateway
p90/95/99 latency
success rate %
4x...
“Can’t you codify these?”
Identify & Resolve
Issues
Understanding
costs
Visibility
happened system repaireduser impact
finding root cause
option 1: CloudWatch & friends
https://lumigo.io/blog/getting-the-most-out-of-cloudwatch-logs/
Pros
Out of the box
No overhead
Comparatively cheap
AWS support
Pros
Out of the box
No overhead
Comparatively cheap
AWS support
Cons
Complicated
https://lumigo.io/blog/serverless-applications-automate-chores-cloudwatch-logs/
Pros
Out of the box
No overhead
Comparatively cheap
AWS support
Cons
Complicated
Hard to query*
* Insights improved things...
https://lumigo.io/blog/how-to-monitor-lambda-with-cloudwatch-metrics/
Pros
Out of the box
Source of truth
No overhead*
Comparatively cheap
AWS support
* unless you record custom metrics synchr...
Pros
Out of the box
Source of truth
No overhead*
Comparatively cheap
AWS support
* unless you record custom metrics synchr...
Pros
Out of the box
SDK
No overhead
Comparatively cheap
AWS support
Pros
Out of the box
SDK
No overhead
Comparatively cheap
AWS support
Cons
Poor async support
Pros
Out of the box
SDK
No overhead
Comparatively cheap
AWS support
Cons
Poor async support
No auto-
instrumentation
Bad D...
option 2: custom built solutions
https://github.com/getndazn/dazn-lambda-powertools
Structured Logging
Structured Logging
Sampling
Structured Logging
Sampling
Correlation IDs
Structured Logging
Sampling
Correlation IDs
Auto “instrumentation”
Structured Logging
Sampling
Correlation IDs
Auto “instrumentation”
Support async events
enrich the usefulness of your logs
https://theburningmonk.com/2017/08/centralised-logging-for-aws-lambda/
https://theburningmonk.com/2018/07/centralised-logging-for-aws-lambda-revised-2018/
Pros
Tailor fit
Free!
Pros
Tailor fit
Free!
Cons
Very high-touch
Not all services are
supported equally
Tailor fit (for someone
else…)
option 3: serverless monitoring solutions
Pros
SAAS
Serverless focus
More than just tracing
Very low touch
Cons
Yet another 3rd party
More than just tracing
Takeaways
Serverless is a game-changer
Serverless has challenges
Options for troubleshooting serverless applications
https://info.lumigo.io/serverless-consulting
Start off on the right foot
@theburningmonk
theburningmonk.com
github.com/theburningmonk
yan@lumigo.io
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Troubleshooting serverless applications
Nächste SlideShare
Wird geladen in …5
×

Troubleshooting serverless applications

533 Aufrufe

Veröffentlicht am

In this talk, we will discuss some tips for alerting around your serverless application, and different approaches to troubleshooting issues in your serverless application: using first-party tools from AWS; using custom-built solutions; or using a serverless monitoring solution.

Veröffentlicht in: Technologie
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Troubleshooting serverless applications

  1. 1. Serverless Application Troubleshooting
  2. 2. I watch a lot of TV shows…
  3. 3. protagonist is shot
  4. 4. 3 hours earlier… protagonist is shot
  5. 5. 3 hours earlier… protagonist is shot
  6. 6. 3 hours earlier… protagonist emerges victoriously protagonist is shot
  7. 7. happened
  8. 8. happened user impact
  9. 9. happened system repaireduser impact
  10. 10. happened system repaireduser impact goal: to fail without users noticing
  11. 11. happened system repaireduser impact reduce MTTR
  12. 12. Yan Cui http://theburningmonk.com @theburningmonk Developer Advocate @ Independent Consultant AWS user since 2009 since 2018 yan@lumigo.io
  13. 13. What do you mean by ‘serverless’?
  14. 14. “Serverless”
  15. 15. Gojko Adzic It is serverless the same way WiFi is wireless. http://bit.ly/2yQgwwb
  16. 16. Serverless means… don’t pay for it if no-one uses it don’t need to worry about scaling don’t need to provision and manage servers
  17. 17. in other words, it’s a lot like taking a cab
  18. 18. Ownership Fuel Navigate To get there! Focus on getting there!
  19. 19. HW Ownership OS Runtime & Scale Code Focus on getting there! Physical Servers Virtual Machines Containers Serverless
  20. 20. Nano Services Self Managed Cost Paradigm ChangeAsync Dynamic agile env
  21. 21. happened system repaireduser impact reduce MTTR
  22. 22. Identify & Resolve Issues Understanding costs Visibility
  23. 23. Identify & Resolve Issues Understanding costs Visibility
  24. 24. happened system repaireduser impact MTTDiscovery
  25. 25. “What alerts should I have?”
  26. 26. It depends on what you’re building…
  27. 27. But, this is a good starting point
  28. 28. Lambda error rate % throttle count DLR error count iterator age regional concurrency
  29. 29. Lambda error rate % throttle count DLR error count iterator age regional concurrency API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate %
  30. 30. API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Lambda error rate % throttle count DLR error count iterator age regional concurrency
  31. 31. API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % SQS message age Step Functions failed count throttle count timed out count Lambda error rate % throttle count DLR error count iterator age regional concurrency
  32. 32. SQS message age Step Functions failed count throttle count timed out count API Gateway p90/95/99 latency success rate % 4xx rate % 5xx rate % Lambda error rate % throttle count DLR error count iterator age regional concurrency
  33. 33. “Can’t you codify these?”
  34. 34. Identify & Resolve Issues Understanding costs Visibility
  35. 35. happened system repaireduser impact finding root cause
  36. 36. option 1: CloudWatch & friends
  37. 37. https://lumigo.io/blog/getting-the-most-out-of-cloudwatch-logs/
  38. 38. Pros Out of the box No overhead Comparatively cheap AWS support
  39. 39. Pros Out of the box No overhead Comparatively cheap AWS support Cons Complicated
  40. 40. https://lumigo.io/blog/serverless-applications-automate-chores-cloudwatch-logs/
  41. 41. Pros Out of the box No overhead Comparatively cheap AWS support Cons Complicated Hard to query* * Insights improved things drastically, but still a gap to ELK
  42. 42. https://lumigo.io/blog/how-to-monitor-lambda-with-cloudwatch-metrics/
  43. 43. Pros Out of the box Source of truth No overhead* Comparatively cheap AWS support * unless you record custom metrics synchronously
  44. 44. Pros Out of the box Source of truth No overhead* Comparatively cheap AWS support * unless you record custom metrics synchronously ** can compensate with custom metrics/metric filters, etc. Cons Missing metrics** Lambda percentile latencies don’t work Only granular to 1 min No query language
  45. 45. Pros Out of the box SDK No overhead Comparatively cheap AWS support
  46. 46. Pros Out of the box SDK No overhead Comparatively cheap AWS support Cons Poor async support
  47. 47. Pros Out of the box SDK No overhead Comparatively cheap AWS support Cons Poor async support No auto- instrumentation Bad DX (for node.js) Poor documentation
  48. 48. option 2: custom built solutions
  49. 49. https://github.com/getndazn/dazn-lambda-powertools
  50. 50. Structured Logging
  51. 51. Structured Logging Sampling
  52. 52. Structured Logging Sampling Correlation IDs
  53. 53. Structured Logging Sampling Correlation IDs Auto “instrumentation”
  54. 54. Structured Logging Sampling Correlation IDs Auto “instrumentation” Support async events
  55. 55. enrich the usefulness of your logs
  56. 56. https://theburningmonk.com/2017/08/centralised-logging-for-aws-lambda/
  57. 57. https://theburningmonk.com/2018/07/centralised-logging-for-aws-lambda-revised-2018/
  58. 58. Pros Tailor fit Free!
  59. 59. Pros Tailor fit Free! Cons Very high-touch Not all services are supported equally Tailor fit (for someone else…)
  60. 60. option 3: serverless monitoring solutions
  61. 61. Pros SAAS Serverless focus More than just tracing Very low touch Cons Yet another 3rd party More than just tracing
  62. 62. Takeaways Serverless is a game-changer Serverless has challenges Options for troubleshooting serverless applications
  63. 63. https://info.lumigo.io/serverless-consulting Start off on the right foot
  64. 64. @theburningmonk theburningmonk.com github.com/theburningmonk yan@lumigo.io

×