The document discusses how different observability tools provide visibility into systems from different perspectives and together allow getting a holistic view of what is happening. It provides examples of how each tool - Real User Monitoring, Synthetics, Tracing, Application and Infrastructure Monitoring, and Logs - could detect issues from different scenarios and perspectives. The key message is that using multiple tools together allows getting complete or 100% visibility into systems to understand what went wrong from different angles.
Gain Maximum Visibility into Your Applications - DEM04 - Atlanta AWS Summit
1. G A I N M A X I M U M V I S I B I L I T Y
H O L I S T I C A L L Y V I E W I N G S Y S T E M S
2. A M B I G U O U S C Y L I N D E R S
P E R S P E C T I V E M A T T E R S
3. B E N J A M I N S M I T H
S R . S O F T W A R E E N G I N E E R W O R K I N G O N
O U R E N G T O O L S T E A M B U I L D I N G C I & C I
S O L U T I O N S
D A D T O 3 K I D S , 2 D O G S , 2 C A T S A N D
L O T S O F S O F T W A R E B U G S
L I K E S T O D R I V E F A S T E R T H A N M O S T .
G H : B E N J A M I N W S
E M : B E N J A M I N . S M I T H @ D
A T A D O G H Q . C O M
4. D A T A D O G
S A A S - B A S E D M O N I T O R I N G
T R I L L I O N S O F P O I N T S / D A Y
O P E N S O U R C E C I T I Z E N S
@ d a t a d o g h q
5. V I S I B I L I T Y ?
W H E R E A R E W E G E T T I N G
11. I N F R A S T R U C T U R E
V I S I B I L I T Y
The Data
• Metrics
• Logs
The Tools
• Infrastructure Monitoring
• Log Management
12. V A L U E - B A S E D D A T A
W H A T I S A M E T R I C ?
13. M E T R I C S
• Often combined or aggregated
• Useful for spotting trends/patterns
• Send alerts from metrics
• Help catch known unknowns
14. L O G S
• Event-based
• Easy to read & grep or parse
• Ideally verbose & structured
• Useful for finding details of an event
• Help catch unknown unknowns
15. The Data
• Metrics
• Logs
• Traces
The Tools
• Application Monitoring
• Log Management
• APM
B A C K E N D
V I S I B I L I T Y
16. T R A C E S
• Request-based
• Follow activity from request across function and service
calls.
• Useful for following code to answer “Where?” and “How
long?”
17. The Data
• Metrics
The Tools
• Real-User Monitoring
(RUM)
• Synthetics
F R O N T E N D
V I S I B I L I T Y
18. P E O P L E & R O B O T S
• RUM & Synthetics work best together
• RUM provides insight into how users actually use a product
• Synthetics operate independently of users
19. D A T E - A - D O G
W H A T ’ S I T A L L M E A N ?
T I N D E R F O R P U P S
20. T H I S A P P I S
G R E A T !
W H O ’ S A G O O D B O Y ? ! ?
21. I G O T T A T E L L M Y
F R I E N D S A B O U T
T H I S A P P !
T H E Y ’ R E S O C U T E ! ! !
22. A N D M Y F R I E N D S
A R E G O N N A T E L L
T H E I R F R I E N D S …
A A A W W W W W W W ! ! !
23. W H A T J U S T
H A P P E N E D ? ! ?
W H E R E ’ D T H E P U P P I E S G O ?
24. H O W D O W E K N O W S O M E T H I N G
W E N T W R O N G ?
U S E R S A R E H A V I N G A H O R R I B L E E X P E R I E N C E
25.
26. R E A L - U S E R M O N I T O R I N G
H O W D O W E K N O W ?
27. R E A L - U S E R M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
29. S Y N T H E T I C S
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
30. S C E N A R I O : T H I R D P A R T Y C D N O U T A G E
The app pulls puppy photos directly from a CDN, but that
provider suffers massive DDOS attack.
• RUM & Synthetics: Will alert and can show what assets are
slow or are not being served.
• APM, Application and Infrastructure Monitoring: No alerts.
Everything is fine!
31. T R A C I N G ( A P M )
H O W D O W E K N O W ?
32. T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
33. T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
34. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
35. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
36. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
37. S C E N A R I O : S E R V I C E O U T A G E
We use an image resizing/optimizing service that resizes images
asynchronously. It goes down. Users only see placeholder images.
• RUM & Synthetics: If images are not delivered, it will alert.
We have a symptom, but not a cause
• APM: Can alert on latency and show where in the code calls are failing.
• Application Monitoring: May alert depending on impact to custom
metrics. May or may not be able to help identify why.
• Infrastructure Monitoring: No alerts. Everything is fine!
38. A P P L I C A T I O N + I N F R A S T R U C T U R E
M O N I T O R I N G
H O W D O W E K N O W ?
39. S C E N A R I O : D E V D E P L O Y S B A D C O D E
Developer accidentally deploys code that improperly verifies
password hashes, so all user logins fail.
• RUM & Synthetics, APM: Unsuccessful Logins Reported (on
tests that require login)!
• Application Monitoring: May alert impact on custom metrics
and may help identify why.
• Infrastructure Monitoring: No alerts. Everything is fine!
40. A P P L I C A T I O N M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
41. I N F R A S T R U C T U R E M O N I T O R I N G
H O W D O W E K N O W ?
42. S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
• RUM & Synthetics, APM, Application Monitoring: Alerts that
latency is high. Will not be able to help identify why.
• Infrastructure Monitoring: Alerts on high resource use and
may be able to trigger automatic remediation.
43. A N O M A L Y D E T E C T I O N
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
44. H O W D O W E K N O W W H A T
W E N T W R O N G ?
45. U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
46. U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
47. L O G S
E X P L O R I N G W H A T W E N T W R O N G
48. H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
• Get multiple perspectives
• Consider all 5 observability tools:
• RUM
• Synthetics
• Tracing
• Application+Infrastructure Monitoring
• Logs