É importante ter visibilidade sobre seus aplicativos para proteger-se contra erros, manter o uptime e garantir seu desempenho. Nesta sessão, mostraremos como obter essa visibilidade usando DevOps para criar sistemas melhores e aproveitar as perspectivas de várias equipes. Essa sessão é oferecida pelo parceiro AWS, Datadog.
100% de visibilidade nas suas aplicações - DEM03 - Sao Paulo Summit
1. 1 0 0 % V I S I B I L I T Y
H O L I S T I C A L L Y V I E W I N G S Y S T E M S
2. A M B I G U O U S C Y L I N D E R S
P E R S P E C T I V E M A T T E R S
3. A L E X A N D R E
F O N S E C A
C U R R E N T L Y :
B A C K E N D S O F T W A R E
E N G I N E E R @ D A T A D O G
F O R M E R L Y :
F U L L S T A C K E N G I N E E R ,
D I S T R I B U T E D S Y S T E M S ,
D E V O P S & S Y S A D M I N
W W W : a l e x j f . n e t
E M : a l e x a n d r e . f o n s e c a @
d a t a d o g h q . c o m
4. D A T A D O G
S A A S - B A S E D M O N I T O R I N G
T R I L L I O N S O F P O I N T S / D A Y
O P E N S O U R C E C I T I Z E N S
W E ’ R E H I R I N G :
w w w . d a t a d o g h q . c o m / c a r e e r s
T W : @ d a t a d o g h q
5. V I S I B I L I T Y ?
W H E R E A R E W E G E T T I N G
11. I N F R A S T R U C T U R
E V I S I B I L I T Y
The Data
• Metrics
• Logs
The Tools
• Infrastructure Monitoring
• Log Management
12. V A L U E - B A S E D D A T A
W H A T I S A M E T R I C ?
13. M E T R I C S
• Historical data & correlation
• Useful for spotting trends/patterns
• Send alerts from metrics
• Help catch known unknowns
14. L O G S
• Event-based
• Easy to read & grep or parse
• Ideally verbose & structured
• Useful for finding details of an event
• Help catch unknown unknowns
15. The Data
• Metrics
• Logs
• Traces
The Tools
• Application Monitoring
• Log Management
• APM
B A C K E N D
V I S I B I L I T Y
16. T R A C E S
• Request-based
• Follow activity from request across function and service
calls.
• Useful for following code to answer “Where?” and “How
long?”
17. The Data
• Metrics
The Tools
• Real-User Monitoring
(RUM)
• Synthetics
F R O N T E N D
V I S I B I L I T Y
18. P E O P L E & R O B O T S
• RUM & Synthetics work best together
• RUM provides insight into how users actually use a
product
• Synthetics operate independently of users
19. D A T E - A - D O G
W H A T ’ S I T A L L M E A N ?
T I N D E R F O R P U P S
20. T H I S A P P I S
G R E A T !
W H O ’ S A G O O D B O Y ? ! ?
21. I G O T T A T E L L
M Y F R I E N D S
A B O U T T H I S
A P P !
T H E Y ’ R E S O C U T E ! ! !
22. A N D M Y F R I E N D S
A R E G O N N A T E L L
T H E I R F R I E N D S …
A A A W W W W W W W ! ! !
23. W H A T J U S T
H A P P E N E D ? ! ?
W H E R E ’ D T H E P U P P I E S
G O ?
24. H O W D O W E K N O W
S O M E T H I N G W E N T W R O N G ?
U S E R S A R E H A V I N G A H O R R I B L E E X P E R I E N C E
25.
26. R E A L - U S E R
M O N I T O R I N G
H O W D O W E K N O W ?
27. R E A L - U S E R M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
29. S Y N T H E T I C S
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
30. S C E N A R I O : T H I R D P A R T Y C D N
O U T A G E
The app pulls puppy photos directly from a CDN, but that
provider suffers massive DDOS attack.
• RUM & Synthetics: Will alert and can show what assets
are slow or are not being served.
• APM, Application and Infrastructure Monitoring: No
alerts. Everything is fine!
31. T R A C I N G ( A P M )
H O W D O W E K N O W ?
32. T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
33. T R A C I N G ( A P M )
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
34. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
35. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
36. T R A C I N G ( A P M )
H O W D O W E K N O W W H A T W E N T W R O N G ?
37. S C E N A R I O : S E R V I C E O U T A G E
We use an image resizing/optimizing service that resizes images
asynchronously. It goes down. Users only see placeholder images.
• RUM & Synthetics: If images are not delivered, it will alert. We
have a symptom, but not a cause.
• APM: Can alert on latency and show where in the code you are
making the API calls.
• Application Monitoring: May alert depending on impact to custom
metrics. Might not be able to help identify why.
• Infrastructure Monitoring: No alerts. Everything is fine!
38. A P P L I C A T I O N M O N I T O R I N G
H O W D O W E K N O W ?
39. S C E N A R I O : D E V D E P L O Y S B A D
C O D E
Developer accidentally deploys code that improperly
checks password hashes, so all new user logins fail.
• RUM & Synthetics, APM: Unsuccessful Logins
Reported (on tests that requires login).
• Application Monitoring: May alert impact on custom
metrics and may help identify why.
• Infrastructure Monitoring: No alerts. Everything is fine!
40. A P P L I C A T I O N M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
41. I N F R A S T R U C T U R E
M O N I T O R I N G
H O W D O W E K N O W ?
42. S C E N A R I O : W E ’ R E T O O P O P U L A R
Everyone loves puppies and we’re completely out of
resources.
• RUM & Synthetics, APM, Application Monitoring: Alerts
that latency is high. Will not be able to help identify
why.
• Infrastructure Monitoring: Alerts on high resource use
and may be able to trigger automatic remediation.
43. I N F R A S T R U C T U R E M O N I T O R I N G
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
44. A N O M A L Y D E T E C T I O N
H O W D O W E K N O W S O M E T H I N G W E N T W R O N G ?
45. H O W D O W E K N O W
W H A T W E N T W R O N G ?
46. U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
47. U N T I L Y O U F I N D T H E C A U S E S
R E C U R S E R E C U R S E R E C U R S E
48. L O G S
E X P L O R I N G W H A T W E N T W R O N G
49. H O W T O G E T 1 0 0 % V I S I B I L I T Y ?
• Think about your system as a whole
• Get multiple perspectives
• Consider all 5 observability tools:
• RUM
• Synthetics
• Tracing
• Application+Infrastructure Monitoring
• Logs