The document discusses foundational practices for achieving effective observability in serverless solutions. It recommends centralizing telemetry data to allow correlation across distributed systems. It also suggests leveraging native metrics from cloud services and taking a holistic view across all components involved in requests. Key practices include using structured metadata, pushing data rather than scraping, looking for patterns to define alerts, and using observability insights to automate limits and quotas. Tracing is identified as critically important for serverless applications.
1. Observability in
serverless solutions
Foundational practices on instrumenting and
achieving the true potential of serverless
Leonardo Murillo
CTO @ Qwinix, Inc
Founder @ Cloud Native Architects, Inc
DevOps Institute Ambassador
2. About DevOps Institute
DevOps Institute’s mission is to advance the
human elements of DevOps by creating a safe
and interactive environment where our
members can network, gain knowledge, grow
their careers, support enterprise transformation
and celebrate professional achievements.
We connect and enable the global DevOps
community to drive change in the digital age. Become a professional member at
www.devopsinstitute.com
3. Learn how observability and monitoring for serverless
solutions enable organizations to achieve its huge
potential.
In this webinar we will share insights and actionable
advice on:
Agenda
○ The serverless advantage
○ Understanding the serverless mindset
○ Instrumenting distributed and ephemeral
systems
○ Observability as a basis for decision making
CTO @ Qwinix, Inc
Founder @ Cloud Native Architects, Inc
DevOps Institute Ambassador
Leonardo Murillo
Wide-ranging industry perspective, with over 20 years of experience
building technology and leading teams all the way from Startups to
Fortune 500s.
Passionate about cloud native technologies, organizational
transformation and the open-source community. A believer in human
potential and the transformative power of technology, Leo focuses on
exploring leading edge technologies hands-on and pondering on
technology strategy
leonardomurillo
murillodigital
5. Lack of foundational
knowledge and expertise may
magnify otherwise small
issues
Data and systems design must
apply to the characteristics of
serverless, scale should be
effectively designed for, not
accidentally achieved.
Understanding and
troubleshooting your solution
during development and
production requires new skills
and workflows
Serverless requires a new mindset
7. Approaching serverless
Functions
as a Service!
Wait…
now we need to direct
requests to them
Huh…
Some of these are asynchronous
and driven by events, need a queue
Data…
We need to store some data
Ugh, static assets…
Let’s put them in some bucket
How to deliver assets?
Of course, we need a CDN
The performance and reliability of your
serverless solution depends on the interplay
between many moving parts
8. How do we troubleshoot a complex, distributed solution?
How do we extract the most performance out of infinite scale?
“Knowledge is power”
10. Serverless telemetry requires a specific
approach
• Solution state is distributed across a variety of
systems.
• Many data sources mean many different schemas,
time series resolution and fragmented context.
• Resources are ephemeral, telemetry data must be
stored before transactions complete (usually in ms).
• Requests traverse many different services,
traceability becomes critical.
11. Serverless is the way to go, and you want to build observability into your solution
Foundational Practices for effective
serverless observability
12. Practice 1: Centralize your telemetry data, making
sure you can correlate, and that no data gets lost
Define and use
structured metadata
• correlation identifiers
• system identifiers
• request identifiers
• process milestone
• context, entity, domain
Push, don’t scrape
• producers of telemetry data are
ephemeral, scraping will usually result
in data loss due to scrape frequency vs
compute lifecycle
13. Practice 2: Leverage the native metrics
provided by cloud managed services
The cloud takes care of a lot of the system specific
heavy lifting, leverage the metrics provided
natively by the managed service.
Relate, cross-reference, augment – you are looking
for a strong query language for your telemetry data
and efficient ways to extract process-oriented
insights from it.
14. Practice 3: Integrated, holistic and global
visibility
It’s about processes, not systems - think holistically
Code path becomes request path, consider
all the components that participate in
fulfilling a given request and build insights
from their aggregated telemetry data
Single pane of glass – build dashboards on
top of your consolidated data.
15. Practice 4: Look for patterns to define your
alerts, not just finite state
• Performance related patterns
• Process related patterns
End to end request time to fulfillment
Queue growth over time
Data growth over time
Number of events involved in end-to-end request
Time to milestone
• Track cloud expense over time
for predictability and efficient
cost management
16. Practice 5: Use observability insight to define
guiderails, quotas, limits
Automate peace of mind
Liberate developers (safely)
Keep the CFO happy
Define concurrency and capacity limits
Alert on patterns before issues arise
Simplify troubleshooting and debugging
Enable autonomous provisioning safely
Efficient budget forecasting
Educated billing alerts
Continuous cost visibility
17. Key takeaways
Tracing is critically important
Don’t look at systems in isolation, think process
Enrich your telemetry with meaningful metadata
Use tooling that allows you to query and
integrate data effectively
Consolidate your telemetry data