The monolith to cloud-native, microservices evolution has driven a shift from monitoring to observability. OpenTelemetry, a merger of the OpenTracing and OpenCensus projects, is enabling Observability 2.0. This talk covers the fundamental concepts of observability and then demonstrates how to instrument your applications using the OpenTelemetry libraries.
2. Our
Agenda
● What is observability?
● How does OpenTelemetry relate to
observability?
● What concepts do I need to use
OpenTelemetry?
● How do I record data using
OpenTelemetry?
● Where can I send my data?
3. Level
Setting
● Are you responsible for writing software?
● Are you responsible for operating
software?
● Have you used distributed tracing
before?
● Have you used OpenCensus?
● Have you used OpenTracing?
4. Who am I?
● Kevin Brockhoff - Senior
Consultant, Daugherty Business
Solutions
○ Solving difficult cloud adoption
challenges for Daugherty's
Fortune 500 clients
○ OpenTelemetry committer since
early stages of the project
○ Github:
https://github.com/kbrockhoff
○ Linkedin:
https://www.linkedin.com/in/kevi
n-brockhoff-a557877/
6. 6
Why observability?
● Microservices create complex interactions.
● Failures don't exactly repeat.
● Debugging multi-tenancy is painful.
● Monitoring no longer can help us.
Cynefin Framework
Complex
7. 7
What is observability?
● We need to answer questions about our systems.
What characteristics did the queries that timed out at 500ms share in
common? Service versions? Browser plugins?
● Instrumentation produces data.
● Querying data answers our questions.
8. 8
Telemetry aids observability
● Telemetry data isn't observability itself.
● Instrumentation code is how we get telemetry.
● Telemetry data can include traces, logs, and/or metrics.
All different views into the same underlying truth.
9. 9
Metrics, logs, and traces, oh my!
● Metrics
○ Aggregated summary statistics.
● Logs
○ Detailed debugging information emitted by processes.
● Distributed Tracing
○ Provides insights into the full lifecycles, aka traces of requests to a system, allowing you
to pinpoint failures and performance issues.
Structured data can be transmuted into any of these!
10. 10
Metrics Concepts
● Gauges
○ Instantaneous point-in-time value (e.g.
CPU utilization)
● Cumulative counters
○ Cumulative sums of data since process
start (e.g. request counts)
● Cumulative histogram
○ Grouped counters for a range of buckets
(e.g. 0-10ms, 11-20ms)
● Rates
○ The derivative of a counter, typically. (e.g.
requests per second)
11. 11
Tracing Concepts
● Span
○ Represents a single unit of work in a
system.
● Trace
○ Defined implicitly by its spans. A trace
can be thought of as a directed acyclic
graph of spans where the edges
between spans are defined as
parent/child relationships.
● Distributed Context
○ Contains the tracing identifiers, tags, and
options that are propagated from parent
to child spans.
12. 12
Add more context to traces with Span Events
● Span Events are context-aware logging.
● An event contains timestamped information added to a span. You can
think of this as a structured log, or a way to annotate your spans with
specific details about what happened along the way.
○ Contains:
■ the name of the event
■ one or more attributes
■ a timestamp
13. 13
But how do I implement these?
● You need an instrumentation framework!
● and a place to send the data!
● and a way to visualize the data!
15. OpenCensus + OpenTracing = OpenTelemetry
● OpenCensus:
○ Provides APIs and instrumentation that allow you to collect application metrics and
distributed tracing.
● OpenTracing:
○ Provides APIs and instrumentation for distributed tracing
● OpenTelemetry:
○ An effort to combine distributed tracing, metrics and logging into a single set of system
components and language-specific libraries.
16. 16
OpenTelemetry Project
● Specification
○ API (for application developers)
○ SDK Implementations
○ Transport Protocol (Protobuf)
● Collector (middleware)
● SDK’s (various stages of maturity)
○ C++
○ C# (Auto-instrument/Manual)
○ Erlang
○ Go
○ JavaScript (Browser/Node)
○ Java (Auto-instrument/Manual)
■ Android compatibility
○ PHP
○ Python (Auto-instrument/Manual)
○ Ruby
○ Rust
○ Swift
17. 17
OTel API - packages, methods, & when to call
● Tracer
○ A Tracer is responsible for tracking the currently active span.
● Meter
○ A Meter is responsible for accumulating a collection of statistics.
● BaggageManager
○ A BaggageManager is responsible for propagating key-value pairs across systems.
You can have more than one. Why?
Ensures uniqueness of name prefixes.
23. 23
Resource SDK Concepts
● Resource is an immutable representation of the entity producing
telemetry.
○ Populated on executable startup.
○ Created by either SDK or Collector.
○ Different Resource objects can be merged.
○ Read from environment variable OTEL_RESOURCE_ATTRIBUTES is the default.
○ One copy of Resource passed to exporters with each batch..
25. 25
Baggage API Concepts
● Provides context and information to trace, metrics and logs telemetry
○ Set of name/value pairs
○ Contains info such as user, session, A/B testing parameters
○ Usually modeled as an immutable Context with each attribute add creating a new Context
○ Can be cleared if sending to untrusted process
● May be propagated across process boundaries
○ Using the header otcorrelations until W3C Baggage specification reaches
recommendation status
● BaggageManager creates new Baggage objects based on current context
27. 27
Tracing API Concepts
● TracerProvider is the entry point of the API. It provides access to Tracers.
○ Stateful object holding configuration with a global provider and possibly additional ones.
● Tracer is the class responsible for creating Spans.
○ Named and optionally versioned with each instrumentation library using values
guaranteed to be globally-unique.
○ Delegates getting active Span and marking a given Span as active to the Context.
● Span is the API to trace an operation.
○ Immutable SpanContext represents the serialized and propagated portion of a Span.
28. 28
Span Fields
● Name - Concisely identifies the work
● SpanContext - Immutable unique id
○ TraceId (16-bytes) globally-unique
○ SpanId (8-bytes) unique-within-trace
○ TraceFlags - Currently only sampled
○ TraceState - Vendor-specific info
● ParentSpan - Span which spawned this
span
● SpanKind - Relationship to other spans
○ [ CLIENT | SERVER | PRODUCER |
CONSUMER | INTERNAL ]
● StartTimestamp - Unix nanos
● EndTimestamp - Unix nanos
● Attributes collection
○ Key-value pairs describing the
operational context
○ OTel semantic conventions define
common keys
● Links collection
○ Associated non-parent spans
● Events collection
○ Significant data points during span
● Status - Success or failure details
29. 29
Span Names
Span Name Guidance
get Too general
get_account/42 Too specific
get_account Good, and account_id=42 would make a nice Span attribute
get_account/{accountId} Also good (using the “HTTP route”)
30. Span Attribute Examples
HTTP Server RDBMS Client
http.method PUT
http.scheme https
http.server_nam
e
api22.opentelemetry.io
net.host.port 443
http.target /blog/posts/774356
http.client_ip 2001:506:71f0:16e::1
http.user_agent Mozilla/5.0 (Macintosh; Intel
Mac OS X 10_15_4)
AppleWebKit/537.36 (KHTML,
like Gecko)
Chrome/81.0.4044.138
Safari/537.36
db.system mysql
db.connection_
string
Server=shopdb.example.com;Datab
ase=ShopDb;Uid=billing_user;Tab
leCache=true;UseCompression=Tru
e;MinimumPoolSize=10;MaximumPoo
lSize=50;
net.host.ip 192.0.3.122
net.host.port 51306
net.peer.ip 192.0.2.12
net.peer.port 3306
db.statement SELECT * FROM orders WHERE
order_id = :oid
31. 31
Tracing SDK Concepts
● Sampler controls the number of traces collected and sent to the backend.
○ Sampling decision values [ DROP | RECORD_ONLY | RECORD_AND_SAMPLE ]
○ SDK built-in samplers [ AlwaysOn | AlwaysOff | TraceIdRatioBased ]
○ Use AlwaysOn if doing tail-based sampling
● SpanProcessor enables hooks for span start and end method invocations.
○ SDK built-in processors [ ExportFormatConverter | Batching ]
○ Only invoked when IsRecording is true
● SpanExporter defines interface for protocol-specific exporters
32. Tracing Code - Java
Tracer tracer = OpenTelemetry.getTracer("instrumentation-library-name","semver:1.0.0");
Span span = tracer.spanBuilder("my span").startSpan();
span.setAttribute("http.method", "GET");
span.setAttribute("http.url", url.toString());
try (Scope scope = tracer.withSpan(span)) {
// your use case
Attributes eventAttributes = Attributes.of(
"key", AttributeValue.stringAttributeValue("value"),
"result", AttributeValue.longAttributeValue(0L));
span.addEvent("End Computation", eventAttributes);
} catch (Throwable t) {
Status status = Status.INTERNAL.withDescription(t.getMessage());
span.setStatus(status);
} finally {
span.end(); // closing the scope does not end the span, this has to be done manually
}
33. Tracing Code - Typescript (Web)
import { SimpleSpanProcessor } from '@opentelemetry/tracing';
import { WebTracerProvider } from '@opentelemetry/web';
import { CollectorTraceExporter } from '@opentelemetry/exporter-collector';
const collectorOptions = {
url: '<opentelemetry-collector-url>', // url is optional and can be omitted - default is
http://localhost:55681/v1/trace
headers: {}, //an optional object containing custom headers to be sent with each request
};
const provider = new WebTracerProvider();
const exporter = new CollectorTraceExporter(collectorOptions);
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();
34. Tracing Code - Python
# tracing.py
from opentelemetry import trace
from opentelemetry.exporter.otlp.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchExportSpanProcessor
trace.set_tracer_provider(TracerProvider())
otlp_exporter = OTLPSpanExporter(endpoint="localhost:55680")
trace.get_tracer_provider().add_span_processor(
BatchExportSpanProcessor(otlp_exporter)
)
tracer = trace.get_tracer(__name__)
with tracer.start_as_current_span("foo"):
with tracer.start_as_current_span("bar"):
with tracer.start_as_current_span("baz"):
print("Hello world from OpenTelemetry Python!")
35. 35
Metrics API Concepts
● MeterProvider is the entry point of the API. It provides access to Meters.
○ Stateful object holding configuration with a global provider and possibly additional ones.
● Meter is class responsible for creating and managing Instruments.
○ Named and optionally versioned with each instrumentation library using values
guaranteed to be globally-unique.
○ Provides means of capturing a batch of measurements in atomic way.
● Instrument is a device for capturing raw measurements.
○ Identified by a LabelSet (set of key-value pairs)
○ Bound instruments have pre-associated set of labels
○ SDK provides means for aggregating measurements captured by an instrument before
export.
36. 36
Metric Instrument Types
Name Instrument Kind Function(argument) Default Aggregation
Counter Synchronous additive
monotonic
Add(increment) Sum
UpDownCounter Synchronous additive Add(increment) Sum
ValueRecorder Synchronous Record(value) MinMaxSumCount / DDSketch
SumObserver Asynchronous additive
monotonic
Observe(sum) Sum
UpDownSumObserver Asynchronous additive Observe(sum) Sum
ValueObserver Asynchronous Observe(value) MinMaxSumCount / DDSketch
37. 37
Metrics SDK Concepts
● Aggregator aggregates one or more measurements in a useful way.
● Accumulator receives metric events from the API, computes one
Accumulation per active Instrument and Label Set pair.
● Processor receives Accumulations from the Accumulator, transforms into
ExportRecordSet.
● Exporter receives ExportRecordSet, transforms into some protocol and
sends it somewhere.
38. Metrics Code - Java
Meter meter = OpenTelemetry.getMeter("instrumentation-library-name","semver:1.0.0");
// Build counter e.g. LongCounter
LongCounter counter = meter.longCounterBuilder("processed_jobs").setDescription("Processed jobs").setUnit("1").build();
// It is recommended that the API user keep a reference to a Bound Counter for the entire time or
// call unbind when no-longer needed.
BoundLongCounter someWorkCounter = counter.bind(Labels.of("Key", "SomeWork"));
// Record data
someWorkCounter.add(123);
// Alternatively, the user can use the unbounded counter and explicitly
// specify the labels set at call-time:
counter.add(123, Labels.of("Key", "SomeWork"));
// Build observer e.g. LongObserver
LongObserver observer = meter.observerLongBuilder("cpu_usage").setDescription("CPU Usage").setUnit("ms").build();
observer.setCallback(
new LongObserver.Callback<LongObserver.ResultLongObserver>() {
@Override
public void update(ResultLongObserver result) {
// long getCpuUsage()
result.observe(getCpuUsage(), Labels.of("Key", "SomeWork"));
}
});
39. Metrics Code - Javascript (Node)
const { MeterProvider } = require('@opentelemetry/metrics');
const { CollectorMetricExporter } = require('@opentelemetry/exporter-collector');
const collectorOptions = {
serviceName: 'basic-service',
url: '<opentelemetry-collector-url>', // url is optional and can be omitted - default is
http://localhost:55681/v1/metrics
};
const exporter = new CollectorMetricExporter(collectorOptions);
// Register the exporter
const meter = new MeterProvider({
exporter,
interval: 60000,
}).getMeter('example-meter');
// Now, start recording data
const counter = meter.createCounter('metric_name');
counter.add(10, { 'key': 'value' });
40. Metrics Code - Python
from opentelemetry import metrics
from opentelemetry.sdk.metrics import Counter, MeterProvider
from opentelemetry.sdk.metrics.export import ConsoleMetricsExporter
from opentelemetry.sdk.metrics.export.controller import PushController
metrics.set_meter_provider(MeterProvider())
meter = metrics.get_meter(__name__, True)
controller = PushController(meter, ConsoleMetricsExporter(), 5)
requests_counter = meter.create_metric(
name="requests",
description="number of requests",
unit="1",
value_type=int,
metric_type=Counter,
label_keys=("environment",),
)
requests_counter.add(25, staging_labels)
41. 41
Logging Concepts
● No API, use existing libraries
● Log events enhanced with key-value pairs from SpanContext and
Baggage by either SDK or Collector processor.
43. 43
Exporter Wiring
● Export to console logging or in-memory
○ Best solution for unit tests
● Export directly to observability frontend
○ Simple architecture but side processing in the application
● Export to otel-collector running as agent/sidecar which forwards to main
otel-collector
○ Static configuration because exporter always pointed to localhost
● Export to otel-collector running as an aggregator
○ All telemetry processing done in isolation from user-facing functionality
Copyright 2020, The OpenTelemetry Authors
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.