SlideShare verwendet Cookies, um die Funktionalität und Leistungsfähigkeit der Webseite zu verbessern und Ihnen relevante Werbung bereitzustellen. Wenn Sie diese Webseite weiter besuchen, erklären Sie sich mit der Verwendung von Cookies auf dieser Seite einverstanden. Lesen Sie bitte unsere Nutzervereinbarung und die Datenschutzrichtlinie.
SlideShare verwendet Cookies, um die Funktionalität und Leistungsfähigkeit der Webseite zu verbessern und Ihnen relevante Werbung bereitzustellen. Wenn Sie diese Webseite weiter besuchen, erklären Sie sich mit der Verwendung von Cookies auf dieser Seite einverstanden. Lesen Sie bitte unsere unsere Datenschutzrichtlinie und die Nutzervereinbarung.
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane’s Butterﬂy
Debugging pathologically performing systems
Debugging system failure
• Failures are easiest to debug when they are explicit and fatal
• A system that fails fatally stops: it ceases to make forward
progress, leaving behind a snapshot of its state — a core dump
• Unfortunately, these are not all problems…
• A broad class of problems are non-fatal: the system continues
to operate despite having failed, often destroying evidence
• Worst of all are those non-fatal failures that are also implicit
Implicit, non-fatal failure
• The most difﬁcult, time-consuming bugs to debug are those in
which the system failure is unbeknownst to the system itself
• The system does the wrong thing or returns the wrong result or
has pathological side effects (e.g., resource leaks)
• Of these, the gnarliest class are those failures that are not
strictly speaking failure at all: the system is operating correctly,
but is failing to operate in a timely or efﬁcient fashion
• That is, it just… sucks
The stack of abstraction
• Our software systems are built as stacks of abstraction
• These stacks allow us to stand on the shoulders of history — to
reuse components without rebuilding them
• We can do this because of the software paradox: software is
both information and machine, exhibiting properties of both
• Our stacks are higher and run deeper than we can see or know:
software is silent and opaque; the nature of abstraction is to
seal us from what runs beneath!
• They run so deep as to challenge our deﬁnition of software…
• When the stack of abstraction performs pathologically, its power
transmogriﬁes to peril: layering ampliﬁes performance
pathologies but hinders insight
• Work ampliﬁes as we go down the stack
• Latency ampliﬁes as we go up the stack
• Seemingly minor issues in one layer can cascade into systemic
• These are the butterﬂies that cause hurricanes
Butterﬂy III: Kernel page-table isolation
Data courtesy Scaleway, running a PHP workload with KPTI patches for Linux. Thank you Edouard Bonlieu and team!
• With pathologically performing systems, we are faced with
Leventhal’s Conundrum: given a hurricane, ﬁnd the butterﬂies!
• This is excruciatingly difﬁcult:
• Symptoms are often far removed from root cause
• There may not be a single root cause but several
• The system is dynamic and may change without warning
• Improvements to the system are hard to model and verify
• Emphatically, this is not “tuning” — it is debugging
• When we think of it as debugging, we can stop pretending that
understanding (and rectifying) pathological system performance
is rote or mechanical — or easy
• We can resist the temptation to be guided by folklore: just
because someone heard about something causing a problem
once doesn’t mean it’s the problem now!
• We can resist the temptation to change the system before
understanding it: just as you wouldn’t (or shouldn’t!) debug by
just changing code, you shouldn’t debug a pathologically
performing system by randomly altering it!
How do we debug?
• To debug methodically, we must resist the temptation to quick
hypotheses, focusing rather on questions and observations
• Iterating between questions and observations gathers the facts
that will constrain future hypotheses
• These facts can be used to disconﬁrm hypotheses!
• How do we ask questions?
• How do we make observations?
• For performance debugging, the initial question formulation is
particularly challenging: where does one start?
• Resource-centric methodologies like the USE Method
(Utilization/Saturation/Errors) can be excellent starting points…
• But keep these methodologies in their context: they provide
initial questions to ask — they are not recipes for debugging
arbitrary performance pathologies!
• Questions are answered through observation
• The observability of the system is paramount
• If the system cannot be observed, one is reduced to guessing,
making changes, and drawing inferences
• If it must be said, drawing inferences based only on change is
highly ﬂawed: correlation does not imply causation!
• To be observable, systems must be instrumentable: they must
be able to be altered to emit a datum in the desired condition
Observability through instrumentation
• Static instrumentation modiﬁes source to provide semantically
relevant information, e.g., via logging or counters
• Dynamic instrumentation allows for the system to be changed
while running to emit data, e.g. DTrace, OpenTracing
• Both mechanisms of instrumentation are essential!
• Static instrumentation provides the observations necessary for
early question formulation…
• Dynamic instrumentation answers deeper, ad hoc questions
Aside: Monitoring vs. observability
• Monitoring is an essential operational activity that can indicate a
pathologically performing system and provide initial questions
• But monitoring alone is often insufﬁcient to completely debug a
pathologically performing system, because the questions that it
can answer are limited to that which is monitored
• As we increasingly deploy developed systems rather than
received ones, it is a welcome (and unsurprising!) development
to see the focus of monitoring expand to observability!
• When instrumenting the system, it can become overwhelmed
with the overhead of instrumentation
• Aggregation is essential for scalable, non-invasive
instrumentation — and is a ﬁrst-class primitive in (e.g.) DTrace
• But aggregation also eliminates important dimensions of data,
especially with respect to time; some questions may only be
answered with disaggregated data!
• Use aggregation for performance debugging — but also
understand its limits!
• The visual cortex is unparalleled at detecting patterns
• The value of visualizing data is not merely providing answers,
but also (and especially) provoking new questions
• Our systems are so large, complicated and abstract that there is
not one way to visualize them, but many
• The visualization of systems and their representations is an
essential skill for performance debugging!
• Graphs are terriﬁc — so much so that we should not restrict
ourselves to the captive graphs found in bundled software!
• An ad hoc plotting tool is essential for performance debugging;
and Gnuplot is an excellent (if idiosyncratic) one
• Gnuplot is easily combined with workhorses like awk or perl
• That Gnuplot is an essential tool helps to set expectation
around performance debugging tools: they are not magicians!
• Especially when trying to understand interplay between different
entities, it can be useful to visualize their state over time
• Time is the critical element here!
• We are experimenting with statemaps whereby state transitions
are instrumented (e.g., with DTrace) and then visualized
• This is not necessarily a new way of visualizing the system
(e.g., early thread debuggers often showed thread state over
time), but with a new focus on post hoc visualization
• Primordial implementation: https://github.com/joyent/statemap
The hurricane’s butterﬂy
• Finding the source(s) of pathologically performing systems must
be thought of as debugging — albeit the hardest kind
• Debugging isn’t about making guesses; it’s about asking
questions and answering them with observations
• We must enshrine observability to assure debuggability!
• Debugging rewards persistence, grit, and resilience more than
intuition or insight — it is more perspiration than inspiration!
• We must have the faith that our systems are — in the end —
purely synthetic; we can ﬁnd the hurricane’s butterﬂy!