SlideShare ist ein Scribd-Unternehmen logo
1 von 36
Downloaden Sie, um offline zu lesen
@xaprb
Why Nobody Cares About

Your Anomaly Detection
Baron Schwartz - November 2017
https://www.flickr.com/photos/muelebius/14113267399
@xaprb
Skepticism From John Allspaw
2
“… your attempts to detect anomalies perfectly, at the right time, is not possible…”
https://www.kitchensoap.com/2015/05/01/openlettertomonitoringproducts/
@xaprb
…And Ewaschuk and Beyer
“In general, Google has trended toward simpler and faster monitoring
systems, with better tools for post hoc analysis. We avoid ‘magic’ systems
that try to learn thresholds or automatically detect causality.”

— The Google SRE book: Monitoring Distributed Systems Chapter
3
@xaprb
… But Not This Vendor
4
@xaprb
What Good Is Anomaly Detection?
• How does it work?

• Why is it so hard?

• What’s it good for anyway?
5
@xaprb
A Rose By Any Other Name
• “Machine Learning”

• “Dynamic Baselining”

• “Automatic Thresholds”

• “Adaptive Self-Learning Serverless IoT Big Data Blockchain”
6
@xaprb
How Anomaly Detection Works
• An anomaly is usually defined as “something abnormal.”

• Normal is usually defined by a mathematical model.

• Anomaly detection, in this sense, is really prediction/forecasting.
7
@xaprb
What’s Normal?
• Most people answer this question reflexively, with lots of unconscious
biases.

• The answer is usually “if a measurement is ± two standard deviations…”

• What’s implicit/assumed is:

• What’s the model that produces the forecast?

• What assumptions does it make about the data?

• What’s the cost/benefit of correct/incorrect predictions?
8
@xaprb
The Ad Nauseum Anomaly Picture
Pretty pictures with shaded bands! :-)
9
@xaprb
A More Useful Definition of Anomaly
An anomaly is an event that has impact greater than the cost of remediation,
and which is actionable by a person.

Restated: people always think they want to know what’s abnormal/weird, but
they really want to know what’s wrong and what to fix.
They don’t realize this till they experience being notified of abnormalities.
10
Why Is It Hard?
@xaprb
#1: Real-Time Often Isn’t
• We often assume anomaly detection “in real time” is possible/desirable.

• But what does that mean? People’s definitions vary wildly.

“Why checking your KPI several times a day? To detect problems as fast
as possible.”
12
@xaprb
#2: Real-Time Data Is Noisy
The beautiful charts always seem to come from long timescales, on the order
of days or weeks. At the 1-second time scale, systems are incredibly noisy.
13
@xaprb
#3: Cost/Benefit Asymmetry
• What’s the benefit of a true positive or true negative? What’s the cost?

• The sensitivity/specificity tradeoff is very unbalanced.

• And because your systems are much noisier than you think, you’re
probably wrong about the number of false positives/negatives you’ll get.

• The signal-to-noise ratio turns out to be really poor.

• Even if the anomaly detection isn’t wrong, if it’s not actionable, it’s still
damaging.
14
@xaprb
#4: Results Aren’t Interpretable
• Most anomaly detection techniques use complex models that are black
boxes combining many moving pieces, many of which are
nondeterministic.

• It’s often nearly impossible to agree or disagree with the outcome.

• Even a simple exponential moving average can be hard to audit.
15
@xaprb
#5: High Cognitive Load
• Systems that abstract/process data and present black-box outcomes are
difficult for engineering teams to act on.

• In firefights, uncertainty, stress, time pressure, and consequences are all
at very high levels.

• Engineers generally will work to reduce these factors, which means they
ignore abstract, non-auditable conclusions they aren’t sure whether to
trust.

• Engineers usually want interpretable, raw data.
16
@xaprb
#6: Highly Dynamic Systems
• Most systems exhibit trainable periodicity on the scale of weeks, but
many such systems have useful lifetimes in the order of hours or days
before the underlying model disappears or changes.

• This means a lot of anomaly detection techniques are obsolete before
they’re even usable.
17
@xaprb
#7: Stored Baselines
• If a product calculates “baselines,” should it store them or calculate on-
the-fly?

• If stored, they become obsolete if the system’s parameters/model
changes, or if the algorithm is upgraded.

• If derived, they’re often not practically computable, or unavailable for use
in many popular tools that can only read “real” metrics from storage.
18
@xaprb
#8: Anomalies Skew Forecasts
• Most feasible models predict things like trend and seasonality.

• Anomalies will perturb these models and cause them to forecast repeated
anomalies.

• Compensating for these factors makes the models a lot less feasible and
understandable.
19
@xaprb
#9: Vendor Hype
When the vendor obviously uses Holt-Winters Forecasting, but calls it
“machine learning” (presumably ML is used to choose params?)…

When a familiar technique like K-Means Clustering is called Artificial
Intelligence…

… we all lose confidence and credibility in the eyes of users.

… and our users have expectations we can’t realistically meet.
20
What’s It Good For?
@xaprb
First - Why Do People Want It?
1. They’ve got a LOT of metrics and can’t look at it all.

2. Vendors and conference thought-leaders told them anomaly detection
worked well.

3. They’ve had problems, noticed a metric spiking, and thought “if only
we’d known sooner about that.”

4. They’re engineers, so they think “this has to be a solvable problem.”
23
@xaprb
#1: Very Specific, Targeted Uses
• You have an absolutely critical, sensitive high-level KPI like pageviews

• Fast-moving data that’s extremely predictable and consistent

• You have validated the exact behavior and expect it to be immortal
24
@xaprb
#2: Capacity Planning
• This is forecasting, not anomaly detection.

• This is an important use case for Netflix,
Twitter, and others.

• Question: is a Christmas

spike an anomaly?
25
@xaprb
#3: You Have A Team Of Data Scientists
It’s not a coincidence that many of the anomaly detection success stories
have dedicated, full time data science teams. With PhDs.
26
@xaprb
#4: Context, Not Detection
• When you’re troubleshooting an incident, and you see a spike in a metric,
a great question is “what does this metric normally do?”

• On-the-fly calculation and visualization of that answer can be helpful.

• The mistake is to take it one step too far and think “I wish I could set an
alert on this…”
27
@xaprb
“What Does This Metric Normally Do?”
28
1 Hour
12 Hours
@xaprb
#5: You Have A Specific Question
In my experience, a lot of the ills have come from thinking anomaly detection
is an answer, when the question/problem isn’t clear yet.
29
@xaprb
#6: If You Can’t Get It Any Other Way
Are you sure you need anomaly detection?

• Scenario: “Our rate of new-account signups per minute is a business KPI,
and we want to know if it’s broken for any reason. It’s highly cyclical and
predictable.”

• Solution 1: “This sounds ideal for time-series prediction, maybe with Holt-
Winters, and anomaly detection when there’s a deviation from the
prediction.”

• Solution 2: “Calculate the pageview:signup conversion rate by dividing two
series, and alert if it drops, using a static threshold.” (See also next page)
30
@xaprb
Ask A 2-Dimensional Question
Instead of “what’s this metric’s behavior?” you’re asking

“what’s this metric’s relationship to another?”
31 https://www.vividcortex.com/blog/correlating-metrics
@xaprb
PerlMonitoring Problems
$problems =~ s/regular expressions?/anomaly detection/gi
32 https://xkcd.com/1171/
@xaprb
A War Story
At VividCortex, we have (had) two kinds of anomaly detection.

• First, we built adaptive fault detection. It applies anomaly detection to a model
based on Little’s Law and queueing theory. It assigns specific meaning to a few
specific metrics that have an underlying physical basis. 

• The outcome has a well defined meaning too: “work is queueing up.”

• It turned out to be really hard to get the false positive rate down, even in this well-
controlled setting. It requires machine learning (!!).

• The result is still more difficult for customers to interpret than we’d like. “Can I set
my own threshold? What does it mean for this one to be bigger than that one?
What does the score really mean? What should I do about these? Can’t you just…”
33
@xaprb
Traditional Dynamic Baselines
At VividCortex we also built limited “dynamic baselining” on top of modified
Holt-Winters prediction.

• We baselined latency and error rate of the most frequent and time-
consuming queries in the system.

• Customers don’t use it, even though it remains a constant hypothetical
request (“I’d like to be alerted when important queries have significant
latency spikes.”)

• This is probably a case of customers asking for a faster horse. It’s also
possible that we just didn’t implement it well enough.
34
@xaprb
Okay, There Was A Third…
• The brilliant CEO built “Baggins” anomaly detection, then turned it off in
horror at the spam it generated.

• The cleverest thing about it was the name.
35
@xaprb
Some Books
36

Weitere ähnliche Inhalte

Was ist angesagt?

The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.Theo Schlossnagle
 
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios AnalyticsNagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios AnalyticsNagios
 
LOPSA East 2013 - Building a More Effective Monitoring Environment
LOPSA East 2013 - Building a More Effective Monitoring EnvironmentLOPSA East 2013 - Building a More Effective Monitoring Environment
LOPSA East 2013 - Building a More Effective Monitoring EnvironmentMike Julian
 
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...CODE BLUE
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Salesforce Engineering
 

Was ist angesagt? (8)

The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
Craftsmanship
CraftsmanshipCraftsmanship
Craftsmanship
 
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios AnalyticsNagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
Nagios Conference 2012 - Alexis Le Quoc - Deep Dive into Nagios Analytics
 
LOPSA East 2013 - Building a More Effective Monitoring Environment
LOPSA East 2013 - Building a More Effective Monitoring EnvironmentLOPSA East 2013 - Building a More Effective Monitoring Environment
LOPSA East 2013 - Building a More Effective Monitoring Environment
 
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
Making & Breaking Machine Learning Anomaly Detectors in Real Life by Clarence...
 
Is this normal?
Is this normal?Is this normal?
Is this normal?
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?Probabilistic Programming: Why, What, How, When?
Probabilistic Programming: Why, What, How, When?
 

Ähnlich wie Influx/Days 2017 San Francisco | Baron Schwartz

Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...tboubez
 
Data Science unit 2 By: Professor Lili Saghafi
Data Science unit 2 By: Professor Lili SaghafiData Science unit 2 By: Professor Lili Saghafi
Data Science unit 2 By: Professor Lili SaghafiProfessor Lili Saghafi
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Charity Majors
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Simplify Your Life with CQRS
Simplify Your Life with CQRSSimplify Your Life with CQRS
Simplify Your Life with CQRSJoel Mason
 
Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson
 
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale DataPredicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Dataphilippbayer
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionBigML, Inc
 
Critical Thinking for Software Testers
Critical Thinking for Software TestersCritical Thinking for Software Testers
Critical Thinking for Software TestersTechWell
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...Drew Miller
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systemsPeter Varhol
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxChitrachitrap
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk KnowledgeKrishna Sankar
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tpseudor00t overflow
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwareTheo Schlossnagle
 

Ähnlich wie Influx/Days 2017 San Francisco | Baron Schwartz (20)

L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Data Science unit 2 By: Professor Lili Saghafi
Data Science unit 2 By: Professor Lili SaghafiData Science unit 2 By: Professor Lili Saghafi
Data Science unit 2 By: Professor Lili Saghafi
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)Observability for Emerging Infra (what got you here won't get you there)
Observability for Emerging Infra (what got you here won't get you there)
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Simplify Your Life with CQRS
Simplify Your Life with CQRSSimplify Your Life with CQRS
Simplify Your Life with CQRS
 
Anomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep LearningAnomaly Detection and Automatic Labeling with Deep Learning
Anomaly Detection and Automatic Labeling with Deep Learning
 
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale DataPredicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
Predicting Gene Loss in Plants: Lessons Learned From Laptop-Scale Data
 
Rus agro elpis
Rus agro elpisRus agro elpis
Rus agro elpis
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
Critical Thinking for Software Testers
Critical Thinking for Software TestersCritical Thinking for Software Testers
Critical Thinking for Software Testers
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
November 15th 2018 denver cu seminar (drew miller) ai robotics cryptocurrency...
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
Unit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptxUnit 1-ML (1) (1).pptx
Unit 1-ML (1) (1).pptx
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00tDefcon 21-pinto-defending-networks-machine-learning by pseudor00t
Defcon 21-pinto-defending-networks-machine-learning by pseudor00t
 
Put Some SRE in Your Shipped Software
Put Some SRE in Your Shipped SoftwarePut Some SRE in Your Shipped Software
Put Some SRE in Your Shipped Software
 

Mehr von InfluxData

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB ClusteredInfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBInfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackInfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedInfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineInfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022InfluxData
 

Mehr von InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
 

Kürzlich hochgeladen

Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...kajalverma014
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptxAsmae Rabhi
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdfMatthew Sinclair
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样ayvbos
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Roommeghakumariji156
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查ydyuyu
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxgalaxypingy
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdfMatthew Sinclair
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 

Kürzlich hochgeladen (20)

Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
best call girls in Hyderabad Finest Escorts Service 📞 9352988975 📞 Available ...
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx75539-Cyber Security Challenges PPT.pptx
75539-Cyber Security Challenges PPT.pptx
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
PowerDirector Explination Process...pptx
PowerDirector Explination Process...pptxPowerDirector Explination Process...pptx
PowerDirector Explination Process...pptx
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
20240507 QFM013 Machine Intelligence Reading List April 2024.pdf
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 

Influx/Days 2017 San Francisco | Baron Schwartz

  • 1. @xaprb Why Nobody Cares About
 Your Anomaly Detection Baron Schwartz - November 2017 https://www.flickr.com/photos/muelebius/14113267399
  • 2. @xaprb Skepticism From John Allspaw 2 “… your attempts to detect anomalies perfectly, at the right time, is not possible…” https://www.kitchensoap.com/2015/05/01/openlettertomonitoringproducts/
  • 3. @xaprb …And Ewaschuk and Beyer “In general, Google has trended toward simpler and faster monitoring systems, with better tools for post hoc analysis. We avoid ‘magic’ systems that try to learn thresholds or automatically detect causality.” — The Google SRE book: Monitoring Distributed Systems Chapter 3
  • 4. @xaprb … But Not This Vendor 4
  • 5. @xaprb What Good Is Anomaly Detection? • How does it work? • Why is it so hard? • What’s it good for anyway? 5
  • 6. @xaprb A Rose By Any Other Name • “Machine Learning” • “Dynamic Baselining” • “Automatic Thresholds” • “Adaptive Self-Learning Serverless IoT Big Data Blockchain” 6
  • 7. @xaprb How Anomaly Detection Works • An anomaly is usually defined as “something abnormal.” • Normal is usually defined by a mathematical model. • Anomaly detection, in this sense, is really prediction/forecasting. 7
  • 8. @xaprb What’s Normal? • Most people answer this question reflexively, with lots of unconscious biases. • The answer is usually “if a measurement is ± two standard deviations…” • What’s implicit/assumed is: • What’s the model that produces the forecast? • What assumptions does it make about the data? • What’s the cost/benefit of correct/incorrect predictions? 8
  • 9. @xaprb The Ad Nauseum Anomaly Picture Pretty pictures with shaded bands! :-) 9
  • 10. @xaprb A More Useful Definition of Anomaly An anomaly is an event that has impact greater than the cost of remediation, and which is actionable by a person. Restated: people always think they want to know what’s abnormal/weird, but they really want to know what’s wrong and what to fix. They don’t realize this till they experience being notified of abnormalities. 10
  • 11. Why Is It Hard?
  • 12. @xaprb #1: Real-Time Often Isn’t • We often assume anomaly detection “in real time” is possible/desirable. • But what does that mean? People’s definitions vary wildly. “Why checking your KPI several times a day? To detect problems as fast as possible.” 12
  • 13. @xaprb #2: Real-Time Data Is Noisy The beautiful charts always seem to come from long timescales, on the order of days or weeks. At the 1-second time scale, systems are incredibly noisy. 13
  • 14. @xaprb #3: Cost/Benefit Asymmetry • What’s the benefit of a true positive or true negative? What’s the cost? • The sensitivity/specificity tradeoff is very unbalanced. • And because your systems are much noisier than you think, you’re probably wrong about the number of false positives/negatives you’ll get. • The signal-to-noise ratio turns out to be really poor. • Even if the anomaly detection isn’t wrong, if it’s not actionable, it’s still damaging. 14
  • 15. @xaprb #4: Results Aren’t Interpretable • Most anomaly detection techniques use complex models that are black boxes combining many moving pieces, many of which are nondeterministic. • It’s often nearly impossible to agree or disagree with the outcome. • Even a simple exponential moving average can be hard to audit. 15
  • 16. @xaprb #5: High Cognitive Load • Systems that abstract/process data and present black-box outcomes are difficult for engineering teams to act on. • In firefights, uncertainty, stress, time pressure, and consequences are all at very high levels. • Engineers generally will work to reduce these factors, which means they ignore abstract, non-auditable conclusions they aren’t sure whether to trust. • Engineers usually want interpretable, raw data. 16
  • 17. @xaprb #6: Highly Dynamic Systems • Most systems exhibit trainable periodicity on the scale of weeks, but many such systems have useful lifetimes in the order of hours or days before the underlying model disappears or changes. • This means a lot of anomaly detection techniques are obsolete before they’re even usable. 17
  • 18. @xaprb #7: Stored Baselines • If a product calculates “baselines,” should it store them or calculate on- the-fly? • If stored, they become obsolete if the system’s parameters/model changes, or if the algorithm is upgraded. • If derived, they’re often not practically computable, or unavailable for use in many popular tools that can only read “real” metrics from storage. 18
  • 19. @xaprb #8: Anomalies Skew Forecasts • Most feasible models predict things like trend and seasonality. • Anomalies will perturb these models and cause them to forecast repeated anomalies. • Compensating for these factors makes the models a lot less feasible and understandable. 19
  • 20. @xaprb #9: Vendor Hype When the vendor obviously uses Holt-Winters Forecasting, but calls it “machine learning” (presumably ML is used to choose params?)… When a familiar technique like K-Means Clustering is called Artificial Intelligence… … we all lose confidence and credibility in the eyes of users. … and our users have expectations we can’t realistically meet. 20
  • 22.
  • 23. @xaprb First - Why Do People Want It? 1. They’ve got a LOT of metrics and can’t look at it all. 2. Vendors and conference thought-leaders told them anomaly detection worked well. 3. They’ve had problems, noticed a metric spiking, and thought “if only we’d known sooner about that.” 4. They’re engineers, so they think “this has to be a solvable problem.” 23
  • 24. @xaprb #1: Very Specific, Targeted Uses • You have an absolutely critical, sensitive high-level KPI like pageviews • Fast-moving data that’s extremely predictable and consistent • You have validated the exact behavior and expect it to be immortal 24
  • 25. @xaprb #2: Capacity Planning • This is forecasting, not anomaly detection. • This is an important use case for Netflix, Twitter, and others. • Question: is a Christmas
 spike an anomaly? 25
  • 26. @xaprb #3: You Have A Team Of Data Scientists It’s not a coincidence that many of the anomaly detection success stories have dedicated, full time data science teams. With PhDs. 26
  • 27. @xaprb #4: Context, Not Detection • When you’re troubleshooting an incident, and you see a spike in a metric, a great question is “what does this metric normally do?” • On-the-fly calculation and visualization of that answer can be helpful. • The mistake is to take it one step too far and think “I wish I could set an alert on this…” 27
  • 28. @xaprb “What Does This Metric Normally Do?” 28 1 Hour 12 Hours
  • 29. @xaprb #5: You Have A Specific Question In my experience, a lot of the ills have come from thinking anomaly detection is an answer, when the question/problem isn’t clear yet. 29
  • 30. @xaprb #6: If You Can’t Get It Any Other Way Are you sure you need anomaly detection? • Scenario: “Our rate of new-account signups per minute is a business KPI, and we want to know if it’s broken for any reason. It’s highly cyclical and predictable.” • Solution 1: “This sounds ideal for time-series prediction, maybe with Holt- Winters, and anomaly detection when there’s a deviation from the prediction.” • Solution 2: “Calculate the pageview:signup conversion rate by dividing two series, and alert if it drops, using a static threshold.” (See also next page) 30
  • 31. @xaprb Ask A 2-Dimensional Question Instead of “what’s this metric’s behavior?” you’re asking
 “what’s this metric’s relationship to another?” 31 https://www.vividcortex.com/blog/correlating-metrics
  • 32. @xaprb PerlMonitoring Problems $problems =~ s/regular expressions?/anomaly detection/gi 32 https://xkcd.com/1171/
  • 33. @xaprb A War Story At VividCortex, we have (had) two kinds of anomaly detection. • First, we built adaptive fault detection. It applies anomaly detection to a model based on Little’s Law and queueing theory. It assigns specific meaning to a few specific metrics that have an underlying physical basis. • The outcome has a well defined meaning too: “work is queueing up.” • It turned out to be really hard to get the false positive rate down, even in this well- controlled setting. It requires machine learning (!!). • The result is still more difficult for customers to interpret than we’d like. “Can I set my own threshold? What does it mean for this one to be bigger than that one? What does the score really mean? What should I do about these? Can’t you just…” 33
  • 34. @xaprb Traditional Dynamic Baselines At VividCortex we also built limited “dynamic baselining” on top of modified Holt-Winters prediction. • We baselined latency and error rate of the most frequent and time- consuming queries in the system. • Customers don’t use it, even though it remains a constant hypothetical request (“I’d like to be alerted when important queries have significant latency spikes.”) • This is probably a case of customers asking for a faster horse. It’s also possible that we just didn’t implement it well enough. 34
  • 35. @xaprb Okay, There Was A Third… • The brilliant CEO built “Baggins” anomaly detection, then turned it off in horror at the spam it generated. • The cleverest thing about it was the name. 35