4. 4
TYPES OF ANOMALIES IN STREAMING DATA
Point anomalies
Temporal
(contextual/
conditional)
5. 5
ANOMALY DETECTION TECHNIQUES
• Traditional techniques
• Classification-based
• Clustering & nearest-neighbor
• Statistical techniques
• Chandola et al., “Anomaly Detection: A Survey”
• In streaming we typically see a collection of statistical techniques
• time-series modeling and forecasting models (e.g. ARIMA)
• change point detection
• outliers tests (e.g. ESD, k-sigma)
• Most techniques not suitable for streaming data
• new approaches needed
• non-streaming benchmarks aren't very useful
6. 6
WHY CREATE A BENCHMARK?
• A benchmark consists of:
• Labeled data files
• Scoring mechanism
• Versioning system
• Most existing benchmarks are designed for batch data, not
streaming data
• We saw a need for a benchmark that is designed to test anomaly
detection algorithms on real-time, streaming data
• Hard to find benchmarks containing real world data labeled with
anomalies
• Impact of published techniques suffers because researchers use
use different data, and/or completely artificial data.
• A standard community benchmark could spur innovation in real-
time anomaly detection algorithms
7. 7
NUMENTA ANOMALY BENCHMARK (NAB)
• NAB: a rigorous benchmark for anomaly
detection in streaming applications
• Real-world benchmark dataset
• 58 labeled data streams
(47 real-world, 11 artificial streams)
• Total of 365,551 data points
• Scoring mechanism
• Custom scoring function
• Reward early detection
• Anomaly windows
• Different “application profiles”
• Open resource
• AGPL repository contains data, source code,
and documentation
• github.com/numenta/NAB!
11. 11
HOW SHOULD WE SCORE ANOMALIES?
• The perfect detector
• Detects every anomaly
• Detects anomalies as soon as possible
• tremendous value to detecting anomalies beforehand
• Provides detections in real time
• Triggers no false alarms
• Requires no parameter tuning
• can’t manually tune params because potentially thousands of models
• Automatically adapts to changing statistics
• e.g. servers get new SW
12. 12
HOW SHOULD WE SCORE ANOMALIES?
• Scoring methods in traditional benchmarks are insufficient
• Precision, recall, and F1-score do not incorporate the value of time
• early detections are not rewarded
• Artificial separation into training and test sets does not handle continuous learning
• Batch data files allow look ahead and multiple passes through the data
• this is unrealistic for real-world use
15. 15
• Effect of each detection is scaled
relative to position within window:
• Detections outside window are false
positives (scored low)
• Multiple detections within window are
ignored (use earliest one)
• Total score is sum of scaled detections
+ weighted sum of missed detections:
SCORING FUNCTION
16. 16
OTHER DETAILS
• Application profiles
• Application profiles assign different weightings based on the tradeoff between false
positives and false negatives.
• EKG data on a cardiac patient favors FPs over FNs.
• IT / DevOps professionals hate FPs.
• Three application profiles: standard, favor low false positives, favor low false negatives.
• NAB emulates practical real-time scenarios
• Look ahead not allowed for algorithms. Detections must be made on the fly.
• No separation between training and test files. Invoke model, start streaming, and go.
• No batch, per data file, parameter tuning. Must be fully automated with single set of
parameters across data files. Any further parameter tuning must be done on the fly.
17. 17
TESTING ALGORITHMS WITH NAB
• NAB is a community effort
• The goal is to have researchers independently evaluate a large number of algorithms
• Very easy to plug in and test new algorithms
• Seed results with three algorithms:
• Hierarchical Temporal Memory
• Numenta’s open source streaming anomaly detection algorithm
• Models temporal sequences in data, continuously learning
• Etsy Skyline
• Popular open source anomaly detection technique
• Mixture of statistical experts, continuously learning
• Twitter AnomalyDetection
• Open source anomaly detection released earlier this year
• Robust outlier statistics + piecewise approximation
19. 19
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
20. 20
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure
21. 21
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
22. 22
SUMMARY
• Anomaly detection is most common application for streaming analytics
• NAB is a community benchmark for streaming anomaly detection
• Includes a labeled dataset with real data
• Scoring methodology designed for practical real-time applications
• Fully open source codebase
• What can you get out of NAB?
• Test and improve your algorithms
• Contribute and improve NAB
• Learn about streaming anomaly detection
23. 23
SUMMARY
• What’s next for NAB?
• We hope to see researchers test additional algorithms
• We hope to spark improved algorithms for streaming
• More data sets!
• Could incorporate UC Irvine dataset, Yahoo labs dataset (not open source)
• Would love to get more labeled streaming datasets from you
• Add support for multivariate anomaly detection
• Any changes that affect the results will be released with v2.0
24. 24
NAB RESOURCES
Repository: github.com/numenta/NAB
Paper:
A. Lavin and S. Ahmad, “Evaluating Real-time Anomaly Detection Algorithms –
the Numenta Anomaly Benchmark,” to appear in 14th International Conference
on Machine Learning and Applications (IEEE ICMLA’15), 2015.
Preprint available: arxiv.org/abs/1510.03336
Presentation from MLConf:
https://www.youtube.com/watch?v=SxtsCrTHz-4
Contact info:
nab@numenta.org
alavin@numenta.com, sahmad@numenta.com
26. 26
NUMENTA RESOURCES
• “Properties of Sparse Distributed Representations and their Application to
Hierarchical Temporal Memory”: http://arxiv.org/abs/1503.07469
• “Why Neurons Have Thousands of Synapses, A Theory of Sequence
Memory in Neocortex”: http://arxiv.org/abs/1511.00083
• NuPIC: Numenta Platform for Intelligent Computing open source repo
• https://github.com/numenta/nupic
• http://numenta.org/
• Numenta
• http://numenta.com/
• HTM Whitepaper:
http://numenta.com/learn/hierarchical-temporal-memory-white-paper.html
27. 27
NAB EXAMPLES
• Figs. 1, 2, 5 from the paper: plot.ly/~alavin/3767
• Fig. 4 from the paper: plot.ly/~alavin/3753
• Fig. 6 from the paper: plot.ly/~alavin/3706
• Subtle change in CPU utilization that precedes a much larger anomaly:
plot.ly/~alavin/3720
• An anomaly preceding a much larger drop in CPU utilization: plot.ly/
~alavin/3717
• All three detectors get the two TPs, but in different orders: plot.ly/~alavin/
3741
• Good detections by HTM, but a lot of FPs: plot.ly/~alavin/3711
• Noisy, difficult CPU utilization data: plot.ly/~alavin/3761
• Temporal anomalies in spiking social media data: plot.ly/~alavin/3815
• No true anomalies, but FP detections in CPU utilization data: https://plot.ly/
~alavin/3723
29. 29
• Scoring example
a) FP before the window
b) TP in the window
c) additional TP (not counted)
d) FP soon after the window
e) FP long after the window
Ø total score = -1.809
• Missing a window
completely (i.e. FN)
detriments the score
-1.0
SCALED SIGMOID SCORING FUNCTION
29
(a)
(c)
(d)
(e)
(b)
30. 30
ANOMALY DETECTION WITH HTM
• How do we turn a data stream into anomaly scores?
HTM Algorithms
Encoder SDR Predictions
Raw anomaly score
Anomaly likelihood
Data
31. 31
CALCULATING RAW ANOMALY SCORE
• Raw anomaly score is the fraction of active columns that were not
predicted.
• This is high when the spatial or temporal patterns deviate from the
norm.
rawAnomalyScore =
At −(Pt−1 ∩ At )
At
Pt = Predicted columns at time t
At = Active columns at time t
33. 33
• Compute normal distribution over history
• Compute probability for each point relative to the distribution
CALCULATING ANOMALY LIKELIHOOD
µ = xP(x)∑ σ = E[(X −µ)2
]
34. 34
CALCULATING ANOMALY LIKELIHOOD
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Probability
Probability
Distribu.on
Mean 0.0201
Std. Dev. 0.1237
0
0.2
0.4
0.6
0.8
1
Raw
Anomaly
Score