3. ABOUT OF THE AUTHOR
• Chief Application Architect at MapR
Technologies
• Ph.D. in computing science from the University
of Sheffield
• Committer on Mahout, Drill, Zookeeper …
• http://tdunning.blogspot.de/
• @ted_dunning
5. ANOMALY DETECTION
99.9%-ile
Online summarizer
(t-digest)
99.9%-ile
t
x > t?
x
!
• The t-digest algorithm was developed byTed Dunning and available in Apache Machout
• With t-digest algorithm one can accurately estimate quantiles for very large data sets
with limited memory use
input signal
9. WHAT IS NORMAL?
• We need to have a model of what is normal
• Everything that doesn’t fit model is the anomaly
• For simple signals we can assume just normal distribution
12. WINDOWS
• Set of windowed signals - model of the original signal
• Clustering can find the prototypes
• The result is a dictionary of shapes
• New signals can be encoded by shifting, scaling and adding
shapes from the dictionary
23. READ MORE?
• A New Look At Anomaly
Detection
• This is the second book in the
series Practical Machine Learning by
Ted Dunning & Ellen Friedman
• FREE download from
www.mapr.com