2. What weâll cover
â˘What is Anomaly Detection?
â˘Whatâs an anomaly?
â˘Detecting Anomalies
â˘Methods and Applications
3. What is Anomaly Detection?
credit card fraud insurance fraud
image processing intrusion detection (cybersecurity)
text analysis sensor networks
insider threats industrial damage
⢠Trying to ďŹnd patterns in data that are different from the
expected.Â
⢠Some applications:Â
4.
5.
6.
7.
8.
9.
10.
11. Detecting Anomalies
So, how would we detect some of these? Letâs take a
naive approach.
1. DeďŹne a ânormalâ region.Â
2. Observations not in the ânormalâ region are
anomalies.Â
12. Will this work?Â
⢠Boundary hard to deďŹne
⢠DeďŹnitions change over time
⢠DeďŹnitions are domain-dependent
⢠Labeled training data is hard to ďŹnd
⢠Training data, is often heavily imbalanced
13. Types of Data
⢠Collection of data instances
⢠a data instance has a set of attributes
⢠Attributes can be of different types
⢠binary
⢠categorical
⢠continuous
14. ⢠The attributes help determine the detection
method.
⢠The relationship between data instances is
important.
⢠Most existing anomaly detection techniques donât
assume any particular relationship between the
data instances. We have to identify relationships.
15. Types of input data
⢠Sequential
⢠time-series, sequences of symbols
⢠Spatial
⢠each data instance is related to its neighbors
⢠images, vehicular trafďŹc
⢠Graph
⢠data instances are nodes in a graph or network
16. Three Types of Anomalies
⢠đ There are only three.Â
⢠đ No, that doesnât make it any easier to detect
them.
⢠Point anomaly
⢠Contextual anomaly
⢠Collective anomaly
17. Point Anomaly
⢠Generally a single data instance.Â
⢠Anomalous compared to the entirety of the data
⢠Most research focuses on point anomalies
⢠Can occur in any dataset
18. Contextual Anomaly
⢠Anomalous in relation to a speciďŹc context
⢠Context comes from how data is structured
⢠Context has to be speciďŹed as a part of the problem
formulation
⢠Each data instance can be deďŹned using two sets of
attributes:
⢠contextual: determines the context (e.g. lat/long or time)
⢠behavioral: non-contextual characteristics of an instance
19. ⢠Anomalous behavior is determined by the
behavioral attributes within a speciďŹc context
⢠A data instance might be a contextual anomaly in a
given context, but a data instance with identical
behavioral attributes could be considered normal in
a different context.Â
20. ⢠Contextual anomalies are generally found in time-
series data. Example:
⢠Avg monthly temp. of an area over last few years.
⢠35 degrees F in winter might be normal
⢠35 degrees F in summer in same place is
anomalous
21.
22. ⢠Another example: Credit card fraud
⢠Contextual attribute: time of purchase.Â
⢠$100 average weekly shopping bill, except during
the Christmas week, when it reaches $1000.Â
⢠A new purchase of $1000 in July would be
considered a contextual anomaly, since itâs
unusual for July.Â
⢠The same amount spent during Christmas week
will be considered normal.
23. Collective Anomaly
⢠A group of data instances are anomalous
⢠They need not be anomalies by themselves
⢠Again, the relationship between the data matters
⢠Point | Collective problem + context = Contextual
problem
24. Three Types of
Anomaly Detection Methods
⢠Supervised
⢠Use labeled training data to build a predictive model
⢠Imbalanced data (many normal, few anomalies)
⢠Semi-Supervised
⢠Only need normal data
⢠Model learns how to classify normal data
⢠Unsupervised (no labeled data)
26. Credit Card Fraud
Data used
⢠user ID
⢠amount spent
⢠time between consecutive card usage
Credit card companies have complete, labeled data andÂ
user proďŹles
27. Kinds of anomaliesÂ
⢠point anomalies in transaction records
âŚhigh payments
âŚitems never before purchased by the user
âŚhigh rate of purchase
⢠contextual anomalies
âŚUser deďŹnes the context
⪠Each credit card user is proďŹled based on card usage
history.Â
⪠Each new transaction compared to user proďŹle,
ďŹagged if it doesnât match
âŚLocation deďŹnes the context
⪠Detects anomalies among transactions at a speciďŹc
geographic location.Â
28. Cellphone Fraud
Data usedÂ
⢠Call data records (CDRs)
⢠CDR = vector of features
âŚcontinuous (e.g., CALL-DURATION)
âŚdiscrete (e.g., CALLING-CITY).Â
Kinds of anomalies
⢠point anomalies from aggregated CDR data
âŚaggregated by time, user, or area
âŚhigh volume of calls
âŚcalls made to unlikely destinations
29. Insider Trading
Data used
⢠Option trading data
⢠Stock trading data
⢠News
⢠Data is time-series or otherwise temporally sequenced.
30. Medical
⢠Patient records
⌠Electronic Health Records (EHRs)
âŚdemographics, medical history, medication and allergies,
immunization status, laboratory test results, radiology images,
vital signs, personal statistics like age and weight, and
billing information
⌠Electrocardiograms (ECG) and Electroencephalograms
(EEG)
⢠Temporal and/or spatial dataÂ
31. Types of anomalies
⢠point anomalies
⌠e.g., abnormal patient condition, instrumentation errors,
recording errors
⢠contextual
⌠Disease outbreaks can be contextual anomaliesÂ
(e.g. geo-temporal pattern of viral infections)Â
⢠collective
32.
33. ⢠False negatives can cost $$$ and lives
⢠A colleague (David Gilmore) said:Â
⢠"Precision saves money, recall saves lives."
35. ClassiďŹcation
⢠Train a model from labeled data (supervised)
⢠Use the model to classify other data
⢠Many different ways to do this
âŚSVMs, PGMs, Rules
âŚNeural nets have shown much promise
⪠LSTMs learn features across a sequence
⪠Autoencoders reconstruct the data, reconstruction error tells
you if data is anomalous
36. Recurrent Neural Nets and
LSTMs
Now weâll look at a method or two for time-series data.
⢠Method needs to learn patterns present in the sequence
⢠Sequences can have patterns of unknown length
⢠Recurrent neural networks (RNNs)[1][2] let you address
sequences of data
37. ⢠Detect deviations from normalcy
⢠Steps
âŚTrain the NN to predict several time steps into the futureÂ
âŚEach point in the sequence has several corresponding
predicted values made at different points in the past,
resulting in multiple error values.Â
âŚCompute error distribution
⢠More generally, to detect anomalies in a time series
âŚAnomalous if prediction error is larger than expected
âŚCan pick an error threshold, e.g. 2 std. dev. from the mean
39. ⢠Train the autoencoder.
⢠If the data is sequential, you can incorporate RNNs
or LSTMs.
⢠Use the model to reconstruct the input.
⢠If the reconstruction error is above some threshold,
label it as an anomaly
40. Nearest-Neighbor Methods
AssumptionÂ
⢠Normal data are close together, while anomalies are far away
Two Methods
1. Anomaly score is distance to kth nearest neighbor.
2. Anomaly score is the density of the neighborhood of each
point
⢠Distance metric affects computational complexity
⢠Easy to adapt to different problem domain. Just deďŹne the
distance metric
41. Statistical Methods
⢠Assumption
⢠Normal data lies in high probability regions,
anomalies in low probability regions
⢠Parametric and non-parametric methods
42. Parametric
⢠Assumes normal data is distributed according to a parametric
distribution
⢠Anomaly score is inverse of the PDFÂ
⢠Or, use a hypothesis test. Anomaly score can be test statistic
43. Examples:Â
⢠Gaussian models => maximum likelihood estimation (MLE),
Grubbâs test and variants
⢠Regression models => ARIMA, ARMA
⢠mixtures of models
âŚAssume each data point has prob. p of being an anomaly
âŚN = PDF of normal data
âŚA = PDF of anomalies (assume to be uniform)
âŚD = PDF of all the data = pA + (1-p)N
âŚStart with all points in N
âŚAnomaly score comes from how much the distributions
change if you move point to A.
44. Non-parametric
⢠Histogram models
âŚDoes test instance ďŹt into an existing bin?
âŚOr, how determine score from the bin in which it lands
⢠Kernel methods estimate the data PDF and are similar to
parametric methodsÂ
45. Spectral Methods
Assumption
⢠"Data can be embedded into a lower dimensional subspace
in which normal instances and anomalies appear signiďŹcantly
different.â - Anomaly Detection: A Survey
Main idea:Â
Find a subspace where the anomalies are easy to see and
project data onto it.
46. MethodsÂ
⢠Unsupervised or semi-supervised
⢠PCA
âŚProject data along low variance principal components.
Anomaly projections will be highÂ
âŚIn graphs, PCA on a graphâs adjacency matrix at different
points in time, differences in principal components determines
anomaly status
⢠Errors in Compact Matrix Decomposition (CMD) of a graphâs
adjacency matrix determined an anomalous graph
⢠PCA can be expensive
47. Contextual Anomalies
Contextual attributes are key
⢠sequential: position in sequence is the context
âŚtime-series
âŚevent data (timestamped)
⪠inter-arrival time between events can be uneven
⢠spatial: location is the context
⢠graphs: the edges between data instance (the nodes) are the
context
⢠proďŹles (user deďŹnes context, like for credit card fraud)
48. Contextual Methods
⢠Convert to a point anomaly problem
⢠1. identify a context for a data instance
⢠2. compute anomaly score within the context with
a point anomaly method
⢠Use the structure of the data when breaking data
into contexts is hard (time-series and sequences)
49. ⢠time-series
âŚregression, RNNs
⢠sequences
âŚUse events occurring before a particular time to predict the
event occurring at that time.Â
âŚIf the prediction doesn't match the actual event, it's labeled rare.
âŚFinite State Automata (FSA) and Hidden Markov Models
(HMMs)
to compute conditional probabilities for events in the sequence
based on previous events.Â
âŚModel event sequence as a Poisson processÂ
⢠graphs
50. Collective AnomaliesÂ
⢠Hardest to detect because theirs is collective behavior.
⢠Relationship between data points is important
âŚSequential => ďŹnd an anomalous subsequence
⪠lots of research here b/c lots of time-series and
event sequence data in the wild
âŚSpatial => ďŹnd an anomalous subregion
⪠image/video processing
âŚGraph => ďŹnd an anomalous subgraph
âŚThe task is to ďŹnd an anomalous subset
51. Detecting Collective
Sequential Anomalies
Reduce to point anomaly problem:
⢠transform subsequences and then use a point anomaly method
⢠FSA, Markov Models, HMMs, CRFs for symbols
Neural Nets would be powerful here
⢠RNNs + LSTMs + Autoencoders: Could use a sequence to
sequence model on the subsequences and compute
reconstruction error
⢠For every example weâve looked at that used FSA or HMMs,
you could use neural nets instead
52. Detecting Collective Spatial
Anomalies
⢠Most work here has been on images
⢠Anomaly detection in videos would likely be a combination of
techniques for spatial and sequential anomalies (collective or
otherwise).Â
âŚVideo = sequence of images + an audio stream
⢠Convolutional neural networks (CNNs) have been used for
anomaly detection in images
âŚFully Convolutional Neural Network for Fast Anomaly
Detection in Crowded Scenes (2016):Â https://arxiv.org/abs/
1609.00866
53. Most important thingâŚ
⢠Understand your problem before picking a method.Â
⢠Just because a method is the most accurate doesnât
automatically make it the best solution for your problem.