SlideShare ist ein Scribd-Unternehmen logo
1 von 60
Downloaden Sie, um offline zu lesen
2nd edition | July 4-6, 2022
1
BigML, Inc #DutchMLSchool
Shallow and Deep Methods for
Anomaly Detection
Thomas G. Dietterich
Chief Scientist, BigML
2
BigML, Inc #DutchMLSchool
โ€ข Anomaly Detection Use Cases
โ€ข Four Basic Methods for Anomaly Detection with Engineered Features
โ€ข Benchmarking Study
โ€ข Incorporating Feedback
โ€ข Deep Versions of the Four Basic Methods
โ€ข Classifier-Based Anomaly Detection using the Max Logit Score
โ€ข Familiarity Hypothesis
โ€ข Challenges for the Future
Outline
3
BigML, Inc #DutchMLSchool
Anomaly Detection Use Cases
4
BigML, Inc #DutchMLSchool 5
โ€ขData Cleaning
โ€ขRemove corrupted data from the training data
โ€ขExample: Typos in feature values, feature values interchanged, test results from two patients
combined
โ€ขFault Detection, Fraud Detection, Cyber Attack
โ€ขAt training or test time, faulty or illegal behavior creates anomalous data
โ€ขOpen Category Detection
โ€ขAt test time, the classifier is given an instance of a novel category
โ€ขExample: Self-driving car (trained in Europe) encounters a kangaroo (in Australia)
โ€ขOut-of-Distribution Detection
โ€ขAt test time, the classifier is given an instance collected in a different way
โ€ขExample: Chest X-Ray classifier trained only on front views is shown a side view
โ€ขExample: Self-driving car trained in clear conditions must operate during rainy conditions
Use Cases
BigML, Inc #DutchMLSchool 6
โ€ขClaim: Every deployed ML
classifier should include an
anomaly detector to detect
queries that lie outside the
region of competence of the
classifier
โ€ขAlso useful as a performance
indicator to detect that you
need to retrain the classifier
Protecting a Classifier
๐‘ฅ๐‘ฅ๐‘ž๐‘ž
Anomaly
Detector
๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž > ๐œ๐œ?
Classifier ๐‘“๐‘“
Training
Examples
(๐‘ฅ๐‘ฅ๐‘–๐‘–, ๐‘ฆ๐‘ฆ๐‘–๐‘–) no
๏ฟฝ
๐‘ฆ๐‘ฆ = ๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘ž๐‘ž)
yes reject
BigML, Inc #DutchMLSchool 7
โ€ขDefinition: An โ€œanomalyโ€ is a data point generated by a process that is
different than the process generating the โ€œnominalโ€ data
โ€ขLet ๐ท๐ท0 be the probability distribution of the nominal process
โ€ขLet ๐ท๐ท๐‘Ž๐‘Ž be the probability distribution of the anomaly process
โ€ขTwo formal settings
โ€ข Clean training data
โ€ข Contaminated training data
Anomaly Detection Definitions
BigML, Inc #DutchMLSchool 8
โ€ข Given:
โ€ข Training data: ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘
โ€ข All data come from ๐ท๐ท0 the โ€œnominalโ€ distribution
โ€ข Test data: ๐‘ฅ๐‘ฅ๐‘๐‘+1, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘+๐‘€๐‘€ from a mixture of ๐ท๐ท0 and ๐ท๐ท๐‘Ž๐‘Ž (the anomaly
distribution)
โ€ข Find:
โ€ข The data points in the test data that belong to ๐ท๐ท๐‘Ž๐‘Ž
โ€ข Examples:
โ€ข Protecting a classifier
โ€ข Detecting manufacturing defects / equipment failure
Clean Training Data
BigML, Inc #DutchMLSchool 9
โ€ข Given:
โ€ข Training data: ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘ from a mixture of ๐ท๐ท0 and ๐ท๐ท๐‘Ž๐‘Ž (the anomaly
distribution)
โ€ข Find:
โ€ข The data points in the training data that belong to ๐ท๐ท๐‘Ž๐‘Ž
โ€ข Use Cases:
โ€ข Data cleaning
โ€ข Fraud detection, Insider Threat detection
โ€ข These two cases can be combined
โ€ข Contaminated training data + Separate contaminated test data
Contaminated Training Data
BigML, Inc #DutchMLSchool
Four Basic Methods for Anomaly
Detection with Engineered Features
10
BigML, Inc #DutchMLSchool 11
โ€ขDistance-Based Methods
โ€ขAnomaly score
๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = min
๐‘ฅ๐‘ฅโˆˆ๐ท๐ท
๐‘ฅ๐‘ฅ๐‘ž๐‘ž โˆ’ ๐‘ฅ๐‘ฅ
โ€ขDensity Estimation Methods
โ€ขSurprise: ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = โˆ’ log ๐‘ƒ๐‘ƒ๐ท๐ท(๐‘ฅ๐‘ฅ๐‘ž๐‘ž)
โ€ขModel the joint distribution
๐‘ƒ๐‘ƒ๐ท๐ท(๐‘ฅ๐‘ฅ) of the input data points
๐‘ฅ๐‘ฅ1, โ€ฆ โˆˆ ๐ท๐ท
Theoretical Approaches to Anomaly Detection
โ€ขQuantile Methods
โ€ขFind a smooth function ๐‘“๐‘“ such that
๐‘ฅ๐‘ฅ: ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ฅ 0 contains 1 โˆ’ ๐›ผ๐›ผ of the
training data
โ€ขAnomaly score ๐ด๐ด ๐‘ฅ๐‘ฅ = โˆ’๐‘“๐‘“(๐‘ฅ๐‘ฅ)
โ€ขReconstruction Methods
โ€ขTrain an auto-encoder: ๐‘ฅ๐‘ฅ โ‰ˆ
๐ท๐ท ๐ธ๐ธ ๐‘ฅ๐‘ฅ , where ๐ธ๐ธ is the encoder and
๐ท๐ท is the decoder
โ€ขAnomaly score
๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โˆ’ ๐ท๐ท ๐ธ๐ธ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž
BigML, Inc #DutchMLSchool 12
โ€ขDefine a distance ๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘–๐‘–, ๐‘ฅ๐‘ฅ๐‘—๐‘—)
โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = min
๐‘ฅ๐‘ฅโˆˆ๐ท๐ท
๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘ž๐‘ž, ๐‘ฅ๐‘ฅ)
โ€ขRequires a good distance metric
Approach 1: Distance-Based Methods
๐‘ฅ๐‘ฅ๐‘ž๐‘ž
๐‘ฅ๐‘ฅ๐‘ž๐‘ž
BigML, Inc #DutchMLSchool 13
โ€ข Approximates L1 (Manhattan) Distance
โ€ข (Guha, et al., ICML 2016)
โ€ข Construct a fully random binary tree
โ€ข choose attribute ๐‘—๐‘— at random
โ€ข choose splitting threshold ๐œƒ๐œƒ uniformly from
min ๐‘ฅ๐‘ฅโ‹…๐‘—๐‘— , max ๐‘ฅ๐‘ฅโ‹…๐‘—๐‘—
โ€ข until every data point is in its own leaf
โ€ข let ๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘–๐‘–) be the depth of point ๐‘ฅ๐‘ฅ๐‘–๐‘–
โ€ข repeat ๐ฟ๐ฟ times
โ€ข let ฬ…
๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘–๐‘–) be the average depth of ๐‘ฅ๐‘ฅ๐‘–๐‘–
โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘–๐‘– = 2
โˆ’
๏ฟฝ
๐‘‘๐‘‘ ๐‘ฅ๐‘ฅ๐‘–๐‘–
๐‘Ÿ๐‘Ÿ ๐‘ฅ๐‘ฅ๐‘–๐‘–
โ€ข ๐‘Ÿ๐‘Ÿ(๐‘ฅ๐‘ฅ๐‘–๐‘–) is the expected depth
Isolation Forest [Liu, Ting, Zhou, 2011]
๐‘ฅ๐‘ฅโ‹…๐‘—๐‘—
๐‘ฅ๐‘ฅโ‹…๐‘—๐‘— > ๐œƒ๐œƒ
๐‘ฅ๐‘ฅโ‹…2 > ๐œƒ๐œƒ2 ๐‘ฅ๐‘ฅโ‹…8 > ๐œƒ๐œƒ3
๐‘ฅ๐‘ฅโ‹…3 > ๐œƒ๐œƒ4 ๐‘ฅ๐‘ฅโ‹…1 > ๐œƒ๐œƒ5
๐‘ฅ๐‘ฅ๐‘–๐‘–
BigML, Inc #DutchMLSchool 14
โ€ข Given a data set ๐‘ฅ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘ where
๐‘ฅ๐‘ฅ๐‘–๐‘– โˆˆ โ„๐‘‘๐‘‘
โ€ข We assume the data have been drawn
iid from an unknown probability
density: ๐‘ฅ๐‘ฅ๐‘–๐‘– โˆผ ๐‘ƒ๐‘ƒ ๐‘ฅ๐‘ฅ๐‘–๐‘–
โ€ข Goal: Estimate ๐‘ƒ๐‘ƒ
โ€ข Anomaly Score: ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = โˆ’ log ๐‘ƒ๐‘ƒ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž
โ€ข โ€œsurprisalโ€ from information theory
โ€ข Why density estimation?
โ€ข Gives a more global view by combining
distances to all data points
Approach 2: Density Estimation
BigML, Inc #DutchMLSchool 15
โ€ขIntroduce sparse random
projections ฮ ๐‘™๐‘™ into 1-
dimensional space
โ€ขFit a density estimator
๐‘ƒ๐‘ƒ๐‘™๐‘™ ฮ ๐‘™๐‘™ ๐‘ฅ๐‘ฅ in each 1-d space
โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ =
1
๐ฟ๐ฟ
โˆ‘๐‘™๐‘™=1
๐ฟ๐ฟ
โˆ’ log ๐‘ƒ๐‘ƒ๐‘™๐‘™ ฮ ๐‘™๐‘™ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž
Example: LODA
(Pevny, 2015)
BigML, Inc #DutchMLSchool 16
โ€ข Vapnikโ€™s principle: We only need to
estimate the โ€œdecision boundaryโ€ between
nominal and anomalous
โ€ข Surround the data by a function ๐‘“๐‘“ that
captures 1 โˆ’ ๐œ–๐œ– of the training data
โ€ข One-Class Support Vector Machine
(OCSVM)
โ€ข ๐‘“๐‘“ is a hyperplane in โ€œkernel spaceโ€
โ€ข Support Vector Data Description (SVDD)
โ€ข ๐‘“๐‘“ is a sphere is โ€œkernel spaceโ€
โ€ข Issue
โ€ข Need to choose ๐œ–๐œ– at learning time rather
than run time
Approach 3: Quantile Methods
BigML, Inc #DutchMLSchool 17
โ€ข NavLab self-driving van (Pomerleau, 1992)
โ€ข Primary head: Predict steering angle from
input image
โ€ข Secondary head: Predict the input image
(โ€œauto-encoderโ€)
โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โˆ’ ๏ฟฝ
๐‘ฅ๐‘ฅ๐‘ž๐‘ž
โ€ข If reconstruction is poor, this suggests that
the steering angle should not be trusted
โ€ข Principle: Anomaly Detection through
Failure
โ€ข Define a task on which the learned system
should fail for anomalies
Approach 4: Reconstruction Methods
Pomerleau, NIPS 1992
BigML, Inc #DutchMLSchool 18
โ€ข NASA Mars Science Laboratory ChemCam
instrument
โ€ข Collects 6144 spectral bands on rock samples
from 7m distance using laser stimulation
โ€ข Goal: active learning to find interesting spectra
โ€ข DEMUD
โ€ข Incremental PCA applied to samples one at a time
โ€ข Fit only to the samples labeled as โ€œuninterestingโ€ by
the user
โ€ข Show the user the most un-uninteresting sample
(sample with highest PCA reconstruction error)
โ€ข Rapidly discovers interesting samples
โ€ข Wagstaff, et al. (2013)
Application: Finding Unusual Chemical Spectra
BigML, Inc #DutchMLSchool 19
โ€ข Distance-Based Methods
โ€ข k-NN: Mean distance to ๐‘˜๐‘˜-nearest neighbors
โ€ข LOF: Local Outlier Factor (Breunig, et al., 2000)
โ€ข ABOD: kNN Angle-Based Outlier Detector (Kriegel, et al., 2008)
โ€ข IFOR: Isolation Forest (Liu, et al., 2008)
โ€ข Density-Based Approaches
โ€ข RKDE: Robust Kernel Density Estimation (Kim & Scott, 2008)
โ€ข EGMM: Ensemble Gaussian Mixture Model (our group)
โ€ข LODA: Lightweight Online Detector of Anomalies (Pevny, 2016)
โ€ข Quantile-Based Methods
โ€ข OCSVM: One-class SVM (Schoelkopf, et al., 1999)
โ€ข SVDD: Support Vector Data Description (Tax & Duin, 2004)
Benchmarking Study [Andrew Emmott, 2015, 2020]
BigML, Inc #DutchMLSchool 20
โ€ข Select 19 data sets from UC Irvine repository
โ€ข Choose one or more classes to be โ€œanomaliesโ€; the rest are โ€œnominalsโ€
โ€ข Manipulate
โ€ข Relative frequency
โ€ข Point difficulty
โ€ข Irrelevant features
โ€ข Clusteredness
โ€ข 20 replicates of each configuration
โ€ข Result: 11,888 Non-trivial Benchmark Datasets
Benchmarking Methodology
BigML, Inc #DutchMLSchool 21
โ€ข Linear ANOVA
โ€ข log
๐ด๐ด๐ด๐ด๐ด๐ด
1 โˆ’๐ด๐ด๐ด๐ด๐ด๐ด
~ ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ + ๐‘๐‘๐‘๐‘ + ๐‘๐‘๐‘๐‘ + ๐‘–๐‘–๐‘–๐‘– + ๐‘๐‘๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  + ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž
โ€ข rf: relative frequency
โ€ข pd: point difficulty
โ€ข cl: normalized clusteredness
โ€ข ir: irrelevant features
โ€ข pset: โ€œParentโ€ set
โ€ข algo: anomaly detection algorithm
โ€ข Assess the algo effect while controlling for all other factors
โ€ข ๐ด๐ด๐ด๐ด๐ด๐ด: area under the ROC curve for the nominal vs. anomaly binary decision
Analysis of Variance
BigML, Inc #DutchMLSchool 22
โ€ข 19 UCI Datasets
โ€ข 9 Leading โ€œfeature-basedโ€ algorithms
โ€ข 11,888 non-trivial benchmark datasets
โ€ข Mean AUC effect for โ€œnominalโ€ vs. โ€œanomalyโ€ decisions
โ€ข Controlling for
โ€ข Parent data set
โ€ข Difficulty of individual queries
โ€ข Fraction of anomalies
โ€ข Irrelevant features
โ€ข Clusteredness of anomalies
โ€ข Baseline method: Distance to nominal mean (โ€œtmdโ€)
โ€ข Best methods: K-nearest neighbors and Isolation Forest
โ€ข Worst methods: Kernel-based OCSVM and SVDD
Benchmarking Study Results
0.62
0.64
0.66
0.68
0.70
0.72
0.74
0.76
0.78
knn iforest egmm rkde lof abod loda svdd tmd ocsvm
Mean AUC Effect
BigML, Inc #DutchMLSchool 23
โ€ข Show top-ranked candidate to the
user
โ€ข User labels candidate
โ€ข Label is used to update the anomaly
detector
โ€ข Two methods
โ€ข AAD [Das, et al, ICDM 2016]
โ€ข GLAD-OMD (modified version of
iForest) [Siddiqui, et al., KDD 2018]
Incorporating User Feedback: Initial Work
Data
Anomaly
Detection
Best
Candidate
User
Anomaly Analysis
yes
no
BigML, Inc #DutchMLSchool 24
User Feedback Yields Big Improvements in
Anomaly Discovery
APT Engagement 3 Results
BigML, Inc #DutchMLSchool
Deep Versions of the Four Basic Methods
25
BigML, Inc #DutchMLSchool 26
โ€ข Input image ๐‘ฅ๐‘ฅ
โ€ข Network backbone, also called
the โ€œencoderโ€: ๐‘ง๐‘ง = ๐ธ๐ธ ๐‘ฅ๐‘ฅ
โ€ข Latent representation ๐‘ง๐‘ง
โ€ข โ€œLogitsโ€ โ„“๐‘˜๐‘˜ = ๐‘ค๐‘ค๐‘˜๐‘˜ โ‹… ๐‘ง๐‘ง
โ€ข Predicted probabilities
ฬ‚
๐‘๐‘ ๐‘ฆ๐‘ฆ = ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ =
exp โ„“๐‘˜๐‘˜(๐‘ง๐‘ง)
โˆ‘๐‘˜๐‘˜โ€ฒ exp โ„“๐‘˜๐‘˜โ€ฒ(๐‘ง๐‘ง)
Deep Anomaly Detection in Image Classification
Convolutional Neural Network Classifier
Image
๐‘ฅ๐‘ฅ
Penultimate Layer
๐‘ง๐‘ง
Logits โ„“๐‘˜๐‘˜ = ๐‘ค๐‘ค๐‘˜๐‘˜
โŠค
๐‘ง๐‘ง
Probabilities
๏ฟฝ
๐‘๐‘(๐‘ฆ๐‘ฆ = ๐‘˜๐‘˜|๐‘ฅ๐‘ฅ)
ฬ‚
๐‘๐‘(๐‘ฆ๐‘ฆ = ๐‘˜๐‘˜|๐‘ฅ๐‘ฅ)
โ€œBackboneโ€ encoder ๐ธ๐ธ
BigML, Inc #DutchMLSchool 27
โ€ขK-nearest neighbor in the
latent space
โ€ขIssue: What distance metric to
use?
โ€ขCosine distance is the most
popular:
๐‘‘๐‘‘ ๐‘ง๐‘ง1, ๐‘ง๐‘ง2 =
๐‘ง๐‘ง1 โ‹… ๐‘ง๐‘ง2
๐‘ง๐‘ง1 โ€–๐‘ง๐‘ง2โ€–
Distance-Based Methods
BigML, Inc #DutchMLSchool 28
โ€ขMahalanobis Method
โ€ข Fit a joint multivariate Gaussian
โ€ข Each class ๐‘˜๐‘˜ has its own mean ๐œ‡๐œ‡๐‘˜๐‘˜
โ€ข Shared covariance matrix ฮฃ
โ€ขGiven a new ๐‘ฅ๐‘ฅ,
log ๐‘ƒ๐‘ƒ(๐‘ฅ๐‘ฅ) โˆ min
๐‘˜๐‘˜
๐‘ฅ๐‘ฅ โˆ’ ๐œ‡๐œ‡๐‘˜๐‘˜
โŠค
ฮฃโˆ’1
๐‘ฅ๐‘ฅ โˆ’ ๐œ‡๐œ‡๐‘˜๐‘˜
This is known as the squared
Mahalanobis distance
Density-Based Methods
BigML, Inc #DutchMLSchool 29
โ€ข Residual Flow Deep Density Estimator
โ€ข (Chen, Behrmann, Duvenaud, et al. NeurIPS 2019)
โ€ข Standard Cross-Entropy Supervised Loss
โ€ข Claim: This helps focus ๐‘ƒ๐‘ƒ ๐‘ฅ๐‘ฅ on relevant aspects of the images
โ€ข Anomaly Score: ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = โˆ’ log ๐‘ƒ๐‘ƒ(๐‘ฅ๐‘ฅ๐‘ž๐‘ž)
Open Hybrid: Classification + Density Estimation
(Tack, Li, Guo, Guo, 2020)
BigML, Inc #DutchMLSchool 30
โ€ข The method is somewhat tricky to work with
โ€ข Set ๐‘๐‘ as the mean of a small set of points passed through the untrained network
โ€ข No bias weights
โ€ข These help prevent โ€œhypersphere collapseโ€
Quantile Method: Deep SVDD (Ruff, et al. ICML 2018)
BigML, Inc #DutchMLSchool 31
โ€ข Encoder: ๐‘ง๐‘ง = ๐ธ๐ธ ๐‘ฅ๐‘ฅ
โ€ข Decoder: ๏ฟฝ
๐‘ฅ๐‘ฅ = ๐ท๐ท(๐‘ง๐‘ง)
โ€ข Challenge: How to constrain ๐ธ๐ธ and
๐ท๐ท so that the autoencoder fails on
anomalies but succeeds on nominal
images?
โ€ข Autoencoders often learn general-
purpose image compression
methods
Reconstruction Methods: Deep Autoencoders
๐‘ฅ๐‘ฅ
๐‘ง๐‘ง
๏ฟฝ
๐‘ฅ๐‘ฅ
๐ธ๐ธ ๐ท๐ท
BigML, Inc #DutchMLSchool
Classifier-Based Anomaly Detection
using the Max Logit Score
32
BigML, Inc #DutchMLSchool 33
โ€ขGarrepalli (2020)
โ€ข Train classifier to optimize
softmax likelihood (minimize
โ€œcross-entropy lossโ€)
โ€ข Maximum logit score is better
than two distance methods:
โ€ข Isolation Forest
โ€ข LOF (a nearest-neighbor method)
Surprise: The Max Logit Score
0.68 0.67
0.63
0.72
0.51
0.44
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
H (y|x) Max SoftMax-
prob.
Max BCE-prob Max-logit Iforest LOF
AUROC
Anomaly Measures on Latent Representations for CIFAR-100
BigML, Inc #DutchMLSchool 34
โ€ข Vaze, Han, Vedaldi, Zisserman (2021): โ€œOpen
Set Recognition: A Good Classifier is All You
Needโ€ (ICLR 2022; arXiv 2110.06207)
โ€ข Carefully train a classifier using the latest tricks
โ€ข Standard cross-entropy combined with the
following:
โ€ข Cosine learning rate schedule
โ€ข Learning rate warmup
โ€ข RandAugment augmentations
โ€ข Label Smoothing
โ€ข Anomaly score: max logit
โ€ข โˆ’ max
๐‘˜๐‘˜
โ„“๐‘˜๐‘˜
More Evidence for Max Logit
Protocol from Lawrence Neal et al. (2018)
BigML, Inc #DutchMLSchool 35
โ€ขNovel class difficulty based on
semantic distance
โ€ข CUB: Bird species
โ€ข Air: Aircraft
โ€ข ImageNet
Still More Evidence for Max Logit
BigML, Inc #DutchMLSchool 36
Why?
Letโ€™s Examine the Learned Representations
BigML, Inc #DutchMLSchool 37
โ€ข DenseNet with 384-dimensional
latent space.
โ€ข CIFAR-10: 6 known classes, 4 novel
classes
โ€ข UMAP visualization
โ€ข Light green: novel classes
โ€ข Darker greens: known classes
โ€ข Note that many novel classes stay
toward the center of the space;
others overlap with known classes
โ€ข Training was not required to โ€œpull
them outโ€ so that they could be
discriminated
How are open set images represented by deep
learning?
Alex Guyer
6 Known
Classes
4 Novel
Classes
BigML, Inc #DutchMLSchool 38
Similar Results from Other Groups
[Tack, et al. NeurIPS 2020] [Vaze, et al. arXiv 2110.06207]
BigML, Inc #DutchMLSchool 39
โ€ข Convolutional neural network learns โ€œfeaturesโ€ that
detect image patches relevant to the classification
task
โ€ข The logit layer weights these features to make the
classification decision
โ€ข Novel classes activate fewer of these features, so
their activation vectors are smaller
โ€ข Hypothesis: The networks donโ€™t detect that an
elephant is novel because of trunk and tusks but
because its head doesnโ€™t activate known features
The Familiarity Hypothesis
The network doesnโ€™t
detect novelty, it detects
the absence of familiarity
BigML, Inc #DutchMLSchool 40
Novel images strongly activate fewer
features
โ€ข CIFAR 10: 6 known classes; 4 novel
classes
โ€ข DenseNet (๐‘ง๐‘ง has 324 dimensions)
โ€ข Activation threshold ๐œƒ๐œƒ
โ€ข Count number of features whose
activation exceeds ๐œƒ๐œƒ
โ€ข OOD images activate fewer
features
Evidence: Number of Activated Features
Alex Guyer (unpublished)
BigML, Inc #DutchMLSchool 41
Are they features โ€œonโ€ the object vs. the
background?
โ€ข Strategy: blur the object and see how the
feature activations change
โ€ข activations that change must be on the object
โ€ข Details:
โ€ข PASCAL VOC Segmented Images
โ€ข Blur the original image (31x31 kernel; sd=31)
โ€ข Form composite image where blurred region
replaces the segmented region
Which features are responsible for the drop in
activation?
https://www.peko-step.com/en/tool/blur.html
BigML, Inc #DutchMLSchool 42
Blurring Examples
Note: This does not remove all object-related information (e.g.,
object boundary), so we donโ€™t detect all on-object features
BigML, Inc #DutchMLSchool 43
โ€ข โ€œpresence featureโ€
โ€ข ๐ต๐ต๐ต๐ต ๐‘–๐‘–, ๐‘—๐‘— > 0. Blurring decreases the
activity of the feature. Its net effect is to
measure the presence of one or more
image patterns
โ€ข Its activity is high when those patterns
are present
โ€ข โ€œabsence featureโ€
โ€ข ๐ต๐ต๐ต๐ต ๐‘–๐‘–, ๐‘—๐‘— < 0. Blurring increases the
activity of the feature. Its net effect is to
measure the absence of one or more
image patterns
โ€ข Its activity is high when those patterns
are absent
โ€ข Define the โ€œblurring effectโ€ of feature ๐‘—๐‘— on
image ๐‘–๐‘–
๐ต๐ต๐ต๐ต ๐‘–๐‘–, ๐‘—๐‘— = ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– โˆ’ ฬƒ
๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘–
where
โ€ข ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– is the activation of latent feature ๐‘—๐‘— on
image ๐‘–๐‘–
โ€ข ฬƒ
๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– is the activation of latent feature ๐‘—๐‘— on
blurred image ๐‘–๐‘–
Blurring Effect
BigML, Inc #DutchMLSchool 44
โ€ขOn average, the activation of
a feature changes when the
object (of class ๐‘˜๐‘˜) is blurred
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜
=
1
๐‘๐‘๐‘˜๐‘˜
๏ฟฝ
๐‘–๐‘–:๐‘ฆ๐‘ฆ๐‘–๐‘–=๐‘˜๐‘˜
๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– โˆ’ ฬƒ
๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘–๐‘–๐‘–
โ€ขFeature ๐‘—๐‘— is a net presence
feature for class ๐‘˜๐‘˜ if
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ > 0.02
โ€ขFeature ๐‘—๐‘— is a net absence
feature for class ๐‘˜๐‘˜ if
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < โˆ’0.02
โ€ขOtherwise ๐‘—๐‘— is net neutral for
class ๐‘˜๐‘˜
โ€œOn Objectโ€ score of feature ๐‘—๐‘— for class ๐‘˜๐‘˜
BigML, Inc #DutchMLSchool 45
โ€ข Logit score is โ„“๐‘—๐‘—๐‘—๐‘— = โˆ‘๐‘—๐‘— ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘—๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘–
โ€ข Contribution of ๐‘—๐‘— in image ๐‘–๐‘– to class ๐‘˜๐‘˜:
โ€ข ๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– = ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘—๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– (in normal images)
โ€ข ฬƒ
๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– = ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— ฬƒ
๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– (in blurred images)
โ€ข Mean contribution
โ€ข ฬ…
๐‘๐‘๐‘—๐‘—๐‘—๐‘— =
1
๐‘๐‘๐‘˜๐‘˜
โˆ‘ ๐‘–๐‘– ๐‘ฆ๐‘ฆ๐‘–๐‘– = ๐‘˜๐‘˜ ๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘–
โ€ข ฬ…ฬƒ
๐‘๐‘๐‘—๐‘—๐‘—๐‘— =
1
๐‘๐‘๐‘˜๐‘˜
โˆ‘ ๐‘–๐‘– ๐‘ฆ๐‘ฆ๐‘–๐‘– = ๐‘˜๐‘˜ ฬƒ
๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘–
Feature Taxonomy
๐’˜๐’˜๐’‹๐’‹๐’‹๐’‹ > ๐ŸŽ๐ŸŽ ๐’˜๐’˜๐’‹๐’‹๐’‹๐’‹ < ๐ŸŽ๐ŸŽ
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜
> 0.02
positive
presence
negative
presence
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜
< 0.02
positive
absence
negative
absence
Sun & Li: On the Effectiveness of Sparsification for Detecting the
Deep Unknowns. arXiv 2111.09805
BigML, Inc #DutchMLSchool 46
Mean feature types for class 3
1.00
0.00
On-Object
Index
(presence)
On-Object
Index
(absence)
positive features
negative features
red = presence
blue = absence
BigML, Inc #DutchMLSchool 47
Zoomed View: Blurring reduces ฬ…
๐‘๐‘๐‘—๐‘—๐‘—๐‘—
Mean unblurred
contribution
Mean blurred contribution
โ€ข Blurringโ€ฆ
โ€ข reduces the contribution of
positive presence features (red
dots)
โ€ข reduces the contribution of
negative absence features (blue
dots)
1.00
0.00
On-Object
Index
(presence)
On-Object
Index
(absence)
BigML, Inc #DutchMLSchool 48
Decomposing the Logit Score: Four Cases
Positive presence:
๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— > 0 and
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ > 0
Positive absence:
๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— > 0 and
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < 0
Negative presence:
๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— > 0 and
๐‘‚๐‘‚๐‘‚๐‘‚(๐‘—๐‘—, ๐‘˜๐‘˜) > 0
Negative absence:
๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— < 0 and
๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < 0
BigML, Inc #DutchMLSchool 49
Visualizing Individual Images: OOD Instance 838
BigML, Inc #DutchMLSchool 50
OOD Instance 770
BigML, Inc #DutchMLSchool 51
OOD Instance 432
BigML, Inc #DutchMLSchool 52
โ€ข Note that the Positive Presence
features dominate the max logit
score
โ€ข The Negative Absence and
Positive Absence features
(purple and blue lines) make a
small contribution
โ€ข Negative Presence features
make no contribution
โ€ข Conclusion: Decreases in
activations of positive presence
account for most of the max
logit score
Decomposing the Novelty Scores
BigML, Inc #DutchMLSchool 53
โ€ขRed line: trend for Positive
Presence contribution to max
logit score
โ€ขBlack line: smooth estimate of
classification accuracy
(โ€œknownโ€ vs โ€œnovelโ€)
Decreases in Positive Presence Features
Account for Novelty Detection Accuracy
BigML, Inc #DutchMLSchool 54
โ€ขBlakemore, Colin, and Grahame F.
Cooper. โ€œDevelopment of the brain
depends on the visual environment.โ€
(1970): 477-478.
โ€ข Kittens raised in environments with
only horizontal or only vertical lines
โ€ข โ€œThey were virtually blind for contours
perpendicular to the orientation they
had experienced.โ€
โ€ขChomsky: โ€œPoverty of the stimulusโ€
Can we expect computer vision systems to perceive
things they have not been trained on?
Source: Li Yang Ku
https://computervisionblog.wordpress.com/2013/06/01/ca
ts-and-vision-is-vision-acquired-or-innate/
BigML, Inc #DutchMLSchool 55
โ€ข Familiarity-based anomaly detection advantages:
โ€ข Easy to implement โ€“ Anomaly signal (max logit) can be extracted from the
classifier. No separate anomaly detection model is needed
โ€ข Training on additional, auxiliary classes improves both classification and
anomaly detection performance
โ€ข Familiarity-based anomaly detection weaknesses
โ€ข Partially-occluded nominal objects will be flagged as anomalies
โ€ข If an image contains both a novel object and a known object, the novel object
will not be detected
โ€ข Adversarial attacks can easily cause false anomalies and missed anomalies
Implications
BigML, Inc #DutchMLSchool
Open Challenges
56
BigML, Inc #DutchMLSchool 57
โ€ข Can we learn deep representations that can represent outliers?
โ€ข Nonstationarity
โ€ข As the world changes, the anomaly detection model must also change
โ€ข Explanation
โ€ข Users often want explanations of why something is labeled as anomalous in order to provide feedback or
take other actions
โ€ข Setting alarm thresholds
โ€ข How can we set a threshold to control the false alarm and missed alarm rates?
โ€ข Incremental (continual) learning in deep networks
โ€ข How can we efficiently update a trained neural network to incorporate user feedback?
โ€ข Anomaly detection in temporal, spatial, and spatio-temporal data, in video data, etc.
โ€ข Anomaly detection at multiple scales
Challenges for Anomaly Detection
BigML, Inc #DutchMLSchool
Summary
58
BigML, Inc #DutchMLSchool
โ€ข Four Basic Methods
โ€ข Distances, densities, density quantiles, and reconstruction
โ€ข Distances work best; Isolation Forest is very robust
โ€ข Anomaly Detection in Deep Learning
โ€ข The four basic methods have been extended to deep learning
โ€ข They often do not work well when applied to learned representations
โ€ข Classifier Max Logit Score Gives Very Competitive Performance
โ€ข Computed as a side effect of standard deep classifiers
โ€ข Measures familiarity rather than novelty, which makes it risky in many settings
โ€ข Advances in Deep Anomaly Detection Require Learning Better Representations
Shallow and Deep Methods for Anomaly Detection
59
Co-organized by:
Companies Presenting:
60

Weitere รคhnliche Inhalte

ร„hnlich wie DutchMLSchool 2022 - History and Developments in ML

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
ย 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means ClusteringJunghoon Kim
ย 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
ย 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptxImXaib
ย 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Daniel Roggen
ย 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsKimin Lee
ย 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsQuantUniversity
ย 
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Kevin Mader
ย 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
ย 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
ย 
07 learning
07 learning07 learning
07 learningankit_ppt
ย 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
ย 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingAkin Osman Kazakci
ย 
Mini datathon
Mini datathonMini datathon
Mini datathonKunal Jain
ย 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practiceAlexander Sibiryakov
ย 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...David Zibriczky
ย 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
ย 

ร„hnlich wie DutchMLSchool 2022 - History and Developments in ML (20)

The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
ย 
Selection K in K-means Clustering
Selection K in K-means ClusteringSelection K in K-means Clustering
Selection K in K-means Clustering
ย 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
ย 
Data_Preparation.pptx
Data_Preparation.pptxData_Preparation.pptx
Data_Preparation.pptx
ย 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
ย 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
ย 
Outlier analysis for Temporal Datasets
Outlier analysis for Temporal DatasetsOutlier analysis for Temporal Datasets
Outlier analysis for Temporal Datasets
ย 
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
Leveraging Machine Learning Techniques Predictive Analytics for Knowledge Dis...
ย 
Declarative data analysis
Declarative data analysisDeclarative data analysis
Declarative data analysis
ย 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
ย 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
ย 
03 presentation-bothiesson
03 presentation-bothiesson03 presentation-bothiesson
03 presentation-bothiesson
ย 
07 learning
07 learning07 learning
07 learning
ย 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
ย 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
ย 
Mini datathon
Mini datathonMini datathon
Mini datathon
ย 
Search quality in practice
Search quality in practiceSearch quality in practice
Search quality in practice
ย 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
ย 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
ย 
5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
ย 

Mehr von BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
ย 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
ย 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
ย 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
ย 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
ย 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
ย 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
ย 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
ย 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
ย 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
ย 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
ย 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
ย 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
ย 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
ย 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
ย 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
ย 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
ย 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
ย 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
ย 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
ย 

Mehr von BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
ย 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
ย 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
ย 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
ย 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
ย 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
ย 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
ย 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
ย 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
ย 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
ย 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
ย 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
ย 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
ย 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
ย 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
ย 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
ย 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
ย 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ย 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ย 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ย 

Kรผrzlich hochgeladen

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
ย 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
ย 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
ย 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
ย 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
ย 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
ย 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
ย 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
ย 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Callshivangimorya083
ย 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptDr. Soumendra Kumar Patra
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
ย 
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...Delhi Call girls
ย 

Kรผrzlich hochgeladen (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
ย 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
ย 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
ย 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
ย 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
ย 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
ย 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
ย 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
ย 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
ย 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
ย 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
ย 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
ย 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
ย 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
ย 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
ย 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
ย 
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 โ˜Žโœ”๐Ÿ‘Œโœ” Whatsapp Hard And Sexy Vip Call
ย 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
ย 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
ย 
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9205541914 ๐Ÿ”( Delhi) Escorts S...
ย 

DutchMLSchool 2022 - History and Developments in ML

  • 1. 2nd edition | July 4-6, 2022 1
  • 2. BigML, Inc #DutchMLSchool Shallow and Deep Methods for Anomaly Detection Thomas G. Dietterich Chief Scientist, BigML 2
  • 3. BigML, Inc #DutchMLSchool โ€ข Anomaly Detection Use Cases โ€ข Four Basic Methods for Anomaly Detection with Engineered Features โ€ข Benchmarking Study โ€ข Incorporating Feedback โ€ข Deep Versions of the Four Basic Methods โ€ข Classifier-Based Anomaly Detection using the Max Logit Score โ€ข Familiarity Hypothesis โ€ข Challenges for the Future Outline 3
  • 4. BigML, Inc #DutchMLSchool Anomaly Detection Use Cases 4
  • 5. BigML, Inc #DutchMLSchool 5 โ€ขData Cleaning โ€ขRemove corrupted data from the training data โ€ขExample: Typos in feature values, feature values interchanged, test results from two patients combined โ€ขFault Detection, Fraud Detection, Cyber Attack โ€ขAt training or test time, faulty or illegal behavior creates anomalous data โ€ขOpen Category Detection โ€ขAt test time, the classifier is given an instance of a novel category โ€ขExample: Self-driving car (trained in Europe) encounters a kangaroo (in Australia) โ€ขOut-of-Distribution Detection โ€ขAt test time, the classifier is given an instance collected in a different way โ€ขExample: Chest X-Ray classifier trained only on front views is shown a side view โ€ขExample: Self-driving car trained in clear conditions must operate during rainy conditions Use Cases
  • 6. BigML, Inc #DutchMLSchool 6 โ€ขClaim: Every deployed ML classifier should include an anomaly detector to detect queries that lie outside the region of competence of the classifier โ€ขAlso useful as a performance indicator to detect that you need to retrain the classifier Protecting a Classifier ๐‘ฅ๐‘ฅ๐‘ž๐‘ž Anomaly Detector ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž > ๐œ๐œ? Classifier ๐‘“๐‘“ Training Examples (๐‘ฅ๐‘ฅ๐‘–๐‘–, ๐‘ฆ๐‘ฆ๐‘–๐‘–) no ๏ฟฝ ๐‘ฆ๐‘ฆ = ๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘ž๐‘ž) yes reject
  • 7. BigML, Inc #DutchMLSchool 7 โ€ขDefinition: An โ€œanomalyโ€ is a data point generated by a process that is different than the process generating the โ€œnominalโ€ data โ€ขLet ๐ท๐ท0 be the probability distribution of the nominal process โ€ขLet ๐ท๐ท๐‘Ž๐‘Ž be the probability distribution of the anomaly process โ€ขTwo formal settings โ€ข Clean training data โ€ข Contaminated training data Anomaly Detection Definitions
  • 8. BigML, Inc #DutchMLSchool 8 โ€ข Given: โ€ข Training data: ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘ โ€ข All data come from ๐ท๐ท0 the โ€œnominalโ€ distribution โ€ข Test data: ๐‘ฅ๐‘ฅ๐‘๐‘+1, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘+๐‘€๐‘€ from a mixture of ๐ท๐ท0 and ๐ท๐ท๐‘Ž๐‘Ž (the anomaly distribution) โ€ข Find: โ€ข The data points in the test data that belong to ๐ท๐ท๐‘Ž๐‘Ž โ€ข Examples: โ€ข Protecting a classifier โ€ข Detecting manufacturing defects / equipment failure Clean Training Data
  • 9. BigML, Inc #DutchMLSchool 9 โ€ข Given: โ€ข Training data: ๐‘ฅ๐‘ฅ1, ๐‘ฅ๐‘ฅ2, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘ from a mixture of ๐ท๐ท0 and ๐ท๐ท๐‘Ž๐‘Ž (the anomaly distribution) โ€ข Find: โ€ข The data points in the training data that belong to ๐ท๐ท๐‘Ž๐‘Ž โ€ข Use Cases: โ€ข Data cleaning โ€ข Fraud detection, Insider Threat detection โ€ข These two cases can be combined โ€ข Contaminated training data + Separate contaminated test data Contaminated Training Data
  • 10. BigML, Inc #DutchMLSchool Four Basic Methods for Anomaly Detection with Engineered Features 10
  • 11. BigML, Inc #DutchMLSchool 11 โ€ขDistance-Based Methods โ€ขAnomaly score ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = min ๐‘ฅ๐‘ฅโˆˆ๐ท๐ท ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โˆ’ ๐‘ฅ๐‘ฅ โ€ขDensity Estimation Methods โ€ขSurprise: ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = โˆ’ log ๐‘ƒ๐‘ƒ๐ท๐ท(๐‘ฅ๐‘ฅ๐‘ž๐‘ž) โ€ขModel the joint distribution ๐‘ƒ๐‘ƒ๐ท๐ท(๐‘ฅ๐‘ฅ) of the input data points ๐‘ฅ๐‘ฅ1, โ€ฆ โˆˆ ๐ท๐ท Theoretical Approaches to Anomaly Detection โ€ขQuantile Methods โ€ขFind a smooth function ๐‘“๐‘“ such that ๐‘ฅ๐‘ฅ: ๐‘“๐‘“ ๐‘ฅ๐‘ฅ โ‰ฅ 0 contains 1 โˆ’ ๐›ผ๐›ผ of the training data โ€ขAnomaly score ๐ด๐ด ๐‘ฅ๐‘ฅ = โˆ’๐‘“๐‘“(๐‘ฅ๐‘ฅ) โ€ขReconstruction Methods โ€ขTrain an auto-encoder: ๐‘ฅ๐‘ฅ โ‰ˆ ๐ท๐ท ๐ธ๐ธ ๐‘ฅ๐‘ฅ , where ๐ธ๐ธ is the encoder and ๐ท๐ท is the decoder โ€ขAnomaly score ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โˆ’ ๐ท๐ท ๐ธ๐ธ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž
  • 12. BigML, Inc #DutchMLSchool 12 โ€ขDefine a distance ๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘–๐‘–, ๐‘ฅ๐‘ฅ๐‘—๐‘—) โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = min ๐‘ฅ๐‘ฅโˆˆ๐ท๐ท ๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘ž๐‘ž, ๐‘ฅ๐‘ฅ) โ€ขRequires a good distance metric Approach 1: Distance-Based Methods ๐‘ฅ๐‘ฅ๐‘ž๐‘ž ๐‘ฅ๐‘ฅ๐‘ž๐‘ž
  • 13. BigML, Inc #DutchMLSchool 13 โ€ข Approximates L1 (Manhattan) Distance โ€ข (Guha, et al., ICML 2016) โ€ข Construct a fully random binary tree โ€ข choose attribute ๐‘—๐‘— at random โ€ข choose splitting threshold ๐œƒ๐œƒ uniformly from min ๐‘ฅ๐‘ฅโ‹…๐‘—๐‘— , max ๐‘ฅ๐‘ฅโ‹…๐‘—๐‘— โ€ข until every data point is in its own leaf โ€ข let ๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘–๐‘–) be the depth of point ๐‘ฅ๐‘ฅ๐‘–๐‘– โ€ข repeat ๐ฟ๐ฟ times โ€ข let ฬ… ๐‘‘๐‘‘(๐‘ฅ๐‘ฅ๐‘–๐‘–) be the average depth of ๐‘ฅ๐‘ฅ๐‘–๐‘– โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘–๐‘– = 2 โˆ’ ๏ฟฝ ๐‘‘๐‘‘ ๐‘ฅ๐‘ฅ๐‘–๐‘– ๐‘Ÿ๐‘Ÿ ๐‘ฅ๐‘ฅ๐‘–๐‘– โ€ข ๐‘Ÿ๐‘Ÿ(๐‘ฅ๐‘ฅ๐‘–๐‘–) is the expected depth Isolation Forest [Liu, Ting, Zhou, 2011] ๐‘ฅ๐‘ฅโ‹…๐‘—๐‘— ๐‘ฅ๐‘ฅโ‹…๐‘—๐‘— > ๐œƒ๐œƒ ๐‘ฅ๐‘ฅโ‹…2 > ๐œƒ๐œƒ2 ๐‘ฅ๐‘ฅโ‹…8 > ๐œƒ๐œƒ3 ๐‘ฅ๐‘ฅโ‹…3 > ๐œƒ๐œƒ4 ๐‘ฅ๐‘ฅโ‹…1 > ๐œƒ๐œƒ5 ๐‘ฅ๐‘ฅ๐‘–๐‘–
  • 14. BigML, Inc #DutchMLSchool 14 โ€ข Given a data set ๐‘ฅ๐‘ฅ1, โ€ฆ , ๐‘ฅ๐‘ฅ๐‘๐‘ where ๐‘ฅ๐‘ฅ๐‘–๐‘– โˆˆ โ„๐‘‘๐‘‘ โ€ข We assume the data have been drawn iid from an unknown probability density: ๐‘ฅ๐‘ฅ๐‘–๐‘– โˆผ ๐‘ƒ๐‘ƒ ๐‘ฅ๐‘ฅ๐‘–๐‘– โ€ข Goal: Estimate ๐‘ƒ๐‘ƒ โ€ข Anomaly Score: ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = โˆ’ log ๐‘ƒ๐‘ƒ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โ€ข โ€œsurprisalโ€ from information theory โ€ข Why density estimation? โ€ข Gives a more global view by combining distances to all data points Approach 2: Density Estimation
  • 15. BigML, Inc #DutchMLSchool 15 โ€ขIntroduce sparse random projections ฮ ๐‘™๐‘™ into 1- dimensional space โ€ขFit a density estimator ๐‘ƒ๐‘ƒ๐‘™๐‘™ ฮ ๐‘™๐‘™ ๐‘ฅ๐‘ฅ in each 1-d space โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ = 1 ๐ฟ๐ฟ โˆ‘๐‘™๐‘™=1 ๐ฟ๐ฟ โˆ’ log ๐‘ƒ๐‘ƒ๐‘™๐‘™ ฮ ๐‘™๐‘™ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž Example: LODA (Pevny, 2015)
  • 16. BigML, Inc #DutchMLSchool 16 โ€ข Vapnikโ€™s principle: We only need to estimate the โ€œdecision boundaryโ€ between nominal and anomalous โ€ข Surround the data by a function ๐‘“๐‘“ that captures 1 โˆ’ ๐œ–๐œ– of the training data โ€ข One-Class Support Vector Machine (OCSVM) โ€ข ๐‘“๐‘“ is a hyperplane in โ€œkernel spaceโ€ โ€ข Support Vector Data Description (SVDD) โ€ข ๐‘“๐‘“ is a sphere is โ€œkernel spaceโ€ โ€ข Issue โ€ข Need to choose ๐œ–๐œ– at learning time rather than run time Approach 3: Quantile Methods
  • 17. BigML, Inc #DutchMLSchool 17 โ€ข NavLab self-driving van (Pomerleau, 1992) โ€ข Primary head: Predict steering angle from input image โ€ข Secondary head: Predict the input image (โ€œauto-encoderโ€) โ€ข ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โˆ’ ๏ฟฝ ๐‘ฅ๐‘ฅ๐‘ž๐‘ž โ€ข If reconstruction is poor, this suggests that the steering angle should not be trusted โ€ข Principle: Anomaly Detection through Failure โ€ข Define a task on which the learned system should fail for anomalies Approach 4: Reconstruction Methods Pomerleau, NIPS 1992
  • 18. BigML, Inc #DutchMLSchool 18 โ€ข NASA Mars Science Laboratory ChemCam instrument โ€ข Collects 6144 spectral bands on rock samples from 7m distance using laser stimulation โ€ข Goal: active learning to find interesting spectra โ€ข DEMUD โ€ข Incremental PCA applied to samples one at a time โ€ข Fit only to the samples labeled as โ€œuninterestingโ€ by the user โ€ข Show the user the most un-uninteresting sample (sample with highest PCA reconstruction error) โ€ข Rapidly discovers interesting samples โ€ข Wagstaff, et al. (2013) Application: Finding Unusual Chemical Spectra
  • 19. BigML, Inc #DutchMLSchool 19 โ€ข Distance-Based Methods โ€ข k-NN: Mean distance to ๐‘˜๐‘˜-nearest neighbors โ€ข LOF: Local Outlier Factor (Breunig, et al., 2000) โ€ข ABOD: kNN Angle-Based Outlier Detector (Kriegel, et al., 2008) โ€ข IFOR: Isolation Forest (Liu, et al., 2008) โ€ข Density-Based Approaches โ€ข RKDE: Robust Kernel Density Estimation (Kim & Scott, 2008) โ€ข EGMM: Ensemble Gaussian Mixture Model (our group) โ€ข LODA: Lightweight Online Detector of Anomalies (Pevny, 2016) โ€ข Quantile-Based Methods โ€ข OCSVM: One-class SVM (Schoelkopf, et al., 1999) โ€ข SVDD: Support Vector Data Description (Tax & Duin, 2004) Benchmarking Study [Andrew Emmott, 2015, 2020]
  • 20. BigML, Inc #DutchMLSchool 20 โ€ข Select 19 data sets from UC Irvine repository โ€ข Choose one or more classes to be โ€œanomaliesโ€; the rest are โ€œnominalsโ€ โ€ข Manipulate โ€ข Relative frequency โ€ข Point difficulty โ€ข Irrelevant features โ€ข Clusteredness โ€ข 20 replicates of each configuration โ€ข Result: 11,888 Non-trivial Benchmark Datasets Benchmarking Methodology
  • 21. BigML, Inc #DutchMLSchool 21 โ€ข Linear ANOVA โ€ข log ๐ด๐ด๐ด๐ด๐ด๐ด 1 โˆ’๐ด๐ด๐ด๐ด๐ด๐ด ~ ๐‘Ÿ๐‘Ÿ๐‘Ÿ๐‘Ÿ + ๐‘๐‘๐‘๐‘ + ๐‘๐‘๐‘๐‘ + ๐‘–๐‘–๐‘–๐‘– + ๐‘๐‘๐‘ ๐‘ ๐‘ ๐‘ ๐‘ ๐‘  + ๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž๐‘Ž โ€ข rf: relative frequency โ€ข pd: point difficulty โ€ข cl: normalized clusteredness โ€ข ir: irrelevant features โ€ข pset: โ€œParentโ€ set โ€ข algo: anomaly detection algorithm โ€ข Assess the algo effect while controlling for all other factors โ€ข ๐ด๐ด๐ด๐ด๐ด๐ด: area under the ROC curve for the nominal vs. anomaly binary decision Analysis of Variance
  • 22. BigML, Inc #DutchMLSchool 22 โ€ข 19 UCI Datasets โ€ข 9 Leading โ€œfeature-basedโ€ algorithms โ€ข 11,888 non-trivial benchmark datasets โ€ข Mean AUC effect for โ€œnominalโ€ vs. โ€œanomalyโ€ decisions โ€ข Controlling for โ€ข Parent data set โ€ข Difficulty of individual queries โ€ข Fraction of anomalies โ€ข Irrelevant features โ€ข Clusteredness of anomalies โ€ข Baseline method: Distance to nominal mean (โ€œtmdโ€) โ€ข Best methods: K-nearest neighbors and Isolation Forest โ€ข Worst methods: Kernel-based OCSVM and SVDD Benchmarking Study Results 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 knn iforest egmm rkde lof abod loda svdd tmd ocsvm Mean AUC Effect
  • 23. BigML, Inc #DutchMLSchool 23 โ€ข Show top-ranked candidate to the user โ€ข User labels candidate โ€ข Label is used to update the anomaly detector โ€ข Two methods โ€ข AAD [Das, et al, ICDM 2016] โ€ข GLAD-OMD (modified version of iForest) [Siddiqui, et al., KDD 2018] Incorporating User Feedback: Initial Work Data Anomaly Detection Best Candidate User Anomaly Analysis yes no
  • 24. BigML, Inc #DutchMLSchool 24 User Feedback Yields Big Improvements in Anomaly Discovery APT Engagement 3 Results
  • 25. BigML, Inc #DutchMLSchool Deep Versions of the Four Basic Methods 25
  • 26. BigML, Inc #DutchMLSchool 26 โ€ข Input image ๐‘ฅ๐‘ฅ โ€ข Network backbone, also called the โ€œencoderโ€: ๐‘ง๐‘ง = ๐ธ๐ธ ๐‘ฅ๐‘ฅ โ€ข Latent representation ๐‘ง๐‘ง โ€ข โ€œLogitsโ€ โ„“๐‘˜๐‘˜ = ๐‘ค๐‘ค๐‘˜๐‘˜ โ‹… ๐‘ง๐‘ง โ€ข Predicted probabilities ฬ‚ ๐‘๐‘ ๐‘ฆ๐‘ฆ = ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ = exp โ„“๐‘˜๐‘˜(๐‘ง๐‘ง) โˆ‘๐‘˜๐‘˜โ€ฒ exp โ„“๐‘˜๐‘˜โ€ฒ(๐‘ง๐‘ง) Deep Anomaly Detection in Image Classification Convolutional Neural Network Classifier Image ๐‘ฅ๐‘ฅ Penultimate Layer ๐‘ง๐‘ง Logits โ„“๐‘˜๐‘˜ = ๐‘ค๐‘ค๐‘˜๐‘˜ โŠค ๐‘ง๐‘ง Probabilities ๏ฟฝ ๐‘๐‘(๐‘ฆ๐‘ฆ = ๐‘˜๐‘˜|๐‘ฅ๐‘ฅ) ฬ‚ ๐‘๐‘(๐‘ฆ๐‘ฆ = ๐‘˜๐‘˜|๐‘ฅ๐‘ฅ) โ€œBackboneโ€ encoder ๐ธ๐ธ
  • 27. BigML, Inc #DutchMLSchool 27 โ€ขK-nearest neighbor in the latent space โ€ขIssue: What distance metric to use? โ€ขCosine distance is the most popular: ๐‘‘๐‘‘ ๐‘ง๐‘ง1, ๐‘ง๐‘ง2 = ๐‘ง๐‘ง1 โ‹… ๐‘ง๐‘ง2 ๐‘ง๐‘ง1 โ€–๐‘ง๐‘ง2โ€– Distance-Based Methods
  • 28. BigML, Inc #DutchMLSchool 28 โ€ขMahalanobis Method โ€ข Fit a joint multivariate Gaussian โ€ข Each class ๐‘˜๐‘˜ has its own mean ๐œ‡๐œ‡๐‘˜๐‘˜ โ€ข Shared covariance matrix ฮฃ โ€ขGiven a new ๐‘ฅ๐‘ฅ, log ๐‘ƒ๐‘ƒ(๐‘ฅ๐‘ฅ) โˆ min ๐‘˜๐‘˜ ๐‘ฅ๐‘ฅ โˆ’ ๐œ‡๐œ‡๐‘˜๐‘˜ โŠค ฮฃโˆ’1 ๐‘ฅ๐‘ฅ โˆ’ ๐œ‡๐œ‡๐‘˜๐‘˜ This is known as the squared Mahalanobis distance Density-Based Methods
  • 29. BigML, Inc #DutchMLSchool 29 โ€ข Residual Flow Deep Density Estimator โ€ข (Chen, Behrmann, Duvenaud, et al. NeurIPS 2019) โ€ข Standard Cross-Entropy Supervised Loss โ€ข Claim: This helps focus ๐‘ƒ๐‘ƒ ๐‘ฅ๐‘ฅ on relevant aspects of the images โ€ข Anomaly Score: ๐ด๐ด ๐‘ฅ๐‘ฅ๐‘ž๐‘ž = โˆ’ log ๐‘ƒ๐‘ƒ(๐‘ฅ๐‘ฅ๐‘ž๐‘ž) Open Hybrid: Classification + Density Estimation (Tack, Li, Guo, Guo, 2020)
  • 30. BigML, Inc #DutchMLSchool 30 โ€ข The method is somewhat tricky to work with โ€ข Set ๐‘๐‘ as the mean of a small set of points passed through the untrained network โ€ข No bias weights โ€ข These help prevent โ€œhypersphere collapseโ€ Quantile Method: Deep SVDD (Ruff, et al. ICML 2018)
  • 31. BigML, Inc #DutchMLSchool 31 โ€ข Encoder: ๐‘ง๐‘ง = ๐ธ๐ธ ๐‘ฅ๐‘ฅ โ€ข Decoder: ๏ฟฝ ๐‘ฅ๐‘ฅ = ๐ท๐ท(๐‘ง๐‘ง) โ€ข Challenge: How to constrain ๐ธ๐ธ and ๐ท๐ท so that the autoencoder fails on anomalies but succeeds on nominal images? โ€ข Autoencoders often learn general- purpose image compression methods Reconstruction Methods: Deep Autoencoders ๐‘ฅ๐‘ฅ ๐‘ง๐‘ง ๏ฟฝ ๐‘ฅ๐‘ฅ ๐ธ๐ธ ๐ท๐ท
  • 32. BigML, Inc #DutchMLSchool Classifier-Based Anomaly Detection using the Max Logit Score 32
  • 33. BigML, Inc #DutchMLSchool 33 โ€ขGarrepalli (2020) โ€ข Train classifier to optimize softmax likelihood (minimize โ€œcross-entropy lossโ€) โ€ข Maximum logit score is better than two distance methods: โ€ข Isolation Forest โ€ข LOF (a nearest-neighbor method) Surprise: The Max Logit Score 0.68 0.67 0.63 0.72 0.51 0.44 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 H (y|x) Max SoftMax- prob. Max BCE-prob Max-logit Iforest LOF AUROC Anomaly Measures on Latent Representations for CIFAR-100
  • 34. BigML, Inc #DutchMLSchool 34 โ€ข Vaze, Han, Vedaldi, Zisserman (2021): โ€œOpen Set Recognition: A Good Classifier is All You Needโ€ (ICLR 2022; arXiv 2110.06207) โ€ข Carefully train a classifier using the latest tricks โ€ข Standard cross-entropy combined with the following: โ€ข Cosine learning rate schedule โ€ข Learning rate warmup โ€ข RandAugment augmentations โ€ข Label Smoothing โ€ข Anomaly score: max logit โ€ข โˆ’ max ๐‘˜๐‘˜ โ„“๐‘˜๐‘˜ More Evidence for Max Logit Protocol from Lawrence Neal et al. (2018)
  • 35. BigML, Inc #DutchMLSchool 35 โ€ขNovel class difficulty based on semantic distance โ€ข CUB: Bird species โ€ข Air: Aircraft โ€ข ImageNet Still More Evidence for Max Logit
  • 36. BigML, Inc #DutchMLSchool 36 Why? Letโ€™s Examine the Learned Representations
  • 37. BigML, Inc #DutchMLSchool 37 โ€ข DenseNet with 384-dimensional latent space. โ€ข CIFAR-10: 6 known classes, 4 novel classes โ€ข UMAP visualization โ€ข Light green: novel classes โ€ข Darker greens: known classes โ€ข Note that many novel classes stay toward the center of the space; others overlap with known classes โ€ข Training was not required to โ€œpull them outโ€ so that they could be discriminated How are open set images represented by deep learning? Alex Guyer 6 Known Classes 4 Novel Classes
  • 38. BigML, Inc #DutchMLSchool 38 Similar Results from Other Groups [Tack, et al. NeurIPS 2020] [Vaze, et al. arXiv 2110.06207]
  • 39. BigML, Inc #DutchMLSchool 39 โ€ข Convolutional neural network learns โ€œfeaturesโ€ that detect image patches relevant to the classification task โ€ข The logit layer weights these features to make the classification decision โ€ข Novel classes activate fewer of these features, so their activation vectors are smaller โ€ข Hypothesis: The networks donโ€™t detect that an elephant is novel because of trunk and tusks but because its head doesnโ€™t activate known features The Familiarity Hypothesis The network doesnโ€™t detect novelty, it detects the absence of familiarity
  • 40. BigML, Inc #DutchMLSchool 40 Novel images strongly activate fewer features โ€ข CIFAR 10: 6 known classes; 4 novel classes โ€ข DenseNet (๐‘ง๐‘ง has 324 dimensions) โ€ข Activation threshold ๐œƒ๐œƒ โ€ข Count number of features whose activation exceeds ๐œƒ๐œƒ โ€ข OOD images activate fewer features Evidence: Number of Activated Features Alex Guyer (unpublished)
  • 41. BigML, Inc #DutchMLSchool 41 Are they features โ€œonโ€ the object vs. the background? โ€ข Strategy: blur the object and see how the feature activations change โ€ข activations that change must be on the object โ€ข Details: โ€ข PASCAL VOC Segmented Images โ€ข Blur the original image (31x31 kernel; sd=31) โ€ข Form composite image where blurred region replaces the segmented region Which features are responsible for the drop in activation? https://www.peko-step.com/en/tool/blur.html
  • 42. BigML, Inc #DutchMLSchool 42 Blurring Examples Note: This does not remove all object-related information (e.g., object boundary), so we donโ€™t detect all on-object features
  • 43. BigML, Inc #DutchMLSchool 43 โ€ข โ€œpresence featureโ€ โ€ข ๐ต๐ต๐ต๐ต ๐‘–๐‘–, ๐‘—๐‘— > 0. Blurring decreases the activity of the feature. Its net effect is to measure the presence of one or more image patterns โ€ข Its activity is high when those patterns are present โ€ข โ€œabsence featureโ€ โ€ข ๐ต๐ต๐ต๐ต ๐‘–๐‘–, ๐‘—๐‘— < 0. Blurring increases the activity of the feature. Its net effect is to measure the absence of one or more image patterns โ€ข Its activity is high when those patterns are absent โ€ข Define the โ€œblurring effectโ€ of feature ๐‘—๐‘— on image ๐‘–๐‘– ๐ต๐ต๐ต๐ต ๐‘–๐‘–, ๐‘—๐‘— = ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– โˆ’ ฬƒ ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– where โ€ข ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– is the activation of latent feature ๐‘—๐‘— on image ๐‘–๐‘– โ€ข ฬƒ ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– is the activation of latent feature ๐‘—๐‘— on blurred image ๐‘–๐‘– Blurring Effect
  • 44. BigML, Inc #DutchMLSchool 44 โ€ขOn average, the activation of a feature changes when the object (of class ๐‘˜๐‘˜) is blurred ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ = 1 ๐‘๐‘๐‘˜๐‘˜ ๏ฟฝ ๐‘–๐‘–:๐‘ฆ๐‘ฆ๐‘–๐‘–=๐‘˜๐‘˜ ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– โˆ’ ฬƒ ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– โ€ขFeature ๐‘—๐‘— is a net presence feature for class ๐‘˜๐‘˜ if ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ > 0.02 โ€ขFeature ๐‘—๐‘— is a net absence feature for class ๐‘˜๐‘˜ if ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < โˆ’0.02 โ€ขOtherwise ๐‘—๐‘— is net neutral for class ๐‘˜๐‘˜ โ€œOn Objectโ€ score of feature ๐‘—๐‘— for class ๐‘˜๐‘˜
  • 45. BigML, Inc #DutchMLSchool 45 โ€ข Logit score is โ„“๐‘—๐‘—๐‘—๐‘— = โˆ‘๐‘—๐‘— ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘—๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– โ€ข Contribution of ๐‘—๐‘— in image ๐‘–๐‘– to class ๐‘˜๐‘˜: โ€ข ๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– = ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘—๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– (in normal images) โ€ข ฬƒ ๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– = ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— ฬƒ ๐‘ง๐‘ง๐‘–๐‘–๐‘–๐‘– (in blurred images) โ€ข Mean contribution โ€ข ฬ… ๐‘๐‘๐‘—๐‘—๐‘—๐‘— = 1 ๐‘๐‘๐‘˜๐‘˜ โˆ‘ ๐‘–๐‘– ๐‘ฆ๐‘ฆ๐‘–๐‘– = ๐‘˜๐‘˜ ๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– โ€ข ฬ…ฬƒ ๐‘๐‘๐‘—๐‘—๐‘—๐‘— = 1 ๐‘๐‘๐‘˜๐‘˜ โˆ‘ ๐‘–๐‘– ๐‘ฆ๐‘ฆ๐‘–๐‘– = ๐‘˜๐‘˜ ฬƒ ๐‘๐‘๐‘–๐‘–๐‘–๐‘–๐‘–๐‘– Feature Taxonomy ๐’˜๐’˜๐’‹๐’‹๐’‹๐’‹ > ๐ŸŽ๐ŸŽ ๐’˜๐’˜๐’‹๐’‹๐’‹๐’‹ < ๐ŸŽ๐ŸŽ ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ > 0.02 positive presence negative presence ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < 0.02 positive absence negative absence Sun & Li: On the Effectiveness of Sparsification for Detecting the Deep Unknowns. arXiv 2111.09805
  • 46. BigML, Inc #DutchMLSchool 46 Mean feature types for class 3 1.00 0.00 On-Object Index (presence) On-Object Index (absence) positive features negative features red = presence blue = absence
  • 47. BigML, Inc #DutchMLSchool 47 Zoomed View: Blurring reduces ฬ… ๐‘๐‘๐‘—๐‘—๐‘—๐‘— Mean unblurred contribution Mean blurred contribution โ€ข Blurringโ€ฆ โ€ข reduces the contribution of positive presence features (red dots) โ€ข reduces the contribution of negative absence features (blue dots) 1.00 0.00 On-Object Index (presence) On-Object Index (absence)
  • 48. BigML, Inc #DutchMLSchool 48 Decomposing the Logit Score: Four Cases Positive presence: ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— > 0 and ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ > 0 Positive absence: ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— > 0 and ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < 0 Negative presence: ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— > 0 and ๐‘‚๐‘‚๐‘‚๐‘‚(๐‘—๐‘—, ๐‘˜๐‘˜) > 0 Negative absence: ๐‘ค๐‘ค๐‘—๐‘—๐‘—๐‘— < 0 and ๐‘‚๐‘‚๐‘‚๐‘‚ ๐‘—๐‘—, ๐‘˜๐‘˜ < 0
  • 49. BigML, Inc #DutchMLSchool 49 Visualizing Individual Images: OOD Instance 838
  • 50. BigML, Inc #DutchMLSchool 50 OOD Instance 770
  • 51. BigML, Inc #DutchMLSchool 51 OOD Instance 432
  • 52. BigML, Inc #DutchMLSchool 52 โ€ข Note that the Positive Presence features dominate the max logit score โ€ข The Negative Absence and Positive Absence features (purple and blue lines) make a small contribution โ€ข Negative Presence features make no contribution โ€ข Conclusion: Decreases in activations of positive presence account for most of the max logit score Decomposing the Novelty Scores
  • 53. BigML, Inc #DutchMLSchool 53 โ€ขRed line: trend for Positive Presence contribution to max logit score โ€ขBlack line: smooth estimate of classification accuracy (โ€œknownโ€ vs โ€œnovelโ€) Decreases in Positive Presence Features Account for Novelty Detection Accuracy
  • 54. BigML, Inc #DutchMLSchool 54 โ€ขBlakemore, Colin, and Grahame F. Cooper. โ€œDevelopment of the brain depends on the visual environment.โ€ (1970): 477-478. โ€ข Kittens raised in environments with only horizontal or only vertical lines โ€ข โ€œThey were virtually blind for contours perpendicular to the orientation they had experienced.โ€ โ€ขChomsky: โ€œPoverty of the stimulusโ€ Can we expect computer vision systems to perceive things they have not been trained on? Source: Li Yang Ku https://computervisionblog.wordpress.com/2013/06/01/ca ts-and-vision-is-vision-acquired-or-innate/
  • 55. BigML, Inc #DutchMLSchool 55 โ€ข Familiarity-based anomaly detection advantages: โ€ข Easy to implement โ€“ Anomaly signal (max logit) can be extracted from the classifier. No separate anomaly detection model is needed โ€ข Training on additional, auxiliary classes improves both classification and anomaly detection performance โ€ข Familiarity-based anomaly detection weaknesses โ€ข Partially-occluded nominal objects will be flagged as anomalies โ€ข If an image contains both a novel object and a known object, the novel object will not be detected โ€ข Adversarial attacks can easily cause false anomalies and missed anomalies Implications
  • 57. BigML, Inc #DutchMLSchool 57 โ€ข Can we learn deep representations that can represent outliers? โ€ข Nonstationarity โ€ข As the world changes, the anomaly detection model must also change โ€ข Explanation โ€ข Users often want explanations of why something is labeled as anomalous in order to provide feedback or take other actions โ€ข Setting alarm thresholds โ€ข How can we set a threshold to control the false alarm and missed alarm rates? โ€ข Incremental (continual) learning in deep networks โ€ข How can we efficiently update a trained neural network to incorporate user feedback? โ€ข Anomaly detection in temporal, spatial, and spatio-temporal data, in video data, etc. โ€ข Anomaly detection at multiple scales Challenges for Anomaly Detection
  • 59. BigML, Inc #DutchMLSchool โ€ข Four Basic Methods โ€ข Distances, densities, density quantiles, and reconstruction โ€ข Distances work best; Isolation Forest is very robust โ€ข Anomaly Detection in Deep Learning โ€ข The four basic methods have been extended to deep learning โ€ข They often do not work well when applied to learned representations โ€ข Classifier Max Logit Score Gives Very Competitive Performance โ€ข Computed as a side effect of standard deep classifiers โ€ข Measures familiarity rather than novelty, which makes it risky in many settings โ€ข Advances in Deep Anomaly Detection Require Learning Better Representations Shallow and Deep Methods for Anomaly Detection 59