Adversarial Attacks and Defense

'Machine Learning models, adversarial
attacks and defense strategies
Center for Information Assurence Lab, University of Memphis
Speaker 1: Dr. Dipankar Dasgupta
Speaker 2: Dr. Kishor Datta Gupta

Speakers:
Dipankar Dasgupta, Ph.D., IEEE Fellow
Hill Professor of Computer Science
The University of Memphis
Memphis, TN 38152
Email: dasgupta@memphis.edu
Kishor Datta Gupta, PhD.
Postdoc, North Carolina A&T state University
Greensboro, NC, 27405
Email: gkishordatta@ncat.edu

Lecture Plan
What is adversarial attack?
Variations of adversarial attacks (with GAN based attacks).
Defenses against adversarial attacks.
Adversarial Attack points on deployed ML System.
AI is not Magic – it is Computational Logic!
Filter based Adaptive defense system.
Outlier based defense system.

Section 1
AI is not Magic – it is Computational Logic!
.

AI IS NOT MAGIC – IT IS
COMPUTATIONAL LOGIC!

AI/ML TECHNIQUES:
• Mathematics and statistics are underlying
building blocks of all computer algorithms
(Alg). There are more than fifty so-called AI/ML
algorithms and heuristics; some are model-
based, most are data-driven--rely on data
quantity, quality & reliability.
• Learning Models include supervised,
unsupervised, semi-supervised, reinforcement,
real-time, incremental, online, etc. etc.
• All models require parameter tuning for
efficiently use as decision-support to perform
detection, recognition, classification, prediction
tasks.
• Performance typically varies with training and
focus on Generalization or Specialization.
AI  ML
Alg.  AI  ML  ANN  DNN  CNN …..
D. DASGUPTA
6

USING AI IN COMPETITIONS
Figure: Deep Blue defeated the World chess champion Figure: IBM Watson defeated two human champion in Jeopardy
7

WHO WILL WIN?
This man vs. machine game demonstrates
that an AI machine is faster/better in winning
competitions if trained with the significant
amount of relevant information, if it is “a well-
defined game—the rules, the moves, the
goals”.
In 2011, IBM’s Watson®, an AI-based
question-answering supercomputer played
Jeopardy and demonstrated the power of AI
by defeating two human champions.
Man vs. AI machine
D. DASGUPTA 8

WHO WILL WIN?
• If two thinking intelligent machines compete in
playing a game (such as chess, game of cards or
jeopardy), when the environment unpredictably
changes in terms of game rules, board size, etc. who
will win?
• It may depend on which AI technique can take
correct dynamic decisions (based on current state of
the board, best moves and future predictions)
quickly; however, if the competing AI techniques are
equally efficient, the game may result in draws.
AI vs. AI
D. DASGUPTA 9

APPLICATIONS
OF AI
AI-based technology
is rapidly changing
our daily lives,
impacting our food,
health, work, and
society.
Saini et. al 2019,
CTIT 5-13 © STM Journals 2019
D. DASGUPTA 10

AI SUCCESSES
AND BUSINESS
OPPORTUNITIES
FDA Approved Artificial
Intelligence-Based IDx-DR
Figure 5: FDA Approved Artificial Intelligence-Based IDx-DR

Examples of AI-Powered Businesses
BBC News: https://www.bbc.com/news/av/technology-52994075/the-ai-powered-app-for-buying-clothes-online:
D. DASGUPTA 12

APPLICABILIT
Y
ISSUES OF AI
IN BUSINESS:
 AI learns from data, performs well in finding correlations,
recognition and discovering patterns but lack in generalization
and handling dynamic situations.
 Biased results: outputs may be biased because of input data or
tainted algorithmic decisions.
 AI as Black-Box is possibly desirable in many well-defined
applications (as trade secrets) but not in health care, criminal
justice, etc.
 Reproducibility: many factors such as initialization, configuration,
stochastics, parameter tuning require for efficiency and
performance improvement.
 Interpretability: use of blackbox AI making false predictions or
wrong decisions can bring catastrophic effects and need
Explainable AI.
 Accountability: verifiable processes that can hold
businesses accountable to biased outcomes and accommodate
application-specific definitions of fairness.
D. DASGUPTA 13

AI BIAS: MANIFOLDS AND FACET
• Data bias: Bias in data can exist in many shapes and forms, some of which can lead to
unfairness in different downstream learning tasks.
• Historical data, unbalanced, the way collected/sampled, processed and presented
• Measurement and Evaluation , the way we choose, aggregate, utilize, and measure a particular
feature and during model evaluation.
• Behavioral and Observational
• Algorithmic Bias: when the bias is not present in the input data but added by the algorithm
• Framing the problem
• Collecting & Preparing the data
• Interpretations of data & Decision manipulation
• Need verification methods that can help to detect and mitigate hidden biases within training
data or that mitigate the biases learned by the model regardless of the data quality,
and provide explanation for decisions.
14

AI BIAS: MANIFOLDS AND FACET
15
• Example of Bias data and use
taints the algorithms behind Web-
based applications, delivering
equally biased results
• Biases created through our
interaction with websites and how
content and use recycles back to
the Web or to Web-based
systems, creating various types of
second-order bias.
Ref: “Bias on the Web” by Ricardo Baeza-Yates,
Communication of the ACM,, VOL. 61, NO. 6,
JUNE 2018.

REDUCING MARKETPLACE COMPETITION
Online Search Software Agents
Software agents recommending a product or service
without searching all possible options.
Collaborative multi-agents engaging in price fixing or
establishing trade terms 16

EXAMPLES OF
BIASNESS:
D. DASGUPTA 17
• Amazon’s internal hiring tool that penalized
female candidates;
• Commercial face analysis and recognition
platforms that are much less accurate for
darker-skinned women than lighter-skinned
men;
• Recently, a Facebook ad recommendation
algorithm that likely perpetuates employment
and housing discrimination regardless of the
advertiser’s specified target audience.

Examples of Biasness::
D. DASGUPTA 18

POLICE TESTING AMAZON’S FACIAL-RECOGNITION
TOOL: REKOGNITION; BUT WHAT IF IT GETS WRONG?
Defense attorneys, artificial-intelligence researchers
and civil rights experts argue that the technology
could lead to the wrongful arrest of innocent people
who bear only a resemblance to a video image.
Rekognition’s accuracy is also hotly disputed, and
some experts worry that a case of mistaken identity
by armed deputies could have dangerous implications,
threatening privacy and people’s lives.
Technology: Washington Post, April 30,
2019. https://www.washingtonpost.com/technology/2019/04/30/amazons-facial-recognition-technology-is-
supercharging-local-police/?noredirect=on
Another Story: Unproven facial-recognition companies target schools, promising an end to shootings
D. DASGUPTA 19

DUAL ROLE OF AI
AI-based image morphing apps
Figure 10: Modifying someone’s face Figure 11: The use of AI to clone someone’s
voice

DUAL ROLE OF AI
Offensive AI vs. Defensive AI
AI-based tools are
used for
Interruption of
service,
Taking control,
braches.
Cyber attacks,
malware, hacking,
misinformation, etc.
AI techniques are
providing
Defense in depth,
Proactive vs. reactive
Patches, Authentication
access control, Testing,
Surveillance, etc.
D. DASGUPTA
21
Security of AI: Need Trustworthy AI to prevent Data Manipulation & Trojan AI

AI FOR GREATER
GOOD
European Union has
crafted seven principles
for guiding AI
development.
Figure 12: Interrelationship of the seven requirements: all are of equal
importance, support each other, and should be implemented and evaluated
throughout the AI system’s lifecycle

APPLICABILIT
Y
ISSUES OF AI
IN BUSINESS:
❑ AI learns from data, performs well in finding correlations,
recognition and discovering patterns but lack in generalization
and handling dynamic situations.
❑ Biased results: outputs may be biased because of input data or
tainted algorithmic decisions.
❑ AI as Black-Box is possibly desirable in many well-defined
applications (as trade secrets) but not in health care, criminal
justice, etc.
❑ Reproducibility: many factors such as initialization,
configuration, stochastics, parameter tuning require for
efficiency and performance improvement.
❑ Interpretability: use of blackbox AI making false predictions or
wrong decisions can bring catastrophic effects and need
Explainable AI.
❑ Accountability: verifiable processes that can hold
businesses accountable to biased outcomes and accommodate
application-specific definitions of fairness.
D. DASGUPTA 23

REGULATION OF AI BASED TECHNOLOGIES
• In April 2019, US lawmakers introduced a Bill “Algorithmic Accountability Act” to protect
from biased algorithms, deepfakes, and other bad AI
Federal Trade Commission will create rules to requiring entities that use, store, or
share personal information to conduct automated decision system impact assessments
and data protection impact assessments for evaluating “highly sensitive” automated
systems.
Companies would have to assess whether the algorithms powering these tools are
biased or discriminatory, as well as whether they pose a privacy or security risk to
consumers.
• Mariya Gabriel, Europe's top official on the digital economy, said to companies using AI
"People need to be informed when they are in contact with an algorithm and not
another human being. Any decision made by an algorithm must be verifiable and
explained." European Union crafted seven principles for guiding AI development. 24

NOTES
25
• AI is a powerful tool for wide variety of applications in pattern
recognition, among others.
• AI challenges in practice: Dealing with Bias, unknown, uncertain,
unpredictable, adversarial (AML) manipulation, etc.
• AI is double-edged technology and can play multiple roles (defensive,
offensive..), can be judge, jury, and executioner in decision-making.
• AI should be explainable and be used with full disclosure, transparency,
accountability, and auditability.
• AI technologies impacts on human will be far-reaching and industry
leaders need to use ethical AI for greater good.
Ref: AI vs. AI: Viewpoints. By Dipankar Dasgupta, Technical Report, 2019.

Section 2
What is adversarial attack?
Variations of adversarial attack.

Adversarial Attacks
A typical ML Pipeline:
Data
Train - Test
Data
Preprocessing
Model Training
and Evaluations
Inferencing
Training and
Validation Accuracies
here! here! here! here!

Adversarial Attacks - cont
Adversaries - Fooling agents
i) Noise
ii) Anomaly or perturbations
Source: Breaking things easy - Nicolas Papernot and Ian Goodfellow, 2016

Few more examples
Source: Developing Future Human-Centered Smart Cities… Ahmad et. al
Source: Camou: Learning Physical Vehicle Camouflages to Adversarailly
Attack Detectors in the Wild - Zhang et. al, ICLR 2019

Few more examples
Source: Robustifying Machine Perception for Image Recognition Systems:
Defense Against the Dark Arts

Threat model
Data
Train - Test
Data
Preprocessing
Model Training
and Evaluations
Inferencing
Confidentiality Integrity Availability - CIA model of security
1
1. Breaking things is easy - Clever Hans

Adversarial
Attacks
Model based Evasion Poisoning
Model
Inversion
Model
Extraction
White Box Black Box
Fig: Adversarial Attack Taxonomy

Poisoning Attacks - Adversarial Contamination
- Training Data is compromised.
3 ways in which poisoning can take place:
1. Dataset poisoning.
2. Algorithm poisoning.
3. Model poisoning.
Source: Viso.Ai - Adversarial Machine learning

Model-based attacks:
1. Model Inversion Attack: Sensitive data extraction from outputs and ML model.
2. Model Extraction Attack: Model parameter extraction (model stealing).
Two notable papers on model inversion attacks:
1. “The Secret Sharer: Measuring Unintended Neural
Network Memorization & Extracting Secrets”
- Carlini et. al, 2018
2. “Model Inversion Attacks that Exploit Confidence
Information and Basic Countermeasures”
- Fredrikson et. al, 2015
Source: Toronto - CS

Evasion Attacks
Categorization:
1. White box attack.
2. Black box attack.
3. Gray box attack.
Source: Introduction to adversarial machine learning - Floyd Hub

Evasion Attacks
Source: Introduction to adversarial machine learning - Floyd Hub
Type Algorithm
Score based attack One pixel attack
Patch attack Lavan, Dpatch
Gradient attack FGSM, BIM, JSMA
Decision attack Hopskipjump attack, Deepfool attack
Adaptive Attack BPDA

White box adversarial attack
Source: Adversarial Machine Learning - An Introduction by Binghui Wang

White box adversarial attack - intuition
Given a function approximation:
F : X -> Y, where X is an input feature vector and Y is an output vector.
An attacker knows what model is used and the training data.
Constructs adversarial sample X* from X by adding perturbation / noise vector v :
Arg min v ( F (X*) ) = Y*
Done in two steps:
1. Direct sensitivity mapping: Evaluate sensitivity of the model.
2. Perturbation Selection: Select perturbation affecting classification.

Black box adversarial attack - intuition
Common trend:
1. Train a random mode.
2. Generate adversarial samples.
3. Apply adversarial samples to target models.

Targeted Attacks
● Mislead classifier to predict a specific target.
Non-targeted Attacks
● Mislead classifier to misclassify any random labels except the correct labels.

Adversarial Attacks - end
Adversarial Origins:
● Intriguing properties of neural networks - Szegedy et. al (2013, ICLR) (Highly
recommended!)
● https://nicholas.carlini.com/writing/2019/all-adversarial-example-papers.html

Several adversarial attack algorithms:
1. L-BFGS
2. Fast gradient sign method (FGSM)
3. BIM and PGD
4. Momentum iterative attack
5. Distributionally adversarial attack
6. Carlini and Wagner attack
7. Jacobian-based saliency map approach
8. DeepFool
9. Elastic-net attack to DNNs
10. Universarial adversarial attack
11. Adversarial patch
12. GAN - based attacks
13. Obfuscated-gradient circumvention attacks
14. List goes on..
Variations of adversarial attack

L-BFGS
● Limited memory Broyden-Fletcher-Goldfarb-Shanno .
● Non-linear gradient-based numerical optimization algorithm.
● Finds the adversarial perturbations.
FGSM
● Uses gradient of the neural network to
create adversarial example.
● Minimizes the maximum amount of
perturbations added to any pixel
(for image) to cause misclassification.
Variations of adversarial attack - cont
Source: Element AI

BIM (Basic iterative Method )
● Improved FGSM with finer iterative optimizer for multiple iterations.
● Smaller step size and clips updated adversarial sample into valid range.
● Non-linear gradient-based numerical optimization algorithm.
PGD (Project Gradient Descent)
● Generalized version of BIM.
● White box attack
Source: Know your enemy-
Oscar Knagg,

Momentum iterative attack
● Inspired by momentum optimizer.
● Integrated momentum memory into iterative process of BIM and a new algorithm called MI-
FGSM (Momentum iterative FGSM).
Source: Boosting Adversarial
Attacks with momentum - Dong
et. al, CVF 2017

Carlini & Wagner (C&W) Attack
● Based on L-BFFS attack without constraints and different objective functions.
● Achieves 100% attack success rate on naturally trained DNNs for MNIST, CIFAR-10 and
ImageNet.
● Finds the adversarial instance by finding the smallest noise added to an image that will change
the classification to a different class.
Source: Learn the Carlini and
Wagner's adversarial attack -
MNIST - Yumi, 2019

JSMA (Jacobian-based Saliency Map Attack)
● Fool DNNs with small perturbations.
● Computes the Jacobian matrix of login outputs before the softmax layer.
● Uses feature selection to minimize the number of features modified while causing
misclassification.
● Flat perturbations are added to features iteratively according to saliency value by decreasing
order.
● Computationally expensive than FGSM.

DeepFool
● Fool DNNs with minimal perturbations on binary classification tasks. Equation to calculate
minimal perturbations:
Source: DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks
- Dezfooli et. al

GAN-based attacks.

A comparison of the differences in perturbations.
Source: DeepFool: DETERMINING SEQUENCE OF IMAGE PROCESSING TECHNIQUE
(IPT) TO DETECT ADVERSARIAL ATTACKS - Gupta et. al

Real
Data
Random
Input
Generator
Sample
Data
Sample
Data
Discriminator
Generator Loss Discriminator Loss
Simple GAN
structure

A classifier tasked with differentiating real data from generated data (fake data).
● Real data : 1
● Fake data: 0
During training a discriminator:
1. The discriminator classifies real and fake data.
2. The loss function penalizes the discriminator for misclassifying real data as fake data, and vice
versa.
3. Weights are updated through backpropagation.
Discriminator Model

The generator model is tasked with producing fake data that resembles the real data, by adjusting
feedbacks from discriminator model. The goal of the generator is to make the discriminator label
generated examples as real data (to misclassify).
The generator requires :
● A random input: A random noise (introduced to existing or new) data.
● Generator network: Transforms random input into meaningful data instance.
● Discriminator network: Classifies the generated output.
● Generator loss: Penalty for failing to ‘fail’ the discriminator classifier.
Generator Model

During training a generator model:
1. Sample the random input and feed it to generator model.
2. The generated output is passed to discriminator for classification.
3. Compute loss from discriminator classification.
4. Backpropagate through discriminator and generator to adjust weights.
Generator Model

Hand shaking Discriminator and Generator
The discriminator and generator model trains alternatively for about 1 or more epochs at a time.
During training the discriminator, the generator model weights are kept constant. During training the
generator, the discriminator weights are kept constant. This is crucial for convergence.
The point of convergence is usually 50% accuracy maintained by the discriminator model. Below this,
the generator model shall train on garbage data contributing to inefficiency of the overall GAN.

The Losses in Discriminator and Generator
A GAN will require loss function for each discriminator and generator models. Till date, Minimax Loss
and Wasserstein Loss are most commonly used.
The losses are computed from the distances between probability distributions. The generator model
can affect the distribution of fake data.

Minimax Loss
L = E_x [log(D(x))] + E_z[log(1-D(G(z)))]
E_x = Expected value over all real data instances.
E_z = Expected value over all random inputs fed to generator.
D(x) = Discriminator estimate of the probability of True Positive.
G(z) = Generator output when given random input (noise) ‘z’.
D(G(z) = Discriminator estimate of the probability that a fake data is real.
The generator model can only minimize log(1-D(G(z)) , however, it will cause the GAN to get stuck in early stages of
training. Therefore, a modified minimax loss states that the generator model should only minimize log(D(G(z)).

Wasserstein Loss
A loss function used by WGAN (Wasserstein GAN), where the discriminator is rather tasked with making the
output bigger for real data example than fake examples. The discriminator in WGAN is termed as ‘critic’.
● The critic loss: D(x) - D(G(z)).
The critic tries to maximize the difference between its output on real and fake instances.
● The generator loss for WGAN is just: D(G(z)).
The generator model tries to maximize the discriminator’s output for its fake instances.

Wasserstein GAN solves the
problem of vanishing gradients
that persist in traditional
GANs.

Applications of GANs
➔ Generate data samples (images, human faces, etc)
➔ Image-to-Image translation
➔ Text-to-image translation
➔ Face Frontal View Generation
➔ Photo Blending and Inpainting
…….
…….
…….
➔ Deep Fake

Section 3
Adversarial Attack points on deployed ML System.

Different Adversarial Attack Points in
deployed system
65
Effective operating system and communication channel security is a prerequisite.
Adversarial Attacks in Deployed System

Adversarial Attack Points..
First, when the input is on the physical world, an adversary can change the input as a famous
adversarial example of changing stop sign to speed limit sign conversion. Here, we have to note that this
attack needs a higher physical degree of modification. Hence, if the noise amount is low, there is a high
chance that environmental factors will nullify the attacks based on our literature review.
Different noise detection method can employ to defend this.

if someone changes one of the sensors so that sensors can add adversarial noises while converting
physical input data into a digital form. One prominent example is putting an adversarial patch sticker
on the CC cam lens to manipulate ML output(Li et al., 2019). This attack can also happen when an
output action was presented or happening.
Pre-processing methods can protect most of the standard AA, and it can employ when input
sensors are converting to digital format.

The third point is where data turned to a digital format (binary/floating/etc.) and sending to the ML system
or output receiving from ML . It is more of a communication channel. It could argue that if adversarial can
control this channel, intruder can modify the data going to the ML or coming from ML. Low perturb-based
attacks can occur here as no environmental factors will diminish the noise. However, we can also say
that if someone can control this network, intruder might not need to add adversarial noises; intruder can
directly add his desired class input for his expecting output. Also, if intruder can modify the output result
no need for adversarial perturb noises.
Computer network layer security, data layer security, et are concerned here

The attack can also happen if the system (example: server) is compromised. This scenario, also similar to the
third point. It more of an intrusion in the system. Low perturbs attack can happen, but we can see that if the
attacker already compromised the server, attacker doesn't need to compromise the input. It is easier for attacker
to change the output label, as attacker already compromised the system. We can argue that the attacker's ability
is limited, so that attacker will prefer AAs.
Falls in operating system security or cloud security based on ML implementation.

Some Trojans or backdoor can be added in the ML before deployment, which is only activated for
specific inputs. .
Hard to address and may need extensive testing on a regular basis..

Additional attack point which is not presented in the figure was adaptive attack. An attacker can run different test
input to determine the decision boundary of ML model. Also, as AAs are transferable attack generated in another
system will work in here.
Can be handled with dynamic ML model configuration (with regular update), or using
active/reinforcement learning approach. It can also help if employ a defense system to identify
attack query pattern, which can relate to DDoS attacks.

Adversarial Attacks in Deployed
System
• Adversarial defenses are interconnected with other cybersecurity and
network infrastructure domains.
• There was a lack of research for adversarial defenses in deployed
environments, and need higher priorities before releasing AI-based
products to market.
• Adaptive attacks and TrojAI attacks are the most challenging which will
require end-to-end protection to defend.
• Defense against AAs needs a collaborative approach between
network, data, firmware, hardware, OS, cloud security.
Adversarial attack (AA) formulations are effective for studying and understanding how deep
learning methods work, and it has enormous potential to devise a way to provide explainability in
AI.
As a cybersecurity threat, it's not yet a considerable threat that would require significant cyber
resources other than the standard cybersecurity practices.
But it cannot be ignored that as more research progresses, it will become a substantial threat that
would cause the AI-based system vulnerable

Section 4
Defenses against adversarial attack.

Defences Against Adversarial Attacks
Real world ML Challenges:
● Defense against diverse attack types.
● Defense for machine learning models.
● Less computational time.
● Less modifications.
● Controlled updates.
● Defense against model changes.
● Cross domain defenses capabilities.
● Defenses should not require any information on learning models.
● Defense should not require any complete training dataset.

Defences Against Adversarial Attacks - cont
● Adversarial training
○ Reduces classification errors
● Gradient masking
○ Denies the attacker access to the useful gradient
● Defence distillation
○ Adversarial training technique where the target model is used to train a
smaller model that exhibits a smoother output surface.
● Ensemble adversarial learning
○ Multiple classifiers are trained together and combined to improve robustness
● Feature squeezing
○ Reduces the search space available to an adversary

Adversarial Training
● Includes adversarial images in the training stage.
● Test examples that are relatively far away from this manifold are more likely to be
vulnerable to adversarial attacks.
● Susceptible to “blind-spot attack”
● If trained on only adversarial examples, accuracy reduces significantly. (Why?)

Dataset image Source: Park et. al (2020) - On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification
Input Data
Deep
Learning
Training
Predictive
Model
Correct
Label

Strategies
● Train from scratch with a mixed dataset of original and adversarial images
● Train the model on original examples and finetune on adversarial examples afterwards

Random Resizing and Padding
● Randomly Resizing input image and applying random padding on all 4 sides have shown to
improve the robustness. (Xie et. al, 18)
Source: Xie et. al (2018) - Mitigating Adversarial Effects Through Randomization

Gradient Masking
● Most adversarial example construction techniques use the gradient of the model to
make an attack.
● But what if there were no gradient?

Defense Distillation
● Adds flexibility to an algorithm’s classification process
● Makes the model less susceptible to exploitation.
● One model is trained to predict the output probabilities of another model that was
trained on an earlier, baseline standard

● Pros
○ Adaptable to unknown threats
○ More dynamic
○ Requires less human intervention
● Cons
○ 2nd model is bound by the general rules of the first model
○ Can be reverse-engineered to discover fundamental exploits
○ Vulnerable to so-called poisoning attacks

Ensemble adversarial learning
● Ensemble methods is a machine learning technique that combines several base models in order
to produce one optimal predictive model.
● Ensemble Adversarial Training is a technique that augments training data with perturbations
transferred from other models.
● Multiple classifiers are trained together and combined to improve robustness

Feature Squeezing
Feature squeezing reduces the search space available to an adversary by coalescing samples
that correspond to many different feature vectors in the original space into a single sample.
Defences Against Adversarial Attacks - end
Source: Xu et. al - Detecting Adversarial Examples in Deep Neural Networks

Detect Adversarial Examples
● If an adversarial input is detected, the defensive mechanism makes the classifier to refuse to predict
class for it.
● Methods:
○ Kernel Density detector
■ Feinman et. al(2017) - Detecting Adversial Samples from Artifacts
○ Local Intrinsic Dimensionality
■ Ma et. al (2018) - Characterizing Adversarial Subspace Using Local Intrinsic Dimensionality
○ Adversary Detection networks
■ Metzen et. al (2017) On detecting adversarial perturbations

Section 5
Filter Based Defense technique

Low Noise AAs are not effective in
Physical world
Percentage of adversarial samples getting ineffective due to environmental factors
89
Attack Types
Minimum Adversarial noise:
Print/Screen

Adver
sarial
Clean
Histogram's representation is dependent of the color of the object being studied,
ignoring its shape and texture.

Ensemble of Adaptive Filters (EAF) to Defend against
Adversarial Machine Learning - cont
Filters for adversarial detection.

Detection of Adversarial Traits(1)
Clean and adversarial images have quantifiable noise difference

Section 6
Outlier Based Defense technique

Effect of Filters
MNIST dataset: AA and Clean data in 2D space
100

Representation in 2D space
101

Performance in Binary Classification
102

Experimental Results
Detection accuracy for different attack type for different class of CIFAR and MNIST dataset
103
Detection accuracy for binary classification of clean and adversarial input(all) MNIST dataset

Negative Selection Algorithm for Outlier Detection
• Define Self as a normal pattern of activity or stable behavior of a system/process as a collection of logically split segments (equal-size) of
pattern sequence and represent the collection as a multiset S of strings of length lover a finite alphabet.
• Generate a set R of detectors, each of which fails to match any string in S.
• Monitor new observations (of S) for changes by continually testing the detectors matching against representatives of S. If any detector
ever matches, a change ( or deviation) must have occurred in system behavior.
104

V-detector Negative Selection
Algorithm
Main idea of V-detector By allowing the detectors to have some
variable properties, V-detector enhances negative selection algorithm
from several aspects: It takes fewer large detectors to cover non-self
region – saving time and space Small detector covers holes better.
Coverage is estimated when the detector set is generated. The shapes
of detectors or even the types of matching rules can be extended to be
variable too.
105
(Reference: JI and Dasgupta 2005)

Generating (Negative) Detector set
106

Testing Phase Flow Diagram
108

Adversarial attacks on CFIAR and
IMAGENET* detection rate
109

Outlier Detection models
Type Abbr Algorithm
Linear Model
MCD Minimum Covariance Determinant (use the mahalanobis distances as the outlier scores)
OCSVM One-Class Support Vector Machines
LMDD Deviation-based Outlier Detection (LMDD)
Proximity-Based
LOF Local Outlier Factor
COF Connectivity-Based Outlier Factor
CBLOF Clustering-Based Local Outlier Factor
LOCI LOCI: Fast outlier detection using the local correlation integral
HBOS Histogram-based Outlier Score
SOD Subspace Outlier Detection
ROD Rotation-based Outlier Detection
Probabilistic
ABOD Angle-Based Outlier Detection
COPOD COPOD: Copula-Based Outlier Detection
FastABOD Fast Angle-Based Outlier Detection using approximation
MAD Median Absolute Deviation (MAD)
SOS Stochastic Outlier Selection
Outlier
Ensembles
IForest Isolation Forest
FB Feature Bagging
LSCP LSCP: Locally Selective Combination of Parallel Outlier Ensembles
XGBOD Extreme Boosting Based Outlier Detection (Supervised)
LODA Lightweight On-line Detector of Anomalies
Neural Networks
AutoEncoder Fully connected AutoEncoder (use reconstruction error as the outlier score)
VAE Variational AutoEncoder (use reconstruction error as the outlier score)
Beta-VAE Variational AutoEncoder (all customized loss term by varying gamma and capacity)
SO_GAAL Single-Objective Generative Adversarial Active Learning
MO_GAAL Multiple-Objective Generative Adversarial Active Learning
110

Comparison of
results with different
outlier detection
models to compare
V-detector NSA
performance with
other OCC methods.
Comparison with different outlier methods
111

Notes
● AI challenges in practice: Dealing with Bias, unknown, uncertain, unpredictable, adversarial (AML)
manipulation, etc.
● AI should be explainable and be used with full disclosure, transparency, accountability, and auditability.
● AI technologies impacts on human will be far-reaching and industry leaders need to use ethical AI for greater
good.
● Diverse computational methods are explored to create and defend adversarial attacks.
● Adversarial input has detectable traits and that can use to detect common adversarial attacks.
● Evasion based adversarial attacks are hard to produce and easy to counter for deployed ML system.
● Input preprocessing-based filtering techniques are more practical as defense for adversarial attack.
● Outlier based defense techniques are shows promises as strong defense.
● Adaptive attacks and TrojAI attacks are the most challenging which will require end-to-end protection to
defend.
● Defense against AAs needs a collaborative approach between network, data, firmware, hardware, OS, cloud
security.

List of Papers (used in this tutorial)
● Smooth Adversarial Training - Xie et. al (2020, ArXiv)
● Feature Denoising for Improving Adversarial Robustness - Xie et. al (2019, CVPR)
● Learnable Boundary Guided Adversarial Training - Cui et. al (2021, ICCV)
● An Orthogonal Classifier for Improving the Adversarial Robustness of Neural Networks - Xu et.
al (2021, ArXiv)
● An integrated Auto Encoder-Block Switching defense approach to prevent adversarial attacks -
Upadhyay et. al (2021, gjstx-e)

References:
114
1. Machine learning in cyber security: Survey, D Dasgupta, Z Akhtar and Sajib Sen
2. https://medium.com/onfido-tech/adversarial-attacks-and-defences-for-convolutional-neural-networks-66915ece52e7
3. “Poisoning Attacks against Support Vector Machines”, Biggio et al. 2013.[https://arxiv.org/abs/1206.6389]
4. “Intriguing properties of neural networks”, Szegedy et al. 2014. [https://arxiv.org/abs/1312.6199]
5. “Explaining and Harnessing Adversarial Examples”, Goodfellow et al. 2014. [https://arxiv.org/abs/1412.6572]
6. “Towards Evaluating the Robustness of Neural Networks”, Carlini and Wagner 2017b. [https://arxiv.org/abs/1608.04644]
7. “Practical Black-Box Attacks against Machine Learning”, Papernot et al. 2017. [https://arxiv.org/abs/1602.02697]
8. “Attacking Machine Learning with Adversarial Examples”, Goodfellow, 2017. [https://openai.com/blog/adversarial-example-research/]
9. https://medium.com/@ODSC/adversarial-attacks-on-deep-neural-networks-ca847ab1063
10. Tianyu Gu, Brendan Dolan-Gavitt, and Siddharth Garg. Badnets: Identifying vulnerabilities in the machine learning model supply chain. arXiv
preprint arXiv:1708.06733, 2017.
11. A brief survey of Adversarial Machine Learning and Defense Strategies.Z Akhtar, D Dasgupta Technical Report, The University of Memphis,
12. Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial AttacksKD Gupta, D Dasgupta, Z Akhtar, arXiv preprint
arXiv:2007.00337
13. Aaditya Prakash, Nick Moran, Solomon Garber, Antonella DiLillo, and James Storer. Deflecting adversarial attacks with pixel deflection. In
Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8571–8580, 2018.
14. Nicholas Carlini. Lessons learned from evaluating the robustness of defenses to adversarial examples. 2019.
15. Nicholas Carlini, Anish Athalye, Nicolas Papernot, Wieland Brendel, Jonas Rauber, Dimitris Tsipras, Ian Goodfellow, Aleksander Madry, and Alexey
Kurakin. On evaluating adversarial robustness. arXiv preprint arXiv:1902.06705, 2019.
16. Nicholas Carlini and DavidWagner. Defensive distillation is not robust to adversarial examples. arXiv preprint arXiv:1607.04311, 2016.
17. Nicholas Carlini and David Wagner. Adversarial examples are not easily detected: Bypassing ten detection methods. In Proceedings of the 10th
ACM Workshop on Artificial Intelligence and Security, pages 3–14, 2017.
18. Nicholas Carlini and DavidWagner. Magnet and" efficient defenses against adversarial attacks“ are not robust to adversarial examples. arXiv preprint
arXiv:1711.08478, 2017.

Q/A
Contact: Dasgupta@memphis.edu
kdgupta87@gmail.com,

Adversarial Attacks and Defense

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Adversarial Attacks and Defense

Ähnlich wie Adversarial Attacks and Defense (20)

Mehr von Kishor Datta Gupta

Mehr von Kishor Datta Gupta (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Adversarial Attacks and Defense