Endgame data scientists present the first known Deep Learning architecture to pseudo-randomly generated domain names. They demonstrate that adversarially-crafted domains names targeting a DL model are also adversarial for an independent external classifier
3. Motivations
§ Can we Red team vs. Blue team a
known infosec problem (DGAs)
leveraging Generative Adversarial
Networks (GAN)?
§ Offensive: Leverage GANs to
construct a deep-learning DGA
designed to bypass an
independent classifier.
§ Defensive: Can adversarially
generated domains
augment/improve training data to
harden independent classifier???
3
www.endgame.com
www.endgame-2016.com
www.emdgane.com
4. Related Work
§ Recent work in adversarial examples
• Explaining and harnessing
adversarial examples
(Goodfellow, 2015)
• Adversarial perturbations
against DNN for Malware
Classification (Papernot, 2016)
§ Key differences between other
domains and INFOSEC:
• Other domains – Make my model
robust to occasional blind spot
examples that it might come
across in the wild
• Information Security – Discover
and plug holes in my model that
the adversary is actively trying to
discover and exploit (Red vs. Blue)
4
Fast gradient sign method (Goodfellow)
“What is the cost of changing X’s
label to a different y?”
5. Background à Domain-Generated Algorithm
§ Employed by malware families to
bypass common C2 defenses
§ DGAs take a seed input and generate
large amounts of pseudo-random
domain names
§ Subset of domains registered
command and control (C2) servers
§ Botnets and malware iterate through
generated domains until it finds one
that is registered, connects and
establishes C2 channel
§ Asymmetric attack since defender
must know all possible domains to
blacklist
5
DNS
C2
212.211.123.01
bjgkre.com212.211.123.01
NXDomain wedcf.com
NXDomain asdfg.com
6. Domain-Generated Algorithm à Cryptolocker Example
btbpurnkbqidxxclfdfrdqjasjphyrtn.org
sehccrlyfadifehntnomqgpfyunqqfft.org
konsbolyfadifehntnomqgpfyunqqfft.org
cytfiobnkjxomkmhimxhcfvtogyaiqaa.org
6
7. Domain-Generated Algorithm à Character Distributions
7
§ DGA char dist + ML == robust
defense?
§ Cryptolocker and ramnit are both
nearly uniform over same range
• Expected; Calculations on a
single seed
§ Suppobox concatenates random
words from English dictionary thus
reflects the distribution of Alexa 1M
§ Much more difficult for prior DGA
detection models to correctly
classify
§ Our goal is build a character-based
generator that mimics the Alexa
domain name distribution
8. AUTOENCODERS
§ Data compression algorithm
§ Models consist of encoder,
decoder, and loss function
• encoder — transforms input to a
low-dimension embedding (lossy
compression)
• decoder — reconstruct original
input from encoder
(decompression)
§ Goal: minimize distortion
between reconstructed output
and original input
§ Easy to train; Don’t need labels
(unsupervised)
GENERATIVE ADVERSARIAL NETWORKS
§ Adversarial game between two
models
• generator— seeks to create
synthetic data based on samples
from the true data distribution (w/
added noise)
• discriminator — receives sample
and must determine if it is a synthetic
(from generator) or true data sample
§ Goal: Find an equilibrium similar to
Nash Equilibrium by pitting models
against one another
§ Harder to train; Unsupervised
• lots of failure modes
8
Background à Frameworks
10. DeepDGA à Autoencoder à Encoder
§ Encoder architecture taken from [Kim et
al, 2015], found useful in character-level
language modeling
§ Embedding learns linear mapping for
each valid domain character (20
dimension space)
§ Convolutions filters applied to capture
character combos (bi/trigrams)
§ Max-pooling over-time & over-filter
• Gather fixed-representation
§ Highway Network à LSTM
10
Learn the right representation of Alexa domains
11. DeepDGA à Autoencoder à Decoder
§ Decoder is ~ the reverse of encoder
minus maxpool step
§ Domain embedding is repeated over
max length domain length (time-steps)
§ Sequence is passed to LSTM à
Highway Network à Convolutional
Filters
§ Softmax activation on final layer
produces a multinomial distribution
over domain characters
§ Sampled to generate new domain
name modeled after the input domain
name.
11
12. DeepDGA à GAN
§ Simply rewire autoencoder
framework as the base of our GAN
• Accepts random seed as input
• Outputs domains much like
valid domain name
§ Box Layer — restricts output to live
in axis-aligned box defined by
embedding vectors of training data
• Parameterize manifold coords
of legit domains
• Box layer used in generator to
ensure it only learns domains in
the legit domain (Alexa-like)
manifold
12
13. DeepDGA à History Regularization
§ Regularize the discriminator model
by training on both recently
generated samples, but also
sampled domains from prior
adversarial rounds
§ Helps discriminator “remember”
any deficiencies in model coverage
AND forces discriminator to learn
novel domain embeddings
§ Reduces likelihood for generator
collapsing (i.e. generating same
domain every batch)
13
14. DeepDGA à Walkthrough
14
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
Move 1: Red Team
train generator to
randomly create
impostors that trick
the detector
19. AUTOENCODED DOMAINS
<input domain> à <output domain>
clearspending à clearspending
synodos à synodos
3walq à 3walq
kayak à kayak
sportpitvl à sportpitvl
7resume à 7resume
templateism à templateism
spielefuerdich à spielefueddrch
firebaseapp à firepareapp
gilliananderson à gilliadandelson
tonwebmarketing à torwetmarketing
thetubestore à thebubestore
infusion à infunion
akorpasaji à akorpajaji
hargonis à harnonis
GAN-GENERATED DOMAINS
firiaps.com
qiurdeees.com
gyldles.com
lirneret.com
vietips.com
mivognit.com
shtrunoa.com
gilrr.com
yhujq.com
sirgivrv.com
tisehl.com
thellehm.com
sztneetkop.com
chdareet.com
statpottxy.com
laner.com
spienienitne.com
19
DeepDGA à Generated Domains
21. Experiment Setup
§ Datasets
• Alexa Top 1M
• DGA Family datasets
• All Open Source
§ Training Time
• DeepDGA (Autoencoder & GAN)
implemented in Keras (python DL library)
• Auto encoder pretrained for 300 epochs
· each epoch 256K domains randomly
sampled
· batch size of 128
· 14 hours on NVIDIA Titan X GPU
• Each adversarial round generated 12.8K
samples against detector
· @ 7 mins on GPU per round
21
22. Experiment Setup à Offensive
§ Red Team: DeepDGA vs. External Classifier
§ Random Forest model (sklearn – python)
• ensemble classifier more resistant to
adversarial attacks due to low
variance
§ Handcrafted feature extraction
• domain length
• entropy of character distribution
• vowel-to-constant ratio
• n-grams
§ Model trained on Alexa top 10K vs.
DeepDGA
• Results averaged over 10-fold CV
22
Trained explicitly to catch DeepDGA and only DeepDGA
23. DeepDGA vs. External Classifier
23
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
accuracy (%)
(trained to catch 11 DGA families, equally represented in training set )
24. DeepDGA à Character Distributions
§ Earlier we compared DGA families and
Alexa 1M character distributions.
• Anomalous distributions were easy to
identify
§ DeepDGA character distributions pre-
adversarial rounds also appear
anomalous.
§ But… post-adversarial rounds begin to
resemble Alexa 1M (still not perfect)
§ Character distribution would confound
previously important features
• Entropy
• Vowel-to-consonant ratio
• n-grams
24
25. Experiment Setup à Defensive
§ The core of this research was to
determine if adversarial examples
could harden an independent
classifier.
§ Augmented training dataset w/
adversarial domains generated by
GAN.
§ In theory, model can be hardened
against previously unobserved families
(in training set)
§ Employed LOO strategy which entire
DGA family was held out for validation
• Baseline – Model trained on other
9 families + Alexa Top 10k
• Hardened – Repeated process w/
+ DeepDGA (malicious)
25
Binary Classification Before/After Adversarial Hardening
TPR @ a fixed 1% FPR
26. Summary
§ Contributions
• Present the first known Deep Learning architecture to pseudo-randomly
generate domain names
• Demonstrate that adversarially-crafted domains names targeting a DL model
are also adversarial for an independent external classifier
• At least experimentally, those same adversarial samples can be used to
augment a training set and harden an independent classifier
§ Hard problems
• GANs are hard! à Adversarial game construction
• Carefully watch FP rate
· A dataset overloaded w/ augmented DGAs can increase the FP rate
· Model tries to learn that these “realistic” domain names are possibly
malicious
§ Future Work
• Network – improving domain name generation (DGA) and detection
• Strengthen Malware Classification models
· Malicious WinAPI sequences
· Adversarially-tuned static feature vectors
26