SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
November 4, 2016
DEEPDGA: ADVERSARIALLY-TUNED
DOMAIN GENERATION AND DETECTION
Bobby Filar -> @filar
Hyrum Anderson
Jonathan Woodbridge
AISec2016
Outline
§ Motivation
§ Background
§ DeepDGA Architecture
§ Experiment(s) Setup
§ Results
§ Future Work
2
Motivations
§ Can we Red team vs. Blue team a
known infosec problem (DGAs)
leveraging Generative Adversarial
Networks (GAN)?
§ Offensive: Leverage GANs to
construct a deep-learning DGA
designed to bypass an
independent classifier.
§ Defensive: Can adversarially
generated domains
augment/improve training data to
harden independent classifier???
3
www.endgame.com
www.endgame-2016.com
www.emdgane.com
Related Work
§ Recent work in adversarial examples
• Explaining and harnessing
adversarial examples
(Goodfellow, 2015)
• Adversarial perturbations
against DNN for Malware
Classification (Papernot, 2016)
§ Key differences between other
domains and INFOSEC:
• Other domains – Make my model
robust to occasional blind spot
examples that it might come
across in the wild
• Information Security – Discover
and plug holes in my model that
the adversary is actively trying to
discover and exploit (Red vs. Blue)
4
Fast gradient sign method (Goodfellow)
“What is the cost of changing X’s
label to a different y?”
Background à Domain-Generated Algorithm
§ Employed by malware families to
bypass common C2 defenses
§ DGAs take a seed input and generate
large amounts of pseudo-random
domain names
§ Subset of domains registered
command and control (C2) servers
§ Botnets and malware iterate through
generated domains until it finds one
that is registered, connects and
establishes C2 channel
§ Asymmetric attack since defender
must know all possible domains to
blacklist
5
DNS
C2
212.211.123.01
bjgkre.com212.211.123.01
NXDomain wedcf.com
NXDomain asdfg.com
Domain-Generated Algorithm à Cryptolocker Example
btbpurnkbqidxxclfdfrdqjasjphyrtn.org
sehccrlyfadifehntnomqgpfyunqqfft.org
konsbolyfadifehntnomqgpfyunqqfft.org
cytfiobnkjxomkmhimxhcfvtogyaiqaa.org
6
Domain-Generated Algorithm à Character Distributions
7
§ DGA char dist + ML == robust
defense?
§ Cryptolocker and ramnit are both
nearly uniform over same range
• Expected; Calculations on a
single seed
§ Suppobox concatenates random
words from English dictionary thus
reflects the distribution of Alexa 1M
§ Much more difficult for prior DGA
detection models to correctly
classify
§ Our goal is build a character-based
generator that mimics the Alexa
domain name distribution
AUTOENCODERS
§ Data compression algorithm
§ Models consist of encoder,
decoder, and loss function
• encoder — transforms input to a
low-dimension embedding (lossy
compression)
• decoder — reconstruct original
input from encoder
(decompression)
§ Goal: minimize distortion
between reconstructed output
and original input
§ Easy to train; Don’t need labels
(unsupervised)
GENERATIVE ADVERSARIAL NETWORKS
§ Adversarial game between two
models
• generator— seeks to create
synthetic data based on samples
from the true data distribution (w/
added noise)
• discriminator — receives sample
and must determine if it is a synthetic
(from generator) or true data sample
§ Goal: Find an equilibrium similar to
Nash Equilibrium by pitting models
against one another
§ Harder to train; Unsupervised
• lots of failure modes
8
Background à Frameworks
DeepDGA Architecture
DeepDGA à Autoencoder à Encoder
§ Encoder architecture taken from [Kim et
al, 2015], found useful in character-level
language modeling
§ Embedding learns linear mapping for
each valid domain character (20
dimension space)
§ Convolutions filters applied to capture
character combos (bi/trigrams)
§ Max-pooling over-time & over-filter
• Gather fixed-representation
§ Highway Network à LSTM
10
Learn the right representation of Alexa domains
DeepDGA à Autoencoder à Decoder
§ Decoder is ~ the reverse of encoder
minus maxpool step
§ Domain embedding is repeated over
max length domain length (time-steps)
§ Sequence is passed to LSTM à
Highway Network à Convolutional
Filters
§ Softmax activation on final layer
produces a multinomial distribution
over domain characters
§ Sampled to generate new domain
name modeled after the input domain
name.
11
DeepDGA à GAN
§ Simply rewire autoencoder
framework as the base of our GAN
• Accepts random seed as input
• Outputs domains much like
valid domain name
§ Box Layer — restricts output to live
in axis-aligned box defined by
embedding vectors of training data
• Parameterize manifold coords
of legit domains
• Box layer used in generator to
ensure it only learns domains in
the legit domain (Alexa-like)
manifold
12
DeepDGA à History Regularization
§ Regularize the discriminator model
by training on both recently
generated samples, but also
sampled domains from prior
adversarial rounds
§ Helps discriminator “remember”
any deficiencies in model coverage
AND forces discriminator to learn
novel domain embeddings
§ Reduces likelihood for generator
collapsing (i.e. generating same
domain every batch)
13
DeepDGA à Walkthrough
14
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
Move 1: Red Team
train generator to
randomly create
impostors that trick
the detector
DeepDGA à Walkthrough
15
0.1
…
x
…
…
…
…
…
www.emdgane.com
www.emdgane.com
generatordetector
Move 2: Blue
Team
train detector to
distinguish real
domains from
generator’s
impostors
DeepDGA à Walkthrough
16
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
DeepDGA à Walkthrough
17
www.endgame.comwww.endgame.com
0.1
…
x
…
…
…
…
…
detector generator
www.emdgane.comwww.emdgane.com
(encoder) (decoder)
DeepDGA à Walkthrough
0.1
…
x
…
…
…
…
…
random seed
www.emdgane.comwww.emdgane.com
www.emdgane.com
generatordetector
18
AUTOENCODED DOMAINS
<input domain> à <output domain>
clearspending à clearspending
synodos à synodos
3walq à 3walq
kayak à kayak
sportpitvl à sportpitvl
7resume à 7resume
templateism à templateism
spielefuerdich à spielefueddrch
firebaseapp à firepareapp
gilliananderson à gilliadandelson
tonwebmarketing à torwetmarketing
thetubestore à thebubestore
infusion à infunion
akorpasaji à akorpajaji
hargonis à harnonis
GAN-GENERATED DOMAINS
firiaps.com
qiurdeees.com
gyldles.com
lirneret.com
vietips.com
mivognit.com
shtrunoa.com
gilrr.com
yhujq.com
sirgivrv.com
tisehl.com
thellehm.com
sztneetkop.com
chdareet.com
statpottxy.com
laner.com
spienienitne.com
19
DeepDGA à Generated Domains
Experiment Setup &
Results
Experiment Setup
§ Datasets
• Alexa Top 1M
• DGA Family datasets
• All Open Source
§ Training Time
• DeepDGA (Autoencoder & GAN)
implemented in Keras (python DL library)
• Auto encoder pretrained for 300 epochs
· each epoch 256K domains randomly
sampled
· batch size of 128
· 14 hours on NVIDIA Titan X GPU
• Each adversarial round generated 12.8K
samples against detector
· @ 7 mins on GPU per round
21
Experiment Setup à Offensive
§ Red Team: DeepDGA vs. External Classifier
§ Random Forest model (sklearn – python)
• ensemble classifier more resistant to
adversarial attacks due to low
variance
§ Handcrafted feature extraction
• domain length
• entropy of character distribution
• vowel-to-constant ratio
• n-grams
§ Model trained on Alexa top 10K vs.
DeepDGA
• Results averaged over 10-fold CV
22
Trained explicitly to catch DeepDGA and only DeepDGA
DeepDGA vs. External Classifier
23
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
accuracy (%)
(trained to catch 11 DGA families, equally represented in training set )
DeepDGA à Character Distributions
§ Earlier we compared DGA families and
Alexa 1M character distributions.
• Anomalous distributions were easy to
identify
§ DeepDGA character distributions pre-
adversarial rounds also appear
anomalous.
§ But… post-adversarial rounds begin to
resemble Alexa 1M (still not perfect)
§ Character distribution would confound
previously important features
• Entropy
• Vowel-to-consonant ratio
• n-grams
24
Experiment Setup à Defensive
§ The core of this research was to
determine if adversarial examples
could harden an independent
classifier.
§ Augmented training dataset w/
adversarial domains generated by
GAN.
§ In theory, model can be hardened
against previously unobserved families
(in training set)
§ Employed LOO strategy which entire
DGA family was held out for validation
• Baseline – Model trained on other
9 families + Alexa Top 10k
• Hardened – Repeated process w/
+ DeepDGA (malicious)
25
Binary Classification Before/After Adversarial Hardening
TPR @ a fixed 1% FPR
Summary
§ Contributions
• Present the first known Deep Learning architecture to pseudo-randomly
generate domain names
• Demonstrate that adversarially-crafted domains names targeting a DL model
are also adversarial for an independent external classifier
• At least experimentally, those same adversarial samples can be used to
augment a training set and harden an independent classifier
§ Hard problems
• GANs are hard! à Adversarial game construction
• Carefully watch FP rate
· A dataset overloaded w/ augmented DGAs can increase the FP rate
· Model tries to learn that these “realistic” domain names are possibly
malicious
§ Future Work
• Network – improving domain name generation (DGA) and detection
• Strengthen Malware Classification models
· Malicious WinAPI sequences
· Adversarially-tuned static feature vectors
26
Questions?

Weitere ähnliche Inhalte

Ähnlich wie Deep DGA: Adversarially-Tuned Domain Generation and Detection

Automated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDBAutomated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDB
Omer Gertel
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
Xavier Amatriain
 
findbugs Bernhard Merkle
findbugs Bernhard Merklefindbugs Bernhard Merkle
findbugs Bernhard Merkle
bmerkle
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
Sanghamitra Deb
 

Ähnlich wie Deep DGA: Adversarially-Tuned Domain Generation and Detection (20)

Automated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDBAutomated testing with OffScale and MongoDB
Automated testing with OffScale and MongoDB
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
Dask and Machine Learning Models in Production - PyColorado 2019
Dask and Machine Learning Models in Production - PyColorado 2019Dask and Machine Learning Models in Production - PyColorado 2019
Dask and Machine Learning Models in Production - PyColorado 2019
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
 
Machine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning SystemsMachine Duping 101: Pwning Deep Learning Systems
Machine Duping 101: Pwning Deep Learning Systems
 
Performance By Design
Performance By DesignPerformance By Design
Performance By Design
 
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
MongoDB World 2019: Raiders of the Anti-patterns: A Journey Towards Fixing Sc...
 
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learningDEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
DEF CON 27 - workshop - YACIN NADJI - hands on adverserial machine learning
 
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at NetflixMLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
MLConf - Emmys, Oscars & Machine Learning Algorithms at Netflix
 
Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013Xavier amatriain, dir algorithms netflix m lconf 2013
Xavier amatriain, dir algorithms netflix m lconf 2013
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
findbugs Bernhard Merkle
findbugs Bernhard Merklefindbugs Bernhard Merkle
findbugs Bernhard Merkle
 
Computer Vision for Beginners
Computer Vision for BeginnersComputer Vision for Beginners
Computer Vision for Beginners
 
Spark
SparkSpark
Spark
 
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
C* Summit 2013: Real-Time Big Data with Storm, Cassandra, and In-Memory Compu...
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
Vulnerabilities of machine learning infrastructure
Vulnerabilities of machine learning infrastructureVulnerabilities of machine learning infrastructure
Vulnerabilities of machine learning infrastructure
 
Regular Expression Denial of Service RegexDoS
Regular Expression Denial of  Service RegexDoSRegular Expression Denial of  Service RegexDoS
Regular Expression Denial of Service RegexDoS
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Deep DGA: Adversarially-Tuned Domain Generation and Detection

  • 1. November 4, 2016 DEEPDGA: ADVERSARIALLY-TUNED DOMAIN GENERATION AND DETECTION Bobby Filar -> @filar Hyrum Anderson Jonathan Woodbridge AISec2016
  • 2. Outline § Motivation § Background § DeepDGA Architecture § Experiment(s) Setup § Results § Future Work 2
  • 3. Motivations § Can we Red team vs. Blue team a known infosec problem (DGAs) leveraging Generative Adversarial Networks (GAN)? § Offensive: Leverage GANs to construct a deep-learning DGA designed to bypass an independent classifier. § Defensive: Can adversarially generated domains augment/improve training data to harden independent classifier??? 3 www.endgame.com www.endgame-2016.com www.emdgane.com
  • 4. Related Work § Recent work in adversarial examples • Explaining and harnessing adversarial examples (Goodfellow, 2015) • Adversarial perturbations against DNN for Malware Classification (Papernot, 2016) § Key differences between other domains and INFOSEC: • Other domains – Make my model robust to occasional blind spot examples that it might come across in the wild • Information Security – Discover and plug holes in my model that the adversary is actively trying to discover and exploit (Red vs. Blue) 4 Fast gradient sign method (Goodfellow) “What is the cost of changing X’s label to a different y?”
  • 5. Background à Domain-Generated Algorithm § Employed by malware families to bypass common C2 defenses § DGAs take a seed input and generate large amounts of pseudo-random domain names § Subset of domains registered command and control (C2) servers § Botnets and malware iterate through generated domains until it finds one that is registered, connects and establishes C2 channel § Asymmetric attack since defender must know all possible domains to blacklist 5 DNS C2 212.211.123.01 bjgkre.com212.211.123.01 NXDomain wedcf.com NXDomain asdfg.com
  • 6. Domain-Generated Algorithm à Cryptolocker Example btbpurnkbqidxxclfdfrdqjasjphyrtn.org sehccrlyfadifehntnomqgpfyunqqfft.org konsbolyfadifehntnomqgpfyunqqfft.org cytfiobnkjxomkmhimxhcfvtogyaiqaa.org 6
  • 7. Domain-Generated Algorithm à Character Distributions 7 § DGA char dist + ML == robust defense? § Cryptolocker and ramnit are both nearly uniform over same range • Expected; Calculations on a single seed § Suppobox concatenates random words from English dictionary thus reflects the distribution of Alexa 1M § Much more difficult for prior DGA detection models to correctly classify § Our goal is build a character-based generator that mimics the Alexa domain name distribution
  • 8. AUTOENCODERS § Data compression algorithm § Models consist of encoder, decoder, and loss function • encoder — transforms input to a low-dimension embedding (lossy compression) • decoder — reconstruct original input from encoder (decompression) § Goal: minimize distortion between reconstructed output and original input § Easy to train; Don’t need labels (unsupervised) GENERATIVE ADVERSARIAL NETWORKS § Adversarial game between two models • generator— seeks to create synthetic data based on samples from the true data distribution (w/ added noise) • discriminator — receives sample and must determine if it is a synthetic (from generator) or true data sample § Goal: Find an equilibrium similar to Nash Equilibrium by pitting models against one another § Harder to train; Unsupervised • lots of failure modes 8 Background à Frameworks
  • 10. DeepDGA à Autoencoder à Encoder § Encoder architecture taken from [Kim et al, 2015], found useful in character-level language modeling § Embedding learns linear mapping for each valid domain character (20 dimension space) § Convolutions filters applied to capture character combos (bi/trigrams) § Max-pooling over-time & over-filter • Gather fixed-representation § Highway Network à LSTM 10 Learn the right representation of Alexa domains
  • 11. DeepDGA à Autoencoder à Decoder § Decoder is ~ the reverse of encoder minus maxpool step § Domain embedding is repeated over max length domain length (time-steps) § Sequence is passed to LSTM à Highway Network à Convolutional Filters § Softmax activation on final layer produces a multinomial distribution over domain characters § Sampled to generate new domain name modeled after the input domain name. 11
  • 12. DeepDGA à GAN § Simply rewire autoencoder framework as the base of our GAN • Accepts random seed as input • Outputs domains much like valid domain name § Box Layer — restricts output to live in axis-aligned box defined by embedding vectors of training data • Parameterize manifold coords of legit domains • Box layer used in generator to ensure it only learns domains in the legit domain (Alexa-like) manifold 12
  • 13. DeepDGA à History Regularization § Regularize the discriminator model by training on both recently generated samples, but also sampled domains from prior adversarial rounds § Helps discriminator “remember” any deficiencies in model coverage AND forces discriminator to learn novel domain embeddings § Reduces likelihood for generator collapsing (i.e. generating same domain every batch) 13
  • 14. DeepDGA à Walkthrough 14 0.1 … x … … … … … random seed www.emdgane.comwww.emdgane.com www.emdgane.com generatordetector Move 1: Red Team train generator to randomly create impostors that trick the detector
  • 15. DeepDGA à Walkthrough 15 0.1 … x … … … … … www.emdgane.com www.emdgane.com generatordetector Move 2: Blue Team train detector to distinguish real domains from generator’s impostors
  • 16. DeepDGA à Walkthrough 16 0.1 … x … … … … … random seed www.emdgane.comwww.emdgane.com www.emdgane.com generatordetector
  • 17. DeepDGA à Walkthrough 17 www.endgame.comwww.endgame.com 0.1 … x … … … … … detector generator www.emdgane.comwww.emdgane.com (encoder) (decoder)
  • 18. DeepDGA à Walkthrough 0.1 … x … … … … … random seed www.emdgane.comwww.emdgane.com www.emdgane.com generatordetector 18
  • 19. AUTOENCODED DOMAINS <input domain> à <output domain> clearspending à clearspending synodos à synodos 3walq à 3walq kayak à kayak sportpitvl à sportpitvl 7resume à 7resume templateism à templateism spielefuerdich à spielefueddrch firebaseapp à firepareapp gilliananderson à gilliadandelson tonwebmarketing à torwetmarketing thetubestore à thebubestore infusion à infunion akorpasaji à akorpajaji hargonis à harnonis GAN-GENERATED DOMAINS firiaps.com qiurdeees.com gyldles.com lirneret.com vietips.com mivognit.com shtrunoa.com gilrr.com yhujq.com sirgivrv.com tisehl.com thellehm.com sztneetkop.com chdareet.com statpottxy.com laner.com spienienitne.com 19 DeepDGA à Generated Domains
  • 21. Experiment Setup § Datasets • Alexa Top 1M • DGA Family datasets • All Open Source § Training Time • DeepDGA (Autoencoder & GAN) implemented in Keras (python DL library) • Auto encoder pretrained for 300 epochs · each epoch 256K domains randomly sampled · batch size of 128 · 14 hours on NVIDIA Titan X GPU • Each adversarial round generated 12.8K samples against detector · @ 7 mins on GPU per round 21
  • 22. Experiment Setup à Offensive § Red Team: DeepDGA vs. External Classifier § Random Forest model (sklearn – python) • ensemble classifier more resistant to adversarial attacks due to low variance § Handcrafted feature extraction • domain length • entropy of character distribution • vowel-to-constant ratio • n-grams § Model trained on Alexa top 10K vs. DeepDGA • Results averaged over 10-fold CV 22 Trained explicitly to catch DeepDGA and only DeepDGA
  • 23. DeepDGA vs. External Classifier 23 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% accuracy (%) (trained to catch 11 DGA families, equally represented in training set )
  • 24. DeepDGA à Character Distributions § Earlier we compared DGA families and Alexa 1M character distributions. • Anomalous distributions were easy to identify § DeepDGA character distributions pre- adversarial rounds also appear anomalous. § But… post-adversarial rounds begin to resemble Alexa 1M (still not perfect) § Character distribution would confound previously important features • Entropy • Vowel-to-consonant ratio • n-grams 24
  • 25. Experiment Setup à Defensive § The core of this research was to determine if adversarial examples could harden an independent classifier. § Augmented training dataset w/ adversarial domains generated by GAN. § In theory, model can be hardened against previously unobserved families (in training set) § Employed LOO strategy which entire DGA family was held out for validation • Baseline – Model trained on other 9 families + Alexa Top 10k • Hardened – Repeated process w/ + DeepDGA (malicious) 25 Binary Classification Before/After Adversarial Hardening TPR @ a fixed 1% FPR
  • 26. Summary § Contributions • Present the first known Deep Learning architecture to pseudo-randomly generate domain names • Demonstrate that adversarially-crafted domains names targeting a DL model are also adversarial for an independent external classifier • At least experimentally, those same adversarial samples can be used to augment a training set and harden an independent classifier § Hard problems • GANs are hard! à Adversarial game construction • Carefully watch FP rate · A dataset overloaded w/ augmented DGAs can increase the FP rate · Model tries to learn that these “realistic” domain names are possibly malicious § Future Work • Network – improving domain name generation (DGA) and detection • Strengthen Malware Classification models · Malicious WinAPI sequences · Adversarially-tuned static feature vectors 26