1. Avast @ Machine Learning
Prague - HANDOUT
S in IoT stands for Security
2. Our team: IoT and ML Research department
2
Galina Adam Tomáš Marek
Martin Vláďa Martin
3. Avast in numbers
Around 2000 employees
435 million users
Users in over 150 countries
Protecting from 3.5 billion attacks per month
Blocked 128 million ransomware attacks in 2016
Our engines check 200 billion URLs and 300 million new files monthly
3
5. • Number of IoT devices is on the rise, expected to
have 75 billion of connected things by 2025
• IP Cameras
• Network attached storages
• Thermostats
• Smart speakers
• Digital personal assistants
• … you name it, the IoT world will have it
Future: The IoT world
5
6. • IoT products:
• convenience
• usability
• not necessarily to be easily secured
• They can be compromised in many ways
• Spy on users
• Blackmail users
• Gain physical access to the home
• Misuse of devices for
• Attacking third-party services
• Misuse of computational power
Future: Securing the IoT world
6
7. • Avast is developing a AI-based protection for IoT
• The key to protecting IoT devices:
Cloud-based security that monitors for
threats at the network level
Avast Smart Life - a new product coming
7
8. 09:00 - 09:30 Yin and Yang of IoT
09:30 - 10:00 A Case study: Mirai attack vector
10:00 - 10:30 Machine learning algorithms and feature engineering
10:30 - 11:00 Coffee Break
11:00 - 11:30 Neural networks for classification of binary files
11:30 - 12:00 Identifying devices within the network
12:00 - 12:30 Phishing prevention and blocking of malicious URLs
Workshop structure
8
10. Botnet
Set of enslaved devices (usually IoT) that can be
controlled by a cybercriminal
OR
Malware that enslaves the devices
10
11. But why?
Can be used to
• Gain computational power
• Perform distributed denial-of-service attack (DDoS attack)
• Send spam
• Mine Cryptocurrencies
• Steal data
DDoS Business model
• Harm&Destroy your competitors and opponents
• Business competitors
• Political opponents, Independent journalism
• Sell DDoS as a service
• Blackmail companies (Money or DDoS!)
11
12. Botnet components
• Zombie computer
• A compromised node
infected by the botnet
malware
• Command and Control (CnC)
server
• Server that remotely controls
the zombie computers
• Botmaster
• A person who operates the
CnC server
• Hides their identity (via Tor,
proxies, …)
12
14. A case study: Mirai
• Attacks vulnerable IoT devices with factory-default credentials
• IP cameras
• Network storages
• Client - Server architecture
• Two main component are the Mirai itself and a C&C server
• Both available on Github
• Spreads like a worm over the internet: each infected node scans the whole IPv4
• Does not attack DoD of USA
• Not persistent - device restarts and mirai disappears (and comes soon again -
C&C has memory)
14
15. A case study: Mirai
• October 2016: One of the most impactful cyber attacks ever against online
infrastructure firm Dyn impacting Twitter, Spotify, Reddit, Airbnb, Netflix,...
• Also hit and disabled krebsonsecurity.com just hours after Krebs presented a talk on
Mirai at a conference
• Solution: Google’s Project Shield protection.
• Attacks with power of 600 - 1500 Gb/s
• 150 000 enslaved devices make for 1Tb of DDoS capability
15
16. Mirai: Vulnerable device setup
• Telnet running (big mistake, but still very popular)
• Forwards its port 23 to the router (i.e. port 23 is visible from the outside of the network)
16
17. Mirai: Dictionary Attack
• Knock knock, will you let me in?
• Mirai has a predefined dictionary of factory-setting credentials, tries them randomly
• Sends the commands via telnet (plaintext)
•
17
18. Mirai: Dictionary Attack
• Knock knock, will you let me in?
• Mirai has a predefined dictionary of factory-setting credentials, tries them randomly
• Sends the commands via telnet (plaintext)
18
19. Mirai: Password guessed
It notifies the C&C server which sends a telnet command to download the mirai binary (via
wget or tftp)
19
20. Mirai: Mass Scan
• Scans the internet and tries to infect vulnerable devices in other networks
• Mirai is quite a stupid parasite: even sometimes kills its host
20
21. Mirai: DDoS
When the command comes, this is the result:
21
A map of internet outages in Europe and North America caused by the Dyn
cyberattack (as of 21 October 2016 1:45pm Pacific Time).
23. A case study: Mirai attack vector
Summary of Mirai attack:
• Uses vulnerable telnet
• Has a list of factory settings (CHANGE YOUR PASSWORD)
• Scans the whole range of the internet
• Is actually very simple, nonetheless, it caused a lot of troubles in 2016
IoT Malware is very simple compared to PC malware, it will be evolving
and gain complexity
23
27. Our data domain
• Stream of packets
• Big amount of data
• A lot of protocols (tcp, udp, ip, icmp, upnp, telnet, ftp, http, https…)
• Deep vs. stateful packet inspection
27
31. 31
• Detect devices which are present in network
• Find communication patterns
• Detect malicious behavior
• Make all above robust
Challenges
32. Data from ML point of view
• Features
• Are available at any time? (e.g. total amount of data)
• How cheap is to calculate them? (e.g. packet interarrival time - mean, variance based)
• Is the information relevant? (e.g. port)
• Time series... but begin and end are possibly more important
• Single packet/flow is never important (or is it?)
• Traffic is not deterministic (e.g. connection issues, port or IP changes a lot)
• All mixed together
• A lot of legitimate scenarios (e.g. torrent vs. mass scan of the network)
• Are we always sure what we are modeling?
32
37. Why this task is important
● “Security in IoT” expects ability to distinguish IoT devices from other devices =>
device identification
● Device type and device model are important features in malware detection
37
38. Existing approach
● Expert-created rules based on device features and regexps
● Advantages:
○ Utilize expert knowledge and many different features
○ Accurate upon exact match
● Disadvantages:
○ Missclassifications for too broad rules
○ Conflicting rules
○ Unknown accuracy
38
39. How we want to improve it
● More accurate model
● Can solve conflicts between “rules” automatically
● Ability to generalise
● Ability to tune every source of properties and measure the accuracy
● Level of confidence (probability) together with an answer
39
40. Device identification as classification
● Classification task where classes = different device types (~20 classes like phone,
security camera, printer, computer, bulb, fridge)
● More detailed task: classes = different models of the device class (thousands of
classes)
● Features:
○ Scan features (MAC-address, open ports, text body of specific responses)
○ Behavioral features (patterns in traffic consumption)
40
42. Challenges
● Unbalanced dataset (could be changed in future)
● It’s hard to obtain ground truth labels (expert knowledge)
● Different categories of features (numerical, categorical, text)
● Missing values - some devices don’t have specific features at all, only a subset
○ Empty ports
○ Empty response strings
○ Randomized MAC-address
42
44. 44
One fixed datasets with all features: ports, mac, DHCP response, etc
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier #N. . . . .
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
44
45. Advantages of ensembling
● More accurate than individual classifiers
● Individual classifier is responsible for specific features
● You can tune individual classifier and see the change on accuracy
● Explainable
● Able to backtrack
45
46. 46
One fixed datasets with all features: ports, mac, DHCP response, etc
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier #N. . . . .
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
46
47. Silver classifier on MAC and open ports without labels
● If you have non-labeled dataset, how to quickly get labelled one from it ?
47
X ? X y
48. Semi-supervised iterative learning
1. Cluster one-hot encoded ports and vendor
features
2. Then expert labels some of the clusters
3. Collect labelled devices and call them silver
labels
4. Train classifier on silver labels
5. Run classifier on the whole dataset
6. Repeat step 1 everything for unclassified
examples (unclassified = low probability)
48
Cluster
Manually
label some
clusters
Train classifier
on labelled
Run classifier
on all data
Take
unclassified
51. Results of Silver classifier
51
accuracy on test set able to classify agreement with
rule-based labels
contains initial
silver labels
99% 77% 55% 30%
52. 52
One fixed datasets with all features: ports, mac, DHCP response, etc
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier #N. . . . .
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
52
53. Components based on protocol responses
● DHCP - info about operating system
● zeroconf and UPnP- info about available services
● HTTP - info about user agent, server, location, admin interface
● General scheme:
○ Prepare xml/html/text documents
○ Text extraction based on heuristics specific for every protocol
○ Estimate probabilities of class given a word P(security camera | “cgi” ) = 0.81
○ Cluster words that often occur in the same documents
○ For a new device find the closest cluster
53
59. 59
Time-series based learning
● Number of events in time
● Any continuous or discrete value
○ E.g. number of flows or number of unique
destinations
● Any time interval
● Challenges
○ Robust to different behaviors
○ Robust to legitimate anomalies
○ Robust to turned on/off
60. Dynamic Time Warping
● Algorithm for measuring similarity of two time-series
● Calculates distance between the series
● Output ∈ [0, +∞[
○ More similar when output closer to zero
● O(N2
)
● Time-series can have different lengths
0
1
1 0
1
62. One component: Time based classifier
● Time-series for each device
● Similarities of behaviors
● Classification of devices
62
63. Time for classifier’s decision
● Each component needs some time to produce a
decision about a device
● The time is different for each classifier
● Precision vs. time
● Silver classifier
○ seconds - minutes
● Protocol responses classifiers
○ minutes - hours
● Time-series classifier
○ minutes - hours
63
Silver classifier
HTTP response
DHCP response
Time-series analysis
68. Catching the phishers
- Challenge
- decide as early as possible
- we’re able to crawl and screenshot only for a fraction
- Solution
- discard most of the URLs with reasonable confidence
- whitelist + whitelist exceptions
- string matches on fishy keywords (approximate)
- prefer likely distribution channels (email, facebook)
68
69. Catching the phishers
- Whitelist
- We use 1 milion of most popular top private domains over several
months
- Whitelist exceptions
- but some popular sites are evil
- additionally we collected popular TPDs which can easily host user
content, and we ‘subtract’ them from the list
- e. g. google.com but not sites.google.com
69
70. Catching the phishers
Approximate matching
- tokens from target sites
- adobe, alibaba, aliexpress, coinsbank,
credit-suisse
Exact match (strings shorter than 5)
- often misused constructs (paypal.com-blabla.gd)
- .com-, .org., cgi-bin
70
74. Detecting the points
• Interesting points: points, edges, blobs
• Many methods how to detect the
interesting points in the picture - Corner
detectors, Edge detectors
• Examples of the points
74
75. Describing the points
• Abstract (mathematical)
representation of the detected
points
• Extract the patches
• FAST, SIFT, ORB, ...
75
77. Matching and verification
- Find the matrix transforming the points from one picture to another
- The hypothesis for transformation is done by 4 points, we sample them
and determine the transformation with RANdom SAmple Consensus
77
Credits: Scikit-learn
78. Matching and verification
- Evaluate how reasonable the transformation is
- valid perspective transformation
- thresholds on scale, rotation, sheer, translation
78
1 2
3
4
1
3
4
2
4
3
80. - Until now the only problem is (small) text blocks
- Text generates a lot of keypoints (high local contrast)
- A text block matches any other text block (the descriptors are all similar)
- Possible solutions
- ignore images with many key points
- detect text through high density in single SIFT octave
- use external text detector to mask areas
Problems
80
81. Conclusion & Future work
- Persuade the phishers that we are accessing their sites from different
locations
- More precise but still fast text detection mechanism
- Make it even faster ;)
81
82. Marek Krčál, Avast fellow at Institute of Computer Science
Joint work with Ondřej Švec, Martin Bálek, and Otakar Jašek
Deep Convolutional Malware Classifiers Can Learn
from Raw Executables and Labels Only
83. Marek Krčál, Avast fellow at Institute of Computer Science
Joint work with Ondřej Švec, Martin Bálek, and Otakar Jašek
Deep Convolutional Malware Classifiers Can Learn
from Raw Executables and Labels Only
• Can the success of convnets (incl. end-to-end learning)
be transferred to malware detection?
84. Marek Krčál, Avast fellow at Institute of Computer Science
Joint work with Ondřej Švec, Martin Bálek, and Otakar Jašek
Deep Convolutional Malware Classifiers Can Learn
from Raw Executables and Labels Only
• Can the success of convnets (incl. end-to-end learning)
be transferred to malware detection?
• Applied on Windows executables,
but is rather domain-agnostic: Other file formats and
blocking their content in network traffic
93. Detection of Malicious Portable Executables
• Input is 1D – sequence of bytes (static malware analysis)
94. Detection of Malicious Portable Executables
• Input is 1D – sequence of bytes (static malware analysis)
– Consists of header, sections, relocation tables, strings
–no global semantics of byte symbols
– Each can appear at almost arbitrary palce – high
translational variance
95. Detection of Malicious Portable Executables
• Input is 1D – sequence of bytes (static malware analysis)
• We use no domain expertise aside from labels
– Consists of header, sections, relocation tables, strings
–no global semantics of byte symbols
– Each can appear at almost arbitrary palce – high
translational variance
96. Detection of Malicious Portable Executables
• Input is 1D – sequence of bytes (static malware analysis)
• We use no domain expertise aside from labels
• Only two classes clean and malware (for simplicity)
– Consists of header, sections, relocation tables, strings
–no global semantics of byte symbols
– Each can appear at almost arbitrary palce – high
translational variance
97. Dataset – 20 million of executables
• Less files with compressed/encrypted machine code
• Between 12kB and 1/2 MB
98. Dataset – 20 million of executables
• Less files with compressed/encrypted machine code
• Between 12kB and 1/2 MB
• Temporal split:
1.1.2016 1.1.2017 week 8 week 16
training validation test
99. Dataset – 20 million of executables
• Less files with compressed/encrypted machine code
• Between 12kB and 1/2 MB
• Temporal split:
1.1.2016 1.1.2017 week 8 week 16
training validation test
100. Dataset – 20 million of executables
• Less files with compressed/encrypted machine code
• Between 12kB and 1/2 MB
• Temporal split:
1.1.2016 1.1.2017 week 8 week 16
training validation test
• roughly balanced clean and malware classes
102. Architecture
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
103. Architecture
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
104. Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
105. Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
Power-of-two strides: improve
speed and performance
106. Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
Power-of-two strides: improve
speed and performance
strides 3,5,7,9 instead of 4,4,8,8
harm performance by 6-10%
107. Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
Power-of-two strides: improve
speed and performance
strides 3,5,7,9 instead of 4,4,8,8
harm performance by 6-10%
projects variably-wide matrix
to a fixed-sized vector
108. Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
192
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Global Average
192
192
160
128
2
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Executable as sequence of N bytes
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
Power-of-two strides: improve
speed and performance
strides 3,5,7,9 instead of 4,4,8,8
harm performance by 6-10%
projects variably-wide matrix
to a fixed-sized vector
109. Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
192
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Global Average
192
192
160
128
2
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Executable as sequence of N bytes
96×N/64
Training:7-fold influence of
clean on loss, mild weight decay,
Adam,...
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits
Power-of-two strides: improve
speed and performance
strides 3,5,7,9 instead of 4,4,8,8
harm performance by 6-10%
projects variably-wide matrix
to a fixed-sized vector
110. Evaluation
• Evaluation in the regime of low false positives (formally
area under the receiver operator curve restricted to
[0, 0.001] – AUC|<0.001)
111. Evaluation
• Evaluation in the regime of low false positives (formally
area under the receiver operator curve restricted to
[0, 0.001] – AUC|<0.001)
False Positives Rate
True Positives Rate
112. Evaluation
• Evaluation in the regime of low false positives (formally
area under the receiver operator curve restricted to
[0, 0.001] – AUC|<0.001)
False Positives Rate
True Positives Rate
• Evaluation score matters: Gobal Average (instead of Max),
strong emphasis of clean files,...
117. Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)
feed to a 5-layer feedforward net (same set of samples):
118. Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)
feed to a 5-layer feedforward net (same set of samples):
AUC|<0.001
convolution features 70.4 ± 0.5%
hand-crafted features 73.2 ± 2.3% (similar in accur. and x-entropy)
119. Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)
feed to a 5-layer feedforward net (same set of samples):
AUC|<0.001
convolution features 70.4 ± 0.5%
hand-crafted features 73.2 ± 2.3% (similar in accur. and x-entropy)
ensambled features 76.1 ± 1.0% (much better accur and x-entr.)
120. Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)
feed to a 5-layer feedforward net (same set of samples):
AUC|<0.001
convolution features 70.4 ± 0.5%
hand-crafted features 73.2 ± 2.3% (similar in accur. and x-entropy)
ensambled features 76.1 ± 1.0% (much better accur and x-entr.)
Convnets slightly below Avast’s know-how, but already good
at feature enrichment
121. Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)
feed to a 5-layer feedforward net (same set of samples):
AUC|<0.001
convolution features 70.4 ± 0.5%
hand-crafted features 73.2 ± 2.3% (similar in accur. and x-entropy)
ensambled features 76.1 ± 1.0% (much better accur and x-entr.)
Convnets slightly below Avast’s know-how, but already good
at feature enrichment
– Dataset easier for convnets
122. Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)
feed to a 5-layer feedforward net (same set of samples):
AUC|<0.001
convolution features 70.4 ± 0.5%
hand-crafted features 73.2 ± 2.3% (similar in accur. and x-entropy)
ensambled features 76.1 ± 1.0% (much better accur and x-entr.)
Convnets slightly below Avast’s know-how, but already good
at feature enrichment
– Dataset easier for convnets
+ Improvement potential, transferable to other domains
130. • unusual imported functions
Byte-level explanations:
Guided Backprop
• header of an embedded PE
• “VERSION_INFO” with a fake vendor and software name
132. Future work
• Improve speed (separable convolutions, mixture of experts)
• More diverse and larger dataset
133. Future work
• Improve speed (separable convolutions, mixture of experts)
• More diverse and larger dataset
• Apply to (other file types relevant for) network traffic
134. Future work
• Improve speed (separable convolutions, mixture of experts)
• More diverse and larger dataset
• Apply to (other file types relevant for) network traffic
Questions?