Avast @ Machine Learning

Avast @ Machine Learning
Prague - HANDOUT
S in IoT stands for Security

Our team: IoT and ML Research department
2
Galina Adam Tomáš Marek
Martin Vláďa Martin

Avast in numbers
Around 2000 employees
435 million users
Users in over 150 countries
Protecting from 3.5 billion attacks per month
Blocked 128 million ransomware attacks in 2016
Our engines check 200 billion URLs and 300 million new files monthly
3

Present: Traditional Avast Business
4

• Number of IoT devices is on the rise, expected to
have 75 billion of connected things by 2025
• IP Cameras
• Network attached storages
• Thermostats
• Smart speakers
• Digital personal assistants
• … you name it, the IoT world will have it
Future: The IoT world
5

• IoT products:
• convenience
• usability
• not necessarily to be easily secured
• They can be compromised in many ways
• Spy on users
• Blackmail users
• Gain physical access to the home
• Misuse of devices for
• Attacking third-party services
• Misuse of computational power
Future: Securing the IoT world
6

• Avast is developing a AI-based protection for IoT
• The key to protecting IoT devices:
Cloud-based security that monitors for
threats at the network level
Avast Smart Life - a new product coming
7

09:00 - 09:30 Yin and Yang of IoT
09:30 - 10:00 A Case study: Mirai attack vector
10:00 - 10:30 Machine learning algorithms and feature engineering
10:30 - 11:00 Coffee Break
11:00 - 11:30 Neural networks for classification of binary files
11:30 - 12:00 Identifying devices within the network
12:00 - 12:30 Phishing prevention and blocking of malicious URLs
Workshop structure
8

Botnet
Set of enslaved devices (usually IoT) that can be
controlled by a cybercriminal
OR
Malware that enslaves the devices
10

But why?
Can be used to
• Gain computational power
• Perform distributed denial-of-service attack (DDoS attack)
• Send spam
• Mine Cryptocurrencies
• Steal data
DDoS Business model
• Harm&Destroy your competitors and opponents
• Business competitors
• Political opponents, Independent journalism
• Sell DDoS as a service
• Blackmail companies (Money or DDoS!)
11

Botnet components
• Zombie computer
• A compromised node
infected by the botnet
malware
• Command and Control (CnC)
server
• Server that remotely controls
the zombie computers
• Botmaster
• A person who operates the
CnC server
• Hides their identity (via Tor,
proxies, …)
12

Architectures
Client-Server Peer to peer
13

A case study: Mirai
• Attacks vulnerable IoT devices with factory-default credentials
• IP cameras
• Network storages
• Client - Server architecture
• Two main component are the Mirai itself and a C&C server
• Both available on Github
• Spreads like a worm over the internet: each infected node scans the whole IPv4
• Does not attack DoD of USA
• Not persistent - device restarts and mirai disappears (and comes soon again -
C&C has memory)
14

A case study: Mirai
• October 2016: One of the most impactful cyber attacks ever against online
infrastructure firm Dyn impacting Twitter, Spotify, Reddit, Airbnb, Netflix,...
• Also hit and disabled krebsonsecurity.com just hours after Krebs presented a talk on
Mirai at a conference
• Solution: Google’s Project Shield protection.
• Attacks with power of 600 - 1500 Gb/s
• 150 000 enslaved devices make for 1Tb of DDoS capability
15

Mirai: Vulnerable device setup
• Telnet running (big mistake, but still very popular)
• Forwards its port 23 to the router (i.e. port 23 is visible from the outside of the network)
16

Mirai: Dictionary Attack
• Knock knock, will you let me in?
• Mirai has a predefined dictionary of factory-setting credentials, tries them randomly
• Sends the commands via telnet (plaintext)
•
17

Mirai: Dictionary Attack
• Knock knock, will you let me in?
• Mirai has a predefined dictionary of factory-setting credentials, tries them randomly
• Sends the commands via telnet (plaintext)
18

Mirai: Password guessed
It notifies the C&C server which sends a telnet command to download the mirai binary (via
wget or tftp)
19

Mirai: Mass Scan
• Scans the internet and tries to infect vulnerable devices in other networks
• Mirai is quite a stupid parasite: even sometimes kills its host
20

Mirai: DDoS
When the command comes, this is the result:
21
A map of internet outages in Europe and North America caused by the Dyn
cyberattack (as of 21 October 2016 1:45pm Pacific Time).

Mirai: Visualization of Activity
Telnet & Mass Scan
22

A case study: Mirai attack vector
Summary of Mirai attack:
• Uses vulnerable telnet
• Has a list of factory settings (CHANGE YOUR PASSWORD)
• Scans the whole range of the internet
• Is actually very simple, nonetheless, it caused a lot of troubles in 2016
IoT Malware is very simple compared to PC malware, it will be evolving
and gain complexity
23

ML for security
in the Networking context
Martin Bálek

Security of an endpoint
• Block malicious servers / sites
• Check traffic
• Detect malicious content
25

Our data domain
• Stream of packets
• Big amount of data
• A lot of protocols (tcp, udp, ip, icmp, upnp, telnet, ftp, http, https…)
• Deep vs. stateful packet inspection
27

Our data domain
• Flows
• Source IP address
• Source Port
• Destination IP address
• Destination Port
• Protocol
• + some useful stats
29

31
• Detect devices which are present in network
• Find communication patterns
• Detect malicious behavior
• Make all above robust
Challenges

Data from ML point of view
• Features
• Are available at any time? (e.g. total amount of data)
• How cheap is to calculate them? (e.g. packet interarrival time - mean, variance based)
• Is the information relevant? (e.g. port)
• Time series... but begin and end are possibly more important
• Single packet/flow is never important (or is it?)
• Traffic is not deterministic (e.g. connection issues, port or IP changes a lot)
• All mixed together
• A lot of legitimate scenarios (e.g. torrent vs. mass scan of the network)
• Are we always sure what we are modeling?
32

ZOO of algorithms
• Unsupervised techniques
• Anomaly detection
• Communication patterns detection
• Semisupervised methods
• Device identification
• Classification
• Content checking
• Patterns classification
33

34
Device identification
Galina Alperovich

Task description
● Device identification: automatically identify device type and device model based on
available device networking information
35

Why this task is important
● “Security in IoT” expects ability to distinguish IoT devices from other devices =>
device identification
● Device type and device model are important features in malware detection
37

Existing approach
● Expert-created rules based on device features and regexps
● Advantages:
○ Utilize expert knowledge and many different features
○ Accurate upon exact match
● Disadvantages:
○ Missclassifications for too broad rules
○ Conflicting rules
○ Unknown accuracy
38

How we want to improve it
● More accurate model
● Can solve conflicts between “rules” automatically
● Ability to generalise
● Ability to tune every source of properties and measure the accuracy
● Level of confidence (probability) together with an answer
39

Device identification as classification
● Classification task where classes = different device types (~20 classes like phone,
security camera, printer, computer, bulb, fridge)
● More detailed task: classes = different models of the device class (thousands of
classes)
● Features:
○ Scan features (MAC-address, open ports, text body of specific responses)
○ Behavioral features (patterns in traffic consumption)
40

Example of data
ports MAC Vendor Protocol Response Label
[ 67, 1900,
2869 ]
4c:0b:be:43:19:28 Microsoft DHCP hostname: XboxOnenclassid:
MSFT 5.0nparamlist:
1,3,6,15,31,33,43,44,46,47,121,
249,252
Game
console
[ 80, 443,
1900]
e0:88:5d:8f:f9:11 Technico UPNP
'<?xml version="1.0"?>n<root
xmlns="urn:schemas-upnp-org:device-1-0">n<specVersion>n<majo
r>1</major>n<minor>0</minor>n</specVersion>n<device>n<devic
eType>urn:schemas-upnp-org:device:InternetGatewayDevice:1</devi
ceType>n<friendlyName>ZyXEL Keenetic
II</friendlyName>n<manufacturer>ZyXEL Communications
Corp.</manufacturer>n<manufacturerURL>ht……….
Router
[ 88, 443,
554, 1900,
5353 ]
00:0e:53:15:1d:ab AvTech HTTPS
'<html>n<head>n<link rel="icon" href="/nobody/favicon.ico"
type="image/vnd.microsoft.icon" />n<link rel="shortcut icon"
href="/nobody/favicon.ico" type="image/vnd.microsoft.icon" />n<link
rel="bookmark" href="/nobody/favicon.ico"
type="image/vnd.microsoft.icon" />n<meta
http-equiv="Content-Type" content="text/html;
charset=utf-8">n<meta name="googlebot"
content="nosnippet">n<meta name="robots"
content="noarchive">n<title>::: Login :::</title>n<style>n<!--nbody
{background-image: url(/nobody/jpg/bg.jpg); margin-left:
0px;margin-top: 0px;margin-right: 0px;margin-bottom: 0px;}ntd {
font-size:14px;color:#FFFFFF;font-weight:bold; font-famil
Security
camera
41

Challenges
● Unbalanced dataset (could be changed in future)
● It’s hard to obtain ground truth labels (expert knowledge)
● Different categories of features (numerical, categorical, text)
● Missing values - some devices don’t have specific features at all, only a subset
○ Empty ports
○ Empty response strings
○ Randomized MAC-address
42

44
One fixed datasets with all features: ports, mac, DHCP response, etc
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier #N. . . . .
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
44

Advantages of ensembling
● More accurate than individual classifiers
● Individual classifier is responsible for specific features
● You can tune individual classifier and see the change on accuracy
● Explainable
● Able to backtrack
45

46
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
46

Silver classifier on MAC and open ports without labels
● If you have non-labeled dataset, how to quickly get labelled one from it ?
47
X ? X y

Semi-supervised iterative learning
1. Cluster one-hot encoded ports and vendor
features
2. Then expert labels some of the clusters
3. Collect labelled devices and call them silver
labels
4. Train classifier on silver labels
5. Run classifier on the whole dataset
6. Repeat step 1 everything for unclassified
examples (unclassified = low probability)
48
Cluster
Manually
label some
clusters
Train classifier
on labelled
Run classifier
on all data
Take
unclassified

Classification coverage after 3 iterations
49
0% 30% 77%
2 hours of
labelling in
total
classifier

Clustering
● ~1500 one-hot-encoded features
● Biased dataset towards rare classes (undersampling)
● Jaccard distance for similarity measure
50
0 0 1 0 1 0 0 1 ... ... 0 0 0 1 0 0 0
Ports Vendors

Results of Silver classifier
51
accuracy on test set able to classify agreement with
rule-based labels
contains initial
silver labels
99% 77% 55% 30%

52
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
52

Components based on protocol responses
● DHCP - info about operating system
● zeroconf and UPnP- info about available services
● HTTP - info about user agent, server, location, admin interface
● General scheme:
○ Prepare xml/html/text documents
○ Text extraction based on heuristics specific for every protocol
○ Estimate probabilities of class given a word P(security camera | “cgi” ) = 0.81
○ Cluster words that often occur in the same documents
○ For a new device find the closest cluster
53

54
Device identification using
Internet traffic
Martin Neznal

55
Preprocessing for
Classifier #1
Preprocessing for
Classifier #2
Preprocessing for
Classifier
#1
Classifier
#N
Classifier
#2
. . . . .
p_1 …. p_n p_1 …. p_n p_1 …. p_n
Ensemble classifier
Label
p_1, … ,p_n -
probabilities for
device_class_1,
... ,
device_class_n
55

59
Time-series based learning
● Number of events in time
● Any continuous or discrete value
○ E.g. number of flows or number of unique
destinations
● Any time interval
● Challenges
○ Robust to different behaviors
○ Robust to legitimate anomalies
○ Robust to turned on/off

Dynamic Time Warping
● Algorithm for measuring similarity of two time-series
● Calculates distance between the series
● Output ∈ [0, +∞[
○ More similar when output closer to zero
● O(N2
)
● Time-series can have different lengths
0
1
1 0
1

Dynamic Time Warping example
DTW( , )
61
= 0.2
= 0.6DTW( , )

One component: Time based classifier
● Time-series for each device
● Similarities of behaviors
● Classification of devices
62

Time for classifier’s decision
● Each component needs some time to produce a
decision about a device
● The time is different for each classifier
● Precision vs. time
● Silver classifier
○ seconds - minutes
● Protocol responses classifiers
○ minutes - hours
● Time-series classifier
○ minutes - hours
63
Silver classifier
HTTP response
DHCP response
Time-series analysis

64
Perceptual phishing
detection
Libor Mořkovský & Tomáš Trnka

65
There is only one legitimate site!

There is only one amazon!
https://www.amazon.com/ap/signin http://secure-login-verify-myaccount-information.ga/signin/
Target Attacker
66

Schema of processing
67
Url
filtering
Image
acquisition
Key Point
detection
Matching Evaluation &
verification

Catching the phishers
- Challenge
- decide as early as possible
- we’re able to crawl and screenshot only for a fraction
- Solution
- discard most of the URLs with reasonable confidence
- whitelist + whitelist exceptions
- string matches on fishy keywords (approximate)
- prefer likely distribution channels (email, facebook)
68

- Whitelist
- We use 1 milion of most popular top private domains over several
months
- Whitelist exceptions
- but some popular sites are evil
- additionally we collected popular TPDs which can easily host user
content, and we ‘subtract’ them from the list
- e. g. google.com but not sites.google.com
69

Approximate matching
- tokens from target sites
- adobe, alibaba, aliexpress, coinsbank,
credit-suisse
Exact match (strings shorter than 5)
- often misused constructs (paypal.com-blabla.gd)
- .com-, .org., cgi-bin
70

71
Url
filtering
Image
acquisition
Key Point
detection
verification

How to match two images?
Unpopular, but looks similar? Fishy!
72

73
Url
filtering
Image
acquisition
Key Point
detection
verification

Detecting the points
• Interesting points: points, edges, blobs
• Many methods how to detect the
interesting points in the picture - Corner
detectors, Edge detectors
• Examples of the points
74

Describing the points
• Abstract (mathematical)
representation of the detected
points
• Extract the patches
• FAST, SIFT, ORB, ...
75

76
Url
filtering
Image
acquisition
Key Point
detection
verification

Matching and verification
- Find the matrix transforming the points from one picture to another
- The hypothesis for transformation is done by 4 points, we sample them
and determine the transformation with RANdom SAmple Consensus
77
Credits: Scikit-learn

Matching and verification
- Evaluate how reasonable the transformation is
- valid perspective transformation
- thresholds on scale, rotation, sheer, translation
78
1 2
3
4
1
3
4
2
4
3

79
Url
filtering
Image
acquisition
Key Point
detection
verification

- Until now the only problem is (small) text blocks
- Text generates a lot of keypoints (high local contrast)
- A text block matches any other text block (the descriptors are all similar)
- Possible solutions
- ignore images with many key points
- detect text through high density in single SIFT octave
- use external text detector to mask areas
Problems
80

Conclusion & Future work
- Persuade the phishers that we are accessing their sites from different
locations
- More precise but still fast text detection mechanism
- Make it even faster ;)
81

Marek Krčál, Avast fellow at Institute of Computer Science
Joint work with Ondřej Švec, Martin Bálek, and Otakar Jašek
Deep Convolutional Malware Classiﬁers Can Learn
from Raw Executables and Labels Only

• Can the success of convnets (incl. end-to-end learning)
be transferred to malware detection?

• Can the success of convnets (incl. end-to-end learning)
be transferred to malware detection?
• Applied on Windows executables,
but is rather domain-agnostic: Other ﬁle formats and
blocking their content in network traﬃc

Convnets achieve state-of-the-art in many areas
Husky
Lilly
Eskymo
Spiral
Coﬀee
...

Husky
Lilly
Eskymo
Spiral
Coﬀee
...• without feature engineering!

Husky
Lilly
Eskymo
Spiral
Coﬀee
• explainability

Husky
Lilly
Eskymo
Spiral
Coﬀee
• explainability
why boxer?

Detection of Malicious Portable Executables

• Input is 1D – sequence of bytes (static malware analysis)

– Consists of header, sections, relocation tables, strings
–no global semantics of byte symbols
– Each can appear at almost arbitrary palce – high
translational variance

• We use no domain expertise aside from labels

• We use no domain expertise aside from labels
• Only two classes clean and malware (for simplicity)

Dataset – 20 million of executables
• Less ﬁles with compressed/encrypted machine code
• Between 12kB and 1/2 MB

• Temporal split:
1.1.2016 1.1.2017 week 8 week 16
training validation test

• Temporal split:
1.1.2016 1.1.2017 week 8 week 16
training validation test
• roughly balanced clean and malware classes

Architecture
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Executable as sequence of N bytes

Architecture
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
according to the byte’s bits

Architecture
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8

Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8

Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
Power-of-two strides: improve
speed and performance

Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
strides 3,5,7,9 instead of 4,4,8,8
harm performance by 6-10%

Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
Fully Connected
Fully Connected
Fully Connected
Global Average
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
Fully Connected
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8
projects variably-wide matrix
to a ﬁxed-sized vector

Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
192
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Global Average
192
192
160
128
2
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
96×N/64
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8

Architecture
192×N/
4·4·4·8·8
4096
Fully Connected
Fully Connected
Fully Connected
Fixed Embedding
Conv 32 (stride 4)
Conv 32 (stride 4)
Max pooling 4
Conv 16 (stride 8)
Conv 16 (stride 8)
96×N/16
96×N/64
128×N/512
192
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Global Average
192
192
160
128
2
Fully Connected
Fixed Embedding
8 × N
48×N/4
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
Fully Connected
Fully Connected
192
160
128
2
Fully Connected
96×N/64
Training:7-fold inﬂuence of
clean on loss, mild weight decay,
Adam,...
Fixed embedding:
byte → (±1/16,. . .,±1/16) ∈ R8

Evaluation
• Evaluation in the regime of low false positives (formally
area under the receiver operator curve restricted to
[0, 0.001] – AUC|<0.001)

Evaluation
[0, 0.001] – AUC|<0.001)
False Positives Rate
True Positives Rate

Evaluation
[0, 0.001] – AUC|<0.001)
False Positives Rate
True Positives Rate
• Evaluation score matters: Gobal Average (instead of Max),
strong emphasis of clean ﬁles,...

Competing architecture (dataset matters)
“MalConv” convnet by Raﬀ et al. (Univ. Maryland + NVIDIA)

EmbeddingFixed Embedding
8 × N
EmbeddingGlobal Max
128 × (N/512)
Fully Connected
128
2
Fully Connected
Gated Conv 512 (stride 512)

AUC[0,0.001]
Our architecture 0.704 ± 0.005
MalConv (competitor) 0.661 ± 0.009
EmbeddingFixed Embedding
8 × N
EmbeddingGlobal Max
128 × (N/512)
Fully Connected
128
2
Fully Connected
Gated Conv 512 (stride 512)

Automatic vs. hand-crafted features
550 hand-crafted in-house features (last year @MLP)

feed to a 5-layer feedforward net (same set of samples):

AUC|<0.001
convolution features 70.4 ± 0.5%
hand-crafted features 73.2 ± 2.3% (similar in accur. and x-entropy)

AUC|<0.001
ensambled features 76.1 ± 1.0% (much better accur and x-entr.)

AUC|<0.001
Convnets slightly below Avast’s know-how, but already good
at feature enrichment

AUC|<0.001
– Dataset easier for convnets

AUC|<0.001
– Dataset easier for convnets
+ Improvement potential, transferable to other domains

Explainability
• grad-CAM (Class Activation Map):

Explainability
• grad-CAM (Class Activation Map):
which “pixels” of the last conv layer
caused the prediction?

Byte-level explanations:
Guided Backprop

Guided Backprop
• header of an embedded PE

Guided Backprop
• “VERSION_INFO” with a fake vendor and software name

• unusual imported functions
Guided Backprop
• “VERSION_INFO” with a fake vendor and software name

Future work
• Improve speed (separable convolutions, mixture of experts)

Future work
• More diverse and larger dataset

Future work
• Apply to (other ﬁle types relevant for) network traﬃc

Future work
• Apply to (other ﬁle types relevant for) network traﬃc
Questions?

Convolutional Nets
input aligned in Euclidean space

Convolutional Nets
1D (sequence)
x1
xn
...

Convolutional Nets
1D (sequence)
x1
xn
...
convolutional layer:
...

Convolutional Nets
1D (sequence)
x1
xn
...
θ1
θ3
θ1
θ3
...

Convolutional Nets
1D (sequence)
x1
xn
...
θ1
θ3
θ1
θ3
convenient to implement some invariance on translation
...

Avast @ Machine Learning

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (11)

Ähnlich wie Avast @ Machine Learning

Ähnlich wie Avast @ Machine Learning (20)

Mehr von Avast

Mehr von Avast (7)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Avast @ Machine Learning