A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

A
Scalable
Approach
for
Malware

Detec2on
through
Bounded
Feature

Space
Behavior
Modeling

Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel
Briand,

Shar Lwin Khin, and Bindu Madhavi Padmanabhuni

Interdisciplinary
Centre
for
ICT
Security,

Reliability,
and
Trust

University
of
Luxembourg,
Luxembourg

School
of
Electrical
and
Electronic
Engineering,

Nanyang
Technological
University,
Singapore

What
is
malware?

Malware
(malicious
+
soFware)
is
nothing
but

a
soFware
that
do
malicious
things
without

the
vicHm’s
knowledge

Mo2va2on

Ø More
than
403
million
new
malware
variants
were

created
in
2011,
a
41%
increase
over
2010.

Ø On
average
around
55,000
new
malware
samples

were
reported
per
day.

Ø ExponenHal
growth
of
malware
is
a
major
threat
in

the
soFware
industry

Problem
Deﬁni2on
1/2

q New
malware
has
become
very
sophisHcated.

q Malware
evade
tradiHonal
anH-‐virus
signatures,

using
various
obfuscaHon
techniques.

q Malware
authors
change
the
syntacHc
characterisHcs

(i.e.,
structure)
of
a
malicious
program
without

changing
its
semanHcs
(i.e.,
behavior)

Problem
Deﬁni2on
2/2

q Scalability
is
a
major
problem
in
exisHng

behavior-‐based
malware
detecHon
techniques

§  malware
feature
space
grows
in
proporHon

with
the
number
of
samples
under

examinaHon

§  ComputaHonally
very
intensive

Related
Work
1/2

q PracHcality
and
eﬃciency
of
behavior
based
malware

detecHon
depends
on:

•  size
of
feature
space,

•  computaHonal
complexity,

•  overheads
(e.g.,
pre-‐processing)

•  detecHon
accuracy

q Simple
malware
behavior
models
(e.g.,
n-‐gram,
m-‐bag

and
k-‐tuple)
generate
huge
feature
spaces
and
require

various
pruning
and
parameter
tuning
mechanisms

Related
Work
2/2

q Complex
malware
behavior
models
(e.g.,
system
call

dependency
graphs)
are
highly
computaHonally

intensive

Behavior
Modeling
–
An
Overview

Ø SoFware
program
perform
ac#ons
on
various

operaHng
system
resources.

Ø An
acHon
corresponds
to
a
higher-‐level
operaHon

(e.g.,
reading
a
ﬁle)
composed
of
a
set
of
related

system
calls
(e.g.,
NtReadFile)

Ø Advantage
of
using
acHons
over
system
calls
is
that
OS

may
use
diﬀerent
names
for
system
calls
that
are
in

fact
serving
the
same
purpose

Ø NtCreateProcess
and
NtCreateProcessEx

maps
to

CreateProcess
acHon

Opera2ng
System
Resource
Types

ü File
System

ü Registry

ü Process
and
Thread

ü Network

ü SynchronizaHon

ü SecHon

Bounded
Feature
space
behavior

Modeling
(BOFM)

Malware
feature

For
each
type
of
OS
resource,
the
set
of
acHons
performed
by

malware
on
an
instance
of
the
OS
resource
type
concerned

consHtutes
a
feature
of
the
malware

Ø Example:

Malware
performs,

CreateFile
and
DeleteFile
acHons
on
a
file
instance
C:foo.exe,
and

DeleteFile
acHon
on
another
file
instance
C:abc.dll

This
malware
has
two
features,

{CreateFile,
DeleteFile}
and
{DeleteFile}

with
respect
to
file

resource
instances
C:foo.exe
and
C:abc.dll,
respecHvely.

ü  Goal:

To
be
more
resilient
to
commonly
used
obfuscaHon
techniques

v Property
1:
Regardless
of
the
number
of
Hmes
an
acHon
is
performed

on
an
OS
resource
instance
it
is
considered
only
once
in
final
feature

set.

E.g.,
ReadFile
acHon
is
performed
several
Hmes
on
a
file
instance
C:
Windows...sysfile2.dll;
this
behavior
is
modeled
by
a
BOFM
feature

{ReadFile}

v Property
2:
The
sequence,
in
which
the
acHons
are
performed,
by

malware,
is
ignored
in
feature
construcHon.

E.g.,
malware
features
{ReadFile,
QueryFileInforma9on}
and

{QueryFileInforma9on,
ReadFile}
are
considered
idenHcal.

Proper2es
of
BOFM
features
1/2

v Property
3:
IdenHcal
acHon
sets
which
are
performed
on
two

different
OS
resource
instances
of
same
type
are
modeled
as
a

single
feature.

E.g.,
acHons
CreateFile
and
DeleteFile
performed
on
two
different
file

resource
instances
C:Windowsabc.dll
and
D:Personel
foo.exe

are
modeled
as
a
single
BOFM
feature
{CreateFile,
DeleteFile}

Proper2es
of
BOFM
features
2/2

Goal:
Avoid
malware
feature
space
growth
proporHonal
to

number
of
samples
under
examinaHon

•  Lets
j
to
be
OS
resource
type,
where

•  Total
number
kj
of
possible
acHons
that
a
malware
may

perform
on
an
OS
resource
instance
of
type
j
is
a
constant

•  Maximum
number
mj

of
possible
features
with
regard
to
OS

resource
type
j
is
also
a
constant

Where,

•  Maximum
number
of
possible
features
N
for
all
resource

types
is
always
the
following
constant
:

Bounded
Feature
Space

OS
Resource
Types
and
Corresponding

Ac2ons

Total
malware
features
(N)
extracted
from
these
six
OS
resources
is
16,652

Model Construction Work Flow
Example
feature
vector

Detec2on
Method

Ø Machine
Learning
(ML)
classiﬁcaHon
techniques

used
for
building
Malware
DetecHon
models

Ø LogisHc
Regression
(LR)
and
Support
Vector

Machine
(SVM)
are
used
in
our
experiments

Ø Malware
detecHon
process
involves
two
phases

•  Phase
1:
model
building
phase

•  Phase
2:
model
evaluaHon
phase

Experimental
Dataset

ü 
Training-‐set
of
5000
malware
and
80
benign
samples
and
a
test-‐set

of
300
malware
and
20
benign
samples

Experimental
Results

ü SVM
achieved
99.4%
detecHon
accuracy
with
no
false
posiHves
and

LR
achieved
99.6%
detecHon
accuracy
with
1%
FP
rate

ü Balanced
test-‐sets
consists
of
20
randomly
selected
(from
a
pool
of

300
samples)
malware
samples
and
the
20
benign
samples.

ü For
balance
test-‐sets
SVM
yielded
a
perfect
accuracy
of
100%
with

0%
FP
rate
and
LR
achieved
99.5%
detecHon
accuracy
with
1%
FP

rate.

Comparison
with
Canali
et
al.
(ISSTA
2012)

q 
Both
achieve
99%
detecHon
accuracy

q However,

§  BOFM
generated
only
569
acHve
features
whereas
Canali
et

al.
generated
several
millions.

§ 
It
took
1.67
hrs
to
extract
malware
features
using
BOFM

while
Canali
et
al.
took
around
48
hrs.

§ 
It
took
26
seconds
to
train
the
SVM
classiﬁer,
consuming

only
200MB
RAM.
Whereas,
Canali’s
approach
consumed

more
than
1GB
RAM
to
perform
signature
matching.

§  BOFM
is
much
more
eﬃcient
and
scalable

Conclusion

ü  Malware
evade
tradiHonal
anH-‐virus
signatures,
using
various

obfuscaHon
techniques.

ü  Behavior-‐based
malware
detecHon
is
an
increasingly
common

soluHon

ü  Scalability
is
a
major
problem
in
exisHng
behavior-‐based
malware

detecHon
techniques

ü  We
proposed
a
bounded
feature
space
malware
behavior
modeling

(BOFM)
technique
to
address
the
scalability
issue.

ü  BOFM
entails
a
ﬁxed
number
of
features
that
do
not
grow
in

proporHon
with
the
number
of
malware
samples
under
examinaHon

ü  Benchmark:
BOFM
combined
with
SVM
achieved
100%
detecHon

accuracy,
within
less
than
a
minute
and
200
MB
of
memory

Feature
Space
Analysis

•  Comparison
of
malware
and
benign
feature
spaces

•  57%
of
unique
malware
features
suggests
that
BOFM

is
a
promising
technique
to
model
the
malware

behavior

Brief
Analysis
of
Interes2ng
Features

Ø ‘NoHfyChangeKey’
acHon
is
very
widely
used
by

malware
samples
compared
to
benign
samples
(86%

Vs.
15%).

Ø ‘DeleteKey’
acHon
also
widely
used
by
malware

samples.

Ø AcHons
such
as
‘OpenFile’,
‘GetFileAmributes’,

‘CreateMutex’
and
‘ReleaseMutex’
widely
appeared

in
both
malware
and
benign
samples.

A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

Ähnlich wie A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling (20)

Mehr von Lionel Briand

Mehr von Lionel Briand (20)

A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling