SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
A	
  Scalable	
  Approach	
  for	
  Malware	
  
Detec2on	
  through	
  Bounded	
  Feature	
  
Space	
  Behavior	
  Modeling	
  
Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel	
  Briand,	
  
Shar Lwin Khin, and Bindu Madhavi Padmanabhuni
	
  
Interdisciplinary	
  Centre	
  for	
  ICT	
  Security,	
  
Reliability,	
  and	
  Trust	
  
University	
  of	
  Luxembourg,	
  Luxembourg	
  
	
  
School	
  of	
  Electrical	
  and	
  Electronic	
  Engineering,	
  	
  
Nanyang	
  Technological	
  University,	
  Singapore	
  
What	
  is	
  malware?	
  
	
  
	
  	
  	
  Malware	
  (malicious	
  +	
  soFware)	
  is	
  nothing	
  but	
  
a	
  soFware	
  that	
  do	
  malicious	
  things	
  without	
  
the	
  vicHm’s	
  knowledge	
  
Mo2va2on	
  
Ø More	
  than	
  403	
  million	
  new	
  malware	
  variants	
  were	
  
created	
  in	
  2011,	
  a	
  41%	
  increase	
  over	
  2010.	
  	
  
Ø On	
  average	
  around	
  55,000	
  new	
  malware	
  samples	
  
were	
  reported	
  per	
  day.	
  	
  
Ø ExponenHal	
  growth	
  of	
  malware	
  is	
  a	
  major	
  threat	
  in	
  
the	
  soFware	
  industry	
  
Problem	
  Defini2on	
  1/2	
  
q New	
  malware	
  has	
  become	
  very	
  sophisHcated.	
  
q Malware	
  evade	
  tradiHonal	
  anH-­‐virus	
  signatures,	
  
using	
  various	
  obfuscaHon	
  techniques.	
  
q Malware	
  authors	
  change	
  the	
  syntacHc	
  characterisHcs	
  
(i.e.,	
  structure)	
  of	
  a	
  malicious	
  program	
  without	
  
changing	
  its	
  semanHcs	
  (i.e.,	
  behavior)	
  
Problem	
  Defini2on	
  2/2	
  
q Scalability	
  is	
  a	
  major	
  problem	
  in	
  exisHng	
  
behavior-­‐based	
  malware	
  detecHon	
  techniques	
  
§  malware	
  feature	
  space	
  grows	
  in	
  proporHon	
  
with	
  the	
  number	
  of	
  samples	
  under	
  
examinaHon	
  
§  ComputaHonally	
  very	
  intensive	
  
Related	
  Work	
  1/2	
  
q PracHcality	
  and	
  efficiency	
  of	
  behavior	
  based	
  malware	
  
detecHon	
  depends	
  on:	
  	
  
•  size	
  of	
  feature	
  space,	
  	
  
•  computaHonal	
  complexity,	
  	
  
•  overheads	
  (e.g.,	
  pre-­‐processing)	
  
•  detecHon	
  accuracy	
  
q Simple	
  malware	
  behavior	
  models	
  (e.g.,	
  n-­‐gram,	
  m-­‐bag	
  
and	
  k-­‐tuple)	
  generate	
  huge	
  feature	
  spaces	
  and	
  require	
  
various	
  pruning	
  and	
  parameter	
  tuning	
  mechanisms	
  
Related	
  Work	
  2/2	
  
q Complex	
  malware	
  behavior	
  models	
  (e.g.,	
  system	
  call	
  
dependency	
  graphs)	
  are	
  highly	
  computaHonally	
  
intensive	
  
Behavior	
  Modeling	
  –	
  An	
  Overview	
  
Ø SoFware	
  program	
  perform	
  ac#ons	
  on	
  various	
  
operaHng	
  system	
  resources.	
  
Ø An	
  acHon	
  corresponds	
  to	
  a	
  higher-­‐level	
  operaHon	
  
(e.g.,	
  reading	
  a	
  file)	
  composed	
  of	
  a	
  set	
  of	
  related	
  
system	
  calls	
  (e.g.,	
  NtReadFile)	
  
Ø Advantage	
  of	
  using	
  acHons	
  over	
  system	
  calls	
  is	
  that	
  OS	
  
may	
  use	
  different	
  names	
  for	
  system	
  calls	
  that	
  are	
  in	
  
fact	
  serving	
  the	
  same	
  purpose	
  	
  
Ø NtCreateProcess	
  and	
  NtCreateProcessEx	
  	
  maps	
  to	
  
CreateProcess	
  acHon	
  
Opera2ng	
  System	
  Resource	
  Types	
  
ü File	
  System	
  
ü Registry	
  
ü Process	
  and	
  Thread	
  
ü Network	
  
ü SynchronizaHon	
  
ü SecHon	
  
	
  
Bounded	
  Feature	
  space	
  behavior	
  
Modeling	
  (BOFM)	
  
Malware	
  feature	
  
For	
  each	
  type	
  of	
  OS	
  resource,	
  the	
  set	
  of	
  acHons	
  performed	
  by	
  
malware	
  on	
  an	
  instance	
  of	
  the	
  OS	
  resource	
  type	
  concerned	
  
consHtutes	
  a	
  feature	
  of	
  the	
  malware	
  
	
  
Ø Example:	
  
Malware	
  performs,	
  
	
  CreateFile	
  and	
  DeleteFile	
  acHons	
  on	
  a	
  file	
  instance	
  C:foo.exe,	
  and	
  
DeleteFile	
  acHon	
  on	
  another	
  file	
  instance	
  C:abc.dll	
  
	
  
This	
  malware	
  has	
  two	
  features,	
  
{CreateFile,	
  DeleteFile}	
  and	
  {DeleteFile}	
  	
  with	
  respect	
  to	
  file	
  
resource	
  instances	
  C:foo.exe	
  and	
  C:abc.dll,	
  respecHvely.	
  
ü  Goal:	
  	
  To	
  be	
  more	
  resilient	
  to	
  commonly	
  used	
  obfuscaHon	
  techniques	
  
v Property	
  1:	
  Regardless	
  of	
  the	
  number	
  of	
  Hmes	
  an	
  acHon	
  is	
  performed	
  
on	
  an	
  OS	
  resource	
  instance	
  it	
  is	
  considered	
  only	
  once	
  in	
  final	
  feature	
  
set.	
  	
  
E.g.,	
  ReadFile	
  acHon	
  is	
  performed	
  several	
  Hmes	
  on	
  a	
  file	
  instance	
  C:
Windows...sysfile2.dll;	
  this	
  behavior	
  is	
  modeled	
  by	
  a	
  BOFM	
  feature	
  
{ReadFile}	
  	
  
	
  
v Property	
  2:	
  The	
  sequence,	
  in	
  which	
  the	
  acHons	
  are	
  performed,	
  by	
  
malware,	
  is	
  ignored	
  in	
  feature	
  construcHon.	
  	
  
E.g.,	
  malware	
  features	
  {ReadFile,	
  QueryFileInforma9on}	
  and	
  
{QueryFileInforma9on,	
  ReadFile}	
  are	
  considered	
  idenHcal.	
  	
  	
  	
  
Proper2es	
  of	
  BOFM	
  features	
  1/2	
  
v Property	
  3:	
  IdenHcal	
  acHon	
  sets	
  which	
  are	
  performed	
  on	
  two	
  
different	
  OS	
  resource	
  instances	
  of	
  same	
  type	
  are	
  modeled	
  as	
  a	
  
single	
  feature.	
  	
  
E.g.,	
  acHons	
  CreateFile	
  and	
  DeleteFile	
  performed	
  on	
  two	
  different	
  file	
  
resource	
  instances	
  C:Windowsabc.dll	
  and	
  D:Personel	
  foo.exe	
  
are	
  modeled	
  as	
  a	
  single	
  BOFM	
  feature	
  {CreateFile,	
  DeleteFile}	
  	
  
	
  
Proper2es	
  of	
  BOFM	
  features	
  2/2	
  
Goal:	
  Avoid	
  malware	
  feature	
  space	
  growth	
  proporHonal	
  to	
  
number	
  of	
  samples	
  under	
  examinaHon	
  	
  	
  
•  Lets	
  j	
  to	
  be	
  OS	
  resource	
  type,	
  where	
  	
  	
  
•  Total	
  number	
  kj	
  of	
  possible	
  acHons	
  that	
  a	
  malware	
  may	
  
perform	
  on	
  an	
  OS	
  resource	
  instance	
  of	
  type	
  j	
  is	
  a	
  constant	
  
•  Maximum	
  number	
  mj	
  	
  of	
  possible	
  features	
  with	
  regard	
  to	
  OS	
  
resource	
  type	
  j	
  is	
  also	
  a	
  constant	
  
	
  	
  	
  	
  	
  Where,	
  
•  Maximum	
  number	
  of	
  possible	
  features	
  N	
  for	
  all	
  resource	
  
types	
  is	
  always	
  the	
  following	
  constant	
  :	
  
Bounded	
  Feature	
  Space	
  
OS	
  Resource	
  Types	
  and	
  Corresponding	
  
Ac2ons	
  
Total	
  malware	
  features	
  (N)	
  extracted	
  from	
  these	
  six	
  OS	
  resources	
  is	
  16,652	
  
Model Construction Work Flow
Example	
  feature	
  vector	
  
	
  
	
  
	
  
	
  
	
  
	
  
Detec2on	
  Method	
  
Ø Machine	
  Learning	
  (ML)	
  classificaHon	
  techniques	
  	
  
used	
  for	
  building	
  Malware	
  DetecHon	
  models	
  
Ø LogisHc	
  Regression	
  (LR)	
  and	
  Support	
  Vector	
  
Machine	
  (SVM)	
  are	
  used	
  in	
  our	
  experiments	
  
Ø Malware	
  detecHon	
  process	
  involves	
  two	
  phases	
  
•  Phase	
  1:	
  model	
  building	
  phase	
  	
  
•  Phase	
  2:	
  model	
  evaluaHon	
  phase	
  	
  
	
  
Experimental	
  Dataset	
  
ü 	
  Training-­‐set	
  of	
  5000	
  malware	
  and	
  80	
  benign	
  samples	
  and	
  a	
  test-­‐set	
  	
  
of	
  300	
  malware	
  and	
  20	
  benign	
  samples	
  
Experimental	
  Results	
  
ü SVM	
  achieved	
  99.4%	
  detecHon	
  accuracy	
  with	
  no	
  false	
  posiHves	
  and	
  
LR	
  achieved	
  99.6%	
  detecHon	
  accuracy	
  with	
  1%	
  FP	
  rate	
  	
  
ü Balanced	
  test-­‐sets	
  consists	
  of	
  20	
  randomly	
  selected	
  (from	
  a	
  pool	
  of	
  
300	
  samples)	
  malware	
  samples	
  and	
  the	
  20	
  benign	
  samples.	
  
ü For	
  balance	
  test-­‐sets	
  SVM	
  yielded	
  a	
  perfect	
  accuracy	
  of	
  100%	
  with	
  
0%	
  FP	
  rate	
  and	
  LR	
  achieved	
  99.5%	
  detecHon	
  accuracy	
  with	
  1%	
  FP	
  
rate.	
  
Comparison	
  with	
  Canali	
  et	
  al.	
  (ISSTA	
  2012)	
  
q 	
  Both	
  achieve	
  99%	
  detecHon	
  accuracy	
  
q However,	
  	
  
§  BOFM	
  generated	
  only	
  569	
  acHve	
  features	
  whereas	
  Canali	
  et	
  
al.	
  generated	
  several	
  millions.	
  
§  	
  It	
  took	
  1.67	
  hrs	
  to	
  extract	
  malware	
  features	
  using	
  BOFM	
  
while	
  Canali	
  et	
  al.	
  took	
  around	
  48	
  hrs.	
  
§  	
  It	
  took	
  26	
  seconds	
  to	
  train	
  the	
  SVM	
  classifier,	
  consuming	
  
only	
  200MB	
  RAM.	
  Whereas,	
  Canali’s	
  approach	
  consumed	
  
more	
  than	
  1GB	
  RAM	
  to	
  perform	
  signature	
  matching.	
  
§  BOFM	
  is	
  much	
  more	
  efficient	
  and	
  scalable	
  
Conclusion	
  
ü  Malware	
  evade	
  tradiHonal	
  anH-­‐virus	
  signatures,	
  using	
  various	
  
obfuscaHon	
  techniques.	
  
ü  Behavior-­‐based	
  malware	
  detecHon	
  is	
  an	
  increasingly	
  common	
  
soluHon	
  
ü  Scalability	
  is	
  a	
  major	
  problem	
  in	
  exisHng	
  behavior-­‐based	
  malware	
  
detecHon	
  techniques	
  
ü  We	
  proposed	
  a	
  bounded	
  feature	
  space	
  malware	
  behavior	
  modeling	
  
(BOFM)	
  technique	
  to	
  address	
  the	
  scalability	
  issue.	
  
ü  BOFM	
  entails	
  a	
  fixed	
  number	
  of	
  features	
  that	
  do	
  not	
  grow	
  in	
  
proporHon	
  with	
  the	
  number	
  of	
  malware	
  samples	
  under	
  examinaHon	
  
ü  Benchmark:	
  BOFM	
  combined	
  with	
  SVM	
  achieved	
  100%	
  detecHon	
  
accuracy,	
  within	
  less	
  than	
  a	
  minute	
  and	
  200	
  MB	
  of	
  memory	
  
Feature	
  Space	
  Analysis	
  
•  Comparison	
  of	
  malware	
  and	
  benign	
  feature	
  spaces	
  
•  57%	
  of	
  unique	
  malware	
  features	
  suggests	
  that	
  BOFM	
  
is	
  a	
  promising	
  technique	
  to	
  model	
  the	
  malware	
  
behavior	
  	
  
Brief	
  Analysis	
  of	
  Interes2ng	
  Features	
  
Ø ‘NoHfyChangeKey’	
  acHon	
  is	
  very	
  widely	
  used	
  by	
  
malware	
  samples	
  compared	
  to	
  benign	
  samples	
  (86%	
  
Vs.	
  15%).	
  
Ø ‘DeleteKey’	
  acHon	
  also	
  widely	
  used	
  by	
  malware	
  
samples.	
  
Ø AcHons	
  such	
  as	
  ‘OpenFile’,	
  ‘GetFileAmributes’,	
  
‘CreateMutex’	
  and	
  ‘ReleaseMutex’	
  widely	
  appeared	
  
in	
  both	
  malware	
  and	
  benign	
  samples.	
  

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Qué es el marketing con artículos
Qué es el marketing con artículosQué es el marketing con artículos
Qué es el marketing con artículos
 
Company Credit Reports Europe
Company Credit Reports EuropeCompany Credit Reports Europe
Company Credit Reports Europe
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Posicionadores de botellas GMS
Posicionadores de botellas GMSPosicionadores de botellas GMS
Posicionadores de botellas GMS
 
Qué es la serigrafía dany
Qué es la serigrafía danyQué es la serigrafía dany
Qué es la serigrafía dany
 
Daves Resume Jan 2014
Daves Resume Jan 2014Daves Resume Jan 2014
Daves Resume Jan 2014
 
SCR Heatime HR System
SCR Heatime HR SystemSCR Heatime HR System
SCR Heatime HR System
 
Social Media_bei_SWISS
Social Media_bei_SWISSSocial Media_bei_SWISS
Social Media_bei_SWISS
 
Folleto pueblos originarios
Folleto pueblos originariosFolleto pueblos originarios
Folleto pueblos originarios
 
BCEngagement survey summary
BCEngagement survey summaryBCEngagement survey summary
BCEngagement survey summary
 
Mias internet
Mias internetMias internet
Mias internet
 
Tie - Presentacion Agencieros
Tie - Presentacion AgencierosTie - Presentacion Agencieros
Tie - Presentacion Agencieros
 
Commercial catalogue
Commercial catalogue Commercial catalogue
Commercial catalogue
 
Quédate en Silencio
Quédate en SilencioQuédate en Silencio
Quédate en Silencio
 
Elvis Collection Compilation
Elvis Collection CompilationElvis Collection Compilation
Elvis Collection Compilation
 
Hoy me amo más
Hoy me amo másHoy me amo más
Hoy me amo más
 
Dossier lcl vodkas and mixers
Dossier lcl vodkas and mixersDossier lcl vodkas and mixers
Dossier lcl vodkas and mixers
 
El agua, fuente de vida
El agua, fuente de vidaEl agua, fuente de vida
El agua, fuente de vida
 
What every attorney needs to know about their clients doing business in Europe
What every attorney needs to know about their clients doing business in EuropeWhat every attorney needs to know about their clients doing business in Europe
What every attorney needs to know about their clients doing business in Europe
 
The polling problem
The polling problemThe polling problem
The polling problem
 

Ähnlich wie A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems butest
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLSiva krishnam raju Patsamatla
 
First Principles Vulnerability Assessment
First Principles Vulnerability AssessmentFirst Principles Vulnerability Assessment
First Principles Vulnerability AssessmentManuel Brugnoli
 
Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Emilio Coppa
 
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...Stephan Chenette
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsUltraUploader
 
The Future of Automated Malware Generation
The Future of Automated Malware GenerationThe Future of Automated Malware Generation
The Future of Automated Malware GenerationStephan Chenette
 
Algebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environmentsAlgebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environmentsUltraUploader
 
A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningjaigera
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...IJCSIS Research Publications
 
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Phil Agcaoili
 
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Lastline, Inc.
 
DEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPDEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPAmr Thabet
 
Obfuscation and Mutation in Malware
Obfuscation and Mutation in Malware Obfuscation and Mutation in Malware
Obfuscation and Mutation in Malware KADARI SHIVRAJ
 
Real-World WebAppSec Flaws - Examples and Countermeasues
Real-World WebAppSec Flaws - Examples and CountermeasuesReal-World WebAppSec Flaws - Examples and Countermeasues
Real-World WebAppSec Flaws - Examples and Countermeasuesvolvent
 
Frankenstein. stitching malware from benign binaries
Frankenstein. stitching malware from benign binariesFrankenstein. stitching malware from benign binaries
Frankenstein. stitching malware from benign binariesYury Chemerkin
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executablesUltraUploader
 
Inside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware AnalysisInside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware AnalysisChong-Kuan Chen
 

Ähnlich wie A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling (20)

Model-checking for efficient malware detection
Model-checking for efficient malware detectionModel-checking for efficient malware detection
Model-checking for efficient malware detection
 
CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems CISC 879 - Machine Learning for Solving Systems Problems
CISC 879 - Machine Learning for Solving Systems Problems
 
Design and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using MLDesign and Development of an Efficient Malware Detection Using ML
Design and Development of an Efficient Malware Detection Using ML
 
First Principles Vulnerability Assessment
First Principles Vulnerability AssessmentFirst Principles Vulnerability Assessment
First Principles Vulnerability Assessment
 
Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)Symbolic Execution (introduction and hands-on)
Symbolic Execution (introduction and hands-on)
 
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
Detecting Web Browser Heap Corruption Attacks - Stephan Chenette, Moti Joseph...
 
Automatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulatorsAutomatic reverse engineering of malware emulators
Automatic reverse engineering of malware emulators
 
The Future of Automated Malware Generation
The Future of Automated Malware GenerationThe Future of Automated Malware Generation
The Future of Automated Malware Generation
 
Algebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environmentsAlgebraic specification of computer viruses and their environments
Algebraic specification of computer viruses and their environments
 
A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learning
 
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
Selecting Prominent API Calls and Labeling Malicious Samples for Effective Ma...
 
Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...Good Security Starts with Software Assurance - Software Assurance Market Plac...
Good Security Starts with Software Assurance - Software Assurance Market Plac...
 
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
Full-System Emulation Achieving Successful Automated Dynamic Analysis of Evas...
 
DEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WPDEFCON 21: EDS: Exploitation Detection System WP
DEFCON 21: EDS: Exploitation Detection System WP
 
Obfuscation and Mutation in Malware
Obfuscation and Mutation in Malware Obfuscation and Mutation in Malware
Obfuscation and Mutation in Malware
 
Real-World WebAppSec Flaws - Examples and Countermeasues
Real-World WebAppSec Flaws - Examples and CountermeasuesReal-World WebAppSec Flaws - Examples and Countermeasues
Real-World WebAppSec Flaws - Examples and Countermeasues
 
MINI PROJECT s.pptx
MINI PROJECT s.pptxMINI PROJECT s.pptx
MINI PROJECT s.pptx
 
Frankenstein. stitching malware from benign binaries
Frankenstein. stitching malware from benign binariesFrankenstein. stitching malware from benign binaries
Frankenstein. stitching malware from benign binaries
 
A hybrid model to detect malicious executables
A hybrid model to detect malicious executablesA hybrid model to detect malicious executables
A hybrid model to detect malicious executables
 
Inside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware AnalysisInside the Matrix,How to Build Transparent Sandbox for Malware Analysis
Inside the Matrix,How to Build Transparent Sandbox for Malware Analysis
 

Mehr von Lionel Briand

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityLionel Briand
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Lionel Briand
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingLionel Briand
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsLionel Briand
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsLionel Briand
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...Lionel Briand
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Lionel Briand
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsLionel Briand
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyLionel Briand
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Lionel Briand
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationLionel Briand
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Lionel Briand
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...Lionel Briand
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Lionel Briand
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Lionel Briand
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...Lionel Briand
 

Mehr von Lionel Briand (20)

Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Metamorphic Testing for Web System Security
Metamorphic Testing for Web System SecurityMetamorphic Testing for Web System Security
Metamorphic Testing for Web System Security
 
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
Simulator-based Explanation and Debugging of Hazard-triggering Events in DNN-...
 
Fuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation TestingFuzzing for CPS Mutation Testing
Fuzzing for CPS Mutation Testing
 
Data-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical SystemsData-driven Mutation Analysis for Cyber-Physical Systems
Data-driven Mutation Analysis for Cyber-Physical Systems
 
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled SystemsMany-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
 
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
ATM: Black-box Test Case Minimization based on Test Code Similarity and Evolu...
 
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
Black-box Safety Analysis and Retraining of DNNs based on Feature Extraction ...
 
PRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System LogsPRINS: Scalable Model Inference for Component-based System Logs
PRINS: Scalable Model Inference for Component-based System Logs
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
 
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and SafetyAutonomous Systems: How to Address the Dilemma between Autonomy and Safety
Autonomous Systems: How to Address the Dilemma between Autonomy and Safety
 
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
Mathematicians, Social Scientists, or Engineers? The Split Minds of Software ...
 
Reinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case PrioritizationReinforcement Learning for Test Case Prioritization
Reinforcement Learning for Test Case Prioritization
 
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
Mutation Analysis for Cyber-Physical Systems: Scalable Solutions and Results ...
 
On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...On Systematically Building a Controlled Natural Language for Functional Requi...
On Systematically Building a Controlled Natural Language for Functional Requi...
 
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
Efficient Online Testing for DNN-Enabled Systems using Surrogate-Assisted and...
 
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...Guidelines for Assessing the Accuracy of Log Message Template Identification ...
Guidelines for Assessing the Accuracy of Log Message Template Identification ...
 
A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...A Theoretical Framework for Understanding the Relationship between Log Parsin...
A Theoretical Framework for Understanding the Relationship between Log Parsin...
 

A Scalable Approach for Malware Detec2on through Bounded Feature Space Behavior Modeling

  • 1. A  Scalable  Approach  for  Malware   Detec2on  through  Bounded  Feature   Space  Behavior  Modeling   Mahinthan Chandramohan, Tan Hee Beng Kuan, Lionel  Briand,   Shar Lwin Khin, and Bindu Madhavi Padmanabhuni   Interdisciplinary  Centre  for  ICT  Security,   Reliability,  and  Trust   University  of  Luxembourg,  Luxembourg     School  of  Electrical  and  Electronic  Engineering,     Nanyang  Technological  University,  Singapore  
  • 2. What  is  malware?          Malware  (malicious  +  soFware)  is  nothing  but   a  soFware  that  do  malicious  things  without   the  vicHm’s  knowledge  
  • 3. Mo2va2on   Ø More  than  403  million  new  malware  variants  were   created  in  2011,  a  41%  increase  over  2010.     Ø On  average  around  55,000  new  malware  samples   were  reported  per  day.     Ø ExponenHal  growth  of  malware  is  a  major  threat  in   the  soFware  industry  
  • 4. Problem  Defini2on  1/2   q New  malware  has  become  very  sophisHcated.   q Malware  evade  tradiHonal  anH-­‐virus  signatures,   using  various  obfuscaHon  techniques.   q Malware  authors  change  the  syntacHc  characterisHcs   (i.e.,  structure)  of  a  malicious  program  without   changing  its  semanHcs  (i.e.,  behavior)  
  • 5. Problem  Defini2on  2/2   q Scalability  is  a  major  problem  in  exisHng   behavior-­‐based  malware  detecHon  techniques   §  malware  feature  space  grows  in  proporHon   with  the  number  of  samples  under   examinaHon   §  ComputaHonally  very  intensive  
  • 6. Related  Work  1/2   q PracHcality  and  efficiency  of  behavior  based  malware   detecHon  depends  on:     •  size  of  feature  space,     •  computaHonal  complexity,     •  overheads  (e.g.,  pre-­‐processing)   •  detecHon  accuracy   q Simple  malware  behavior  models  (e.g.,  n-­‐gram,  m-­‐bag   and  k-­‐tuple)  generate  huge  feature  spaces  and  require   various  pruning  and  parameter  tuning  mechanisms  
  • 7. Related  Work  2/2   q Complex  malware  behavior  models  (e.g.,  system  call   dependency  graphs)  are  highly  computaHonally   intensive  
  • 8. Behavior  Modeling  –  An  Overview   Ø SoFware  program  perform  ac#ons  on  various   operaHng  system  resources.   Ø An  acHon  corresponds  to  a  higher-­‐level  operaHon   (e.g.,  reading  a  file)  composed  of  a  set  of  related   system  calls  (e.g.,  NtReadFile)   Ø Advantage  of  using  acHons  over  system  calls  is  that  OS   may  use  different  names  for  system  calls  that  are  in   fact  serving  the  same  purpose     Ø NtCreateProcess  and  NtCreateProcessEx    maps  to   CreateProcess  acHon  
  • 9. Opera2ng  System  Resource  Types   ü File  System   ü Registry   ü Process  and  Thread   ü Network   ü SynchronizaHon   ü SecHon    
  • 10. Bounded  Feature  space  behavior   Modeling  (BOFM)   Malware  feature   For  each  type  of  OS  resource,  the  set  of  acHons  performed  by   malware  on  an  instance  of  the  OS  resource  type  concerned   consHtutes  a  feature  of  the  malware     Ø Example:   Malware  performs,    CreateFile  and  DeleteFile  acHons  on  a  file  instance  C:foo.exe,  and   DeleteFile  acHon  on  another  file  instance  C:abc.dll     This  malware  has  two  features,   {CreateFile,  DeleteFile}  and  {DeleteFile}    with  respect  to  file   resource  instances  C:foo.exe  and  C:abc.dll,  respecHvely.  
  • 11. ü  Goal:    To  be  more  resilient  to  commonly  used  obfuscaHon  techniques   v Property  1:  Regardless  of  the  number  of  Hmes  an  acHon  is  performed   on  an  OS  resource  instance  it  is  considered  only  once  in  final  feature   set.     E.g.,  ReadFile  acHon  is  performed  several  Hmes  on  a  file  instance  C: Windows...sysfile2.dll;  this  behavior  is  modeled  by  a  BOFM  feature   {ReadFile}       v Property  2:  The  sequence,  in  which  the  acHons  are  performed,  by   malware,  is  ignored  in  feature  construcHon.     E.g.,  malware  features  {ReadFile,  QueryFileInforma9on}  and   {QueryFileInforma9on,  ReadFile}  are  considered  idenHcal.         Proper2es  of  BOFM  features  1/2  
  • 12. v Property  3:  IdenHcal  acHon  sets  which  are  performed  on  two   different  OS  resource  instances  of  same  type  are  modeled  as  a   single  feature.     E.g.,  acHons  CreateFile  and  DeleteFile  performed  on  two  different  file   resource  instances  C:Windowsabc.dll  and  D:Personel  foo.exe   are  modeled  as  a  single  BOFM  feature  {CreateFile,  DeleteFile}       Proper2es  of  BOFM  features  2/2  
  • 13. Goal:  Avoid  malware  feature  space  growth  proporHonal  to   number  of  samples  under  examinaHon       •  Lets  j  to  be  OS  resource  type,  where       •  Total  number  kj  of  possible  acHons  that  a  malware  may   perform  on  an  OS  resource  instance  of  type  j  is  a  constant   •  Maximum  number  mj    of  possible  features  with  regard  to  OS   resource  type  j  is  also  a  constant            Where,   •  Maximum  number  of  possible  features  N  for  all  resource   types  is  always  the  following  constant  :   Bounded  Feature  Space  
  • 14. OS  Resource  Types  and  Corresponding   Ac2ons   Total  malware  features  (N)  extracted  from  these  six  OS  resources  is  16,652  
  • 15. Model Construction Work Flow Example  feature  vector              
  • 16. Detec2on  Method   Ø Machine  Learning  (ML)  classificaHon  techniques     used  for  building  Malware  DetecHon  models   Ø LogisHc  Regression  (LR)  and  Support  Vector   Machine  (SVM)  are  used  in  our  experiments   Ø Malware  detecHon  process  involves  two  phases   •  Phase  1:  model  building  phase     •  Phase  2:  model  evaluaHon  phase      
  • 17. Experimental  Dataset   ü   Training-­‐set  of  5000  malware  and  80  benign  samples  and  a  test-­‐set     of  300  malware  and  20  benign  samples  
  • 18. Experimental  Results   ü SVM  achieved  99.4%  detecHon  accuracy  with  no  false  posiHves  and   LR  achieved  99.6%  detecHon  accuracy  with  1%  FP  rate     ü Balanced  test-­‐sets  consists  of  20  randomly  selected  (from  a  pool  of   300  samples)  malware  samples  and  the  20  benign  samples.   ü For  balance  test-­‐sets  SVM  yielded  a  perfect  accuracy  of  100%  with   0%  FP  rate  and  LR  achieved  99.5%  detecHon  accuracy  with  1%  FP   rate.  
  • 19. Comparison  with  Canali  et  al.  (ISSTA  2012)   q   Both  achieve  99%  detecHon  accuracy   q However,     §  BOFM  generated  only  569  acHve  features  whereas  Canali  et   al.  generated  several  millions.   §   It  took  1.67  hrs  to  extract  malware  features  using  BOFM   while  Canali  et  al.  took  around  48  hrs.   §   It  took  26  seconds  to  train  the  SVM  classifier,  consuming   only  200MB  RAM.  Whereas,  Canali’s  approach  consumed   more  than  1GB  RAM  to  perform  signature  matching.   §  BOFM  is  much  more  efficient  and  scalable  
  • 20. Conclusion   ü  Malware  evade  tradiHonal  anH-­‐virus  signatures,  using  various   obfuscaHon  techniques.   ü  Behavior-­‐based  malware  detecHon  is  an  increasingly  common   soluHon   ü  Scalability  is  a  major  problem  in  exisHng  behavior-­‐based  malware   detecHon  techniques   ü  We  proposed  a  bounded  feature  space  malware  behavior  modeling   (BOFM)  technique  to  address  the  scalability  issue.   ü  BOFM  entails  a  fixed  number  of  features  that  do  not  grow  in   proporHon  with  the  number  of  malware  samples  under  examinaHon   ü  Benchmark:  BOFM  combined  with  SVM  achieved  100%  detecHon   accuracy,  within  less  than  a  minute  and  200  MB  of  memory  
  • 21. Feature  Space  Analysis   •  Comparison  of  malware  and  benign  feature  spaces   •  57%  of  unique  malware  features  suggests  that  BOFM   is  a  promising  technique  to  model  the  malware   behavior    
  • 22. Brief  Analysis  of  Interes2ng  Features   Ø ‘NoHfyChangeKey’  acHon  is  very  widely  used  by   malware  samples  compared  to  benign  samples  (86%   Vs.  15%).   Ø ‘DeleteKey’  acHon  also  widely  used  by  malware   samples.   Ø AcHons  such  as  ‘OpenFile’,  ‘GetFileAmributes’,   ‘CreateMutex’  and  ‘ReleaseMutex’  widely  appeared   in  both  malware  and  benign  samples.