"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс

2
Fast detection
of Android malware
Yury Leonychev

4
Android application
APK
Manifest
(AndroidManifest.xml)
Code
(Classes.dex and
native)
Meta
information
(META-INF)
Resources
(ﬁles and
Resources.arsc)

5
Brief list of tools for APK analysis
! Androguard (ultimate tool by @adesnos and others) – used
by VirusTotal, APKInspector, etc.
! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster)
! TaintDroid (guys from Intel, Penn State University, Duke
University)
! DroidBox (dynamic analysis by Lantz Patric) – used by ApkScan

6
Is this all? Really?
!  http://www.apk-analyzer.net
!  http://anubis.iseclab.org
!  http://apkscan.nviso.be

7
Our task is more complex
Malware
detector

8
Methods of malware detection
Static analysis
!  Advantages
–  APK has predictable content. Application behavior can be learned by simply
reading the ﬁle
–  Checks are safe
!  Limitations
–  Can be ineffective for sophisticated malware and obfuscation techniques
–  We cannot really tell as we don't execute app

9
Dynamic analysis
!  Advantages
–  Clear results and interpretation
–  Open source solutions available
!  Limitations
–  Not fast (enough)
–  Can be detected and bypassed
–  Big ecosystem requires big infrastructure

10
Signature analysis
!  Advantages
–  Effective for known malware
–  Commercial solutions available
!  Limitations
–  Signature databases requires regular (and frequent) updates
–  Not effective for new malware
–  Do you have a team of virus analytics?

11
Seems like the most efﬁcient way
is hybrid solution

12
MatrixNet
What is The Matrix?

13
Why can we use machine learning?
Abstract task description:
!  We have a set of objects (APK-ﬁles). We should divide this set into two
subsets (malware and normal)
!  For every element in main set we can count predictable amount of features
!  Subsets – only result of simple classiﬁcation task, so we can try to choose
effective features

14
What is the MatrixNet?
MatrixNet is an implementation of gradient boosted decision trees algorithm
MatrixNet is a bit different from standard:
!  Using Oblivious Trees
!  Accounting for sample count in each leaf

15
Why MatrixNet is powerful?
!  This is machine learning algorithm for classiﬁcation task
!  A key feature of this method is it’s resistance to overﬁtting

16
MatrixNet post learning optimization

17
MatrixNet post learning optimization
Copyright © 2013 by Sidney Harris.

18
How it works?
Ofﬂine learning process:
!  Choosing features
!  Choosing samples
!  Manual classiﬁcation (malware or not)
!  Learning on combined set of apps
!  Calculating mistakes

19
Features
What kind of features to use:
!  Permissions
!  URI in strings and other resources
!  Adware library usage
!  Obfuscation methods
!  …

20
Samples and classiﬁcation
Malware applications:
! VirusTotal feed
!  Samples from malicious sites
Normal applications:
!  Manual testing
!  Trusted developers
!  Yandex applications

21
Formula
Features weight
Features cost
Learning
Normal
Malware
MatrixNetFeatures

22
Measuring of mistakes
Formula 1
Features cost 1
Formula N
Features cost N
Normal
Malware
Formula with cool
confusion matrix
and low cost

23
Analyzer architecture
Fine! I'll go build my own casino, with blackjack and
big data

24
Main parts
Parsers Analyzers
Oracle Report

25
Parsers
In depth
APK
ManifestParser ResourceParser MetaInfoParser ClassesParser
Analyzers
PermissionAnalyzer PackageAnalyzer URLAnalyzer ReﬂectionAnalyzer
Reports
XHTMLReporter JSONReporter
Oracle
MatrixNet

26
ManifestParser
Avoid some obfuscation methods:
! HEUR:Backdoor.AndroidOS.Obad.a

27
<?xml version="1.0" encoding="utf-8"?>
<manifest ="singleTop" android:versionCode="2" ="2.0"
android:installLocation="internalOnly" package="com.android.system.admin"
xmlns:android="http://schemas.android.com/apk/res/android">
<uses-permission ="android.permission.READ_LOGS" />
<uses-permission ="android.permission.WAKE_LOCK" />
…
<uses-permission ="android.permission.RECEIVE_SMS" />
<uses-permission ="android.permission.SEND_SMS" />
<uses-permission ="android.permission.CALL_PHONE" />
ManifestParser

28
ClassesParser
!  Parser for DEX ﬁles
!  Internal DEX disassembler
!  Callgraph builder
!  Embeds “real” functions/variables names into disassembly listing
!  Builds a list of used procedures and functions

29
ClassesParser
Disassembler
https://github.com/tracer0tong/de
Example:
./de.py test1.dex.dat
[[0, 'sget-object v0, {type} [{class}].{ﬁeld} // ﬁeld@2225'],
[2, 'invoke-virtual v0 @13970 // {class}->{method}'],
[5, 'move-result-object v0'],
[6, 'check-cast v0, [{type_name}] // type@0958'],
[8, 'return-object v0']]

30
ReﬂectionAnalyzer
java.lang.reﬂect.*
!  Classes: Field, Method, etc.
!  Functions: getClass(), getDeclaredField(), etc.

31
ReflectionAnalyzer
Output:
!  Report:
There is some reflections usage:
1@android.app.Activity->getContentResolver calls:
598@java.lang.Class->forName
2@android.app.Activity->onActivityResult calls:
598@java.lang.Class->forName
!  Amount of reflection calls is a feature.

32
Service architecture
Nginx

Gunicorn

Flask

Celery

MongoDB

Nginx

Gunicorn

Flask

Celery

MongoDB

34
Let's try it on...
Yandex.Store application feed:
!  More than 50K Android applications
!  More than 200 new/updated apps per week
!  Open for developers (no strict manual veriﬁcation)

35
Perfomance. Check timing
~2 ms
~0,25 s
~4,5 min

36
Performance. Amount of checks
!  More than 16.000 applications checked in 1 hour on 1 cluster node

37
Confusion matrix
Meaning
Malware (Score > 0) Normal (Score < 0)
Fact
Malware 485 (97%) 15 (3%)
Normal 25 (5%) 475 (95%)

38
(Un)predictable results
!  Applications with malicious adware library AirPush classiﬁed as malware
!  But we have no special features for adware in ﬁrst version

39
Conclusion
It’s alive… alive!

40
It works!
!  Analytic methods work ﬁne for detection Android mobile malware
!  Machine learning is not a “rocket science” but cool and effective instrument
!  Open API coming soon.

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (12)

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie "Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс

Ähnlich wie "Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс (20)

Mehr von Yandex

Mehr von Yandex (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения". Юрий Леонычев, Яндекс