В докладе речь пойдёт о применении алгоритмов машинного обучения для обнаружения вредоносных приложений для Android. Я расскажу, как на базе Матрикснета в Яндексе был спроектирован высокопроизводительный инструмент для решения этой задачи. А также продемонстрирую, в каких случаях аналитические методы выявления вредоносного ПО помогают блокировать множество простых образцов вирусного кода. Затем мы поговорим о том, как можно усовершенствовать такие методы для обнаружения более хитроумных вредных программ.
5. 5
Brief list of tools for APK analysis
! Androguard (ultimate tool by @adesnos and others) – used
by VirusTotal, APKInspector, etc.
! SCanDroid (Adam P. Fuchs, Avik Chaudhuri, and Jeffrey S. Foster)
! TaintDroid (guys from Intel, Penn State University, Duke
University)
! DroidBox (dynamic analysis by Lantz Patric) – used by ApkScan
6. 6
Is this all? Really?
! http://www.apk-analyzer.net
! http://anubis.iseclab.org
! http://apkscan.nviso.be
8. 8
Methods of malware detection
Static analysis
! Advantages
– APK has predictable content. Application behavior can be learned by simply
reading the file
– Checks are safe
! Limitations
– Can be ineffective for sophisticated malware and obfuscation techniques
– We cannot really tell as we don't execute app
9. 9
Methods of malware detection
Dynamic analysis
! Advantages
– Clear results and interpretation
– Open source solutions available
! Limitations
– Not fast (enough)
– Can be detected and bypassed
– Big ecosystem requires big infrastructure
10. 10
Methods of malware detection
Signature analysis
! Advantages
– Effective for known malware
– Commercial solutions available
! Limitations
– Signature databases requires regular (and frequent) updates
– Not effective for new malware
– Do you have a team of virus analytics?
13. 13
Why can we use machine learning?
Abstract task description:
! We have a set of objects (APK-files). We should divide this set into two
subsets (malware and normal)
! For every element in main set we can count predictable amount of features
! Subsets – only result of simple classification task, so we can try to choose
effective features
14. 14
What is the MatrixNet?
MatrixNet is an implementation of gradient boosted decision trees algorithm
MatrixNet is a bit different from standard:
! Using Oblivious Trees
! Accounting for sample count in each leaf
15. 15
Why MatrixNet is powerful?
! This is machine learning algorithm for classification task
! A key feature of this method is it’s resistance to overfitting
18. 18
How it works?
Offline learning process:
! Choosing features
! Choosing samples
! Manual classification (malware or not)
! Learning on combined set of apps
! Calculating mistakes
19. 19
Features
What kind of features to use:
! Permissions
! URI in strings and other resources
! Adware library usage
! Obfuscation methods
! …
20. 20
Samples and classification
Malware applications:
! VirusTotal feed
! Samples from malicious sites
Normal applications:
! Manual testing
! Trusted developers
! Yandex applications
28. 28
ClassesParser
! Parser for DEX files
! Internal DEX disassembler
! Callgraph builder
! Embeds “real” functions/variables names into disassembly listing
! Builds a list of used procedures and functions
31. 31
ReflectionAnalyzer
Output:
! Report:
There is some reflections usage:
1@android.app.Activity->getContentResolver calls:
598@java.lang.Class->forName
2@android.app.Activity->onActivityResult calls:
598@java.lang.Class->forName
! Amount of reflection calls is a feature.
34. 34
Let's try it on...
Yandex.Store application feed:
! More than 50K Android applications
! More than 200 new/updated apps per week
! Open for developers (no strict manual verification)
40. 40
It works!
! Analytic methods work fine for detection Android mobile malware
! Machine learning is not a “rocket science” but cool and effective instrument
! Open API coming soon.