The document discusses how machine learning can be used to battle unknown malware. It provides examples of using machine learning techniques like feature selection, dimensionality reduction, and classification on military personnel data and malware samples. While machine learning is effective against most malware, it is not enough on its own as many attacks do not use malware. A comprehensive prevention approach is needed to address the full spectrum of threats.
8. Some Data to Get Started:
1988 ANTHROPOMETRIC
SURVEY OF ARMY PERSONNEL
Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
9. • Over 4000 soldiers surveyed
• Over 100 measurements
• Reported by gender
Data
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
13. LET’S
CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
• Let’s assume we
want to detect
males (blue)
• I.e. “blue” is our
positive class
• TP: classify blue
as blue
• Note some
misclassifications
• FP: classify red as
blue
• FN: classify blue
as red
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
18. “Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
• Get more “blue”
right (true positives)
• Get more “red”
wrong (false
positives)
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
22. CURSE OF DIMENSIONALITY
REDUCED
predictive performance
INCREASED
training time
SLOWER
classification
LARGER
memory footprint
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
28. LET’S APPLY THIS TO
SECURITY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
29. FILE
ANALYSIS
AKA Static Analysis
• THE GOOD
– Relatively fast
– Scalable
– No need to detonate
– Platform independent, can be done at gateway
• THE BAD
– Limited insight due to narrow view
– Different file types require different techniques
– Different subtypes need special consideration
– Packed files
– .Net
– Installers
– EXEs vs DLLs
– Obfuscations (yet good if detectable)
– Ineffective against exploitation and malware-less attacks
– Asymmetry: a fraction of a second to decide for the
defender, months to craft for the attacker
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
30. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FILE CONTENT
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
31. EXAMPLE FEATURES
32/64 BIT
EXECUTABLE
GUI
SUBSYSTEM
COMMAND
LINE
SUBSYSTEM
FILE SIZE TIMESTAMP
DEBUG
INFORMATION
PRESENT
PACKER TYPE FILE ENTROPY
NUMBER OF
SECTIONS
NUMBER
WRITABLE
NUMBER
READABLE
NUMBER
EXECUTABLE
DISTRIBUTION
OF SECTION
ENTROPY
IMPORTED DLL
NAMES
IMPORTED
FUNCTION
NAMES
COMPILER
ARTIFACTS
LINKER
ARTIFACTS
RESOURCE
DATA
EMBEDDED
PROTOCOL
STRINGS
EMBEDDED
IPS/DOMAINS
EMBEDDED
PATHS
EMBEDDED
PRODUCT
META DATA
DIGITAL
SIGNATURE
ICON
CONTENT …
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
36. APTS & 99% OF MALWARE DETECTED…
36
Chanceofatleastone
successforadversary
Number of attempts
1%
>99%
500
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
39. Next-Generation Endpoint Protection
Cloud Delivered. Enriched by Threat Intelligence
MANAGED
HUNTING
ENDPOINT DETECTION
AND RESPONSE
NEXT-GEN
ANTIVIRUS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
40. ML SETTINGS WITHIN FALCON HOST
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
41. ML PREVENTION IN ACTION
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
42. KEY
POINTS
• Machine Learning is an effective tool against
unknown malware
• Try it out on VirusTotal
• Trading off true positives and false positives
• Detecting 99% malware means an APT has a
100% chance of getting malware into your
environment
• The majority of intrusions are not malware-
based
• Avoid silent failure
• Use a comprehensive array of techniques
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.