ML for Malware Classification & Clustering Using Boosted Decision Trees

•Als PPTX, PDF herunterladen•

1 gefällt mir•242 views

A

1) Machine learning can be used as a replacement for antivirus software by using statistical techniques to learn patterns from large malware datasets. 2) Boosted decision trees are well-suited for malware classification because they perform like a game of 20 questions to maximize discrimination between malware and benign classes. 3) Features used in machine learning models require a balance between complexity, which provides more information but less explainability, and explainability, which provides insights to analysts but may not help classification.

Machine Learning for Malware
Classification and Clustering
Phil Roth, Data Scientist
1

• PhD in particle astrophysics
• Switched to making images from radar data
• Switched to solving security problems with data
Phil Roth
Data Scientist
2

Outline
• Malware Detection
• Boosted Decision Trees
• Malware Features
• Evaluating Performance
• Bringing a Human into the Loop
3

The Problem: Antivirus
The security industry has declared antivirus as dead, but
there is no widely accepted replacement.
Machine Learning can be that replacement.
4

The Problem: Antivirus
• Antivirus uses signatures, heuristics, and hand crafted rules
that do not scale well
• Using polymorphism and obfuscation, malware authors can
circumvent rules based detection techniques
5

The Solution: Machine Learning
Machine Learning uses statistical techniques to learn
patterns from large datasets
6
Two Steps:
• Feature Extraction
• Boundary Learning

Machine Learning Advantages
• Automation
• Deep Insights
• Scalability
• Generalization
7

Machine Learning Challenges
• Requires labels
• Requires large data sets
• Security field requires very low tolerance for errors
8

Boosted Decision Trees
Basically, it’s a game of 20 questions
Source: https://en.wikipedia.org/wiki/Decision_tree_learning
A tree showing survival of passengers
on the Titanic ("sibsp" is the number
of spouses or siblings aboard). The
figures under the leaves show the
probability of survival and the
percentage of observations in the
leaf.
9

Boosted Decision Trees
• The trees are built by choosing “questions” that
maximize the discrimination between two classes
• The model is called “boosted” because misclassified
samples are given higher weight in future tree building
10

Why Boosted Decision Trees?
Proven results in security and physics
References:
https://www.kaggle.com/c/malware-classification/
http://arxiv.org/pdf/1511.04317.pdf
http://jmlr.org/proceedings/papers/v42/chen14.pdf
11

Malware Features
The extracted features determine your
model’s performance, but there is a tradeoff
Complicated Explainable
12

Complicated Features
Byte frequency and byte
entropy features form a
binary fingerprint that inform
the model
13

Explainable Features
Lists of capabilities don’t greatly help the model classify a
sample, but they can provide more insight to an analyst.
This sample can:
• Record keystrokes
• Send/receive network traffic
• Modify registry
14

Evaluating Performance
We must be careful not to learn from “future” information:
time
time
Train Data
Test Data
Model Train Times
Patterns learned here….
... should not inform classifications here
15

Bringing Humans in the Loop
Amazon built an entire tool (Mechanical Turk) to cheaply
generate labels from human intuition:
Are these products related?
16

Bringing Humans in the Loop
Our labels are more expensive to obtain, and so choosing
what samples to label is even more important.
Is this binary malicious?
Active Learning can help!
17

Bringing Humans in the Loop
When new data arrives, Active Learning tells analysts
which labels would be most helpful.
18

Integration
• Our malware classifier model has been integrated into
our stealthy sensor and Hunt Platform
• Ask the other friendly Endgamers here for a demo!
19

Thanks!
proth@endgame.com
@mrphilroth
20

Empfohlen

Fighting advanced malware using machine learning (English)

Fighting advanced malware using machine learning (English)

Fighting advanced malware using machine learning (English)FFRI, Inc.

Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...

Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...

Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...EndgameInc

AI approach to malware similarity analysis: Maping the malware genome with a...

AI approach to malware similarity analysis: Maping the malware genome with a...

AI approach to malware similarity analysis: Maping the malware genome with a...Priyanka Aash

Reverse Engineering - Protecting and Breaking the Software

Reverse Engineering - Protecting and Breaking the Software

Reverse Engineering - Protecting and Breaking the SoftwareSatria Ady Pradana

A review of machine learning based anomaly detection

A review of machine learning based anomaly detection

A review of machine learning based anomaly detectionMohamed Elfadly

Malware Detection - A Machine Learning Perspective

Malware Detection - A Machine Learning Perspective

Malware Detection - A Machine Learning PerspectiveChong-Kuan Chen

BlueHat v18 || Improving security posture through increased agility with meas...

BlueHat v18 || Improving security posture through increased agility with meas...

BlueHat v18 || Improving security posture through increased agility with meas...BlueHat Security Conference

CSER2016 - Detecting Problems in Database Access Code of Large Scale Systems

CSER2016 - Detecting Problems in Database Access Code of Large Scale Systems

CSER2016 - Detecting Problems in Database Access Code of Large Scale SystemsConcordia University

Empfohlen

Fighting advanced malware using machine learning (English)

Fighting advanced malware using machine learning (English)

Fighting advanced malware using machine learning (English)FFRI, Inc.

Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...

Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...

Vulnerability and Exploit Trends: Combining behavioral analysis and OS defens...EndgameInc

AI approach to malware similarity analysis: Maping the malware genome with a...

AI approach to malware similarity analysis: Maping the malware genome with a...

AI approach to malware similarity analysis: Maping the malware genome with a...Priyanka Aash

Reverse Engineering - Protecting and Breaking the Software

Reverse Engineering - Protecting and Breaking the Software

Reverse Engineering - Protecting and Breaking the SoftwareSatria Ady Pradana

A review of machine learning based anomaly detection

A review of machine learning based anomaly detection

A review of machine learning based anomaly detectionMohamed Elfadly

Malware Detection - A Machine Learning Perspective

Malware Detection - A Machine Learning Perspective

Malware Detection - A Machine Learning PerspectiveChong-Kuan Chen

BlueHat v18 || Improving security posture through increased agility with meas...

BlueHat v18 || Improving security posture through increased agility with meas...

BlueHat v18 || Improving security posture through increased agility with meas...BlueHat Security Conference

CSER2016 - Detecting Problems in Database Access Code of Large Scale Systems

CSER2016 - Detecting Problems in Database Access Code of Large Scale Systems

CSER2016 - Detecting Problems in Database Access Code of Large Scale SystemsConcordia University

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...BlueHat Security Conference

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

Optimizing fault injection in FMI co-simulation through sensitivity partitioningmehmor

The VTC experience

The VTC experience

The VTC experiencefrisksoftware

Active Testingfrisksoftware

EdgarDB - the simple, powerful database for scientific research

EdgarDB - the simple, powerful database for scientific research

EdgarDB - the simple, powerful database for scientific researchMark Khoury

Anomaly Detection for Security

Anomaly Detection for Security

Anomaly Detection for SecurityCody Rioux

Whittaker How To Break Software Security - SoftTest Ireland

Whittaker How To Break Software Security - SoftTest Ireland

Whittaker How To Break Software Security - SoftTest IrelandDavid O'Dowd

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...Yandex

Data Science curriculum

Data Science curriculum

Data Science curriculumObject Automation

Building & Leveraging White Database for Antivirus Testing

Building & Leveraging White Database for Antivirus Testing

Building & Leveraging White Database for Antivirus Testingfrisksoftware

CISSP Exam-Certified Information Systems Security Professional

CISSP Exam-Certified Information Systems Security Professional

CISSP Exam-Certified Information Systems Security Professional Isabella789

What Every Developer And Tester Should Know About Software Security

What Every Developer And Tester Should Know About Software Security

What Every Developer And Tester Should Know About Software SecurityAnne Oikarinen

Assignment 4-it409-IT Security & Policies questions and answers

Assignment 4-it409-IT Security & Policies questions and answers

Assignment 4-it409-IT Security & Policies questions and answersKarthik Srinivasan

Deep Learning and Image Recognition

Deep Learning and Image Recognition

Deep Learning and Image RecognitionFrank Fang Kuo Yu

Collections Cubed: Into the Third Dimension

Collections Cubed: Into the Third Dimension

Collections Cubed: Into the Third DimensionRichard Urban

P14-3 S-Clip (mm)

P14-3 S-Clip (mm)

P14-3 S-Clip (mm)Brian Terrill

Content Delivery - Hot Topic in Academia

Content Delivery - Hot Topic in Academia

Content Delivery - Hot Topic in AcademiaWarsaw School of Tourism and Hospitality Management

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Panasonic NN-SN661S Countertop Microwave Oven GUIDEloreneteddy93

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Bir Zamanlar Turkiye - Amedeo Preziosi 2Bicahi Esgici

Poland Meetings Impact

Poland Meetings Impact

Poland Meetings Impact Warsaw School of Tourism and Hospitality Management

Extracting the Malware Signal from Internet Noise

Extracting the Malware Signal from Internet Noise

Extracting the Malware Signal from Internet NoiseAshwini Almad

Examining Malware with Python

Examining Malware with Python

Examining Malware with Pythonmrphilroth

Weitere ähnliche Inhalte

Was ist angesagt?

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...BlueHat Security Conference

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

Optimizing fault injection in FMI co-simulation through sensitivity partitioningmehmor

The VTC experience

The VTC experience

The VTC experiencefrisksoftware

Active Testingfrisksoftware

EdgarDB - the simple, powerful database for scientific research

EdgarDB - the simple, powerful database for scientific research

EdgarDB - the simple, powerful database for scientific researchMark Khoury

Anomaly Detection for Security

Anomaly Detection for Security

Anomaly Detection for SecurityCody Rioux

Whittaker How To Break Software Security - SoftTest Ireland

Whittaker How To Break Software Security - SoftTest Ireland

Whittaker How To Break Software Security - SoftTest IrelandDavid O'Dowd

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...Yandex

Data Science curriculum

Data Science curriculum

Data Science curriculumObject Automation

Building & Leveraging White Database for Antivirus Testing

Building & Leveraging White Database for Antivirus Testing

Building & Leveraging White Database for Antivirus Testingfrisksoftware

CISSP Exam-Certified Information Systems Security Professional

CISSP Exam-Certified Information Systems Security Professional

CISSP Exam-Certified Information Systems Security Professional Isabella789

What Every Developer And Tester Should Know About Software Security

What Every Developer And Tester Should Know About Software Security

What Every Developer And Tester Should Know About Software SecurityAnne Oikarinen

Assignment 4-it409-IT Security & Policies questions and answers

Assignment 4-it409-IT Security & Policies questions and answers

Assignment 4-it409-IT Security & Policies questions and answersKarthik Srinivasan

Deep Learning and Image Recognition

Deep Learning and Image Recognition

Deep Learning and Image RecognitionFrank Fang Kuo Yu

Was ist angesagt? (14)

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

BlueHat v18 || Crafting synthetic attack examples from past cyber-attacks for...

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

Optimizing fault injection in FMI co-simulation through sensitivity partitioning

The VTC experience

The VTC experience

The VTC experience

Active Testing

EdgarDB - the simple, powerful database for scientific research

EdgarDB - the simple, powerful database for scientific research

EdgarDB - the simple, powerful database for scientific research

Anomaly Detection for Security

Anomaly Detection for Security

Anomaly Detection for Security

Whittaker How To Break Software Security - SoftTest Ireland

Whittaker How To Break Software Security - SoftTest Ireland

Whittaker How To Break Software Security - SoftTest Ireland

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...

Data Science curriculum

Data Science curriculum

Data Science curriculum

Building & Leveraging White Database for Antivirus Testing

Building & Leveraging White Database for Antivirus Testing

Building & Leveraging White Database for Antivirus Testing

CISSP Exam-Certified Information Systems Security Professional

CISSP Exam-Certified Information Systems Security Professional

CISSP Exam-Certified Information Systems Security Professional

What Every Developer And Tester Should Know About Software Security

What Every Developer And Tester Should Know About Software Security

What Every Developer And Tester Should Know About Software Security

Assignment 4-it409-IT Security & Policies questions and answers

Assignment 4-it409-IT Security & Policies questions and answers

Assignment 4-it409-IT Security & Policies questions and answers

Deep Learning and Image Recognition

Deep Learning and Image Recognition

Deep Learning and Image Recognition

Andere mochten auch

Collections Cubed: Into the Third Dimension

Collections Cubed: Into the Third Dimension

Collections Cubed: Into the Third DimensionRichard Urban

P14-3 S-Clip (mm)

P14-3 S-Clip (mm)

P14-3 S-Clip (mm)Brian Terrill

Content Delivery - Hot Topic in Academia

Content Delivery - Hot Topic in Academia

Content Delivery - Hot Topic in AcademiaWarsaw School of Tourism and Hospitality Management

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Panasonic NN-SN661S Countertop Microwave Oven GUIDEloreneteddy93

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Bir Zamanlar Turkiye - Amedeo Preziosi 2Bicahi Esgici

Poland Meetings Impact

Poland Meetings Impact

Poland Meetings Impact Warsaw School of Tourism and Hospitality Management

Extracting the Malware Signal from Internet Noise

Extracting the Malware Signal from Internet Noise

Extracting the Malware Signal from Internet NoiseAshwini Almad

Examining Malware with Python

Examining Malware with Python

Examining Malware with Pythonmrphilroth

Eboluzioaiazpiro1

Andere mochten auch (9)

Collections Cubed: Into the Third Dimension

Collections Cubed: Into the Third Dimension

Collections Cubed: Into the Third Dimension

P14-3 S-Clip (mm)

P14-3 S-Clip (mm)

P14-3 S-Clip (mm)

Content Delivery - Hot Topic in Academia

Content Delivery - Hot Topic in Academia

Content Delivery - Hot Topic in Academia

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Panasonic NN-SN661S Countertop Microwave Oven GUIDE

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Bir Zamanlar Turkiye - Amedeo Preziosi 2

Poland Meetings Impact

Poland Meetings Impact

Poland Meetings Impact

Extracting the Malware Signal from Internet Noise

Extracting the Malware Signal from Internet Noise

Extracting the Malware Signal from Internet Noise

Examining Malware with Python

Examining Malware with Python

Examining Malware with Python

Eboluzioa

Ähnlich wie ML for Malware Classification & Clustering Using Boosted Decision Trees

Web applications security conference slides

Web applications security conference slides

Web applications security conference slidesBassam Al-Khatib

Cybersecurity and Generative AI - for Good and Bad vol.2

Cybersecurity and Generative AI - for Good and Bad vol.2

Cybersecurity and Generative AI - for Good and Bad vol.2Ivo Andreev

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...Pluribus One

Webinar on Functional Safety Analysis using Model-based System Analysis

Webinar on Functional Safety Analysis using Model-based System Analysis

Webinar on Functional Safety Analysis using Model-based System AnalysisDeepak Shankar

BsidesLVPresso2016_JZeditsv6

BsidesLVPresso2016_JZeditsv6

BsidesLVPresso2016_JZeditsv6Rod Soto

BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...

BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...

BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...BlueHat Security Conference

Malware Collection and Analysis via Hardware Virtualization

Malware Collection and Analysis via Hardware Virtualization

Malware Collection and Analysis via Hardware VirtualizationTamas K Lengyel

First Principles Vulnerability Assessment

First Principles Vulnerability Assessment

First Principles Vulnerability AssessmentManuel Brugnoli

Feature store: Solving anti-patterns in ML-systems

Feature store: Solving anti-patterns in ML-systems

Feature store: Solving anti-patterns in ML-systemsAndrzej Michałowski

Securing your Machine Learning models

Securing your Machine Learning models

Securing your Machine Learning modelsPhilipBasford

AI & ML in Cyber Security - Why Algorithms Are Dangerous

AI & ML in Cyber Security - Why Algorithms Are Dangerous

AI & ML in Cyber Security - Why Algorithms Are DangerousRaffael Marty

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malwareDaveEdwards12

Machine Learning & Predictive Maintenance

Machine Learning & Predictive Maintenance

Machine Learning & Predictive MaintenanceArnab Biswas

Controlling Access to IBM i Systems and Data

Controlling Access to IBM i Systems and Data

Controlling Access to IBM i Systems and DataPrecisely

Deep learning in manufacturing predicting and preventing manufacturing defect...

Deep learning in manufacturing predicting and preventing manufacturing defect...

Deep learning in manufacturing predicting and preventing manufacturing defect...WMG centre High Value Manufacturing Catapult

New Horizons SCYBER Presentation

New Horizons SCYBER Presentation

New Horizons SCYBER PresentationNew Horizons Computer Learning Centers / 5PE

Expand Your Control of Access to IBM i Systems and Data

Expand Your Control of Access to IBM i Systems and Data

Expand Your Control of Access to IBM i Systems and DataPrecisely

Cybersecurity Challenges with Generative AI - for Good and Bad

Cybersecurity Challenges with Generative AI - for Good and Bad

Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev

Foutse_Khomh.pptx

Foutse_Khomh.pptx

Foutse_Khomh.pptxFoutse Khomh

Rise of the machines -- Owasp israel -- June 2014 meetup

Rise of the machines -- Owasp israel -- June 2014 meetup

Rise of the machines -- Owasp israel -- June 2014 meetupShlomo Yona

Ähnlich wie ML for Malware Classification & Clustering Using Boosted Decision Trees (20)

Web applications security conference slides

Web applications security conference slides

Web applications security conference slides

Cybersecurity and Generative AI - for Good and Bad vol.2

Cybersecurity and Generative AI - for Good and Bad vol.2

Cybersecurity and Generative AI - for Good and Bad vol.2

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...

Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training D...

Webinar on Functional Safety Analysis using Model-based System Analysis

Webinar on Functional Safety Analysis using Model-based System Analysis

Webinar on Functional Safety Analysis using Model-based System Analysis

BsidesLVPresso2016_JZeditsv6

BsidesLVPresso2016_JZeditsv6

BsidesLVPresso2016_JZeditsv6

BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...

BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...

BlueHat Seattle 2019 || The good, the bad & the ugly of ML based approaches f...

Malware Collection and Analysis via Hardware Virtualization

Malware Collection and Analysis via Hardware Virtualization

Malware Collection and Analysis via Hardware Virtualization

First Principles Vulnerability Assessment

First Principles Vulnerability Assessment

First Principles Vulnerability Assessment

Feature store: Solving anti-patterns in ML-systems

Feature store: Solving anti-patterns in ML-systems

Feature store: Solving anti-patterns in ML-systems

Securing your Machine Learning models

Securing your Machine Learning models

Securing your Machine Learning models

AI & ML in Cyber Security - Why Algorithms Are Dangerous

AI & ML in Cyber Security - Why Algorithms Are Dangerous

AI & ML in Cyber Security - Why Algorithms Are Dangerous

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

Defcon 22-wesley-mc grew-instrumenting-point-of-sale-malware

Machine Learning & Predictive Maintenance

Machine Learning & Predictive Maintenance

Machine Learning & Predictive Maintenance

Controlling Access to IBM i Systems and Data

Controlling Access to IBM i Systems and Data

Controlling Access to IBM i Systems and Data

Deep learning in manufacturing predicting and preventing manufacturing defect...

Deep learning in manufacturing predicting and preventing manufacturing defect...

Deep learning in manufacturing predicting and preventing manufacturing defect...

New Horizons SCYBER Presentation

New Horizons SCYBER Presentation

New Horizons SCYBER Presentation

Expand Your Control of Access to IBM i Systems and Data

Expand Your Control of Access to IBM i Systems and Data

Expand Your Control of Access to IBM i Systems and Data

Cybersecurity Challenges with Generative AI - for Good and Bad

Cybersecurity Challenges with Generative AI - for Good and Bad

Cybersecurity Challenges with Generative AI - for Good and Bad

Foutse_Khomh.pptx

Foutse_Khomh.pptx

Foutse_Khomh.pptx

Rise of the machines -- Owasp israel -- June 2014 meetup

Rise of the machines -- Owasp israel -- June 2014 meetup

Rise of the machines -- Owasp israel -- June 2014 meetup

Kürzlich hochgeladen

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech

Potential of AI (Generative AI) in Business: Learnings and Insights

Potential of AI (Generative AI) in Business: Learnings and Insights

Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3

Design pattern talk by Kaya Weers - 2024 (v2)

Design pattern talk by Kaya Weers - 2024 (v2)

Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney

How to write a Business Continuity Plan

How to write a Business Continuity Plan

How to write a Business Continuity PlanDatabarracks

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3

Scale your database traffic with Read & Write split using MySQL Router

Scale your database traffic with Read & Write split using MySQL Router

Scale your database traffic with Read & Write split using MySQL RouterMydbops

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe

The State of Passkeys with FIDO Alliance.pptx

The State of Passkeys with FIDO Alliance.pptx

The State of Passkeys with FIDO Alliance.pptxLoriGlavin3

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica

Kürzlich hochgeladen (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

The Ultimate Guide to Choosing WordPress Pros and Cons

Potential of AI (Generative AI) in Business: Learnings and Insights

Potential of AI (Generative AI) in Business: Learnings and Insights

Potential of AI (Generative AI) in Business: Learnings and Insights

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

[Webinar] SpiraTest - Setting New Standards in Quality Assurance

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Moving Beyond Passwords: FIDO Paris Seminar.pdf

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx

Design pattern talk by Kaya Weers - 2024 (v2)

Design pattern talk by Kaya Weers - 2024 (v2)

Design pattern talk by Kaya Weers - 2024 (v2)

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...

How to write a Business Continuity Plan

How to write a Business Continuity Plan

How to write a Business Continuity Plan

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx

Scale your database traffic with Read & Write split using MySQL Router

Scale your database traffic with Read & Write split using MySQL Router

Scale your database traffic with Read & Write split using MySQL Router

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

Glenn Lazarus- Why Your Observability Strategy Needs Security Observability

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.

How AI, OpenAI, and ChatGPT impact business and software.

The State of Passkeys with FIDO Alliance.pptx

The State of Passkeys with FIDO Alliance.pptx

The State of Passkeys with FIDO Alliance.pptx

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure

ML for Malware Classification & Clustering Using Boosted Decision Trees

1. Machine Learning for Malware Classification and Clustering Phil Roth, Data Scientist 1

2. • PhD in particle astrophysics • Switched to making images from radar data • Switched to solving security problems with data Phil Roth Data Scientist 2

3. Outline • Malware Detection • Boosted Decision Trees • Malware Features • Evaluating Performance • Bringing a Human into the Loop 3

4. The Problem: Antivirus The security industry has declared antivirus as dead, but there is no widely accepted replacement. Machine Learning can be that replacement. 4

5. The Problem: Antivirus • Antivirus uses signatures, heuristics, and hand crafted rules that do not scale well • Using polymorphism and obfuscation, malware authors can circumvent rules based detection techniques 5

6. The Solution: Machine Learning Machine Learning uses statistical techniques to learn patterns from large datasets 6 Two Steps: • Feature Extraction • Boundary Learning

7. Machine Learning Advantages • Automation • Deep Insights • Scalability • Generalization 7

8. Machine Learning Challenges • Requires labels • Requires large data sets • Security field requires very low tolerance for errors 8

9. Boosted Decision Trees Basically, it’s a game of 20 questions Source: https://en.wikipedia.org/wiki/Decision_tree_learning A tree showing survival of passengers on the Titanic ("sibsp" is the number of spouses or siblings aboard). The figures under the leaves show the probability of survival and the percentage of observations in the leaf. 9

10. Boosted Decision Trees • The trees are built by choosing “questions” that maximize the discrimination between two classes • The model is called “boosted” because misclassified samples are given higher weight in future tree building 10

11. Why Boosted Decision Trees? Proven results in security and physics References: https://www.kaggle.com/c/malware-classification/ http://arxiv.org/pdf/1511.04317.pdf http://jmlr.org/proceedings/papers/v42/chen14.pdf 11

12. Malware Features The extracted features determine your model’s performance, but there is a tradeoff Complicated Explainable 12

13. Complicated Features Byte frequency and byte entropy features form a binary fingerprint that inform the model 13

14. Explainable Features Lists of capabilities don’t greatly help the model classify a sample, but they can provide more insight to an analyst. This sample can: • Record keystrokes • Send/receive network traffic • Modify registry 14

15. Evaluating Performance We must be careful not to learn from “future” information: time time Train Data Test Data Model Train Times Patterns learned here…. ... should not inform classifications here 15

16. Bringing Humans in the Loop Amazon built an entire tool (Mechanical Turk) to cheaply generate labels from human intuition: Are these products related? 16

17. Bringing Humans in the Loop Our labels are more expensive to obtain, and so choosing what samples to label is even more important. Is this binary malicious? Active Learning can help! 17

18. Bringing Humans in the Loop When new data arrives, Active Learning tells analysts which labels would be most helpful. 18

19. Integration • Our malware classifier model has been integrated into our stealthy sensor and Hunt Platform • Ask the other friendly Endgamers here for a demo! 19

20. Thanks! proth@endgame.com @mrphilroth 20

Hinweis der Redaktion

Dive right into train versus test data.