SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Security Applications for Malicious
Code Detection using Data Mining
Under the guidance of
Prof .Pawade A.S
Presented By
1. Kulkarni Eshwari
2. Annaldas Namrata
3. Yalameli Pravin
4. Shingade Karishma
5. Jena Suvendu
SAMCD
OVERVIEW
1. Introduction
What is Malicious Code?
Harmful effects of Malicious code
How Data Mining is useful for detecting
Malicious code
3. Problem Statement
4. Objective and Scope
5. Methodology
6. Vision of System Architecture
7. Algorithm
8. Flow of Project
9. Advantages
10. Conclusion
11. References
SAMCD
INTRODUCTION
 What is Malicious Code?
 Describe any code in any part of a software system or script that
is intended to cause undesired effects, security breaches or damage
to a system.
 This malicious code is a rather simple virus, which
searches for
“ *.exe“
 Exploits software vulnerability on a victim
contd..
SAMCD
Harmful effects of Malicious code?
Harm the confidentiality, integrity or availability of your
computer data or network and can potentially cause more harm in
terms of stealing your personal information.
May remotely infect other victims
Contd..
SAMCD
 How Data Mining is useful for detecting Malicious code
 Automatically design and build a scanner that accurately detects
malicious executable before they have been given a chance to run.
 Data mining methods detect patterns in large amounts of data, such as
byte code, and use these patterns to detect future instances in similar data
 Framework uses classifiers to detect new malicious executable.
 A classifier is a rule set, or detection model, generated by the data
mining algorithm that was trained over a given set of training data.
SAMCD
The traditional detection accuracy (signature based) of malware is
ineffective, because of constantly changing of malware nature and shapes
through obfuscation techniques. Some feature representations are effective to
detect malicious code from huge historical data using classifiers, security and
learning algorithm such as RIPPER technology for higher performance
detection rate.
SAMCD
PROBLEM STATEMENT
 To perform data pre-processing that will prepare appropriate format to
be input to Machine Learning classifiers.
 To develop two representatives supervised machine learning models;
Such as Decision Tree.
 To evaluate the performance of Support vector machine and artificial
neural network to classify for new malicious executable programs.
SAMCD
OBJECTIVE
 Focus on malicious program that exists in Microsoft Windows as
experiment platform and VMware as virtual machine.
 In this project, Supervised Machine learning techniques will be focused,
because it performs statistical comparisons on specific datasets to examine
the accuracies of trained classifiers.
SAMCD
SCOPE
 DECISION TREE AND RULES:
 They only work over a single table, and over a single attribute at a time.
 Useful when the outcomes are uncertain
 Allows comparison of different possible decisions to be made.
 They are easily understandable. They build a model made up by rules
(Split Point).
 They are one of the most used data mining techniques.
METHODOLOGY
contd..
SAMCD
Classification Example
Age Car Class
20 M Yes
30 M Yes
25 T No
30 S Yes
40 S Yes
20 T No
30 M Yes
25 M Yes
40 M Yes
20 S No
Suppose,
Two Predictor attributes:
Age and Car-type (Sport,
Minivan & Truck)
Age is ordered, Car-type is
categorical attribute
Class label indicates whether
person brought product
Dependent attribute is categorical
contd..
SAMCD
What is Decision
Tree?
Age
Car
type
YES
YES NO
<30 >=30
Minivan Sport,Truck
Minivan
YES
YESSports,
Truck
NO
0 30 60 Age
contd..
SAMCD
A decision tree is built top-down from a root node and involves
partitioning the data into subsets that contain instances with similar
values (homogenous). ID3 algorithm uses entropy to calculate the
homogeneity of a sample.
If the sample is completely homogeneous the entropy is zero and if the
sample is an equally divided it has entropy of one.
Entrop
y
Information
GainThe information gain is based on the decrease in entropy after a
dataset is split on an attribute. Constructing a decision tree is all about
finding attribute that returns the highest information gain (i.e., the
most homogeneous branches).
Step 1 : Calculate entropy of the target.
Step 2 : The dataset is then split on the different attributes. The
entropy for each branch is calculated.
contd..
SAMCD
Use of probability allows flexibility
Objective analysis to decision making
Encourages clear thinking and planning
SAMCD
Advantages of Decision Tree
The architecture of our malware detection system. The system
consists of three main modules:
1.PE-Miner
2.Feature selection and data transformation
3.Learning algorithms such as RIPPER.
VISION OF SYSTEM ARCHITECTURE
contd..
SAMCD
PE- Miner
PE header
DLL & DLL
call Function
Feature
database
Feature
Selection and
Transformation
Testing set
Training set
Learning
algorithms
Classifications
result
SAMCD
We propose data mining algorithm to produce new classifiers with
separate features
RIPPER algorithm
The RIPPER algorithm is an inductive rule learner
Developed to detect examples of malicious executables
This algorithm is using a LibBFD data as characteristics
Building a set of rules that is able to determine the classes while
reducing the ambiguities
SAMCD
ALGORITHM
FLOW OF PROJECT
No
Stop
Is Information
of File gain
count = 0?
Start
Prepare the Dataset for .exe Files
Read File attributes from particular File
Separate call header & call code of File
Prepare for Testing
Prepare for the Training set
No
Change the prediction
attribute
Is accuracy of
prediction
correct?
Yes
Files is dirty / malicious
Files is clean / non malicious
SAMCD 17
 Fast testing
 Low overhead
 Robust against many confusion
SAMCD 18
ADVANTAGES
 There is a need for a technique in which detection of malicious patterns in
executable code sequences can be done more efficiently.
 It is expected that this procedure will lead to the development of better
algorithms for identifying the malicious code that has infected a system.
SAMCD
CONCLUSION
REFERENCES
 International Journal of Computer Science Trends and Technology
http://www.ijcstjournal.org/volume-3/issue-1/IJCST-V3l1P12.pdf, 2015.
 F. Cohen. "Computer Viruses“. Ph.D thesis, University of California, 1985.
 William Stallings. “Cryptography and Network Security Principles and Practices,
4ed, 2005
 Bhavani Thuraisingham, Data Mining for Security Applications, IEEE/IFIP
International Conference on Embedded and Ubiquitous Computing, 2008.
 Dr.R.Geetha Ramani, Suresh Kumar.S , Shomona Gracia Jacob”Rootkit
(Malicious Code) Prediction through Data Mining Methods and Techniques” ,
978-1-4799-1597-2/13/$31.00 ©2013 IEEE.
 M. G. Schultz, E. Eskin, E. Zadok and S. J. Stolfo, “Data Mining Methods for
Detection of New Malicious Executables”, Proceedings of the 2001 IEEE
Symposium on Security and Privacy, IEEE Computer Society.
Any Questions?
Thank You!

Weitere ähnliche Inhalte

Was ist angesagt?

Artificial immune system against viral attack
Artificial immune system against viral attackArtificial immune system against viral attack
Artificial immune system against viral attack
UltraUploader
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision Tree
CSCJournals
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
Editor IJARCET
 
Machine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityMachine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber security
IAEME Publication
 
Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection System
Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection SystemKey-Recovery Attacks on KIDS, a Keyed Anomaly Detection System
Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection System
1crore projects
 

Was ist angesagt? (20)

A malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learningA malware detection method for health sensor data based on machine learning
A malware detection method for health sensor data based on machine learning
 
Design and Implementation of Artificial Immune System for Detecting Flooding ...
Design and Implementation of Artificial Immune System for Detecting Flooding ...Design and Implementation of Artificial Immune System for Detecting Flooding ...
Design and Implementation of Artificial Immune System for Detecting Flooding ...
 
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIESCOMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
COMPARATIVE REVIEW OF MALWARE ANALYSIS METHODOLOGIES
 
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения..."Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
"Быстрое обнаружение вредоносного ПО для Android с помощью машинного обучения...
 
Artificial immune system against viral attack
Artificial immune system against viral attackArtificial immune system against viral attack
Artificial immune system against viral attack
 
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM
Wmn06MODERNIZED INTRUSION DETECTION USING  ENHANCED APRIORI ALGORITHM Wmn06MODERNIZED INTRUSION DETECTION USING  ENHANCED APRIORI ALGORITHM
Wmn06MODERNIZED INTRUSION DETECTION USING ENHANCED APRIORI ALGORITHM
 
Malware1
Malware1Malware1
Malware1
 
Robustness in deep learning
Robustness in deep learningRobustness in deep learning
Robustness in deep learning
 
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
COMPARISON OF MALWARE CLASSIFICATION METHODS USING CONVOLUTIONAL NEURAL NETWO...
 
Malware Dectection Using Machine learning
Malware Dectection Using Machine learningMalware Dectection Using Machine learning
Malware Dectection Using Machine learning
 
Classification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision TreeClassification of Malware Attacks Using Machine Learning In Decision Tree
Classification of Malware Attacks Using Machine Learning In Decision Tree
 
Attack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu AkindeindeAttack Simulation And Threat Modeling -Olu Akindeinde
Attack Simulation And Threat Modeling -Olu Akindeinde
 
Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194Volume 2-issue-6-2190-2194
Volume 2-issue-6-2190-2194
 
Analysis of field data on web security vulnerabilities
Analysis of field data on web security vulnerabilities Analysis of field data on web security vulnerabilities
Analysis of field data on web security vulnerabilities
 
Only Abstract
Only AbstractOnly Abstract
Only Abstract
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
Inspiration to Application: A Tutorial on Artificial Immune Systems
Inspiration to Application: A Tutorial on Artificial Immune SystemsInspiration to Application: A Tutorial on Artificial Immune Systems
Inspiration to Application: A Tutorial on Artificial Immune Systems
 
Machine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber securityMachine learning approach to anomaly detection in cyber security
Machine learning approach to anomaly detection in cyber security
 
Computer security - A machine learning approach
Computer security - A machine learning approachComputer security - A machine learning approach
Computer security - A machine learning approach
 
Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection System
Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection SystemKey-Recovery Attacks on KIDS, a Keyed Anomaly Detection System
Key-Recovery Attacks on KIDS, a Keyed Anomaly Detection System
 

Ähnlich wie Security Application for Malicious Code Detection using Data Mining

COPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docxCOPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docx
voversbyobersby
 
Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...
CSITiaesprime
 
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...2 14-1346479656-1- a study of feature selection methods in intrusion detectio...
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...
Dr. Amrita .
 

Ähnlich wie Security Application for Malicious Code Detection using Data Mining (20)

spamzombieppt
spamzombiepptspamzombieppt
spamzombieppt
 
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
2014 IEEE DOTNET PARALLEL DISTRIBUTED PROJECT A system-for-denial-of-service-...
 
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
IEEE 2014 DOTNET PARALLEL DISTRIBUTED PROJECTS A system-for-denial-of-service...
 
JPD1424 A System for Denial-of-Service Attack Detection Based on Multivariat...
JPD1424  A System for Denial-of-Service Attack Detection Based on Multivariat...JPD1424  A System for Denial-of-Service Attack Detection Based on Multivariat...
JPD1424 A System for Denial-of-Service Attack Detection Based on Multivariat...
 
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
PROVIDING CYBER SECURITY SOLUTION FOR MALWARE DETECTION USING SUPPORT VECTOR ...
 
Classification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining ApproachClassification of Malware based on Data Mining Approach
Classification of Malware based on Data Mining Approach
 
Advanced Malware Analysis Training Session 3 - Botnet Analysis Part 2
Advanced Malware Analysis Training Session 3 - Botnet Analysis Part 2Advanced Malware Analysis Training Session 3 - Botnet Analysis Part 2
Advanced Malware Analysis Training Session 3 - Botnet Analysis Part 2
 
Amm Icict 12 2005
Amm Icict 12 2005Amm Icict 12 2005
Amm Icict 12 2005
 
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...
An Approach of Automatic Data Mining Algorithm for Intrusion  Detection and P...An Approach of Automatic Data Mining Algorithm for Intrusion  Detection and P...
An Approach of Automatic Data Mining Algorithm for Intrusion Detection and P...
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
[IJET-V1I6P6] Authors: Ms. Neeta D. Birajdar, Mr. Madhav N. Dhuppe, Ms. Trupt...
 
COPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docxCOPYRIGHTThis thesis is copyright materials protected under the .docx
COPYRIGHTThis thesis is copyright materials protected under the .docx
 
Machine Learning in Malware Detection
Machine Learning in Malware DetectionMachine Learning in Malware Detection
Machine Learning in Malware Detection
 
Machine learning in network security using knime analytics
Machine learning in network security using knime analyticsMachine learning in network security using knime analytics
Machine learning in network security using knime analytics
 
Articles - International Journal of Network Security & Its Applications (IJNSA)
Articles - International Journal of Network Security & Its Applications (IJNSA)Articles - International Journal of Network Security & Its Applications (IJNSA)
Articles - International Journal of Network Security & Its Applications (IJNSA)
 
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICSMACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
MACHINE LEARNING IN NETWORK SECURITY USING KNIME ANALYTICS
 
New enterprise application and data security challenges and solutions apr 2...
New enterprise application and data security challenges and solutions   apr 2...New enterprise application and data security challenges and solutions   apr 2...
New enterprise application and data security challenges and solutions apr 2...
 
Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...Exploring and comparing various machine and deep learning technique algorithm...
Exploring and comparing various machine and deep learning technique algorithm...
 
A review of machine learning based anomaly detection
A review of machine learning based anomaly detectionA review of machine learning based anomaly detection
A review of machine learning based anomaly detection
 
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...2 14-1346479656-1- a study of feature selection methods in intrusion detectio...
2 14-1346479656-1- a study of feature selection methods in intrusion detectio...
 

Kürzlich hochgeladen

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Security Application for Malicious Code Detection using Data Mining

  • 1. Security Applications for Malicious Code Detection using Data Mining Under the guidance of Prof .Pawade A.S Presented By 1. Kulkarni Eshwari 2. Annaldas Namrata 3. Yalameli Pravin 4. Shingade Karishma 5. Jena Suvendu SAMCD
  • 2. OVERVIEW 1. Introduction What is Malicious Code? Harmful effects of Malicious code How Data Mining is useful for detecting Malicious code 3. Problem Statement 4. Objective and Scope 5. Methodology 6. Vision of System Architecture 7. Algorithm 8. Flow of Project 9. Advantages 10. Conclusion 11. References SAMCD
  • 3. INTRODUCTION  What is Malicious Code?  Describe any code in any part of a software system or script that is intended to cause undesired effects, security breaches or damage to a system.  This malicious code is a rather simple virus, which searches for “ *.exe“  Exploits software vulnerability on a victim contd.. SAMCD
  • 4. Harmful effects of Malicious code? Harm the confidentiality, integrity or availability of your computer data or network and can potentially cause more harm in terms of stealing your personal information. May remotely infect other victims Contd.. SAMCD
  • 5.  How Data Mining is useful for detecting Malicious code  Automatically design and build a scanner that accurately detects malicious executable before they have been given a chance to run.  Data mining methods detect patterns in large amounts of data, such as byte code, and use these patterns to detect future instances in similar data  Framework uses classifiers to detect new malicious executable.  A classifier is a rule set, or detection model, generated by the data mining algorithm that was trained over a given set of training data. SAMCD
  • 6. The traditional detection accuracy (signature based) of malware is ineffective, because of constantly changing of malware nature and shapes through obfuscation techniques. Some feature representations are effective to detect malicious code from huge historical data using classifiers, security and learning algorithm such as RIPPER technology for higher performance detection rate. SAMCD PROBLEM STATEMENT
  • 7.  To perform data pre-processing that will prepare appropriate format to be input to Machine Learning classifiers.  To develop two representatives supervised machine learning models; Such as Decision Tree.  To evaluate the performance of Support vector machine and artificial neural network to classify for new malicious executable programs. SAMCD OBJECTIVE
  • 8.  Focus on malicious program that exists in Microsoft Windows as experiment platform and VMware as virtual machine.  In this project, Supervised Machine learning techniques will be focused, because it performs statistical comparisons on specific datasets to examine the accuracies of trained classifiers. SAMCD SCOPE
  • 9.  DECISION TREE AND RULES:  They only work over a single table, and over a single attribute at a time.  Useful when the outcomes are uncertain  Allows comparison of different possible decisions to be made.  They are easily understandable. They build a model made up by rules (Split Point).  They are one of the most used data mining techniques. METHODOLOGY contd.. SAMCD
  • 10. Classification Example Age Car Class 20 M Yes 30 M Yes 25 T No 30 S Yes 40 S Yes 20 T No 30 M Yes 25 M Yes 40 M Yes 20 S No Suppose, Two Predictor attributes: Age and Car-type (Sport, Minivan & Truck) Age is ordered, Car-type is categorical attribute Class label indicates whether person brought product Dependent attribute is categorical contd.. SAMCD
  • 11. What is Decision Tree? Age Car type YES YES NO <30 >=30 Minivan Sport,Truck Minivan YES YESSports, Truck NO 0 30 60 Age contd.. SAMCD
  • 12. A decision tree is built top-down from a root node and involves partitioning the data into subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is zero and if the sample is an equally divided it has entropy of one. Entrop y Information GainThe information gain is based on the decrease in entropy after a dataset is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches). Step 1 : Calculate entropy of the target. Step 2 : The dataset is then split on the different attributes. The entropy for each branch is calculated. contd.. SAMCD
  • 13. Use of probability allows flexibility Objective analysis to decision making Encourages clear thinking and planning SAMCD Advantages of Decision Tree
  • 14. The architecture of our malware detection system. The system consists of three main modules: 1.PE-Miner 2.Feature selection and data transformation 3.Learning algorithms such as RIPPER. VISION OF SYSTEM ARCHITECTURE contd.. SAMCD
  • 15. PE- Miner PE header DLL & DLL call Function Feature database Feature Selection and Transformation Testing set Training set Learning algorithms Classifications result SAMCD
  • 16. We propose data mining algorithm to produce new classifiers with separate features RIPPER algorithm The RIPPER algorithm is an inductive rule learner Developed to detect examples of malicious executables This algorithm is using a LibBFD data as characteristics Building a set of rules that is able to determine the classes while reducing the ambiguities SAMCD ALGORITHM
  • 17. FLOW OF PROJECT No Stop Is Information of File gain count = 0? Start Prepare the Dataset for .exe Files Read File attributes from particular File Separate call header & call code of File Prepare for Testing Prepare for the Training set No Change the prediction attribute Is accuracy of prediction correct? Yes Files is dirty / malicious Files is clean / non malicious SAMCD 17
  • 18.  Fast testing  Low overhead  Robust against many confusion SAMCD 18 ADVANTAGES
  • 19.  There is a need for a technique in which detection of malicious patterns in executable code sequences can be done more efficiently.  It is expected that this procedure will lead to the development of better algorithms for identifying the malicious code that has infected a system. SAMCD CONCLUSION
  • 20. REFERENCES  International Journal of Computer Science Trends and Technology http://www.ijcstjournal.org/volume-3/issue-1/IJCST-V3l1P12.pdf, 2015.  F. Cohen. "Computer Viruses“. Ph.D thesis, University of California, 1985.  William Stallings. “Cryptography and Network Security Principles and Practices, 4ed, 2005  Bhavani Thuraisingham, Data Mining for Security Applications, IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, 2008.  Dr.R.Geetha Ramani, Suresh Kumar.S , Shomona Gracia Jacob”Rootkit (Malicious Code) Prediction through Data Mining Methods and Techniques” , 978-1-4799-1597-2/13/$31.00 ©2013 IEEE.  M. G. Schultz, E. Eskin, E. Zadok and S. J. Stolfo, “Data Mining Methods for Detection of New Malicious Executables”, Proceedings of the 2001 IEEE Symposium on Security and Privacy, IEEE Computer Society.