2. 01/18/15 Data Mining in Intrusion Detection 2
Intrusion detection and computer security
Current intrusion detection approaches
Data mining
Data mining tool-Weka
3. 01/18/15 Data Mining in Intrusion Detection 3
Computer security goals: confidentiality,
integrity, and availability
Intrusion is a set of actions aimed to
compromise these security goals
Intrusion prevention (authentication,
encryption, etc.) alone is not sufficient
Intrusion detection is needed
4. 01/18/15 Data Mining in Intrusion Detection 4
Primary assumption: user and program
activities can be monitored and modeled
Key elements:
Resources to be protected
Models of the “normal” or “legitimate”
behavior on the resources
Efficient methods that compare real-time
activities against the models and report
probably “intrusive” activities.
5. 01/18/15 Data Mining in Intrusion Detection 5
Two categories of techniques:
Misuse detection: use patterns of well-known
attacks to identify intrusions
Anomaly detection: use deviation from normal
usage patterns to identify intrusions
6. Knowledge Discovery in Databases (KDD)
“Process of extracting useful information from large databases”
KDD basic steps
1. Understanding the application domain
2. Data integration and selection
3. Data mining
4. Pattern Evaluation
5. Knowledge representation
Related Fields
Machine learning, statistics, others
01/18/15 Data Mining in Intrusion Detection 6
7. “concerned with uncovering patterns, associations, changes, anomalies,
and statistically significant structures and events in data”
Why Data Mining?
Understand existing data
Predict new data
Components
Representation
▪ Decide on what model can we build.
▪ Model is a compact summary of examples.
Learning Element
▪ Builds a model from a set of examples
Performance Element
▪ Applies the model to new observations
01/18/15 7Data Mining in Intrusion Detection
8. 01/18/15 Data Mining in Intrusion Detection 8
Why is it applicable to intrusion detection?
Normal and intrusive activities leave evidence
in audit data
From the data-centric point view, intrusion
detection is a data analysis process
Successful applications in related domains,
e.g., fraud detection, fault/alarm
management
9. Well-known and used in Intrusion Detection
Association Rules [Descriptive]
Classification [Predictive]
Clustering [Descriptive]
Preliminary step
Raw Data DatabaseTable (Training set)
Columns – Attributes
Rows - Records
01/18/15 Data Mining in Intrusion Detection 9
10. Motivated by market-basket analysis
Generate Rules that capture implications between
attribute values
Rule Example
Lettuce &Tomato -> Salad Dressing [0.4, 0.9]
Parameters [s, c]
Support (s) % records satisfy LHS and RHS
Confidence (c) = P(satisfies RHS | satisfies LHS)
Mining Problem
“Find all association rules that have support and
confidence > user-defined minimum value”
01/18/15 Data Mining in Intrusion Detection 10
11. Predefined set of classes
Training set has Class as one of the attributes
Supervised Learning
Mining Problem
“Find a model for class attribute as a function of the values of other
attributes”
Use model to predict class
for new records
Classifier representation
If-then Rules
DecisionTrees
01/18/15 Data Mining in Intrusion Detection 11
12. Given Data Set and Similarity Measure
Unsupervised Learning
Mining Problem
“Group records into clusters such that all records within a cluster are more similar to one
another . And records in separate clusters are less similar another”
Similarity Measures:
Euclidean Distance if attributes are continuous.
Other Problem-specific Measures.
Clustering Methods
Partitioning
▪ Divide data into disjoint partitions
Hierarchical
▪ Root is complete data set, Leaves are individual records, and Intermediate layers -> partitions
01/18/15 Data Mining in Intrusion Detection 12
13. Detection Approach
Misuse Detection
▪ Based on known malicious patterns
(signatures)
Anomaly Detection
▪ Based on deviations from established
normal patterns (profiles)
Data Source
Network-based (NIDS)
▪ Network traffic
Host-based (HIDS)
▪ Audit trails
01/18/15 13Data Mining in Intrusion Detection
14. Signature extraction
Rule matching
Alarm data analysis
Reduce false alarms
Eliminate redundant alarms
Feature selection
Training Data cleaning
01/18/15 Data Mining in Intrusion Detection 14
15. Behavioral Features for Network Anomaly Detection
Attribute values cannot be used as features
Interpretation of protocol specifications
Transform attributes into behavior features
aggregation of the attribute values
Data Mining Challenges
Self-tuning data mining techniques
Pattern-finding and prior knowledge
Modeling of temporal data
Scalability
Incremental mining
01/18/15 15Data Mining in Intrusion Detection
17. 01/18/15 Data Mining in Intrusion Detection 17
Waikato Environment for Knowledge Analysis
It’s a data mining/machine learning tool
developed by Department of Computer
Science, University of Waikato, New Zealand.
Weka is also a bird found only on the islands of
New Zealand.
18. 01/18/15 Data Mining in Intrusion Detection 18
49 data preprocessing tools
76 classification/regression algorithms
8 clustering algorithms
3 algorithms for finding association rules
15 attribute/subset evaluators + 10 search
algorithms for feature selection
18 01/18/15
19. 01/18/15 Data Mining in Intrusion Detection 19
Three graphical user interfaces
“The Explorer” (exploratory data analysis)
“The Experimenter” (experimental
environment)
“The KnowledgeFlow” (new process model
inspired interface)
19 01/18/15
20. 01/18/15 Data Mining in Intrusion Detection 20
01/18/1520
Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
Data can also be read from a URL or from
an SQL database (using JDBC)
Pre-processing tools in WEKA are called
“filters”
WEKA contains filters for:
Discretization, normalization, resampling,
attribute selection, transforming and
combining attributes, …
28. 01/18/15 Data Mining in Intrusion Detection 28
01/18/1528
Classifiers in WEKA are models for
predicting nominal or numeric quantities
Implemented learning schemes include:
Decision trees and lists, instance-based
classifiers, support vector machines, multi-
layer perceptrons, logistic regression, Bayes’
nets, …
29. 01/18/15 Data Mining in Intrusion Detection 29
01/18/15
University of Waikato
29
30. 01/18/15 Data Mining in Intrusion Detection 30
01/18/15
University of Waikato
30
31. 01/18/15 Data Mining in Intrusion Detection 31
01/18/15
University of Waikato
31