Presentation for the paper presented at 4th IEEE International Advance Computing Conference
Bhavesh Kasliwal, Shraey Bhatia, Shubham Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar. "A Hybrid Anomaly Detection Model using G-LDA" at 4th IEEE International Advance Computing Conference. (21-22 Feb, 2014)
1. A Hybrid Anomaly Detection
Model using G-LDA
Bhavesh Kasliwal, Shraey Bhatia, Shubham
Saini, I.Sumaiya Thaseen, Ch.Aswani Kumar.
VIT University – Chennai
4. Attribute Selection
“With more data, the simpler solution can be
more accurate than the sophisticated
solution.”
Selection process based on means and
modes of numeric attributes
A contrast between the mode values of
anomaly and normal patterns with their
corresponding means inclined towards the
modes
6. Training Set Selection (using
LDA)
Latent Dirichlet Allocation is a
generative model that allows sets of
observations to be explained by
unobserved groups that explain why
some parts of the data are similar.
Apply LDA (separately on anomaly
and normal packets) to obtain 200
sets of 10 packets each. Each set
dominated by a particular packet type.
7. Sample LDA Output
Topic 0th:
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.25,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly
0,icmp,eco_i,SF,18,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,1,0,1,0.26,0,0,0,0,anomaly
0,tcp,telnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,125,13,1,1,0,0,0.1,0.06,0,255,0.03,0.07,0,0,1,1,0,0,anom
aly
0,tcp,uucp,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,135,9,1,1,0,0,0.07,0.06,0,255,0.04,0.07,0,0,1,1,0,0,anoma
ly
0,tcp,vmnet,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,258,10,1,1,0,0,0.04,0.05,0,255,0.04,0.05,0,0,1,1,0,0,ano
maly
Topic 1th:
0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,3,1,1,0,0,0.21,0.29,0,255,0.25,0.02,0.01,0,1,1,0,0,ano
maly
0,tcp,finger,S0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,246,20,1,1,0,0,0.08,0.06,0,255,0.08,0.07,0,0,1,1,0,0,ano
maly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.55,0.01,0.55,0,0,0,0,0,an
omaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.56,0.02,0.56,0,0.01,0,0,0
,anomaly
0,icmp,ecr_i,SF,1032,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,511,511,0,0,0,0,1,0,0,255,0.6,0.01,0.6,0,0,0,0,0,ano
9. Genetic Algorithm
Applied on Normal and Anomaly
packets separately
Threshold value taken for providing a
negative weight
Run for 3 generations
Top 3 values for anomaly and normal
packets used
10. Identifying nature of incoming
packet
For each selected attribute value Fi in incoming packet
◦ If Fi ∈ Vi
Si = (A* Frequency of Fi in Anomaly) – (Frequency of Fi in Normal)
◦ Else
Si= 0
C = Σ Si
If C > 0
◦ Then Anomaly
Else Normal
11. Additional Weight
Multiplied to the anomaly frequency
Why ?
generic anomalies having diverse values
unlike the normal packets that contain values in
a particular range
• Trade-off between the accuracy and
the false positive rate required