SlideShare ist ein Scribd-Unternehmen logo
1 von 7
Downloaden Sie, um offline zu lesen
Proc. of Int. Conf. on Computational Intelligence and Information Technology

A Heuristic Approach for Network Data Clustering
Preeti Sharma1, and Thaksen J. Parvat2
1

Department of Computer Engineering, SKN SITS, Lonavala, India
Email: preeti5588@gmail.com
2
Department of Computer Engineering, SIT, Lonavala, India
Email: pthaksen.sit@sinhgad.edu

Abstract— In this growing world of technology there are lots of security threats received by
each and every area of computer networks. Most of the time the network security threats
produce high false positive and negative ratios, this creates an obstacle for any security
system to work improperly. The overwhelming threats make it challenging to understand
and manage the network data.
To address this problem we present a novel approach which eventually understand the
network data by clustering them without background knowledge of any threats according to
various parameters like source IP, Destination IP etc. And this approach saves
administrator’s time and energy in processing of large amount threats.
Index Terms— Threats, Alerts, Clusters, K- means, Email data, Enron, network security,
email users, data mining.

I. INTRODUCTION
Intrusions pose a serious security risk in a network environment. Network intrusion detection systems aim to
identify attacks or malicious activity in a network with a high detection rate while maintaining a low false
alarm rate. New emerging threats or attacks are the most difficult to detect. Signature based methods and
misuse detection methods, which rely on labelled patterns, can detect previously known attacks with good
accuracy but are unable to detect new types of attacks. In addition, maintaining the signature data base and
labelling the patterns is time consuming and expensive.
Anomaly detection techniques can make use of un-supervised learning methods to identify new emerging
threats with no need of labelled patterns, but, with a potential false alarm rate. We reviewed the different network intrusion detection methods and present here a comparative study with more emphasis on the
unsupervised learning methods for anomaly detection.
The K-means algorithm was chosen to evaluate the performance of an unsupervised learning method for
anomaly detection using the STES campus data set captured in the MS excel format at gateway. And also we
used Enron Email data set to cluster emails. The results of the evaluation confirm that a high detection rate
can be achieved while maintaining a low false alarm rate.
A network intrusion is any type of attack or malicious activity that can compromise the stability or security of
a network environment. Intruders can be classified in two groups. External intruders don't have authorized
access to the system or network they attack, while internal intruders have some authority access to the system. Network intrusions keep increasing over the years with new emerging and complex threats. This new
emerging threats are the most difficult to identify.
A network intrusion detection system (NIDS) scans the network activities in a computer environment and attempt to detect the intrusions or attacks. Then, the sys-tem administrator may be alerted to the take the
corrective actions.
DOI: 03.LSCS.2013.6.53
© Association of Computer Electronics and Electrical Engineers, 2013
There are generally three types of approaches taken toward network intrusion detection. Signaturebased,misuse detection and anomaly detection. The signature-based method is the oldest method in practice
and depends on a signature database of previously known at-tacks. Misuse detection is a model-based
supervised method which trains a classifier with labelled patterns to classify new unlabeled patterns.
Anomaly detection approaches can make use of supervised or unsupervised methods to detect abnormal
behaviours in patterns. A main objective of this study is to confirm the advantages of anomaly detection for
intrusion detection using a simple clustering algorithm.
The rest of the paper is organized as follows: section 2 will introduce about related work done so far. Section
3 will give proposed work, Section 4 give result analysis process, section 5 will give future work and
conclusion, Section 6 will give idea about implementation, and Section 7 References that we used in our
proposed model.
II. RELATED WORK
TABLE I: LITERATURE SURVEY

Data clustering has been used in network area for different purposes like for intrusion detection, for analysis
of support services etc which is depicted in table 1 [13].
Early work [2] utilized email traffic to infer social net-works for the purpose of discovering communities of
shared interest. Email classification can be applied to several different applications, including filtering
messages based on priority, assigning messages to user-created folders [5], or identifying SPAM.
21
One major consideration in the classification is that of how to represent the messages. Specifically, one must
decide which features to use, and how to apply those features to the classification. [6] defined three types of
features to consider in email: unstructured text, categorical text, and numeric data. Relationship data is
another type of information that could be useful for Classification. Unstructured text in email consists of
fields like the subject and body, which allow for natural language text of any kind. Generally, these fields
have been used in classification using a bag-of-words approach, the same as with other kinds of text
classification [3, 4 and 5].
A technique to classification the email is proposed by Martin, Sewani, Nelson, Chen, and Joseph [12]. The
proposed classification is used for identifying spam messages. Categorical text includes fields such as “to”
and “from”. These differ from unstructured text fields in that the type of data which can be used in them very
well defined. However, these fields have typically been treated the same as the unstructured text fields, with
the components added to the bag of words [10, 11]. These fields have been found to be very useful in
automatic email classification, although not as useful as the unstructured data [10, 12].
The characteristics of the links have been analyzed. Recently there are many analysis of Social Network [7].
In above-mentioned communication analysis, social network analysis techniques are often used. Social network analysis techniques are the methods to calculate the degree distribution of each node in networks, network density, and centrality.
Just as data mining can be loosely described as looking for patterns in data, text mining is about looking for
patterns in text. However, the superficial similarity between the two conceals real differences. Data mining
can be more fully characterized as the extraction of implicit, previously unknown, and potentially useful information from data [Witten and Frank, 2000].
The in-formation is implicit in the input data: it is hidden, un-known, and could hardly be extracted without
recourse to automatic techniques of data mining. With text mining, however, the information to be extracted
is clearly and explicitly stated in the text [4].
The Enron Email Dataset Database Schema and Brief Statistical Report [9] which shows how is the
distribution of emails for every user and showing the network, how the employees are connected. Use of
interpersonal communication for network inference has been of interest to researchers for several decades
[1]. Early work [11] utilized email traffic to infer social networks for the purpose of discovering communities
of shared interest. Email classification can be applied to several different applications, including filtering
messages based on priority, assigning messages to user-created folders, or identifying SPAM.
III. PROPOSED METHOD
In this section, we describe our approach of clustering of network dataset and email dataset according to the
steps shown in figure 1 and 3.
A. Network data clustering [13 and 14]
As shown in figure 1 there are 6 main steps in our approach of network data clustering that we describe the
steps below
Step 1: In this step we have taken spread sheet of net-work data collected over the network of the STES campus by the main server at its end. This dataset is given as the input to our implemented system. Then our
application with the help of jxl archive file it reads the spread sheet and it will take the input in the form of
matrix.

Fig 1: Network Data Clustering Approach

22
Step 2: In this steps user or administrator have to pro-vide a network parameter on the basis of which we
need to cluster the network data. The parameters like Port number, Source IP, Destination IP etc.
Step 3: Here in this step the matrix of network data is been searched in its column for provided parameter
name. If the parameter name is present in the column then for the column we search for the given parameter
value. Then if we find the parameter value then we keep collecting the complete row value in a list. This
operation keep iterating till the spread sheet completes. And that makes a cluster for the given value.
We propose the above Discussed Single parameter clustering using following Algorithm:
// input: Parameter name
1. for j=0 to columns
2. if (cell(i, j)=parameter name) then
3. for i=0 to rows
4. if( cell(i, j)=parameter value) then
5. Collect the row value into a vector to make a cluster
Repeat till the end of spread sheet to form cluster
Step 4: In this step collected cluster vector is written into a new spread sheet and saved in a given path.
Step 5: In this step the user or administrator has to pro-vide two network parameters on the basis of which we
need to cluster the network data. The parameters like Port number and Source IP, time and destination IP etc.
Step 6: In this step the cluster is been formed using K-means algorithm for the multiple values.
K-means clustering is one of the unsupervised computational methods used to group similar objects in to
smaller partitions called clusters so that similar objects are grouped together,. The algorithm aims to
minimize the within cluster variance and maximize the intra cluster’s variance. The technique involves
determining the number of clusters at first and randomly assigning cluster centroids to each cluster from the
whole datasets; this step is called initialization of cluster centroids [8].
The distance between each point in the whole dataset and every cluster centroid is then calculated using a
distance metric. Then, for every data point, the mini-mum distance is determined and that point is assigned to
the closest cluster. This step is called cluster assignment, and is repeated until all of the data points have been
assigned to one of the clusters [8].
Finally, the mean for each cluster is calculated based on the accumulated values of points in each cluster and
the number of points in that cluster. Those means are then assigned as new cluster centroids, and the process
of finding distances between each point and the new centroids is repeated, where points are re-assigned to the
new closest clusters. The process iterates for a fixed number of times, or until points in each cluster stop
moving across to different clusters. This is called convergence. The steps of the K-means can be summarized
in Figure 2 below.
The Algorithmic steps of the k-means can be summarized as below
 Initially, the number of clusters must be known, or chosen, to be K say.
 The initial step is to choose a set of K instances as centres of the clusters. Often chosen such that the
points are mutually “farthest apart”, in some way.
 Next, the algorithm considers each instance and assigns it to the cluster which is closest
 The cluster centroids are recalculated either after each instance assignment, or after the whole cycle
of re-assignments.
 This process is iterated.
B. Email Data Clustering [13 and 14]
As shown in figure 3 there are 4 main steps in our approach. To cluster Email Data that we describe each and
every step below
Step 1: In this step the collected Enron Email set folder is given as input to our system. Where Enron Email
set is having folder with name of users and again within that it is having folder for inbox, outbox and sentbox
emails.
Step 2: In this steps user or administrator have to pro-vide a parameter on the basis of which we need to
cluster the Email data. The parameters are any word/s that we have to search within the emails. Again
administrator have to provide searching folders like inbox, outbox and sentbox etc.
Step 3: Here in this step the given email folder item is scanned within each folder, the emails are read
continuously and keep finding for the given string or word. If the given String is found then the user name is

23
collected in a vector. This operation keeps iterating till all users emails are scanned. And that makes a cluster
for the given String or parameter.
We propose the above Discussed Email Clustering following Algorithm:
// input: Searching String
// input: Searching Email Folder like inbox, outbox, sentbox

Fig 2: Computational steps of the k-means algorithm

1.
2.
3.
4.
5.
6.
7.
8.

Fig 3: E-mail Data Clustering Approach

for i =0 to n //where n is number of User folders
for j =0 to m // where m is number of Email Folders
if ( EmailFolder=Given Email Folder) then
for k=0 to P // where p is number of email files
Read complete Email and get into a String
if ( Email string Contains searching word) then
get the User folder name into a vector
Repeat this for all User folders to get Cluster of users for specific word

IV. RESULTS AND ANALYSIS
As we know Data clustering greatly boost the concept of grouping of similar objects. In our proposed and
implemented system we apply this to cluster network data captured at server’s end in the form of spread
sheets, Where the spread sheet is having many columns like Source IP, Destination IP, Source Port etc,. The
Result of these Clustering can shown and analyze below.
A. Result of Multiple Parameter Clustering
Here in the below table 2 we can see a resulted cluster ,which is for input of a multiple parameter for service
and destination port Number value 27388 and 80 respectively, written in a spread sheet by our application in
the same format as of given input spread sheet. This makes easy to analyze the data. The same kind of result
we are getting in single parameter also, where we are giving only one parameter as input.

24
B. Performance Evaluation of Time For data clustering
Here in the below figure 5 we can see a performance of our system for formation of number clusters .If we
ob-server the graph properly we can find that our system becomes lenient as the number of clusters increases.
This means our system takes almost same time to form more number of clusters. This Indicates we will
perform more accurately for huge amount data. This adds boosting point in our project.
C. K-Means Iteration Performance Evaluation
Here in the below figure 6 we can see that the number of iterations taken by the K-Means Algorithm to form
clusters. If we observer Graph Properly we can see that K- means approximately takes almost same amount
TABLE II: OUTPUT OF MULTI-PARAMETER CLUSTERING

-------Cluster 0-----------For source and Destination Port [27388.0, 50983.0]
For source and Destination Port [27388.0, 41876.0]
For source and Destination Port [4771.0, 80.0]
-------Cluster 1-----------For source and Destination Port [27388.0, 443.0]
For source and Destination Port [56561.0, 8140.0]
For source and Destination Port [59201.0, 80.0]
For source and Destination Port [27388.0, 443.0]
For source and Destination Port [33885.0, 80.0]

Figure 4: Cluster Formation Performance

Figure 5: K-Means Iterations performance

25
of iteration to produce clusters. This shows that K-Means is having good performance ratio for greater
amount of data to form cluster.
V. IMPLEMENTATION
The proposed algorithm is implemented using open source technologies and algorithm is applied over network data collected by the STES Lonavala campus server in the form of Microsoft excel sheets using
Fortigate firewall and Enron email database respectively.
Java is selected as the programming languages and the other open source API’s (Application Programming
Interfaces) to Support the other functionalities. Net-Beans 6.9.1 is used as a development IDE (Integrated
Development Environment) for Java and library of other technologies are added as external jar (Java
Archives) in the Netbeans. NetBeans IDE is having feature of integrating the external libraries with it so that
it can make easy to build any applications on its platform .jxl library available for Java that allows users to
read and write the Microsoft spread sheets easily which eventually helps in the system to read data and to
present clustered data.
CONCLUSION AND FUTURE WORK
In this paper network data clustering approach is pro-posed & implemented to show the clustered data which
we can use for further studies of attacks & threats in a network. These clustered data can be used further to
create an attack graph without the knowledge of previous attacks or threats. This eventually boosts the
analysis of threats & alerts in network situation awareness. This kind of clustering systems can help the
administrator to understand the network threats.
Here we also proposed & implemented a model for e-mail clustering to show text similarities. The proposed
techniques shows the e-mail attributes & how the text similarities is use to cluster the users. The Enron
dataset is used for the experiment in our project where the users are clustered based on many parameters in
inbox, sentbox, drafts etc. folders of an email application.
The future scope of this work could be implementation of similar system for the e-mail attachments and real
time e-mails of an organization. This kind of e-mail clustering helps in operations like summarization,
automatic answering machines and conducting surveys.
REFERENCES
[1] Yingjie Zhou, Kenneth R. Fleischmann “Automatic Text Analysis of Values in the Enron Email Dataset”
Proceedings of the 43rd Hawaii International Conference on System Sciences 2010
[2] Minoru Sasaki, Hiroyuki Shinnou” Spam Detection Using Text Clustering” Proceedings of the 2005 Inter-national
Conference on Cyberworlds (CW’05) IEEE 2005
[3] Hui Huang, and Wei Jiang, Jason Cong, “Pattern-Mining for Behavioral Synthesis” IEEE, 2011
[4] Taiwo Ayodele, Shikun Zhou, Rinat Khusainov “Evolving Email Clustering Method for Email Group-ing: A
Machine Learning Approach”, IEEE 2009
[5] Jayadev Gyani “A Novel Approach for Clustering E-mail Users Using Pattern Matching” Syeda Farha Shazmeen
IEEE 2011
[6] Shady Shehata Fakhri Karray “Enhancing Text Clustering using Concept-based Mining Model” IEEE 2006
[7] Longbing Cao Huaifeng Zhang, Yanchang Zhao,Dan Luo, and Chengqi Zhang “Combined Mining: Discovering
Informative Knowledge in Complex Data” IEEE 2011
[8] Data Clustering: A Review (1999) Jain/Murty/Flynn “http://citeseer.ist.psu.edu/jain99data.html”
[9] Brief Statistical Report, /"Technical Report, Information Sciences Institute, 2004. Available at:
http://www.isi.edu/~adibi/Enron/Enron_Dataset_Report .pdf.
[10] W. W. Cohen: Learning Rules that classify E-mail. In Proc. of the 1996 AAAI Spring Symposium in In-formation
Access, 1996.
[11] Y. Diao, H. Lu, and D. Wu “A comparative study of classification based personal email Filtering” In Proc. 4th
Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD'00), pages 408-419, Kyoto, JP, 2000.
[12] Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, Anthony D. Joseph, "Analyzing Behaviorial Features for
Email Classification", Second Conference on Email and Anti-Spam CEAS 2005 , The International Association for
Cryptologic Research and The IEEE Technical Committee on Security and Privacy 2005.
[13] Sharma P., Parvat T. J. “Network log Clustering Using K-means Algorithm” pp 97-105, ITC 2012, Springer-Verlag
Berlin Heidelberg 2012 Jiawei Han, Micheline kamber.: Data Mining Concepts and techniques.: Second Edition,
Morgan Kaufmann Publishers (2006)

26

Weitere ähnliche Inhalte

Was ist angesagt?

Ista presentation-malicious url
Ista presentation-malicious urlIsta presentation-malicious url
Ista presentation-malicious urlvinaykumar R
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsIJCSIS Research Publications
 
Hybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationHybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationAlexander Decker
 
11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classificationAlexander Decker
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...IRJET Journal
 
Probabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network trafficProbabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network trafficAlexander Decker
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectioneSAT Journals
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectioneSAT Publishing House
 
Application of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemApplication of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemAlexander Decker
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksAnvardh Nanduri
 
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...IRJET Journal
 
Ensure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay NetworksEnsure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay NetworksEditor IJCATR
 
Detecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSDetecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSIRJET Journal
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSIJNSA Journal
 

Was ist angesagt? (19)

Ista presentation-malicious url
Ista presentation-malicious urlIsta presentation-malicious url
Ista presentation-malicious url
 
Context Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word PairsContext Sensitive Relatedness Measure of Word Pairs
Context Sensitive Relatedness Measure of Word Pairs
 
Hybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classificationHybrid ga svm for efficient feature selection in e-mail classification
Hybrid ga svm for efficient feature selection in e-mail classification
 
11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification11.hybrid ga svm for efficient feature selection in e-mail classification
11.hybrid ga svm for efficient feature selection in e-mail classification
 
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...An Approach for Malicious Spam Detection in Email with Comparison of Differen...
An Approach for Malicious Spam Detection in Email with Comparison of Differen...
 
Probabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network trafficProbabilistic models for anomaly detection based on usage of network traffic
Probabilistic models for anomaly detection based on usage of network traffic
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
 
A multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detectionA multi classifier prediction model for phishing detection
A multi classifier prediction model for phishing detection
 
Application of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection systemApplication of genetic algorithm in intrusion detection system
Application of genetic algorithm in intrusion detection system
 
Aj35198205
Aj35198205Aj35198205
Aj35198205
 
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...
 
Predicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networksPredicting_new_friendships_in_social_networks
Predicting_new_friendships_in_social_networks
 
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
IRJET-Impact of Manual VS Automatic Transfer Switching on Reliability of Powe...
 
40120130406014 2
40120130406014 240120130406014 2
40120130406014 2
 
paper5
paper5paper5
paper5
 
Ensure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay NetworksEnsure Security and Scalable Performance in Multiple Relay Networks
Ensure Security and Scalable Performance in Multiple Relay Networks
 
Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)Final Report(SuddhasatwaSatpathy)
Final Report(SuddhasatwaSatpathy)
 
Detecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSSDetecting root of the rumor in social network using GSSS
Detecting root of the rumor in social network using GSSS
 
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMSWORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
WORKLOAD CHARACTERIZATION OF SPAM EMAIL FILTERING SYSTEMS
 

Andere mochten auch

Heuristics for Network Design with budget constraint
Heuristics for Network Design with budget constraintHeuristics for Network Design with budget constraint
Heuristics for Network Design with budget constraintlucpiz
 
Heuristics slides 321
Heuristics slides 321Heuristics slides 321
Heuristics slides 321Moe Tmz
 
Cloud auto-scaling with deadline and budget constraints
Cloud auto-scaling with deadline and budget constraintsCloud auto-scaling with deadline and budget constraints
Cloud auto-scaling with deadline and budget constraintsmingtemp
 
Introduction To Heuristically accelerated Hierarchical RL in RTS Games
Introduction To Heuristically accelerated Hierarchical RL in RTS GamesIntroduction To Heuristically accelerated Hierarchical RL in RTS Games
Introduction To Heuristically accelerated Hierarchical RL in RTS GamesOmar Enayet
 
Heuristic Search
Heuristic SearchHeuristic Search
Heuristic Searchbutest
 

Andere mochten auch (6)

Heuristics for Network Design with budget constraint
Heuristics for Network Design with budget constraintHeuristics for Network Design with budget constraint
Heuristics for Network Design with budget constraint
 
Heuristics slides 321
Heuristics slides 321Heuristics slides 321
Heuristics slides 321
 
Cloud auto-scaling with deadline and budget constraints
Cloud auto-scaling with deadline and budget constraintsCloud auto-scaling with deadline and budget constraints
Cloud auto-scaling with deadline and budget constraints
 
Introduction To Heuristically accelerated Hierarchical RL in RTS Games
Introduction To Heuristically accelerated Hierarchical RL in RTS GamesIntroduction To Heuristically accelerated Hierarchical RL in RTS Games
Introduction To Heuristically accelerated Hierarchical RL in RTS Games
 
Heuristic Search
Heuristic SearchHeuristic Search
Heuristic Search
 
Functional strategies
Functional strategiesFunctional strategies
Functional strategies
 

Ähnlich wie A Heuristic Approach for Network Data Clustering

The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...IRJET Journal
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...IJNSA Journal
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsIRJET Journal
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficeSAT Journals
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficeSAT Publishing House
 
A web application detecting dos attack using mca and tam
A web application detecting dos attack using mca and tamA web application detecting dos attack using mca and tam
A web application detecting dos attack using mca and tameSAT Journals
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETijmnct
 
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...cscpconf
 
Log management siem 5651 sayılı yasa
Log management siem 5651 sayılı yasaLog management siem 5651 sayılı yasa
Log management siem 5651 sayılı yasaErtugrul Akbas
 
Enhancing SIEM Correlation Rules Through Baselining
Enhancing SIEM Correlation Rules Through BaseliningEnhancing SIEM Correlation Rules Through Baselining
Enhancing SIEM Correlation Rules Through BaseliningErtugrul Akbas
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesiaemedu
 
A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...Alexander Decker
 
Indonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmIndonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmjournalBEEI
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTIJERA Editor
 
Data Mining Techniques for Providing Network Security through Intrusion Detec...
Data Mining Techniques for Providing Network Security through Intrusion Detec...Data Mining Techniques for Providing Network Security through Intrusion Detec...
Data Mining Techniques for Providing Network Security through Intrusion Detec...IJAAS Team
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainIJCI JOURNAL
 
Internet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionInternet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionGyan Prakash
 
Improving the performance of Intrusion detection systems
Improving the performance of Intrusion detection systemsImproving the performance of Intrusion detection systems
Improving the performance of Intrusion detection systemsyasmen essam
 
IRJET- Attack Detection Strategies in Wireless Sensor Network
IRJET- Attack Detection Strategies in Wireless Sensor NetworkIRJET- Attack Detection Strategies in Wireless Sensor Network
IRJET- Attack Detection Strategies in Wireless Sensor NetworkIRJET Journal
 

Ähnlich wie A Heuristic Approach for Network Data Clustering (20)

The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
 
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
OPTIMIZING HYPERPARAMETERS FOR ENHANCED EMAIL CLASSIFICATION AND FORENSIC ANA...
 
Study of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam EmailsStudy of Various Techniques to Filter Spam Emails
Study of Various Techniques to Filter Spam Emails
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
 
Online stream mining approach for clustering network traffic
Online stream mining approach for clustering network trafficOnline stream mining approach for clustering network traffic
Online stream mining approach for clustering network traffic
 
A web application detecting dos attack using mca and tam
A web application detecting dos attack using mca and tamA web application detecting dos attack using mca and tam
A web application detecting dos attack using mca and tam
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASETISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
ISSUES RELATED TO SAMPLING TECHNIQUES FOR NETWORK TRAFFIC DATASET
 
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
SAMPLING BASED APPROACHES TO HANDLE IMBALANCES IN NETWORK TRAFFIC DATASET FOR...
 
Log management siem 5651 sayılı yasa
Log management siem 5651 sayılı yasaLog management siem 5651 sayılı yasa
Log management siem 5651 sayılı yasa
 
Enhancing SIEM Correlation Rules Through Baselining
Enhancing SIEM Correlation Rules Through BaseliningEnhancing SIEM Correlation Rules Through Baselining
Enhancing SIEM Correlation Rules Through Baselining
 
Integration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniquesIntegration of feature sets with machine learning techniques
Integration of feature sets with machine learning techniques
 
A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...A new approach of detecting network anomalies using improved id3 with horizon...
A new approach of detecting network anomalies using improved id3 with horizon...
 
Indonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithmIndonesian language email spam detection using N-gram and Naïve Bayes algorithm
Indonesian language email spam detection using N-gram and Naïve Bayes algorithm
 
Analysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOTAnalysis on different Data mining Techniques and algorithms used in IOT
Analysis on different Data mining Techniques and algorithms used in IOT
 
Data Mining Techniques for Providing Network Security through Intrusion Detec...
Data Mining Techniques for Providing Network Security through Intrusion Detec...Data Mining Techniques for Providing Network Security through Intrusion Detec...
Data Mining Techniques for Providing Network Security through Intrusion Detec...
 
Evaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chainEvaluation of network intrusion detection using markov chain
Evaluation of network intrusion detection using markov chain
 
Internet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detectionInternet ttraffic monitering anomalous behiviour detection
Internet ttraffic monitering anomalous behiviour detection
 
Improving the performance of Intrusion detection systems
Improving the performance of Intrusion detection systemsImproving the performance of Intrusion detection systems
Improving the performance of Intrusion detection systems
 
IRJET- Attack Detection Strategies in Wireless Sensor Network
IRJET- Attack Detection Strategies in Wireless Sensor NetworkIRJET- Attack Detection Strategies in Wireless Sensor Network
IRJET- Attack Detection Strategies in Wireless Sensor Network
 

Mehr von idescitation (20)

65 113-121
65 113-12165 113-121
65 113-121
 
69 122-128
69 122-12869 122-128
69 122-128
 
71 338-347
71 338-34771 338-347
71 338-347
 
72 129-135
72 129-13572 129-135
72 129-135
 
74 136-143
74 136-14374 136-143
74 136-143
 
80 152-157
80 152-15780 152-157
80 152-157
 
82 348-355
82 348-35582 348-355
82 348-355
 
84 11-21
84 11-2184 11-21
84 11-21
 
62 328-337
62 328-33762 328-337
62 328-337
 
46 102-112
46 102-11246 102-112
46 102-112
 
47 292-298
47 292-29847 292-298
47 292-298
 
49 299-305
49 299-30549 299-305
49 299-305
 
57 306-311
57 306-31157 306-311
57 306-311
 
60 312-318
60 312-31860 312-318
60 312-318
 
5 1-10
5 1-105 1-10
5 1-10
 
11 69-81
11 69-8111 69-81
11 69-81
 
14 284-291
14 284-29114 284-291
14 284-291
 
15 82-87
15 82-8715 82-87
15 82-87
 
29 88-96
29 88-9629 88-96
29 88-96
 
43 97-101
43 97-10143 97-101
43 97-101
 

Kürzlich hochgeladen

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Kürzlich hochgeladen (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 

A Heuristic Approach for Network Data Clustering

  • 1. Proc. of Int. Conf. on Computational Intelligence and Information Technology A Heuristic Approach for Network Data Clustering Preeti Sharma1, and Thaksen J. Parvat2 1 Department of Computer Engineering, SKN SITS, Lonavala, India Email: preeti5588@gmail.com 2 Department of Computer Engineering, SIT, Lonavala, India Email: pthaksen.sit@sinhgad.edu Abstract— In this growing world of technology there are lots of security threats received by each and every area of computer networks. Most of the time the network security threats produce high false positive and negative ratios, this creates an obstacle for any security system to work improperly. The overwhelming threats make it challenging to understand and manage the network data. To address this problem we present a novel approach which eventually understand the network data by clustering them without background knowledge of any threats according to various parameters like source IP, Destination IP etc. And this approach saves administrator’s time and energy in processing of large amount threats. Index Terms— Threats, Alerts, Clusters, K- means, Email data, Enron, network security, email users, data mining. I. INTRODUCTION Intrusions pose a serious security risk in a network environment. Network intrusion detection systems aim to identify attacks or malicious activity in a network with a high detection rate while maintaining a low false alarm rate. New emerging threats or attacks are the most difficult to detect. Signature based methods and misuse detection methods, which rely on labelled patterns, can detect previously known attacks with good accuracy but are unable to detect new types of attacks. In addition, maintaining the signature data base and labelling the patterns is time consuming and expensive. Anomaly detection techniques can make use of un-supervised learning methods to identify new emerging threats with no need of labelled patterns, but, with a potential false alarm rate. We reviewed the different network intrusion detection methods and present here a comparative study with more emphasis on the unsupervised learning methods for anomaly detection. The K-means algorithm was chosen to evaluate the performance of an unsupervised learning method for anomaly detection using the STES campus data set captured in the MS excel format at gateway. And also we used Enron Email data set to cluster emails. The results of the evaluation confirm that a high detection rate can be achieved while maintaining a low false alarm rate. A network intrusion is any type of attack or malicious activity that can compromise the stability or security of a network environment. Intruders can be classified in two groups. External intruders don't have authorized access to the system or network they attack, while internal intruders have some authority access to the system. Network intrusions keep increasing over the years with new emerging and complex threats. This new emerging threats are the most difficult to identify. A network intrusion detection system (NIDS) scans the network activities in a computer environment and attempt to detect the intrusions or attacks. Then, the sys-tem administrator may be alerted to the take the corrective actions. DOI: 03.LSCS.2013.6.53 © Association of Computer Electronics and Electrical Engineers, 2013
  • 2. There are generally three types of approaches taken toward network intrusion detection. Signaturebased,misuse detection and anomaly detection. The signature-based method is the oldest method in practice and depends on a signature database of previously known at-tacks. Misuse detection is a model-based supervised method which trains a classifier with labelled patterns to classify new unlabeled patterns. Anomaly detection approaches can make use of supervised or unsupervised methods to detect abnormal behaviours in patterns. A main objective of this study is to confirm the advantages of anomaly detection for intrusion detection using a simple clustering algorithm. The rest of the paper is organized as follows: section 2 will introduce about related work done so far. Section 3 will give proposed work, Section 4 give result analysis process, section 5 will give future work and conclusion, Section 6 will give idea about implementation, and Section 7 References that we used in our proposed model. II. RELATED WORK TABLE I: LITERATURE SURVEY Data clustering has been used in network area for different purposes like for intrusion detection, for analysis of support services etc which is depicted in table 1 [13]. Early work [2] utilized email traffic to infer social net-works for the purpose of discovering communities of shared interest. Email classification can be applied to several different applications, including filtering messages based on priority, assigning messages to user-created folders [5], or identifying SPAM. 21
  • 3. One major consideration in the classification is that of how to represent the messages. Specifically, one must decide which features to use, and how to apply those features to the classification. [6] defined three types of features to consider in email: unstructured text, categorical text, and numeric data. Relationship data is another type of information that could be useful for Classification. Unstructured text in email consists of fields like the subject and body, which allow for natural language text of any kind. Generally, these fields have been used in classification using a bag-of-words approach, the same as with other kinds of text classification [3, 4 and 5]. A technique to classification the email is proposed by Martin, Sewani, Nelson, Chen, and Joseph [12]. The proposed classification is used for identifying spam messages. Categorical text includes fields such as “to” and “from”. These differ from unstructured text fields in that the type of data which can be used in them very well defined. However, these fields have typically been treated the same as the unstructured text fields, with the components added to the bag of words [10, 11]. These fields have been found to be very useful in automatic email classification, although not as useful as the unstructured data [10, 12]. The characteristics of the links have been analyzed. Recently there are many analysis of Social Network [7]. In above-mentioned communication analysis, social network analysis techniques are often used. Social network analysis techniques are the methods to calculate the degree distribution of each node in networks, network density, and centrality. Just as data mining can be loosely described as looking for patterns in data, text mining is about looking for patterns in text. However, the superficial similarity between the two conceals real differences. Data mining can be more fully characterized as the extraction of implicit, previously unknown, and potentially useful information from data [Witten and Frank, 2000]. The in-formation is implicit in the input data: it is hidden, un-known, and could hardly be extracted without recourse to automatic techniques of data mining. With text mining, however, the information to be extracted is clearly and explicitly stated in the text [4]. The Enron Email Dataset Database Schema and Brief Statistical Report [9] which shows how is the distribution of emails for every user and showing the network, how the employees are connected. Use of interpersonal communication for network inference has been of interest to researchers for several decades [1]. Early work [11] utilized email traffic to infer social networks for the purpose of discovering communities of shared interest. Email classification can be applied to several different applications, including filtering messages based on priority, assigning messages to user-created folders, or identifying SPAM. III. PROPOSED METHOD In this section, we describe our approach of clustering of network dataset and email dataset according to the steps shown in figure 1 and 3. A. Network data clustering [13 and 14] As shown in figure 1 there are 6 main steps in our approach of network data clustering that we describe the steps below Step 1: In this step we have taken spread sheet of net-work data collected over the network of the STES campus by the main server at its end. This dataset is given as the input to our implemented system. Then our application with the help of jxl archive file it reads the spread sheet and it will take the input in the form of matrix. Fig 1: Network Data Clustering Approach 22
  • 4. Step 2: In this steps user or administrator have to pro-vide a network parameter on the basis of which we need to cluster the network data. The parameters like Port number, Source IP, Destination IP etc. Step 3: Here in this step the matrix of network data is been searched in its column for provided parameter name. If the parameter name is present in the column then for the column we search for the given parameter value. Then if we find the parameter value then we keep collecting the complete row value in a list. This operation keep iterating till the spread sheet completes. And that makes a cluster for the given value. We propose the above Discussed Single parameter clustering using following Algorithm: // input: Parameter name 1. for j=0 to columns 2. if (cell(i, j)=parameter name) then 3. for i=0 to rows 4. if( cell(i, j)=parameter value) then 5. Collect the row value into a vector to make a cluster Repeat till the end of spread sheet to form cluster Step 4: In this step collected cluster vector is written into a new spread sheet and saved in a given path. Step 5: In this step the user or administrator has to pro-vide two network parameters on the basis of which we need to cluster the network data. The parameters like Port number and Source IP, time and destination IP etc. Step 6: In this step the cluster is been formed using K-means algorithm for the multiple values. K-means clustering is one of the unsupervised computational methods used to group similar objects in to smaller partitions called clusters so that similar objects are grouped together,. The algorithm aims to minimize the within cluster variance and maximize the intra cluster’s variance. The technique involves determining the number of clusters at first and randomly assigning cluster centroids to each cluster from the whole datasets; this step is called initialization of cluster centroids [8]. The distance between each point in the whole dataset and every cluster centroid is then calculated using a distance metric. Then, for every data point, the mini-mum distance is determined and that point is assigned to the closest cluster. This step is called cluster assignment, and is repeated until all of the data points have been assigned to one of the clusters [8]. Finally, the mean for each cluster is calculated based on the accumulated values of points in each cluster and the number of points in that cluster. Those means are then assigned as new cluster centroids, and the process of finding distances between each point and the new centroids is repeated, where points are re-assigned to the new closest clusters. The process iterates for a fixed number of times, or until points in each cluster stop moving across to different clusters. This is called convergence. The steps of the K-means can be summarized in Figure 2 below. The Algorithmic steps of the k-means can be summarized as below  Initially, the number of clusters must be known, or chosen, to be K say.  The initial step is to choose a set of K instances as centres of the clusters. Often chosen such that the points are mutually “farthest apart”, in some way.  Next, the algorithm considers each instance and assigns it to the cluster which is closest  The cluster centroids are recalculated either after each instance assignment, or after the whole cycle of re-assignments.  This process is iterated. B. Email Data Clustering [13 and 14] As shown in figure 3 there are 4 main steps in our approach. To cluster Email Data that we describe each and every step below Step 1: In this step the collected Enron Email set folder is given as input to our system. Where Enron Email set is having folder with name of users and again within that it is having folder for inbox, outbox and sentbox emails. Step 2: In this steps user or administrator have to pro-vide a parameter on the basis of which we need to cluster the Email data. The parameters are any word/s that we have to search within the emails. Again administrator have to provide searching folders like inbox, outbox and sentbox etc. Step 3: Here in this step the given email folder item is scanned within each folder, the emails are read continuously and keep finding for the given string or word. If the given String is found then the user name is 23
  • 5. collected in a vector. This operation keeps iterating till all users emails are scanned. And that makes a cluster for the given String or parameter. We propose the above Discussed Email Clustering following Algorithm: // input: Searching String // input: Searching Email Folder like inbox, outbox, sentbox Fig 2: Computational steps of the k-means algorithm 1. 2. 3. 4. 5. 6. 7. 8. Fig 3: E-mail Data Clustering Approach for i =0 to n //where n is number of User folders for j =0 to m // where m is number of Email Folders if ( EmailFolder=Given Email Folder) then for k=0 to P // where p is number of email files Read complete Email and get into a String if ( Email string Contains searching word) then get the User folder name into a vector Repeat this for all User folders to get Cluster of users for specific word IV. RESULTS AND ANALYSIS As we know Data clustering greatly boost the concept of grouping of similar objects. In our proposed and implemented system we apply this to cluster network data captured at server’s end in the form of spread sheets, Where the spread sheet is having many columns like Source IP, Destination IP, Source Port etc,. The Result of these Clustering can shown and analyze below. A. Result of Multiple Parameter Clustering Here in the below table 2 we can see a resulted cluster ,which is for input of a multiple parameter for service and destination port Number value 27388 and 80 respectively, written in a spread sheet by our application in the same format as of given input spread sheet. This makes easy to analyze the data. The same kind of result we are getting in single parameter also, where we are giving only one parameter as input. 24
  • 6. B. Performance Evaluation of Time For data clustering Here in the below figure 5 we can see a performance of our system for formation of number clusters .If we ob-server the graph properly we can find that our system becomes lenient as the number of clusters increases. This means our system takes almost same time to form more number of clusters. This Indicates we will perform more accurately for huge amount data. This adds boosting point in our project. C. K-Means Iteration Performance Evaluation Here in the below figure 6 we can see that the number of iterations taken by the K-Means Algorithm to form clusters. If we observer Graph Properly we can see that K- means approximately takes almost same amount TABLE II: OUTPUT OF MULTI-PARAMETER CLUSTERING -------Cluster 0-----------For source and Destination Port [27388.0, 50983.0] For source and Destination Port [27388.0, 41876.0] For source and Destination Port [4771.0, 80.0] -------Cluster 1-----------For source and Destination Port [27388.0, 443.0] For source and Destination Port [56561.0, 8140.0] For source and Destination Port [59201.0, 80.0] For source and Destination Port [27388.0, 443.0] For source and Destination Port [33885.0, 80.0] Figure 4: Cluster Formation Performance Figure 5: K-Means Iterations performance 25
  • 7. of iteration to produce clusters. This shows that K-Means is having good performance ratio for greater amount of data to form cluster. V. IMPLEMENTATION The proposed algorithm is implemented using open source technologies and algorithm is applied over network data collected by the STES Lonavala campus server in the form of Microsoft excel sheets using Fortigate firewall and Enron email database respectively. Java is selected as the programming languages and the other open source API’s (Application Programming Interfaces) to Support the other functionalities. Net-Beans 6.9.1 is used as a development IDE (Integrated Development Environment) for Java and library of other technologies are added as external jar (Java Archives) in the Netbeans. NetBeans IDE is having feature of integrating the external libraries with it so that it can make easy to build any applications on its platform .jxl library available for Java that allows users to read and write the Microsoft spread sheets easily which eventually helps in the system to read data and to present clustered data. CONCLUSION AND FUTURE WORK In this paper network data clustering approach is pro-posed & implemented to show the clustered data which we can use for further studies of attacks & threats in a network. These clustered data can be used further to create an attack graph without the knowledge of previous attacks or threats. This eventually boosts the analysis of threats & alerts in network situation awareness. This kind of clustering systems can help the administrator to understand the network threats. Here we also proposed & implemented a model for e-mail clustering to show text similarities. The proposed techniques shows the e-mail attributes & how the text similarities is use to cluster the users. The Enron dataset is used for the experiment in our project where the users are clustered based on many parameters in inbox, sentbox, drafts etc. folders of an email application. The future scope of this work could be implementation of similar system for the e-mail attachments and real time e-mails of an organization. This kind of e-mail clustering helps in operations like summarization, automatic answering machines and conducting surveys. REFERENCES [1] Yingjie Zhou, Kenneth R. Fleischmann “Automatic Text Analysis of Values in the Enron Email Dataset” Proceedings of the 43rd Hawaii International Conference on System Sciences 2010 [2] Minoru Sasaki, Hiroyuki Shinnou” Spam Detection Using Text Clustering” Proceedings of the 2005 Inter-national Conference on Cyberworlds (CW’05) IEEE 2005 [3] Hui Huang, and Wei Jiang, Jason Cong, “Pattern-Mining for Behavioral Synthesis” IEEE, 2011 [4] Taiwo Ayodele, Shikun Zhou, Rinat Khusainov “Evolving Email Clustering Method for Email Group-ing: A Machine Learning Approach”, IEEE 2009 [5] Jayadev Gyani “A Novel Approach for Clustering E-mail Users Using Pattern Matching” Syeda Farha Shazmeen IEEE 2011 [6] Shady Shehata Fakhri Karray “Enhancing Text Clustering using Concept-based Mining Model” IEEE 2006 [7] Longbing Cao Huaifeng Zhang, Yanchang Zhao,Dan Luo, and Chengqi Zhang “Combined Mining: Discovering Informative Knowledge in Complex Data” IEEE 2011 [8] Data Clustering: A Review (1999) Jain/Murty/Flynn “http://citeseer.ist.psu.edu/jain99data.html” [9] Brief Statistical Report, /"Technical Report, Information Sciences Institute, 2004. Available at: http://www.isi.edu/~adibi/Enron/Enron_Dataset_Report .pdf. [10] W. W. Cohen: Learning Rules that classify E-mail. In Proc. of the 1996 AAAI Spring Symposium in In-formation Access, 1996. [11] Y. Diao, H. Lu, and D. Wu “A comparative study of classification based personal email Filtering” In Proc. 4th Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD'00), pages 408-419, Kyoto, JP, 2000. [12] Steve Martin, Anil Sewani, Blaine Nelson, Karl Chen, Anthony D. Joseph, "Analyzing Behaviorial Features for Email Classification", Second Conference on Email and Anti-Spam CEAS 2005 , The International Association for Cryptologic Research and The IEEE Technical Committee on Security and Privacy 2005. [13] Sharma P., Parvat T. J. “Network log Clustering Using K-means Algorithm” pp 97-105, ITC 2012, Springer-Verlag Berlin Heidelberg 2012 Jiawei Han, Micheline kamber.: Data Mining Concepts and techniques.: Second Edition, Morgan Kaufmann Publishers (2006) 26