SlideShare ist ein Scribd-Unternehmen logo
1 von 6
Downloaden Sie, um offline zu lesen
International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
DOI : 10.5121/ijnsa.2013.5414 181
AN IMPROVED MULTI-SOM ALGORITHM
Imen Khanchouch1
, Khaddouja Boujenfa2
and Mohamed Limam3
1
LARODEC ISG, University of Tunis
kh.imen88@gmail.com
2
LARODEC ISG, University of Tunis
khadouja.Boujenfa@isg.rnu.tn
3
LARODEC, ISG University of Tunis, Dhofar University, Oman
Mohamed.limam@isg.rnu.tn
ABSTRACT
This paper proposes a clustering algorithm based on the Self Organizing Map (SOM) method. To find the
optimal number of clusters, our algorithm uses the Davies Bouldin index which has not been used
previously in the multi-SOM. The proposed algorithm is compared to three clustering methods based on
five databases. Results show that our algorithm is as performing as concurrent methods.
KEYWORDS
Clustering, SOM, multi-SOM, DB index.
1. INTRODUCTION
Clustering is an unsupervised learning technique aiming to obtain homogeneous partitions of
objects while promoting the heterogeneity between partitions.In the literature there are many
clustering categories such as hierarchical [13], partition-based [5], density-based [1] and neuronal
networks (NN) [6].
Hierarchical methods aim to build a hierarchy of clusters with many levels. There are two types
of hierarchical clustering approaches: the agglomerative methods (bottom-up) and the divisive
methods (Top-down).Agglomerative methods start by many data objects taken as clusters and are
successively joined two by two until obtaining a single partition containing all the objects.
However, divisive methods begin with a sample of data as one cluster and successively get N
divided clusters as objects. Hierarchical methods are time consuming in the presence of large
amount of data. Consequently, the resulting dendrogram is very large and may include incorrect
information.
Partitioning methods divide the data set into disjoint partitions where each partition represents a
cluster. Clusters are formed to optimize an objective partitioning criterion, often called a
similarity function, such as distance. Each cluster is represented by a centroid or a representative
cluster. Partitioning methods suffer from the sensibility of initialization. Thus, inappropriate
initialization may lead to bad results. However, they are faster than hierarchical methods.
Density-based clustering methods aim to discover clusters with different shapes. They are based
on the assumption that regions with high density constitute clusters, which are separated by
regions with low density. They are based on the concept of cloud of points with higher density
where the neighborhoods of a point are defined by a threshold of distance or number of nearest
neighbors.
International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
182
NN are complex systems with interconnected neurons. Unlike hierarchical and partitioning
clustering methods, NN can handle a large number of high dimensional data. Neural Gas is an
artificial NN proposed by [11] which is based on feature vectors to find optimal representations
for the input data. The algorithm’s name refers to the dynamicity of the feature vectors during the
adaptation process.
Based on competitive learning, SOM [6] is the most commonly used NN method. In the training
process, the nodes compete to be the most similar to the input vector node. Euclidean distance is
commonly used to measure distances between input vectors and output nodes’ weights. The node
with the minimum distance is the winner, also known as the Best Matching Unit (BMU). This
latter, is a SOM unit having the closest weight to the current input vector after calculating the
Euclidean distance from each existing weight vector to the chosen input record. Therefore, the
neighbors of the BMU on the map are determined and adjusted. The main function of SOM is to
map the input data from a high dimensional space to a lower dimensional one. It is appropriate for
visualization of high-dimensional data allowing a reduction of data and its complexity. However,
SOM map is insufficient to define the boundaries of each cluster since there is no clear separation
of data items. Thus, extracting partitions from SOM grid is a crucial task. Also, SOM initializes
the topology and the size of the grid where the choice of the size is very sensitive to the
generalization of the method. Hence, we look for an extension of the multi-SOM to overcome
these shortcomings and give the optimal number of clusters without any initialization.
This paper is structured as follows. Section 2 describes different clustering approaches. Section 3
details the multi-SOM approach and the proposed algorithm. Experimental results on real datasets
are given in Section 4. Finally, a conclusion and some future works are given in Section 5.
2. THE MULTI-SOM APPROACH
The multi-SOM method was firstly introduced by [8] for scientific and technical information
analysis specifically for patenting transgenic plant to improve the resistance of plants to pathogen
agents. They proposed an extension of SOM, called multi-SOM, to introduce the notion of
viewpoints into the information analysis with its multiple map visualization and dynamicity. A
viewpoint is defined as a partition of the analyst reasoning. The objects in a partition could be
homogenous or heterogeneous and not necessary similar. However objects in a cluster are similar
and homogenous where a criterion of similarity is inevitably used. Each map in multi-SOM
represents a viewpoint and the information in each map is represented by nodes (classes) and
logical areas (group of classes).
[7] applied multi-SOM on an iconographic database. Iconographic is the collected representation
illustrating a subject which can be an image or a document text. Then, multi-SOM model is
applied in the domain of patent analysis in [10] and [9], where a patent is an official document
conferring a right. The experiments use a database of one thousand patents about oil engineering
technology and indicate the efficiency of viewpoint oriented analysis, where selected viewpoints
correspond to: uses, advantages, patentees and titles subfields of patents.
[12] applied multi-SOM to a zoo data set from the UCI repository to illustrate the technique
combining multiple SOMs which visualizes the different feature maps of the zoo data with color
coded clusters superimposed. The multi-SOM algorithm supplies good map coverage with a
minimal topological defects but it does not facilitate the integration of new data dimensions.
[4] applied the Multi-SOM algorithm for macrophage gene expression analysis. Their proposed
algorithm overcomes some weaknesses of clustering methods which are the cluster number
International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
183
estimation in partitioning methods and the delimitation of partitions from the output grid of SOM
algorithm. The idea of [3] consists on obtaining compact and well separated clusters using an
evaluation criterion namely Dynamic Validity Index (DVI). The DVI metric is derived from
compactness and separation properties. Thus compactness and separation are two criteria to
evaluate clustering quality and to select the optimal clustering layer. Compactness is assessed by
the intra-distance variability which should be minimized and separation is assessed by the inter-
distance between two clusters which should be maximized. The DVI metric is given by
( ) ( ){ }kInterRatiokIntraRatioDVI Kk += = ..1min
Where k denotes the number of activated nodes on the layer and  is a modulating parameter.
Figure 1. Architecture of the proposed multi-SOM algorithm
There are other evaluation criteria such as DVI. [2] used the DB index to measure cluster quality
in Multi-Layer Growing SOM algorithm (GSOM) for expression data analysis. The DB index
aims to define the compactness and how well separated are the clusters. GSOM is a model of NN
algorithm which belongs to the hierarchical agglomerative clustering (HAC) algorithms. It is
based on SOM approach but it starts with a minimum number of nodes which is usually four
nodes and grows with a new node at every iteration.
We have chosen to use the DB index because it belongs to the internal criteria which is based on
the compactness and separation of the clusters and well used in many works but not in the multi-
SOM algorithm. The DB index is given by
( ) ( )
( ) 






 +
= ∑=
≠
ji
ji
c
i
ji
ccd
XdXd
c
DB
,
max
1
1
Where c defines the number of clusters, i and j denote the clusters, d(Xi) and d(Xj) are distances
between all objects in clusters i and j to their respective cluster centroids, and d(ci, cj) is the
distance between centroids. Smaller values of DB index indicate better clustering quality.
International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
184
3. The proposed multi-SOM algorithm
The proposed algorithm (Algorithm1) aims to find the optimal clusters using the DB index as an
evaluation criterion.
Given a dataset, the multi-SOM algorithm first uses the SOM approach to cluster the dataset and
to generate the first SOM grid, namely SOM1 and then the first SOM grid is iteratively clustered
as shown in figure 1. The grid height (Hs) and the grid width (Ws) of SOM1 are given by the user.
The multi-SOM algorithm uses the Batch map function proposed by [3] which is a version of
SOM Kohonen algorithm but it is faster than SOM in the training process. Then, the SOM map is
clustered iteratively from one level to another where the DB index is computed at each level. The
size of the grid decreases at each level until the optimal number of clusters is reached.
In [4], the training process stops when one neuron is reached. However, our proposed algorithm
stops when the DB index gets its minimum value. At this value the optimal number of clusters is
given. Consequently, the proposed algorithm uses less computation time than the one proposed by
[4].
The time complexity of DB is O (DB) = O(C) from the formula of DB index, where C is the
number of clusters. This time complexity is less than the time complexity of the DVI, where O
(DVI) = O (InterRatio + IntraRatio). This integrates many operations to compute the intra- and
inter-distance. Thus, the computation of DVI values at each grid requires more memory space and
time than that of the DB index.
The different steps of the algorithm are as follow:
s: the SOM layer current number
Hs : the SOMs grid height
Ws : the SOMs grid width
Is : the input of SOMs
max_it: the maximum number of iterations for training SOM grids
Algorithm1: multi-SOM
Input: W1, H1, I1, max_it
Output: Optimal cluster number, data partitions
Begin
• Step1: Clustering data by SOM
s = 1;
Batch SOM (W1, H1,I1,max_it) ;
Compute DB index;
s = s+1;
• Step2: Clustering of the SOM and cluster delimitation
HS=HS-1;
WS=WS-1;
repeat
Batch SOM (Ws, Hs, Is, max_it);
Compute DB index on each SOM grid;
s=s+1;
until(DBs<DBs+1);
Return (Data partitions, Optimal cluster number);
End
International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
185
4. EXPERIMENTAL RESULTS
This section gives the results of the proposed multi-SOM algorithm on five public datasets. Our
algorithm is compared with a partitioning clustering method, namely k-means ([5]), a hierarchical
method, namely BIRCH ([1]) and the algorithm proposed by [4]. The different data sets used in
this work are extracted from the UCI machine learning repositories which are: iris, Pima Indians
diabetes, wine, breast cancer and seeds.
Figure 2. Variation of the number of partitions and the values of DB index with Iris
Figure 3. Variation of the number of partitions and the values of DB index with Wine
Figure 2 and 3 illustrate the variation of the DB index and the number of partitions for Iris and
Wine datasets. We note that the DB index decreases gradually as the number of partitions
decreases from one grid to another to obtain better clustering results until it reaches its lowest
value of 0.161 and 0.202 for Iris and Wine datasets respectively. These values are relative to the
optimal number of clusters which is 3 on Iris and Wine datasets
International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013
186
TABLE 1. Evaluation of the proposed multi-SOM algorithm
Class number k-means Birch Ghouila et al.(2008) Multi-SOM
Iris 3 2 3 3 3
Pima 2 2 3 2 2
Breast cancer 2 2 2 2 2
Wine 3 5 4 3 3
Seeds 3 4 2 3 3
For all datasets, results given in Table (1) show that the proposed algorithm is as performing as
that of [4]. On each dataset, the two algorithms succeeded in yielding the real number of clusters.
However, k-means method determines the exact number of clusters only on Pima and breast
datasets. On Iris, Breast and Seeds datasets, Birch algorithm generates the real number of clusters.
With Wine dataset, the number of generated clusters given by the proposed multi-SOM algorithm
is better than those given by k-means and Birch methods.
5. CONCLUSIONS
In this paper, a multi-SOM algorithm based on the DB index to determine the optimal number of
clusters is proposed. In fact, the minimum value generated by DB index refers to the optimal
cluster number. We have shown that our algorithm takes less iterations steps than that of [4].
Thus, the complexity of the proposed multi-SOM algorithm is better than the complexity of [4]
algorithm. As a future work, we will investigate other validity indices and adapt our algorithm to
fuzzy clustering.
REFERENCES
[1] Ester, M., Kriegel, H.-P., Sander, J., and Xu, X, (1996). A density-based algorithm for discovering clusters
in large spatial databases with noise. In Proceedings of 2nd International Conference on KDD,Portland,
Oregon, pp 226–231.
[2] Fonseka, A. and Alahakoon, D, (2010). Exploratory data analysis with multi-layer growing self-organizing
maps. IEEE Xplore, pp 132–137.
[3] Fort J.C. Letrémy P. Cottrell M. (2002) , Advantages and Drawbacks of the Batch Kohonen Algorithm,
Proc. ESANN 2002, Bruges, 2002, M.Verleysen Ed., Editions D Facto, Bruxelles p. 223-230.
[4] Ghouila, A., Yahia, S. B., Malouche, D., Jmel, H., Laouini, D., Z.Guerfali, F., and Abdelhak, S.(2008).
Application of multisom clustering approach to macrophage gene expression analysis. Infection,Genetics
and Evolution, Vol 9,pp 328–329.
[5] J.MacQueen (1967). Some methods for classification and analysis of multivariate observations.
InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol 1, pp 281–
289.
[6] Kohonen, T. (1981). Automatic formation of topological maps of patterns in a self-organising system. In
Proceedings of the 2SCIA, Scand. Conference on Image Analysis, pp 214–220.
[7] Lamirel, J. C. (2002). Multisom: a multimap extension of the som model. Application to information
discovery in an iconographic context.3:pp 1790–1795.
[8] Lamirel, J.-C., Polanco, X., and Francois, C. (2001). Using artificial neural networks for mapping of
science and technology: A multi self-organizing maps approach. Scientometrics, 51:pp 267–292.
[9] Lamirel, J. C. and Shehabi, S. (2006). Multisom: a multimap extension of the som model. Application to
information discovery in an iconographic context. IEEE CONFERENCE PUBLICATIONS, pp 42–54.
[10] Lamirel, J. C., Shehabi, S., Hoffmann, and Francois, C. (2003). Intelligent patent analysis through the use
of a neural network: experiment of multi-viewpoint analysis with the multisom model. pp 7–23.
[11] Martinetz, T. and Schulten, K. (1991). A neural-gas network learns topologies. In Artificial Neural
Networks.North-Holland, Amsterdam, pp 397–402.
[12] Smith, T. (2009). Adapting to increasing data availability using multi-layered self-organizing maps.IEEE
Computer Society, pp 108–113.
[13] Zhang, T., Ramakrishna, R., and Livny, M. (1996). Birch: An efficient data clustering method for very
large databases. pp 103–114.

Weitere ähnliche Inhalte

Was ist angesagt?

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Konstantinos Zagoris
 
A new block cipher for image encryption based on multi chaotic systems
A new block cipher for image encryption based on multi chaotic systemsA new block cipher for image encryption based on multi chaotic systems
A new block cipher for image encryption based on multi chaotic systemsTELKOMNIKA JOURNAL
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesFarzad Nozarian
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATAcsandit
 
Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...journalBEEI
 
Hiding text in speech signal using K-means, LSB techniques and chaotic maps
Hiding text in speech signal using K-means, LSB techniques and chaotic maps Hiding text in speech signal using K-means, LSB techniques and chaotic maps
Hiding text in speech signal using K-means, LSB techniques and chaotic maps IJECEIAES
 
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM ijcisjournal
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmChenghao Jin
 
Intrusion Detection Model using Self Organizing Maps.
Intrusion Detection Model using Self Organizing Maps.Intrusion Detection Model using Self Organizing Maps.
Intrusion Detection Model using Self Organizing Maps.Tushar Shinde
 
3ways to improve semantic segmentation
3ways to improve semantic segmentation3ways to improve semantic segmentation
3ways to improve semantic segmentationFrozen Paradise
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...eSAT Publishing House
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
 

Was ist angesagt? (18)

Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...Handwritten and Machine Printed Text Separation in Document Images using the ...
Handwritten and Machine Printed Text Separation in Document Images using the ...
 
A new block cipher for image encryption based on multi chaotic systems
A new block cipher for image encryption based on multi chaotic systemsA new block cipher for image encryption based on multi chaotic systems
A new block cipher for image encryption based on multi chaotic systems
 
Big data Clustering Algorithms And Strategies
Big data Clustering Algorithms And StrategiesBig data Clustering Algorithms And Strategies
Big data Clustering Algorithms And Strategies
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
A0360109
A0360109A0360109
A0360109
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
CLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATACLUSTERING HYPERSPECTRAL DATA
CLUSTERING HYPERSPECTRAL DATA
 
Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...Architecture neural network deep optimizing based on self organizing feature ...
Architecture neural network deep optimizing based on self organizing feature ...
 
Hiding text in speech signal using K-means, LSB techniques and chaotic maps
Hiding text in speech signal using K-means, LSB techniques and chaotic maps Hiding text in speech signal using K-means, LSB techniques and chaotic maps
Hiding text in speech signal using K-means, LSB techniques and chaotic maps
 
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
Colour Image Segmentation Using Soft Rough Fuzzy-C-Means and Multi Class SVM
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
Ay32333339
Ay32333339Ay32333339
Ay32333339
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
 
Intrusion Detection Model using Self Organizing Maps.
Intrusion Detection Model using Self Organizing Maps.Intrusion Detection Model using Self Organizing Maps.
Intrusion Detection Model using Self Organizing Maps.
 
3ways to improve semantic segmentation
3ways to improve semantic segmentation3ways to improve semantic segmentation
3ways to improve semantic segmentation
 
Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...Stegnography of high embedding efficiency by using an extended matrix encodin...
Stegnography of high embedding efficiency by using an extended matrix encodin...
 
I0341042048
I0341042048I0341042048
I0341042048
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
 

Andere mochten auch

Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)
Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)
Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)Ramadhani Sardiman
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitudeWith Company
 
50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)Heinz Marketing Inc
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer ExperienceYuan Wang
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanPost Planner
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionIn a Rocket
 
20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage ContentBarry Feldman
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting PersonalKirsty Hulse
 

Andere mochten auch (8)

Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)
Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)
Biologi - Percobaan Pertumbuhan Biji Kacang Hijau 2 (Isi)
 
Prototyping is an attitude
Prototyping is an attitudePrototyping is an attitude
Prototyping is an attitude
 
50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)50 Essential Content Marketing Hacks (Content Marketing World)
50 Essential Content Marketing Hacks (Content Marketing World)
 
10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience10 Insightful Quotes On Designing A Better Customer Experience
10 Insightful Quotes On Designing A Better Customer Experience
 
How to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media PlanHow to Build a Dynamic Social Media Plan
How to Build a Dynamic Social Media Plan
 
Learn BEM: CSS Naming Convention
Learn BEM: CSS Naming ConventionLearn BEM: CSS Naming Convention
Learn BEM: CSS Naming Convention
 
20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content20 Ideas for your Website Homepage Content
20 Ideas for your Website Homepage Content
 
SEO: Getting Personal
SEO: Getting PersonalSEO: Getting Personal
SEO: Getting Personal
 

Ähnlich wie AN IMPROVED MULTI-SOM ALGORITHM

Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somArchiLab 7
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957IJMER
 
Distributed vertex cover
Distributed vertex coverDistributed vertex cover
Distributed vertex coverIJCNCJournal
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Editor IJARCET
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving dataiaemedu
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc NetworkA Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc NetworkIOSR Journals
 
An ann approach for network
An ann approach for networkAn ann approach for network
An ann approach for networkIJNSA Journal
 
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...IJNSA Journal
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
 
A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...
A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...
A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...cscpconf
 

Ähnlich wie AN IMPROVED MULTI-SOM ALGORITHM (20)

Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the som
 
Ba2419551957
Ba2419551957Ba2419551957
Ba2419551957
 
Distributed vertex cover
Distributed vertex coverDistributed vertex cover
Distributed vertex cover
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
A frame work for clustering time evolving data
A frame work for clustering time evolving dataA frame work for clustering time evolving data
A frame work for clustering time evolving data
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Az36311316
Az36311316Az36311316
Az36311316
 
50120140501016
5012014050101650120140501016
50120140501016
 
Dy4301752755
Dy4301752755Dy4301752755
Dy4301752755
 
F017123439
F017123439F017123439
F017123439
 
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc NetworkA Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
A Survey Paper on Cluster Head Selection Techniques for Mobile Ad-Hoc Network
 
An ann approach for network
An ann approach for networkAn ann approach for network
An ann approach for network
 
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
AN ANN APPROACH FOR NETWORK INTRUSION DETECTION USING ENTROPY BASED FEATURE S...
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
A fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming dataA fuzzy clustering algorithm for high dimensional streaming data
A fuzzy clustering algorithm for high dimensional streaming data
 
A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...
A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...
A GENERIC ALGORITHM TO DETERMINE CONNECTED DOMINATING SETS FOR MOBILE AD HOC ...
 

Kürzlich hochgeladen

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 

AN IMPROVED MULTI-SOM ALGORITHM

  • 1. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013 DOI : 10.5121/ijnsa.2013.5414 181 AN IMPROVED MULTI-SOM ALGORITHM Imen Khanchouch1 , Khaddouja Boujenfa2 and Mohamed Limam3 1 LARODEC ISG, University of Tunis kh.imen88@gmail.com 2 LARODEC ISG, University of Tunis khadouja.Boujenfa@isg.rnu.tn 3 LARODEC, ISG University of Tunis, Dhofar University, Oman Mohamed.limam@isg.rnu.tn ABSTRACT This paper proposes a clustering algorithm based on the Self Organizing Map (SOM) method. To find the optimal number of clusters, our algorithm uses the Davies Bouldin index which has not been used previously in the multi-SOM. The proposed algorithm is compared to three clustering methods based on five databases. Results show that our algorithm is as performing as concurrent methods. KEYWORDS Clustering, SOM, multi-SOM, DB index. 1. INTRODUCTION Clustering is an unsupervised learning technique aiming to obtain homogeneous partitions of objects while promoting the heterogeneity between partitions.In the literature there are many clustering categories such as hierarchical [13], partition-based [5], density-based [1] and neuronal networks (NN) [6]. Hierarchical methods aim to build a hierarchy of clusters with many levels. There are two types of hierarchical clustering approaches: the agglomerative methods (bottom-up) and the divisive methods (Top-down).Agglomerative methods start by many data objects taken as clusters and are successively joined two by two until obtaining a single partition containing all the objects. However, divisive methods begin with a sample of data as one cluster and successively get N divided clusters as objects. Hierarchical methods are time consuming in the presence of large amount of data. Consequently, the resulting dendrogram is very large and may include incorrect information. Partitioning methods divide the data set into disjoint partitions where each partition represents a cluster. Clusters are formed to optimize an objective partitioning criterion, often called a similarity function, such as distance. Each cluster is represented by a centroid or a representative cluster. Partitioning methods suffer from the sensibility of initialization. Thus, inappropriate initialization may lead to bad results. However, they are faster than hierarchical methods. Density-based clustering methods aim to discover clusters with different shapes. They are based on the assumption that regions with high density constitute clusters, which are separated by regions with low density. They are based on the concept of cloud of points with higher density where the neighborhoods of a point are defined by a threshold of distance or number of nearest neighbors.
  • 2. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013 182 NN are complex systems with interconnected neurons. Unlike hierarchical and partitioning clustering methods, NN can handle a large number of high dimensional data. Neural Gas is an artificial NN proposed by [11] which is based on feature vectors to find optimal representations for the input data. The algorithm’s name refers to the dynamicity of the feature vectors during the adaptation process. Based on competitive learning, SOM [6] is the most commonly used NN method. In the training process, the nodes compete to be the most similar to the input vector node. Euclidean distance is commonly used to measure distances between input vectors and output nodes’ weights. The node with the minimum distance is the winner, also known as the Best Matching Unit (BMU). This latter, is a SOM unit having the closest weight to the current input vector after calculating the Euclidean distance from each existing weight vector to the chosen input record. Therefore, the neighbors of the BMU on the map are determined and adjusted. The main function of SOM is to map the input data from a high dimensional space to a lower dimensional one. It is appropriate for visualization of high-dimensional data allowing a reduction of data and its complexity. However, SOM map is insufficient to define the boundaries of each cluster since there is no clear separation of data items. Thus, extracting partitions from SOM grid is a crucial task. Also, SOM initializes the topology and the size of the grid where the choice of the size is very sensitive to the generalization of the method. Hence, we look for an extension of the multi-SOM to overcome these shortcomings and give the optimal number of clusters without any initialization. This paper is structured as follows. Section 2 describes different clustering approaches. Section 3 details the multi-SOM approach and the proposed algorithm. Experimental results on real datasets are given in Section 4. Finally, a conclusion and some future works are given in Section 5. 2. THE MULTI-SOM APPROACH The multi-SOM method was firstly introduced by [8] for scientific and technical information analysis specifically for patenting transgenic plant to improve the resistance of plants to pathogen agents. They proposed an extension of SOM, called multi-SOM, to introduce the notion of viewpoints into the information analysis with its multiple map visualization and dynamicity. A viewpoint is defined as a partition of the analyst reasoning. The objects in a partition could be homogenous or heterogeneous and not necessary similar. However objects in a cluster are similar and homogenous where a criterion of similarity is inevitably used. Each map in multi-SOM represents a viewpoint and the information in each map is represented by nodes (classes) and logical areas (group of classes). [7] applied multi-SOM on an iconographic database. Iconographic is the collected representation illustrating a subject which can be an image or a document text. Then, multi-SOM model is applied in the domain of patent analysis in [10] and [9], where a patent is an official document conferring a right. The experiments use a database of one thousand patents about oil engineering technology and indicate the efficiency of viewpoint oriented analysis, where selected viewpoints correspond to: uses, advantages, patentees and titles subfields of patents. [12] applied multi-SOM to a zoo data set from the UCI repository to illustrate the technique combining multiple SOMs which visualizes the different feature maps of the zoo data with color coded clusters superimposed. The multi-SOM algorithm supplies good map coverage with a minimal topological defects but it does not facilitate the integration of new data dimensions. [4] applied the Multi-SOM algorithm for macrophage gene expression analysis. Their proposed algorithm overcomes some weaknesses of clustering methods which are the cluster number
  • 3. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013 183 estimation in partitioning methods and the delimitation of partitions from the output grid of SOM algorithm. The idea of [3] consists on obtaining compact and well separated clusters using an evaluation criterion namely Dynamic Validity Index (DVI). The DVI metric is derived from compactness and separation properties. Thus compactness and separation are two criteria to evaluate clustering quality and to select the optimal clustering layer. Compactness is assessed by the intra-distance variability which should be minimized and separation is assessed by the inter- distance between two clusters which should be maximized. The DVI metric is given by ( ) ( ){ }kInterRatiokIntraRatioDVI Kk += = ..1min Where k denotes the number of activated nodes on the layer and  is a modulating parameter. Figure 1. Architecture of the proposed multi-SOM algorithm There are other evaluation criteria such as DVI. [2] used the DB index to measure cluster quality in Multi-Layer Growing SOM algorithm (GSOM) for expression data analysis. The DB index aims to define the compactness and how well separated are the clusters. GSOM is a model of NN algorithm which belongs to the hierarchical agglomerative clustering (HAC) algorithms. It is based on SOM approach but it starts with a minimum number of nodes which is usually four nodes and grows with a new node at every iteration. We have chosen to use the DB index because it belongs to the internal criteria which is based on the compactness and separation of the clusters and well used in many works but not in the multi- SOM algorithm. The DB index is given by ( ) ( ) ( )         + = ∑= ≠ ji ji c i ji ccd XdXd c DB , max 1 1 Where c defines the number of clusters, i and j denote the clusters, d(Xi) and d(Xj) are distances between all objects in clusters i and j to their respective cluster centroids, and d(ci, cj) is the distance between centroids. Smaller values of DB index indicate better clustering quality.
  • 4. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013 184 3. The proposed multi-SOM algorithm The proposed algorithm (Algorithm1) aims to find the optimal clusters using the DB index as an evaluation criterion. Given a dataset, the multi-SOM algorithm first uses the SOM approach to cluster the dataset and to generate the first SOM grid, namely SOM1 and then the first SOM grid is iteratively clustered as shown in figure 1. The grid height (Hs) and the grid width (Ws) of SOM1 are given by the user. The multi-SOM algorithm uses the Batch map function proposed by [3] which is a version of SOM Kohonen algorithm but it is faster than SOM in the training process. Then, the SOM map is clustered iteratively from one level to another where the DB index is computed at each level. The size of the grid decreases at each level until the optimal number of clusters is reached. In [4], the training process stops when one neuron is reached. However, our proposed algorithm stops when the DB index gets its minimum value. At this value the optimal number of clusters is given. Consequently, the proposed algorithm uses less computation time than the one proposed by [4]. The time complexity of DB is O (DB) = O(C) from the formula of DB index, where C is the number of clusters. This time complexity is less than the time complexity of the DVI, where O (DVI) = O (InterRatio + IntraRatio). This integrates many operations to compute the intra- and inter-distance. Thus, the computation of DVI values at each grid requires more memory space and time than that of the DB index. The different steps of the algorithm are as follow: s: the SOM layer current number Hs : the SOMs grid height Ws : the SOMs grid width Is : the input of SOMs max_it: the maximum number of iterations for training SOM grids Algorithm1: multi-SOM Input: W1, H1, I1, max_it Output: Optimal cluster number, data partitions Begin • Step1: Clustering data by SOM s = 1; Batch SOM (W1, H1,I1,max_it) ; Compute DB index; s = s+1; • Step2: Clustering of the SOM and cluster delimitation HS=HS-1; WS=WS-1; repeat Batch SOM (Ws, Hs, Is, max_it); Compute DB index on each SOM grid; s=s+1; until(DBs<DBs+1); Return (Data partitions, Optimal cluster number); End
  • 5. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013 185 4. EXPERIMENTAL RESULTS This section gives the results of the proposed multi-SOM algorithm on five public datasets. Our algorithm is compared with a partitioning clustering method, namely k-means ([5]), a hierarchical method, namely BIRCH ([1]) and the algorithm proposed by [4]. The different data sets used in this work are extracted from the UCI machine learning repositories which are: iris, Pima Indians diabetes, wine, breast cancer and seeds. Figure 2. Variation of the number of partitions and the values of DB index with Iris Figure 3. Variation of the number of partitions and the values of DB index with Wine Figure 2 and 3 illustrate the variation of the DB index and the number of partitions for Iris and Wine datasets. We note that the DB index decreases gradually as the number of partitions decreases from one grid to another to obtain better clustering results until it reaches its lowest value of 0.161 and 0.202 for Iris and Wine datasets respectively. These values are relative to the optimal number of clusters which is 3 on Iris and Wine datasets
  • 6. International Journal of Network Security & Its Applications (IJNSA), Vol.5, No.4, July 2013 186 TABLE 1. Evaluation of the proposed multi-SOM algorithm Class number k-means Birch Ghouila et al.(2008) Multi-SOM Iris 3 2 3 3 3 Pima 2 2 3 2 2 Breast cancer 2 2 2 2 2 Wine 3 5 4 3 3 Seeds 3 4 2 3 3 For all datasets, results given in Table (1) show that the proposed algorithm is as performing as that of [4]. On each dataset, the two algorithms succeeded in yielding the real number of clusters. However, k-means method determines the exact number of clusters only on Pima and breast datasets. On Iris, Breast and Seeds datasets, Birch algorithm generates the real number of clusters. With Wine dataset, the number of generated clusters given by the proposed multi-SOM algorithm is better than those given by k-means and Birch methods. 5. CONCLUSIONS In this paper, a multi-SOM algorithm based on the DB index to determine the optimal number of clusters is proposed. In fact, the minimum value generated by DB index refers to the optimal cluster number. We have shown that our algorithm takes less iterations steps than that of [4]. Thus, the complexity of the proposed multi-SOM algorithm is better than the complexity of [4] algorithm. As a future work, we will investigate other validity indices and adapt our algorithm to fuzzy clustering. REFERENCES [1] Ester, M., Kriegel, H.-P., Sander, J., and Xu, X, (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of 2nd International Conference on KDD,Portland, Oregon, pp 226–231. [2] Fonseka, A. and Alahakoon, D, (2010). Exploratory data analysis with multi-layer growing self-organizing maps. IEEE Xplore, pp 132–137. [3] Fort J.C. Letrémy P. Cottrell M. (2002) , Advantages and Drawbacks of the Batch Kohonen Algorithm, Proc. ESANN 2002, Bruges, 2002, M.Verleysen Ed., Editions D Facto, Bruxelles p. 223-230. [4] Ghouila, A., Yahia, S. B., Malouche, D., Jmel, H., Laouini, D., Z.Guerfali, F., and Abdelhak, S.(2008). Application of multisom clustering approach to macrophage gene expression analysis. Infection,Genetics and Evolution, Vol 9,pp 328–329. [5] J.MacQueen (1967). Some methods for classification and analysis of multivariate observations. InProceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol 1, pp 281– 289. [6] Kohonen, T. (1981). Automatic formation of topological maps of patterns in a self-organising system. In Proceedings of the 2SCIA, Scand. Conference on Image Analysis, pp 214–220. [7] Lamirel, J. C. (2002). Multisom: a multimap extension of the som model. Application to information discovery in an iconographic context.3:pp 1790–1795. [8] Lamirel, J.-C., Polanco, X., and Francois, C. (2001). Using artificial neural networks for mapping of science and technology: A multi self-organizing maps approach. Scientometrics, 51:pp 267–292. [9] Lamirel, J. C. and Shehabi, S. (2006). Multisom: a multimap extension of the som model. Application to information discovery in an iconographic context. IEEE CONFERENCE PUBLICATIONS, pp 42–54. [10] Lamirel, J. C., Shehabi, S., Hoffmann, and Francois, C. (2003). Intelligent patent analysis through the use of a neural network: experiment of multi-viewpoint analysis with the multisom model. pp 7–23. [11] Martinetz, T. and Schulten, K. (1991). A neural-gas network learns topologies. In Artificial Neural Networks.North-Holland, Amsterdam, pp 397–402. [12] Smith, T. (2009). Adapting to increasing data availability using multi-layered self-organizing maps.IEEE Computer Society, pp 108–113. [13] Zhang, T., Ramakrishna, R., and Livny, M. (1996). Birch: An efficient data clustering method for very large databases. pp 103–114.