SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
Robust Fuzzy n-Means Clustering
A Research Paper
Presented to
the Faculty of the Division of Mathematical Sciences
Midwestern State University
In Partial Fulfillment
of the Requirements of the Degree
Master of Science
by
Thomas G. Aranda
October 2000
Abstract
Clustering is a data segmentation method with a wide range of applications including
pattern recognition, document classification and data mining. This paper focuses on the problem
of unsupervised clustering when the optimal number of clusters is not known. This paper
presents an algorithm that can determine the ideal number of clusters and be robust to the
influence of outliers. A modification of the Robust Fuzzy c-Means Clustering Algorithm
(RFCM) was developed. This modification retains the robustness (ability to ignore outliers) of
RFCM, yet it does not increase the complexity of the algorithm. A Robust Fuzzy n-Means
Clustering Algorithm (RFNM) is presented. This method produces a good partition without a
priori knowledge of the optimal number of clusters.
1
Introduction
This research is motivated by the requirement to segment large images in real time
without prior knowledge of the image’s structure. The ultimate goal is to identify and classify
sections of data into categories. For example, given an aerial photograph, the computer should
be able to distinguish between grass, concrete, water and asphalt. One technique for segmenting
data in this way is called clustering.
Many methods for clustering are in use, including Validity-Guided Clustering [1] and c-
Means Clustering [4]. However, these algorithms have two drawbacks. First, they are very
susceptible to the presence of outliers in some data sets. Consequently, they do not identify the
clusters properly. Some algorithms solve this problem by using robust centering statistics.
Second, these algorithms require the user to input the desired number of clusters. Often times,
the correct number of clusters is not known prior to execution. Therefore, it would be beneficial
to develop an algorithm that does not require such knowledge. This paper presents such an
algorithm.
Data Segmentation via Clustering
Background on Clustering
The classification of objects into categories is the subject of cluster analysis. It plays a
large role in pattern recognition. However, it has many other applications such as the
classification of documents in a database, the development of social demographics, data mining
and the construction of taxonomies in biology.
Ultimately, clustering attempts to identify groups of similar data. Given a set of data X,
the problem of clustering is to find several cluster centers that properly characterize relevant
classes of X [10]. For example, a good clustering of an image by color would identify the
2
various shades of red as one cluster, the blues as another cluster, etc. A clustering of a 3D set of
points over a Euclidean space would find groups (clusters) of points that are close together.
After the cluster centers are identified, the data set X is partitioned by labeling each data element
with the exemplar (cluster center) closest to it.
In 1967 Ball and Hall introduced the ISODATA process [2]. This technique, which is
also called Hard c-Means Clustering (HCM), is one of the most popular clustering methods [4].
However, the user is required to input the desired number N of clusters. It uses an alternating
optimization (AO) technique to minimize an objective function. The definition of the objective
function and the AO technique can be found in [7]. One problem with HCM is that it tends to
get caught in local minima [7]. In other words, it does not find the global minimum of the
objective function and therefore does not properly identify the cluster centers.
Zadeh introduced fuzzy set theory in 1965 as a way to represent the vagueness of
everyday life [3]. In a nutshell, fuzzy set theory allows data elements to belong to a set in
varying degrees. Each element has a membership value [ ]1,0∈u that represents the degree to
which the data element belongs to that set. In other words, data elements can have a partial
membership in a set. This fuzziness allows one to mathematically represent vague concepts such
as “pretty soon” or “very far.”
Dunn applied fuzzy set theory to the ISODATA clustering process in 1973 [7]. His
method, called Fuzzy c-Means Clustering (FCM), allows data elements to belong to several
clusters in varying degrees. For example, a data element can have a 30% membership in one
cluster and a 70% membership in a second cluster, instead of discretely belonging to one cluster
or the other. Consider the clustering by color example: a dark violet could partially belong to the
red cluster and partially belong to the blue cluster.
3
Fuzzy c-Means Clustering (FCM) uses an alternating optimization (AO) technique that is
very similar to HCM. After the algorithm finishes execution and the cluster centers are
identified, the clusters are “defuzzified” by discretely assigning each data element to the cluster
in which it has the highest membership. If a light orange color had a 45% membership in red, a
52% membership in yellow and a 3% membership in blue, then the color would be assigned to
the yellow cluster. Experiments have shown that the fuzzy clustering method is less likely to be
trapped in a local minimum [7] and, therefore, avoids one disadvantage of HCM.
FCM typically produces better results than HCM, but it is susceptible to the influence of
outliers—extraneous data elements that are very far away from the cluster centers. Outliers may
be the result of errors in the data, or they could be real information: such as a highly reflective
piece of aluminum foil appearing in a radar image of a grass field. Regardless of what the
outliers are, their presence often disrupts the clustering process.
Kersten’s Fuzzy c-Medians Clustering Algorithm (FCMED), which uses the fuzzy
median as its centering statistic, is more robust than FCM [8]. In other words, it is more resistant
to the influence of outliers. However, its time complexity of ( )NcpNO lg and space complexity
of ( )NO make it very slow [8]. Conversely, Choi’s and Krishnapuram’s Robust Fuzzy c-Means
Algorithm (RFCM) solves the outlier problem in linear time. [6]. Kersten’s implementation of
RFCM uses Huber’s weighting functions to reduce the influence of outliers [9]. Experiments
have shown RFCM to be very robust.
One disadvantage of RFCM is that it requires the user to input the correct number of
clusters. Often times the user does not know enough about the structure of the data to provide
such information. This is especially true in data mining applications. The research described in
this paper developed a new algorithm, Robust Fuzzy n-Means (RFNM), which is robust to
4
outliers and capable of determining the proper number of clusters. This algorithm is a
modification of FCM and RFCM. In order to provide the reader with a complete understanding
of the new RFNM algorithm, this paper will describe its parent algorithms in detail.
Fuzzy c-Means Clustering
Fuzzy c-means clustering (FCM) is defined well by [4]. Consider N data samples
forming the data set denoted by { }NxxxX ,,, 21 K= . Assume there are c clusters and
( ) [ ]1,0∈= kiik xuu is the membership of the k-th sample kx in the i-th cluster iv , where
{ }cvvvv ,,, 21 K= is the set of exemplars (cluster centers) and U is the membership matrix.
Normally, a cluster center refers to an actual pattern in the data and an exemplar refers to a
pattern identified by the algorithm. However, these terms will be used interchangeably in this
paper. The membership value of each data element kx satisfies the requirement that
∑=
=
c
i
iku
1
1 (1)
for all Nk ℵ∈ . In other words, all of a particular data element’s membership values must add up
to one. In addition, each cluster must contain some, but not all of the data points’ membership.
Defined mathematically, this means that for every ci ℵ∈
Nu
N
k
ik << ∑=1
0 . (2)
The goal of the FCM algorithm is to minimize the objective function
( ) ∑∑= =
=
N
k
c
i
ik
m
ik duvUJ c
1 1
2
, (3)
where 2kiik xvd −= (the Euclidean distance between the exemplar and the data element). The
power cm of the membership function is called the weighting exponent. It expresses the
5
“fuzziness” of the algorithm. Setting 1=cm and only allowing discrete membership values will
convert the fuzzy algorithm into traditional HCM [9].
The objective function (3) is the weighted square error of the exemplars. The closer data
elements are to their respective cluster centers, the lower the value of the function will be.
Furthermore, the number of exemplars c will have an effect on the value of ( )vUJ , . Increasing
the number of exemplars will lower the value the objective function. In an extreme case, when
the number of clusters equals the number of data elements ( )Nc = , the objective function will go
to zero. Although using a large number of clusters will reduce the value of ( )vUJ , , it is more
important to choose a value of c that represents the actual number of clusters in the data.
Fuzzy c-Means Clustering is more effective than Hard c-Means because the objective
function is less likely to get caught in a local minimum [7]. Furthermore, it runs in ( )cNO time
and ( )cO space. However, it is susceptible to outliers [9]. The robust algorithm presented in the
next section addresses this problem.
Robust Fuzzy c-Means Clustering
Real world data sets often contain outliers. These extraneous data elements are usually
very far away from the larger cluster centers. Consider a data set with two large well-defined
clusters and one small outlying cluster that is very far away from the other two. Due to the 2
ikd
term in ( )vUJ , (3), the distance of a data point from its exemplar will have a quadratic effect on
the value of objective function. Since FCM attempts to minimize ( )vUJ , , it will attempt to
reduce the impact of the outliers’ large 2
ikd values by placing an exemplar over the outlying
cluster. This minimizes the objective function, but does not correctly identify the larger cluster
centers.
6
Kersten’s implementation of Choi and Krishnapuram’s Robust Fuzzy c-Means Clustering
Algorithm (RFCM) takes steps to solve this problem [9]. Huber’s m-estimator is used to reduce
the influence of outliers. Huber’s function ρ is defined as:
( )



>−
≤
=
1,
1,
2
1
2
2
1
xifx
xifx
xρ . (4)
The 2
ikd term is replaced with ( )γρ ikd where γ is a scaling constant. As a result, the influence
of the distance between cluster centers and data elements is quadratic when the data element is
close to the exemplar and linear when the data element is far away from the exemplar. The
objective function to be minimized becomes:
( ) ( )∑∑= =
=
N
k
c
i
ik
m
ik duvUJ c
1 1
, γρ . (5)
The membership values of each element are given by:
( )
( )
1
1
1
1 −
=
−


















= ∑
c
j
m
jk
ik
ik
c
d
d
u
γρ
γρ
. (6)
Using this function, the membership of a data element kx in cluster iv is assigned in proportion
to the distance between kx and iv . In other words, the data element will have a larger
membership in clusters that are closer to it.
The center of a cluster is computed by determining the average value of all the points in
that cluster. Since a point’s membership in a cluster is fuzzy, the mean must be adjusted by the
membership values iku . Therefore, the locations of the exemplars are computed by using the
weighted mean given by:
7
( )
( )∑
∑
=
=
= N
k
ik
m
ik
N
k
ikik
m
ik
i
dwu
xdwu
v
c
c
1
1
γ
γ
(7)
where Huber’s weighting function ( ) ( ) xxxw ρ′= . In this case
( )



>
≤
=
1,1
1,1
xx
x
xw . (8)
Huber’s w function has the effect of reducing the influence of data points that are far away from
the cluster centers thereby making the algorithm robust to outliers.
In order for the ρ and w functions to work properly, all distances must be adjusted by a
scaling constant γ [9]. The experiments in this paper use the median absolute deviation about
the median (MAD) [11] to compute γ . The MAD is a robust estimator similar to the standard
deviation. All distances are divided by three times the MAD before Huber’s functions are
applied, i.e. MAD3⋅=γ . As a result, when ρ is applied, data points have a quadratic influence
when they are MAD3⋅ or less from the exemplar and linear influence when they are greater than
MAD3⋅ away. One should note that computing the MAD takes ( )NNO lg time (on average)
and ( )NO space. The normalization of the data using an estimator like the MAD is crucial to
making the algorithm run properly.
Except for the calculation of the scaling constant and the application of Huber’s
functions, RFCM is identical to FCM. However, RFCM is not as susceptible to the influence of
outliers [9].
Determining the Number of Clusters
Robust Fuzzy n-Means Clustering
One problem with RFCM is that the user must input the desired number of clusters.
8
Quite often the optimal number of clusters is not known prior to execution. The Robust Fuzzy n-
Means Algorithm (RFNM) presented in this paper retains the robustness of RFCM, yet does not
require a priori knowledge of the proper number of clusters.
RFNM requires the user to provide a maximum number mc of clusters. The algorithm
begins by executing the RFCM algorithm with mc clusters. During every iteration cluster
centers that are close together are considered for merging. Several methods for merging have
been explored, including Validity-Guided Clustering described in [1] and Competitive Clustering
described in [5]. However, the merging criteria should be robust and efficient.
Merging Criterion
If two clusters are “close” together they should be merged. Two clusters are close if the
distance between their centers is small compared to their compactness. The notion of
compactness [12] is the weighted mean square deviation of the cluster. It can be thought of as
the average “radius” squared. The compactness of a cluster is defined in terms of its variation
and cardinality.
The variation of a cluster is a measure of the cluster’s dispersion. One can think of it as
the fuzzy variance. Formally, the variation is defined by [12]:
∑=
=
N
k
ik
m
iki du c
1
2
σ (9)
The fuzzy cardinality of a cluster is a measure of the cluster’s size. The more data
elements that belong to the cluster the larger the cluster’s cardinality will be. Often, the fuzzy
cardinality is used as a divisor when calculating the fuzzy mean. Formally the fuzzy cardinality
is defined by [12]:
∑=
=
N
k
iki un
1
. (10)
9
The compactness of a cluster is the ratio of its variation and cardinality [12]:
i
i
i
n
σ
π = . (11)
To make the compactness formula robust to outliers Huber’s ρ function (4) is inserted into the
equation. Finally, the cardinality of the cluster must take the weighting exponent cm into
account. Therefore, the robust compactness of a cluster iv is defined as
( )
∑
∑
=
=
= n
k
m
ik
N
k
ik
m
ik
i
c
c
u
du
1
1
γρ
π . (12)
RFNM uses a modified version of separation [12] to measure how far clusters are apart.
Formally, the separation between two clusters qv and rv is defined as the Euclidean distance
between the clusters’ centers:
2rqqr vvs −= . (13)
The merging criterion uses a merge ratio, which is similar to the validity index defined in
[12]. The merge ratio will be small when exemplars are close together relative to their
compactness. Formally, it is the ratio of the separation squared over the compactness:
q
qr
qr
s
π
ω
2
= . (14)
Once again, to make the formula robust Huber’s function is substituted:
( )
q
qr
qr
s
π
γρ
ω = . (15)
During every iteration of RFCM, the merge ratio qrω is calculated for every cluster
vvq ∈ and vvr ∈ . If αω ≤qr , where α is some constant, then the clusters centered at qv and
10
rv are merged. Choosing a value of 1<α means that in order for two clusters to be merged, the
distance between the clusters’ centers must be less than the compactness (radius) of the clusters.
Experimentally, values of [ ]3.0,1.0∈α work well.
Merging Mechanics
Once the decision is made to join two clusters, they must be combined in a meaningful
way. The new exemplar should exist on a line segment that runs between the two old exemplars.
The new center will be placed closer to the cluster with the larger fuzzy cardinality.
The placement of the new exemplar is accomplished by using a parameter p:
rq
q
nn
n
p
+
= (16)
where qv and rv are the centers of the two clusters to be merged. The location of the new
exemplar is calculated using a combination formula:
( ) rqn vppvv −+= 1 (17)
where nv is the center of the new cluster. The old exemplars are removed from v and replaced
with the new center nv . The next iteration of the algorithm will compute the membership values
of X in the new cluster.
The RFNM Algorithm
The Robust Fuzzy n-Means algorithm is based on the FCM algorithm described in [10].
It uses an alternating optimization (AO) technique for minimizing the objective function (5).
FCM has been modified to be robust and unsupervised. The mc exemplars begin at locations
determined by the user. During execution, these exemplars gravitate toward the data set’s “true”
cluster centers. Some exemplars may merge along the way. Ideally, the algorithm will terminate
with exactly one exemplar positioned near the center of each cluster. The user provides the
11
following input:
∞ℵ∈mc initial (maximum) number of clusters
[ )∞∈ ,1cm weighting exponent
( )1,0∈α merging criterion constant
( )∞∈ ,0ε stopping constant (small positive number)
( )∞∈ ,0γ scaling constant (example: three times the MAD)
{ }mcvvvv ,,, 21 K= initial placement of the exemplars (cluster centers)
Algorithm: ( )vmcRFNM cm ,,,,, γεα
Step 1. Let mcc =
Step 2. Let csss ,,, 21 K equal cvvv ,,, 21 K respectively.
Step 3. Calculate the new membership matrix U by the following procedure: for each
Xxk ∈ , if 0
2
>− ik vx for all ci ℵ∈ , then compute iku using equation (6). If
0
2
=− ik vx for some cIi ℵ⊆∈ , then define iku for Ii ∈ by any nonnegative
real numbers satisfying ∑∈
=
Ii
iku 1 and define 0=iku for Ii c −ℵ∈ .
Step 4. Merge clusters that are close together. For every vvq ∈ and vvr ∈ and rq ≠
do the following: calculate qrω using equation (15); if αω ≤qr then compute
nv using equations (16) and (17); let nq vv = ; remove rv from v and decrement
c by 1. NOTE: Any cluster can only be merged once per iteration.
Step 5. Calculate the c cluster centers cvvv ,,, 21 K using equation (7) and the given value
of cm .
12
Step 6. If a merge took place in Step 4 then return to Step 2. Otherwise if
ε≤−
ℵ∈
ii
i
sv
c
max then stop. Otherwise, return to Step 2.
On average, this algorithm has linear time complexity. Steps 2, 5 and 6 have a total
maximum running time of ( )cbacm ++ where a, b and c are constants. The maximum running
time of Step 3 is Nck m ⋅⋅ and Step 4 will run in 2
mcl ⋅ time (worst case), where k and l are
constants. Thus, the total running time of this algorithm has an upper bound of
( )[ ]cbacNckclt mmm +++⋅⋅+⋅ 2
where t is the number of iterations of the algorithm. In most
cases the size of the data set N will be significantly larger than mc and t. Therefore, the N term
will overwhelm the 2
mc and t terms yielding a running time complexity of ( )NcO m .
The memory overhead of this algorithm is also linear. Storing the exemplar vectors v and
s requires ( )mcO space in the worst case. The membership matrix U requires ( )NcO m ⋅ space.
The memory required for the data set X is not considered because it is not overhead. The total
memory overhead is ( )NcO m . Often times, the data set to be clustered is very large. Therefore,
storing something of size N will cost much memory. However, clever coding and some slight
modifications will allow the algorithm to run in ( )mcO space. The following modifications
compute the membership matrix U, exemplars v and the merge ratio ω on the fly without storing
U in memory:
Algorithm: ( )vmcRFNMFast cm ,,,,, γεα
Step 1. Let mcc = .
Step 2. Let csss ,,, 21 K equal cvvv ,,, 21 K respectively.
Step 3. Calculate, but do not store, the new membership matrix U by the following
13
procedure: for each Xxk ∈ , if 0
2
>− ik vx for all ci ℵ∈ , then compute iku
using equation (6). If 0
2
=− ik vx for some cIi ℵ⊆∈ , then define iku for
Ii ∈ by any nonnegative real numbers satisfying ∑∈
=
Ii
iku 1 and define 0=iku
for Ii c −ℵ∈ . Simultaneously, calculate and keep the running sums used in
equations (7), (15) and (16).
Step 4. For every vvq ∈ and vvr ∈ and rq ≠ do the following: calculate qrω using
equation (15) and the running sums from Step 3; if αω ≤qr then compute nv
using equations (16) and (17) and the running sum from Step 3; let nq vv = ;
remove rv from v and decrement c by 1. If any two clusters were merged, then
return to Step 3.
Step 5. Calculate the c cluster centers cvvv ,,, 21 K using equation (7), the given value of
cm and the running sums from Step 3.
Step 6. If a merge took place in Step 4 then return to Step 2. Otherwise if
ε≤−
ℵ∈
ii
i
sv
c
max then stop. Otherwise, return to Step 2.
This “fast” version of RFNM actually has the same time complexity as the normal
algorithm. However, it has a much lower memory complexity. Since it does not store the
membership matrix U, the N term can be dropped from the space complexity. Consequently, the
memory overhead is ( )mcO , which represents only storing the exemplar vectors. On average,
this “fast” algorithm will execute quicker because its lower memory overhead reduces the risk of
page faults. RFNM provides robust unsupervised learning with a linear running time and low
memory overhead. This makes it ideally suited for real-time data processing applications.
14
Testing
Exemplar Placement (Gaussian Tests 1 and 2)
The RFNM algorithm described in the previous section was tested using a five-
dimensional Gaussian scatter of random data. The test data has two cluster centers equidistant
from the origin. The first two tests start with six exemplars ( )6=mc , 75.1=cm and 3.0=α .
The positioning of the initial exemplars is critical. Figure 1 shows a 2D plot of the movement of
the exemplars.
Figure 1. Gaussian Test 1, 6=mc .
The “true” cluster centers, which are computed using the sample mean, exist at
( )0,0,0,0,0.1− and ( )0,0,0,0,0.1 . In Figure 1, two exemplars (labeled A and B) merge together at
15
point C and then converge to approximately ( )0,0,0,1.0,3.1 −− . Two more exemplars (D and E)
merge together at F and converge to ( )0,0,0,1.0,4.1≈ . The last two exemplars (G and H) were
initialized on the y-axis. They merge and converge very close to the origin (point I). The test
data is almost symmetric. Since the middle exemplars (G and H) started equidistant from two
nearly symmetric clusters, they were never drawn to one cluster or the other. In this case the
algorithm does not converge properly.
Figure 2. Gaussian Test 2, 6=mc .
The second test uses the same data set and starting parameters except two of the initial
exemplars (Figure 2: D and G) are offset 5.0+ along the x-axis. Figure 2 plots the movement of
the exemplars. Notice the algorithm converges to the desired two cluster centers ( )0,0,0,0,0.1± .
16
Exemplars A and B merge together into exemplar C, which then converges to the cluster center
at ( )0,0,0,0,1− .
The exemplar trace on the right side of the y-axis is more interesting. Exemplars D and E
merge together and become F. Exemplar G then merges with F to become H. Finally, H and I
merge into exemplar J, which then converges to the cluster center at ( )0,0,0,0,1 . Since the
middle exemplars (D and G) were initialized slightly closer the right hand cluster, they were
drawn toward that cluster’s center. By comparing the results of test 1 and test 2, one can see that
the initial placement of the exemplars can change the results dramatically.
Robustness Testing (Cauchy Test 1)
A second set of two-dimensional test data was randomly generated using a Cauchy
distribution. This data set has two well-defined main clusters, but also has several outliers that
are very far away from the main clusters. The presence of the outliers obviously increases the
compactness of the clusters (makes them less compact). Consequently, the exemplars tend to
merge very quickly. To compensate for this, lower values of 25.1=cm and 2.0=α were
chosen. Additionally, the initial exemplars were started a little further away from the origin. To
reduce the influence of outliers, the sample median is used to determine the “true” cluster
centers: approximately ( )0,3.1± . Figure 3 shows a trace of the six exemplars.
The merging sequence in Figure 3 is very similar to the previous test. Exemplars A and
B merge into C and converge to ( )0,8.1− . On the right side of the y-axis, exemplars D and E
merge into F. Next, G and F merge into H. Finally, H and I merge into J and converge to
( )0,5.1 . Notice, the algorithm converges near the two desired cluster centers of ( )0,3.1± .
However, it does not converge exactly because the outliers still have some influence on the
exemplars. The total error is 0.7.
17
Figure 3. Cauchy Test 1 (RFNM), 6=mc .
Additionally, the proper choice of α is very important. Choosing a merge ratio that is
too low ( )1.0<α will cause the exemplars to not merge. Conversely, setting the merge ratio too
high ( )4.0>α will cause all of the exemplars to merge together into one cluster center. In both
cases, the true cluster centers are never found. Thus, choosing a good merge ratio is crucial.
Robust n-Means vs. c-Means Clustering (Cauchy Test 1)
For comparison purposes, a standard RFCM algorithm ( )0=α was run on the same set of
data with the same parameters. Figure 4 shows the trace. Exemplar A converges to ( )0,5.1 , and
exemplar B converges to ( )0,8.1− . In other words, the two exemplars in this example converge
to the same cluster centers as the exemplars in the previous test. Clearly, the RFNM algorithm
18
performs as well as RFCM.
Figure 4. Cauchy Test 1 (RFCM), 2== ccm .
The values of the objective functions (5) for both methods were plotted against time (see
Figure 5). Both methods converge to the same value ( )830≈ within the same number of
iterations. Notice the RFNM algorithm (left), with an initial 6=mc and final 2=c , yields an
increasing value of ( )vUJ , . This is because reducing the number of clusters actually causes an
increase in the objective function. However, RFNM still reaches the optimal solution without
requiring the user to input the desired number of clusters.
Catching Outliers (Cauchy Test 2)
The previous examples start with the exemplars near the cluster centers. In the next
19
example, eight exemplars ( )8=mc begin far away from the origin. One again, the Cauchy data
set is used. Figure 6 shows a close-up of the exemplar trace. Exemplars A and B merge into C.
At the same time exemplars D and E merge into F. Finally, C and F merge into G and converge
to ( )0,5.1− . Exemplar H converges to ( )0,5.1 without merging. The total error is 0.4, and the
final number of clusters is 5=c .
Figure 5. RFNM vs. RFCM (Cauchy Test 1).
Figure 7 shows a trace of the same test on a larger scale. One can see exemplars A, B, D,
E and H move toward the main clusters, merge and converge to the cluster centers (see Figure 6).
Furthermore, exemplars I, J and K converge near clusters of outliers: approximately located at
( )1,10− , ( )25,0 and ( )17,1 −− respectively. These outlying clusters have very low fuzzy
cardinalities (less than 10% of the main clusters). In the final analysis, one could classify
exemplars with low cardinalities as clusters of outliers. Depending on the application, it may be
useful to discover outlying clusters. Otherwise, they can be ignored and removed from the final
partition.
The use of Huber’s functions in RFNM reduces the influence of outliers, but it does not
20
eliminate their influence. However, notice that the exemplars in Cauchy test 2 (Figure 6) are
closer to the desired centers of ( )0,3.1± than the exemplars in Cauchy test 1 (Figure 3). In fact,
test 2 yields improvement of 0.3 in total error over test 1. This is because the second test placed
exemplars near the outliers (see Figure 7), which has the effect of reducing the influence of those
outliers on the two main clusters. As a result, the true centers of the main cluster are identified
with greater accuracy. Furthermore, the final value of the objective function is lower:
approximately 611 as opposed to 830 from test 1. Of course, the larger number of exemplars
( )5=c in test 2 accounts for much of this decrease. Figure 8 shows a plot of the objective
function. A good initial placement of the exemplars will improve the robustness of the algorithm
Figure 5. Cauchy Test 2 (Zoomed-In), 8=mc .
21
and yield better results.
Figure 6. Cauchy Test 2 (Zoomed Out), 8=mc .
Conclusion
The goal of this paper is to provide a robust algorithm that will find an optimal partition
without knowing the proper number of clusters. Ideally, the user should be able to partition a
data set without any a priori knowledge of the data’s structure. Robust Fuzzy n-Means
Clustering provides a good start toward this goal.
Experiments with Gaussian data have demonstrated that RFNM can accurately find the
desired number of clusters and their centers. Furthermore, the first Cauchy test has shown that
RFNM provides results which are identical to the results reached by RFCM. Thus, RFNM is as
22
accurate and robust as RFCM, yet it does not increase the time complexity. Finally, clever
initialization of the exemplars allows RFNM to identify outlying clusters (Cauchy test 2). This
in turn improves the accuracy of the final results. Clearly, RFNM provides robust accurate
results without requiring prior knowledge of the data’s structure.
Figure 7. Cauchy Test 2 (Objective Function).
Although RFNM is an improvement over other algorithms, it does have some
shortcomings. First, it is not completely unsupervised because the user’s choices of cm and α
will have significant effects on the results. Data sets with outliers, for example, require lower
values cm and α than sets with compact well-separated clusters. Future research should
examine ways of preprocessing the target data in order to determine the ideal clustering
parameters so that the entire process can be fully automated.
Second, the initial positioning of the exemplars is crucial to getting optimal results.
Placing the exemplars exactly between two cluster centers, for example, may cause those
exemplars to not converge. One possible solution is to place the initial exemplars very far away
23
from the cluster centers. This will allow the exemplars to compete equally for cardinality. In
other words, one exemplar will not have an advantage simply because it was initially placed
close to a cluster of points. However, if the exemplars are initialized too far away from the
cluster centers, then the main clusters and the outliers will have equal influence. As a result, the
exemplars may skip over the outliers all together. Determining an automated, yet reliable way of
initializing the exemplars would be very beneficial. Future research in this area should be
considered.
Third, the preprocessing requirements of RFNM can be costly. The experiments in this
paper use the MAD to compute the scaling constant γ . This operation takes ( )NNO lg time and
uses ( )NO space. Research into more efficient preprocessing techniques may be useful.
Robust Fuzzy n-Means Clustering has a wide range of applications in image and data
processing. It requires less user supervision than many other algorithms, but it is not completely
unsupervised. However, in several situations the RFNM algorithm provides a good solution in
linear time.
24
References
1. Bensaid, A., Hall, L., Bezdek, J., Clarke L., Silbiger, M., Arrington, J. and Murtagh, R.,
Validity-guided (re)clustering with applications to image segmentation, IEEE Transactions
on Fuzzy Systems, vol. 4, no. 2 (May 1996), 112-123.
2. Bezdek, J., A convergence theorem for the fuzzy ISODATA clustering algorithms, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 2, no. 1 (January 1980), 1-8.
3. Bezdek, J., Fuzzy models—what are they and why? IEEE Transactions on Fuzzy Systems,
vol. 1, no 1 (February 1993), 1-5.
4. Bezdek, J. and Pal, S. Fuzzy Models for Pattern Recognition: Methods That Search for
Structures in Data, IEEE Press, New York, NY, 1992.
5. Boujemaa, N., Generalized competitive clustering for image segmentation, In Proceedings of
the 19th
International Conference of the North American Fuzzy Information Processing
Society – NAFIPS (July 13-15, 2000, Atlanta, GA), NAFIPS/IEEE, 2000, 133-137.
6. Choi, Y. and Krishnapuram, R., Fuzzy and robust formulations of maximum-likelihood-
based Gaussian mixture decomposition, In Proceedings of the Fifth IEEE International
Conference on Fuzzy Systems (September 8-11, 1996, New Orleans, LA), IEEE Neural
Networks Council, 1996, 1899-1905.
7. Dunn, J., A fuzzy relative of the ISODATA process and its use in detecting compact well-
separated clusters, Journal of Cybernetics, vol. 3, no. 3 (1973) 32-57.
8. Kersten, P., Fuzzy order statistics and their application to fuzzy clustering, IEEE
Transactions on Fuzzy Systems, vol. 7, no. 6 (December 1999) 708-712.
9. Kersten, P., Lee, R., Verdi, J. Carvalho R. and Yankovich, S., Segmenting SAR images using
fuzzy clustering, In Proceedings of the 19th
International Conference of the North American
25
Fuzzy Information Processing Society – NAFIPS (July 13-15, 2000, Atlanta, GA),
NAFIPS/IEEE, 2000, 105-108.
10. Klir, G. and Yuan, B. Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall P
T R, Upper Saddle River, NJ, 1995.
11. Randles, R. and Wolfe, D., Introduction to The Theory of Nonparametric Statistics, John
Wiley & Sons, Inc., New York, NY, 1979.
12. Xie, X.L. and Beni, G., A validity measure for fuzzy clustering, IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 13, no. 8 (August 1991), 841-847.

Weitere ähnliche Inhalte

Was ist angesagt?

A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
ijcseit
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
taeseon ryu
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
Raed Aldahdooh
 

Was ist angesagt? (18)

Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Cure, Clustering Algorithm
Cure, Clustering AlgorithmCure, Clustering Algorithm
Cure, Clustering Algorithm
 
Locally consistent concept factorization for
Locally consistent concept factorization forLocally consistent concept factorization for
Locally consistent concept factorization for
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms
 
50120140505013
5012014050501350120140505013
50120140505013
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
NBDT : Neural-backed Decision Tree 2021 ICLR
 NBDT : Neural-backed Decision Tree 2021 ICLR NBDT : Neural-backed Decision Tree 2021 ICLR
NBDT : Neural-backed Decision Tree 2021 ICLR
 
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
ON DISTRIBUTED FUZZY DECISION TREES FOR BIG DATA
 
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...CLIQUE Automatic subspace clustering of high dimensional data for data mining...
CLIQUE Automatic subspace clustering of high dimensional data for data mining...
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
A COMPREHENSIVE ANALYSIS OF QUANTUM CLUSTERING : FINDING ALL THE POTENTIAL MI...
 
A Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data MiningA Novel Approach to Mathematical Concepts in Data Mining
A Novel Approach to Mathematical Concepts in Data Mining
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
 

Andere mochten auch (14)

How to Create a Mobile Strategy
How to Create a Mobile StrategyHow to Create a Mobile Strategy
How to Create a Mobile Strategy
 
SUMMER15UVC12
SUMMER15UVC12SUMMER15UVC12
SUMMER15UVC12
 
1990 2010
1990 20101990 2010
1990 2010
 
Historical Events on Christmas
Historical Events on ChristmasHistorical Events on Christmas
Historical Events on Christmas
 
1990's
1990's1990's
1990's
 
Movies during the 1980’s
Movies during the 1980’sMovies during the 1980’s
Movies during the 1980’s
 
Artifacts to use for interior design
Artifacts to use for interior designArtifacts to use for interior design
Artifacts to use for interior design
 
Session 1 - Introduction to iOS 7 and SDK
Session 1 -  Introduction to iOS 7 and SDKSession 1 -  Introduction to iOS 7 and SDK
Session 1 - Introduction to iOS 7 and SDK
 
Interior Design ppt.
Interior Design ppt.Interior Design ppt.
Interior Design ppt.
 
The 80s
The 80sThe 80s
The 80s
 
1970s britain
1970s britain1970s britain
1970s britain
 
THE 1980's POWERPOINT
THE 1980's POWERPOINTTHE 1980's POWERPOINT
THE 1980's POWERPOINT
 
Principles of Interior Design
Principles of Interior DesignPrinciples of Interior Design
Principles of Interior Design
 
Fashion
FashionFashion
Fashion
 

Ähnlich wie RFNM-Aranda-Final.PDF

Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images  Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images
csandit
 
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATIONGAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
cscpconf
 
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATIONGAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
csandit
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
ijsrd.com
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
Editor IJARCET
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
IAEME Publication
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
prjpublications
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
Cemal Ardil
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the som
ArchiLab 7
 

Ähnlich wie RFNM-Aranda-Final.PDF (20)

Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images  Geometric Correction for Braille Document Images
Geometric Correction for Braille Document Images
 
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATIONGAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
 
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATIONGAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
GAUSSIAN KERNEL BASED FUZZY C-MEANS CLUSTERING ALGORITHM FOR IMAGE SEGMENTATION
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
FUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis ToolFUAT – A Fuzzy Clustering Analysis Tool
FUAT – A Fuzzy Clustering Analysis Tool
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...
 
Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147Volume 2-issue-6-2143-2147
Volume 2-issue-6-2143-2147
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
4 image segmentation through clustering
4 image segmentation through clustering4 image segmentation through clustering
4 image segmentation through clustering
 
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM AlgorithmUnsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
Unsupervised Clustering Classify theCancer Data withtheHelp of FCM Algorithm
 
Fuzzy c means_realestate_application
Fuzzy c means_realestate_applicationFuzzy c means_realestate_application
Fuzzy c means_realestate_application
 
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...
 
Paper id 26201478
Paper id 26201478Paper id 26201478
Paper id 26201478
 
Az36311316
Az36311316Az36311316
Az36311316
 
Literature Survey On Clustering Techniques
Literature Survey On Clustering TechniquesLiterature Survey On Clustering Techniques
Literature Survey On Clustering Techniques
 
A0310112
A0310112A0310112
A0310112
 
Juha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the somJuha vesanto esa alhoniemi 2000:clustering of the som
Juha vesanto esa alhoniemi 2000:clustering of the som
 

RFNM-Aranda-Final.PDF

  • 1. Robust Fuzzy n-Means Clustering A Research Paper Presented to the Faculty of the Division of Mathematical Sciences Midwestern State University In Partial Fulfillment of the Requirements of the Degree Master of Science by Thomas G. Aranda October 2000
  • 2. Abstract Clustering is a data segmentation method with a wide range of applications including pattern recognition, document classification and data mining. This paper focuses on the problem of unsupervised clustering when the optimal number of clusters is not known. This paper presents an algorithm that can determine the ideal number of clusters and be robust to the influence of outliers. A modification of the Robust Fuzzy c-Means Clustering Algorithm (RFCM) was developed. This modification retains the robustness (ability to ignore outliers) of RFCM, yet it does not increase the complexity of the algorithm. A Robust Fuzzy n-Means Clustering Algorithm (RFNM) is presented. This method produces a good partition without a priori knowledge of the optimal number of clusters.
  • 3. 1 Introduction This research is motivated by the requirement to segment large images in real time without prior knowledge of the image’s structure. The ultimate goal is to identify and classify sections of data into categories. For example, given an aerial photograph, the computer should be able to distinguish between grass, concrete, water and asphalt. One technique for segmenting data in this way is called clustering. Many methods for clustering are in use, including Validity-Guided Clustering [1] and c- Means Clustering [4]. However, these algorithms have two drawbacks. First, they are very susceptible to the presence of outliers in some data sets. Consequently, they do not identify the clusters properly. Some algorithms solve this problem by using robust centering statistics. Second, these algorithms require the user to input the desired number of clusters. Often times, the correct number of clusters is not known prior to execution. Therefore, it would be beneficial to develop an algorithm that does not require such knowledge. This paper presents such an algorithm. Data Segmentation via Clustering Background on Clustering The classification of objects into categories is the subject of cluster analysis. It plays a large role in pattern recognition. However, it has many other applications such as the classification of documents in a database, the development of social demographics, data mining and the construction of taxonomies in biology. Ultimately, clustering attempts to identify groups of similar data. Given a set of data X, the problem of clustering is to find several cluster centers that properly characterize relevant classes of X [10]. For example, a good clustering of an image by color would identify the
  • 4. 2 various shades of red as one cluster, the blues as another cluster, etc. A clustering of a 3D set of points over a Euclidean space would find groups (clusters) of points that are close together. After the cluster centers are identified, the data set X is partitioned by labeling each data element with the exemplar (cluster center) closest to it. In 1967 Ball and Hall introduced the ISODATA process [2]. This technique, which is also called Hard c-Means Clustering (HCM), is one of the most popular clustering methods [4]. However, the user is required to input the desired number N of clusters. It uses an alternating optimization (AO) technique to minimize an objective function. The definition of the objective function and the AO technique can be found in [7]. One problem with HCM is that it tends to get caught in local minima [7]. In other words, it does not find the global minimum of the objective function and therefore does not properly identify the cluster centers. Zadeh introduced fuzzy set theory in 1965 as a way to represent the vagueness of everyday life [3]. In a nutshell, fuzzy set theory allows data elements to belong to a set in varying degrees. Each element has a membership value [ ]1,0∈u that represents the degree to which the data element belongs to that set. In other words, data elements can have a partial membership in a set. This fuzziness allows one to mathematically represent vague concepts such as “pretty soon” or “very far.” Dunn applied fuzzy set theory to the ISODATA clustering process in 1973 [7]. His method, called Fuzzy c-Means Clustering (FCM), allows data elements to belong to several clusters in varying degrees. For example, a data element can have a 30% membership in one cluster and a 70% membership in a second cluster, instead of discretely belonging to one cluster or the other. Consider the clustering by color example: a dark violet could partially belong to the red cluster and partially belong to the blue cluster.
  • 5. 3 Fuzzy c-Means Clustering (FCM) uses an alternating optimization (AO) technique that is very similar to HCM. After the algorithm finishes execution and the cluster centers are identified, the clusters are “defuzzified” by discretely assigning each data element to the cluster in which it has the highest membership. If a light orange color had a 45% membership in red, a 52% membership in yellow and a 3% membership in blue, then the color would be assigned to the yellow cluster. Experiments have shown that the fuzzy clustering method is less likely to be trapped in a local minimum [7] and, therefore, avoids one disadvantage of HCM. FCM typically produces better results than HCM, but it is susceptible to the influence of outliers—extraneous data elements that are very far away from the cluster centers. Outliers may be the result of errors in the data, or they could be real information: such as a highly reflective piece of aluminum foil appearing in a radar image of a grass field. Regardless of what the outliers are, their presence often disrupts the clustering process. Kersten’s Fuzzy c-Medians Clustering Algorithm (FCMED), which uses the fuzzy median as its centering statistic, is more robust than FCM [8]. In other words, it is more resistant to the influence of outliers. However, its time complexity of ( )NcpNO lg and space complexity of ( )NO make it very slow [8]. Conversely, Choi’s and Krishnapuram’s Robust Fuzzy c-Means Algorithm (RFCM) solves the outlier problem in linear time. [6]. Kersten’s implementation of RFCM uses Huber’s weighting functions to reduce the influence of outliers [9]. Experiments have shown RFCM to be very robust. One disadvantage of RFCM is that it requires the user to input the correct number of clusters. Often times the user does not know enough about the structure of the data to provide such information. This is especially true in data mining applications. The research described in this paper developed a new algorithm, Robust Fuzzy n-Means (RFNM), which is robust to
  • 6. 4 outliers and capable of determining the proper number of clusters. This algorithm is a modification of FCM and RFCM. In order to provide the reader with a complete understanding of the new RFNM algorithm, this paper will describe its parent algorithms in detail. Fuzzy c-Means Clustering Fuzzy c-means clustering (FCM) is defined well by [4]. Consider N data samples forming the data set denoted by { }NxxxX ,,, 21 K= . Assume there are c clusters and ( ) [ ]1,0∈= kiik xuu is the membership of the k-th sample kx in the i-th cluster iv , where { }cvvvv ,,, 21 K= is the set of exemplars (cluster centers) and U is the membership matrix. Normally, a cluster center refers to an actual pattern in the data and an exemplar refers to a pattern identified by the algorithm. However, these terms will be used interchangeably in this paper. The membership value of each data element kx satisfies the requirement that ∑= = c i iku 1 1 (1) for all Nk ℵ∈ . In other words, all of a particular data element’s membership values must add up to one. In addition, each cluster must contain some, but not all of the data points’ membership. Defined mathematically, this means that for every ci ℵ∈ Nu N k ik << ∑=1 0 . (2) The goal of the FCM algorithm is to minimize the objective function ( ) ∑∑= = = N k c i ik m ik duvUJ c 1 1 2 , (3) where 2kiik xvd −= (the Euclidean distance between the exemplar and the data element). The power cm of the membership function is called the weighting exponent. It expresses the
  • 7. 5 “fuzziness” of the algorithm. Setting 1=cm and only allowing discrete membership values will convert the fuzzy algorithm into traditional HCM [9]. The objective function (3) is the weighted square error of the exemplars. The closer data elements are to their respective cluster centers, the lower the value of the function will be. Furthermore, the number of exemplars c will have an effect on the value of ( )vUJ , . Increasing the number of exemplars will lower the value the objective function. In an extreme case, when the number of clusters equals the number of data elements ( )Nc = , the objective function will go to zero. Although using a large number of clusters will reduce the value of ( )vUJ , , it is more important to choose a value of c that represents the actual number of clusters in the data. Fuzzy c-Means Clustering is more effective than Hard c-Means because the objective function is less likely to get caught in a local minimum [7]. Furthermore, it runs in ( )cNO time and ( )cO space. However, it is susceptible to outliers [9]. The robust algorithm presented in the next section addresses this problem. Robust Fuzzy c-Means Clustering Real world data sets often contain outliers. These extraneous data elements are usually very far away from the larger cluster centers. Consider a data set with two large well-defined clusters and one small outlying cluster that is very far away from the other two. Due to the 2 ikd term in ( )vUJ , (3), the distance of a data point from its exemplar will have a quadratic effect on the value of objective function. Since FCM attempts to minimize ( )vUJ , , it will attempt to reduce the impact of the outliers’ large 2 ikd values by placing an exemplar over the outlying cluster. This minimizes the objective function, but does not correctly identify the larger cluster centers.
  • 8. 6 Kersten’s implementation of Choi and Krishnapuram’s Robust Fuzzy c-Means Clustering Algorithm (RFCM) takes steps to solve this problem [9]. Huber’s m-estimator is used to reduce the influence of outliers. Huber’s function ρ is defined as: ( )    >− ≤ = 1, 1, 2 1 2 2 1 xifx xifx xρ . (4) The 2 ikd term is replaced with ( )γρ ikd where γ is a scaling constant. As a result, the influence of the distance between cluster centers and data elements is quadratic when the data element is close to the exemplar and linear when the data element is far away from the exemplar. The objective function to be minimized becomes: ( ) ( )∑∑= = = N k c i ik m ik duvUJ c 1 1 , γρ . (5) The membership values of each element are given by: ( ) ( ) 1 1 1 1 − = −                   = ∑ c j m jk ik ik c d d u γρ γρ . (6) Using this function, the membership of a data element kx in cluster iv is assigned in proportion to the distance between kx and iv . In other words, the data element will have a larger membership in clusters that are closer to it. The center of a cluster is computed by determining the average value of all the points in that cluster. Since a point’s membership in a cluster is fuzzy, the mean must be adjusted by the membership values iku . Therefore, the locations of the exemplars are computed by using the weighted mean given by:
  • 9. 7 ( ) ( )∑ ∑ = = = N k ik m ik N k ikik m ik i dwu xdwu v c c 1 1 γ γ (7) where Huber’s weighting function ( ) ( ) xxxw ρ′= . In this case ( )    > ≤ = 1,1 1,1 xx x xw . (8) Huber’s w function has the effect of reducing the influence of data points that are far away from the cluster centers thereby making the algorithm robust to outliers. In order for the ρ and w functions to work properly, all distances must be adjusted by a scaling constant γ [9]. The experiments in this paper use the median absolute deviation about the median (MAD) [11] to compute γ . The MAD is a robust estimator similar to the standard deviation. All distances are divided by three times the MAD before Huber’s functions are applied, i.e. MAD3⋅=γ . As a result, when ρ is applied, data points have a quadratic influence when they are MAD3⋅ or less from the exemplar and linear influence when they are greater than MAD3⋅ away. One should note that computing the MAD takes ( )NNO lg time (on average) and ( )NO space. The normalization of the data using an estimator like the MAD is crucial to making the algorithm run properly. Except for the calculation of the scaling constant and the application of Huber’s functions, RFCM is identical to FCM. However, RFCM is not as susceptible to the influence of outliers [9]. Determining the Number of Clusters Robust Fuzzy n-Means Clustering One problem with RFCM is that the user must input the desired number of clusters.
  • 10. 8 Quite often the optimal number of clusters is not known prior to execution. The Robust Fuzzy n- Means Algorithm (RFNM) presented in this paper retains the robustness of RFCM, yet does not require a priori knowledge of the proper number of clusters. RFNM requires the user to provide a maximum number mc of clusters. The algorithm begins by executing the RFCM algorithm with mc clusters. During every iteration cluster centers that are close together are considered for merging. Several methods for merging have been explored, including Validity-Guided Clustering described in [1] and Competitive Clustering described in [5]. However, the merging criteria should be robust and efficient. Merging Criterion If two clusters are “close” together they should be merged. Two clusters are close if the distance between their centers is small compared to their compactness. The notion of compactness [12] is the weighted mean square deviation of the cluster. It can be thought of as the average “radius” squared. The compactness of a cluster is defined in terms of its variation and cardinality. The variation of a cluster is a measure of the cluster’s dispersion. One can think of it as the fuzzy variance. Formally, the variation is defined by [12]: ∑= = N k ik m iki du c 1 2 σ (9) The fuzzy cardinality of a cluster is a measure of the cluster’s size. The more data elements that belong to the cluster the larger the cluster’s cardinality will be. Often, the fuzzy cardinality is used as a divisor when calculating the fuzzy mean. Formally the fuzzy cardinality is defined by [12]: ∑= = N k iki un 1 . (10)
  • 11. 9 The compactness of a cluster is the ratio of its variation and cardinality [12]: i i i n σ π = . (11) To make the compactness formula robust to outliers Huber’s ρ function (4) is inserted into the equation. Finally, the cardinality of the cluster must take the weighting exponent cm into account. Therefore, the robust compactness of a cluster iv is defined as ( ) ∑ ∑ = = = n k m ik N k ik m ik i c c u du 1 1 γρ π . (12) RFNM uses a modified version of separation [12] to measure how far clusters are apart. Formally, the separation between two clusters qv and rv is defined as the Euclidean distance between the clusters’ centers: 2rqqr vvs −= . (13) The merging criterion uses a merge ratio, which is similar to the validity index defined in [12]. The merge ratio will be small when exemplars are close together relative to their compactness. Formally, it is the ratio of the separation squared over the compactness: q qr qr s π ω 2 = . (14) Once again, to make the formula robust Huber’s function is substituted: ( ) q qr qr s π γρ ω = . (15) During every iteration of RFCM, the merge ratio qrω is calculated for every cluster vvq ∈ and vvr ∈ . If αω ≤qr , where α is some constant, then the clusters centered at qv and
  • 12. 10 rv are merged. Choosing a value of 1<α means that in order for two clusters to be merged, the distance between the clusters’ centers must be less than the compactness (radius) of the clusters. Experimentally, values of [ ]3.0,1.0∈α work well. Merging Mechanics Once the decision is made to join two clusters, they must be combined in a meaningful way. The new exemplar should exist on a line segment that runs between the two old exemplars. The new center will be placed closer to the cluster with the larger fuzzy cardinality. The placement of the new exemplar is accomplished by using a parameter p: rq q nn n p + = (16) where qv and rv are the centers of the two clusters to be merged. The location of the new exemplar is calculated using a combination formula: ( ) rqn vppvv −+= 1 (17) where nv is the center of the new cluster. The old exemplars are removed from v and replaced with the new center nv . The next iteration of the algorithm will compute the membership values of X in the new cluster. The RFNM Algorithm The Robust Fuzzy n-Means algorithm is based on the FCM algorithm described in [10]. It uses an alternating optimization (AO) technique for minimizing the objective function (5). FCM has been modified to be robust and unsupervised. The mc exemplars begin at locations determined by the user. During execution, these exemplars gravitate toward the data set’s “true” cluster centers. Some exemplars may merge along the way. Ideally, the algorithm will terminate with exactly one exemplar positioned near the center of each cluster. The user provides the
  • 13. 11 following input: ∞ℵ∈mc initial (maximum) number of clusters [ )∞∈ ,1cm weighting exponent ( )1,0∈α merging criterion constant ( )∞∈ ,0ε stopping constant (small positive number) ( )∞∈ ,0γ scaling constant (example: three times the MAD) { }mcvvvv ,,, 21 K= initial placement of the exemplars (cluster centers) Algorithm: ( )vmcRFNM cm ,,,,, γεα Step 1. Let mcc = Step 2. Let csss ,,, 21 K equal cvvv ,,, 21 K respectively. Step 3. Calculate the new membership matrix U by the following procedure: for each Xxk ∈ , if 0 2 >− ik vx for all ci ℵ∈ , then compute iku using equation (6). If 0 2 =− ik vx for some cIi ℵ⊆∈ , then define iku for Ii ∈ by any nonnegative real numbers satisfying ∑∈ = Ii iku 1 and define 0=iku for Ii c −ℵ∈ . Step 4. Merge clusters that are close together. For every vvq ∈ and vvr ∈ and rq ≠ do the following: calculate qrω using equation (15); if αω ≤qr then compute nv using equations (16) and (17); let nq vv = ; remove rv from v and decrement c by 1. NOTE: Any cluster can only be merged once per iteration. Step 5. Calculate the c cluster centers cvvv ,,, 21 K using equation (7) and the given value of cm .
  • 14. 12 Step 6. If a merge took place in Step 4 then return to Step 2. Otherwise if ε≤− ℵ∈ ii i sv c max then stop. Otherwise, return to Step 2. On average, this algorithm has linear time complexity. Steps 2, 5 and 6 have a total maximum running time of ( )cbacm ++ where a, b and c are constants. The maximum running time of Step 3 is Nck m ⋅⋅ and Step 4 will run in 2 mcl ⋅ time (worst case), where k and l are constants. Thus, the total running time of this algorithm has an upper bound of ( )[ ]cbacNckclt mmm +++⋅⋅+⋅ 2 where t is the number of iterations of the algorithm. In most cases the size of the data set N will be significantly larger than mc and t. Therefore, the N term will overwhelm the 2 mc and t terms yielding a running time complexity of ( )NcO m . The memory overhead of this algorithm is also linear. Storing the exemplar vectors v and s requires ( )mcO space in the worst case. The membership matrix U requires ( )NcO m ⋅ space. The memory required for the data set X is not considered because it is not overhead. The total memory overhead is ( )NcO m . Often times, the data set to be clustered is very large. Therefore, storing something of size N will cost much memory. However, clever coding and some slight modifications will allow the algorithm to run in ( )mcO space. The following modifications compute the membership matrix U, exemplars v and the merge ratio ω on the fly without storing U in memory: Algorithm: ( )vmcRFNMFast cm ,,,,, γεα Step 1. Let mcc = . Step 2. Let csss ,,, 21 K equal cvvv ,,, 21 K respectively. Step 3. Calculate, but do not store, the new membership matrix U by the following
  • 15. 13 procedure: for each Xxk ∈ , if 0 2 >− ik vx for all ci ℵ∈ , then compute iku using equation (6). If 0 2 =− ik vx for some cIi ℵ⊆∈ , then define iku for Ii ∈ by any nonnegative real numbers satisfying ∑∈ = Ii iku 1 and define 0=iku for Ii c −ℵ∈ . Simultaneously, calculate and keep the running sums used in equations (7), (15) and (16). Step 4. For every vvq ∈ and vvr ∈ and rq ≠ do the following: calculate qrω using equation (15) and the running sums from Step 3; if αω ≤qr then compute nv using equations (16) and (17) and the running sum from Step 3; let nq vv = ; remove rv from v and decrement c by 1. If any two clusters were merged, then return to Step 3. Step 5. Calculate the c cluster centers cvvv ,,, 21 K using equation (7), the given value of cm and the running sums from Step 3. Step 6. If a merge took place in Step 4 then return to Step 2. Otherwise if ε≤− ℵ∈ ii i sv c max then stop. Otherwise, return to Step 2. This “fast” version of RFNM actually has the same time complexity as the normal algorithm. However, it has a much lower memory complexity. Since it does not store the membership matrix U, the N term can be dropped from the space complexity. Consequently, the memory overhead is ( )mcO , which represents only storing the exemplar vectors. On average, this “fast” algorithm will execute quicker because its lower memory overhead reduces the risk of page faults. RFNM provides robust unsupervised learning with a linear running time and low memory overhead. This makes it ideally suited for real-time data processing applications.
  • 16. 14 Testing Exemplar Placement (Gaussian Tests 1 and 2) The RFNM algorithm described in the previous section was tested using a five- dimensional Gaussian scatter of random data. The test data has two cluster centers equidistant from the origin. The first two tests start with six exemplars ( )6=mc , 75.1=cm and 3.0=α . The positioning of the initial exemplars is critical. Figure 1 shows a 2D plot of the movement of the exemplars. Figure 1. Gaussian Test 1, 6=mc . The “true” cluster centers, which are computed using the sample mean, exist at ( )0,0,0,0,0.1− and ( )0,0,0,0,0.1 . In Figure 1, two exemplars (labeled A and B) merge together at
  • 17. 15 point C and then converge to approximately ( )0,0,0,1.0,3.1 −− . Two more exemplars (D and E) merge together at F and converge to ( )0,0,0,1.0,4.1≈ . The last two exemplars (G and H) were initialized on the y-axis. They merge and converge very close to the origin (point I). The test data is almost symmetric. Since the middle exemplars (G and H) started equidistant from two nearly symmetric clusters, they were never drawn to one cluster or the other. In this case the algorithm does not converge properly. Figure 2. Gaussian Test 2, 6=mc . The second test uses the same data set and starting parameters except two of the initial exemplars (Figure 2: D and G) are offset 5.0+ along the x-axis. Figure 2 plots the movement of the exemplars. Notice the algorithm converges to the desired two cluster centers ( )0,0,0,0,0.1± .
  • 18. 16 Exemplars A and B merge together into exemplar C, which then converges to the cluster center at ( )0,0,0,0,1− . The exemplar trace on the right side of the y-axis is more interesting. Exemplars D and E merge together and become F. Exemplar G then merges with F to become H. Finally, H and I merge into exemplar J, which then converges to the cluster center at ( )0,0,0,0,1 . Since the middle exemplars (D and G) were initialized slightly closer the right hand cluster, they were drawn toward that cluster’s center. By comparing the results of test 1 and test 2, one can see that the initial placement of the exemplars can change the results dramatically. Robustness Testing (Cauchy Test 1) A second set of two-dimensional test data was randomly generated using a Cauchy distribution. This data set has two well-defined main clusters, but also has several outliers that are very far away from the main clusters. The presence of the outliers obviously increases the compactness of the clusters (makes them less compact). Consequently, the exemplars tend to merge very quickly. To compensate for this, lower values of 25.1=cm and 2.0=α were chosen. Additionally, the initial exemplars were started a little further away from the origin. To reduce the influence of outliers, the sample median is used to determine the “true” cluster centers: approximately ( )0,3.1± . Figure 3 shows a trace of the six exemplars. The merging sequence in Figure 3 is very similar to the previous test. Exemplars A and B merge into C and converge to ( )0,8.1− . On the right side of the y-axis, exemplars D and E merge into F. Next, G and F merge into H. Finally, H and I merge into J and converge to ( )0,5.1 . Notice, the algorithm converges near the two desired cluster centers of ( )0,3.1± . However, it does not converge exactly because the outliers still have some influence on the exemplars. The total error is 0.7.
  • 19. 17 Figure 3. Cauchy Test 1 (RFNM), 6=mc . Additionally, the proper choice of α is very important. Choosing a merge ratio that is too low ( )1.0<α will cause the exemplars to not merge. Conversely, setting the merge ratio too high ( )4.0>α will cause all of the exemplars to merge together into one cluster center. In both cases, the true cluster centers are never found. Thus, choosing a good merge ratio is crucial. Robust n-Means vs. c-Means Clustering (Cauchy Test 1) For comparison purposes, a standard RFCM algorithm ( )0=α was run on the same set of data with the same parameters. Figure 4 shows the trace. Exemplar A converges to ( )0,5.1 , and exemplar B converges to ( )0,8.1− . In other words, the two exemplars in this example converge to the same cluster centers as the exemplars in the previous test. Clearly, the RFNM algorithm
  • 20. 18 performs as well as RFCM. Figure 4. Cauchy Test 1 (RFCM), 2== ccm . The values of the objective functions (5) for both methods were plotted against time (see Figure 5). Both methods converge to the same value ( )830≈ within the same number of iterations. Notice the RFNM algorithm (left), with an initial 6=mc and final 2=c , yields an increasing value of ( )vUJ , . This is because reducing the number of clusters actually causes an increase in the objective function. However, RFNM still reaches the optimal solution without requiring the user to input the desired number of clusters. Catching Outliers (Cauchy Test 2) The previous examples start with the exemplars near the cluster centers. In the next
  • 21. 19 example, eight exemplars ( )8=mc begin far away from the origin. One again, the Cauchy data set is used. Figure 6 shows a close-up of the exemplar trace. Exemplars A and B merge into C. At the same time exemplars D and E merge into F. Finally, C and F merge into G and converge to ( )0,5.1− . Exemplar H converges to ( )0,5.1 without merging. The total error is 0.4, and the final number of clusters is 5=c . Figure 5. RFNM vs. RFCM (Cauchy Test 1). Figure 7 shows a trace of the same test on a larger scale. One can see exemplars A, B, D, E and H move toward the main clusters, merge and converge to the cluster centers (see Figure 6). Furthermore, exemplars I, J and K converge near clusters of outliers: approximately located at ( )1,10− , ( )25,0 and ( )17,1 −− respectively. These outlying clusters have very low fuzzy cardinalities (less than 10% of the main clusters). In the final analysis, one could classify exemplars with low cardinalities as clusters of outliers. Depending on the application, it may be useful to discover outlying clusters. Otherwise, they can be ignored and removed from the final partition. The use of Huber’s functions in RFNM reduces the influence of outliers, but it does not
  • 22. 20 eliminate their influence. However, notice that the exemplars in Cauchy test 2 (Figure 6) are closer to the desired centers of ( )0,3.1± than the exemplars in Cauchy test 1 (Figure 3). In fact, test 2 yields improvement of 0.3 in total error over test 1. This is because the second test placed exemplars near the outliers (see Figure 7), which has the effect of reducing the influence of those outliers on the two main clusters. As a result, the true centers of the main cluster are identified with greater accuracy. Furthermore, the final value of the objective function is lower: approximately 611 as opposed to 830 from test 1. Of course, the larger number of exemplars ( )5=c in test 2 accounts for much of this decrease. Figure 8 shows a plot of the objective function. A good initial placement of the exemplars will improve the robustness of the algorithm Figure 5. Cauchy Test 2 (Zoomed-In), 8=mc .
  • 23. 21 and yield better results. Figure 6. Cauchy Test 2 (Zoomed Out), 8=mc . Conclusion The goal of this paper is to provide a robust algorithm that will find an optimal partition without knowing the proper number of clusters. Ideally, the user should be able to partition a data set without any a priori knowledge of the data’s structure. Robust Fuzzy n-Means Clustering provides a good start toward this goal. Experiments with Gaussian data have demonstrated that RFNM can accurately find the desired number of clusters and their centers. Furthermore, the first Cauchy test has shown that RFNM provides results which are identical to the results reached by RFCM. Thus, RFNM is as
  • 24. 22 accurate and robust as RFCM, yet it does not increase the time complexity. Finally, clever initialization of the exemplars allows RFNM to identify outlying clusters (Cauchy test 2). This in turn improves the accuracy of the final results. Clearly, RFNM provides robust accurate results without requiring prior knowledge of the data’s structure. Figure 7. Cauchy Test 2 (Objective Function). Although RFNM is an improvement over other algorithms, it does have some shortcomings. First, it is not completely unsupervised because the user’s choices of cm and α will have significant effects on the results. Data sets with outliers, for example, require lower values cm and α than sets with compact well-separated clusters. Future research should examine ways of preprocessing the target data in order to determine the ideal clustering parameters so that the entire process can be fully automated. Second, the initial positioning of the exemplars is crucial to getting optimal results. Placing the exemplars exactly between two cluster centers, for example, may cause those exemplars to not converge. One possible solution is to place the initial exemplars very far away
  • 25. 23 from the cluster centers. This will allow the exemplars to compete equally for cardinality. In other words, one exemplar will not have an advantage simply because it was initially placed close to a cluster of points. However, if the exemplars are initialized too far away from the cluster centers, then the main clusters and the outliers will have equal influence. As a result, the exemplars may skip over the outliers all together. Determining an automated, yet reliable way of initializing the exemplars would be very beneficial. Future research in this area should be considered. Third, the preprocessing requirements of RFNM can be costly. The experiments in this paper use the MAD to compute the scaling constant γ . This operation takes ( )NNO lg time and uses ( )NO space. Research into more efficient preprocessing techniques may be useful. Robust Fuzzy n-Means Clustering has a wide range of applications in image and data processing. It requires less user supervision than many other algorithms, but it is not completely unsupervised. However, in several situations the RFNM algorithm provides a good solution in linear time.
  • 26. 24 References 1. Bensaid, A., Hall, L., Bezdek, J., Clarke L., Silbiger, M., Arrington, J. and Murtagh, R., Validity-guided (re)clustering with applications to image segmentation, IEEE Transactions on Fuzzy Systems, vol. 4, no. 2 (May 1996), 112-123. 2. Bezdek, J., A convergence theorem for the fuzzy ISODATA clustering algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, no. 1 (January 1980), 1-8. 3. Bezdek, J., Fuzzy models—what are they and why? IEEE Transactions on Fuzzy Systems, vol. 1, no 1 (February 1993), 1-5. 4. Bezdek, J. and Pal, S. Fuzzy Models for Pattern Recognition: Methods That Search for Structures in Data, IEEE Press, New York, NY, 1992. 5. Boujemaa, N., Generalized competitive clustering for image segmentation, In Proceedings of the 19th International Conference of the North American Fuzzy Information Processing Society – NAFIPS (July 13-15, 2000, Atlanta, GA), NAFIPS/IEEE, 2000, 133-137. 6. Choi, Y. and Krishnapuram, R., Fuzzy and robust formulations of maximum-likelihood- based Gaussian mixture decomposition, In Proceedings of the Fifth IEEE International Conference on Fuzzy Systems (September 8-11, 1996, New Orleans, LA), IEEE Neural Networks Council, 1996, 1899-1905. 7. Dunn, J., A fuzzy relative of the ISODATA process and its use in detecting compact well- separated clusters, Journal of Cybernetics, vol. 3, no. 3 (1973) 32-57. 8. Kersten, P., Fuzzy order statistics and their application to fuzzy clustering, IEEE Transactions on Fuzzy Systems, vol. 7, no. 6 (December 1999) 708-712. 9. Kersten, P., Lee, R., Verdi, J. Carvalho R. and Yankovich, S., Segmenting SAR images using fuzzy clustering, In Proceedings of the 19th International Conference of the North American
  • 27. 25 Fuzzy Information Processing Society – NAFIPS (July 13-15, 2000, Atlanta, GA), NAFIPS/IEEE, 2000, 105-108. 10. Klir, G. and Yuan, B. Fuzzy Sets and Fuzzy Logic: Theory and Applications, Prentice Hall P T R, Upper Saddle River, NJ, 1995. 11. Randles, R. and Wolfe, D., Introduction to The Theory of Nonparametric Statistics, John Wiley & Sons, Inc., New York, NY, 1979. 12. Xie, X.L. and Beni, G., A validity measure for fuzzy clustering, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 8 (August 1991), 841-847.