Knowledge Based Clustering

Knowledge Based
Clustering
An Intelligent way to find groups in your data

Contents
Knowledge-Based Clustering (KBC)
Fuzzy Clustering and FCM
Conditional Fuzzy Clustering and CFCM
Clustering With Partial Supervision
Collaborative Clustering
Directional Clustering
Fuzzy Relational Clustering

Christos N. Zigkolis Aristotle University of Thessaloniki 2

Some reasonable questions…
What type of clustering is the KBC?
“Partitional Clustering”

What are the differences from the
“conventional” clustering?
“Data-Centric VS Human-Centric”

What are the basic concepts of KBC?
“Information Granules, Fuzzy Clustering,
Objective Function-Based Techniques”


Data Clustering

Partitional
Hierarchical
Clustering – PC
Clustering – HC
(Agglomerative HC)

Hard Clustering
Soft Clustering
(K-Means)

Data-Centric
Approaches
Fuzzy Clustering – FC
(Fuzzy C-Means)

------------------------------------------------------------------
Knowledge Based Clustering
Human-Centric
Approaches

Objective Function-Based
Clustering Techniques

minor max(obj_function) => better clustering

To formulate an objective function that
is capable of reflecting the nature of
Our GOAL is:the problem so that its min() or max()
reveals a meaningful structure in the
dataset.


Fuzzy Clustering
“The Big Bang for KBC”

Binary Character of Partitions

0 || 1

VS
Fuzzy Logic – Partial Membership

[0, 1]


Fuzzy Clustering (2)

K Means + Fuzzy Logic = Fuzzy C Means

“Yet another clustering procedure…What is so
special about it?” Can deal with patterns with
borderline character contrary to K-Means

[prototypes, U] = fcm( X_data, C)


Input
• X_data [Nxp]
Iterative Process
• m : fuzzification coefficient >= 1
1. Compute the prototypes
• C : number of clusters
2. Compute the U matrix
• initialized U[CxN] matrix
3. Compute the value of the
objective function and
Output
stop the process if this
• prototypes [Cxp] value is lower than a
criterion e
• U matrix [CxN]

“Stop talking and show us the maths”
N

∑ m
uij X j
(1) proti = j =1
N

∑ m
uij
j =1
Restriction

(2) uij = 1 C

∑u
2
C
X − proti =1
∑(
i =1 X − prot j
) ( m −1)

i =1
ij

C N 2
(3) Q = ∑∑ uij X j − proti
m
<e
i =1 j =1


Examples

• Fuzzy c-Means Clustering of Incomplete Data
“Modified versions of standard FCM are applied for dealing
with data with missing feature values”
• FCM-Based Model Selection Algorithms for Determining the
Number of Clusters
“Determining the number of clusters in a given data set and a
new validity index for measuring the “goodness” of clustering”


Conditional Fuzzy Clustering
“The presence of the aside information”

FROM
UNSUPERVISED LEARNING
TO
SEMI-SUPERVISED LEARNING

We mark our patterns according to a condition and these
marks are the aside information which can guide our
clustering process to give more meaningful results.


Conditional Fuzzy Clustering
(1)
Xdata [N x p] Condition(s) (2)
Zk [1 x N] (Patterns’ Marks)
(3)
Scaling Function
(4)
Fk [1 x N] (Scaled patterns’ marks)
(5)

[prototypes, U] = CFCM(Xdata, Fk, C)

Conditional Fuzzy Clustering(2)

Formulation Differences from FCM

Restriction
C Fj
uij =
∑ uij = Fj =>
i =1
C
X − proti
∑ ( X − prot )
2
( m −1)

i =1 j


Conditional Fuzzy Clustering(3)
Example

“Using CFCM to mine event-related brain dynamics”
by C.N. Zigkolis and N.A. Laskaris

“…a framework for mining event related dynamics based on Conditional
FCM (CFCM). CFCM enables prototyping in a principled manner. User-
defined constraints, which are imposed by the nature of experimental data
and/or dictated by the neuroscientist’s intuition, direct the process of
knowledge extraction and can robustify single-trial analysis…“


Clustering with Partial Supervision
“Label some, cluster all”

X = [X1, X2, ..., XN]
---------------------------------------------------------------------------

Labeled patterns Unlabeled patterns
Υ = [Υ1,..., ΥΜ] Z = [Z1,..., ZN-M]
---------------------------------------------------------------------------
'
X =Y∪Z
After labeling some patterns we start the clustering process

Clustering with Partial Supervision(2)
How this labeling are going to help us?
• Labeling = Knowledge
• This Knowledge will guide the whole process
• The labeled patterns can be considered as a grid of anchor points with
which we get to the entire structure of the data set

What algorithmic changes do we need to include this
partial supervision to the clustering process?
• The knowledge has to be included in the objective function

• The formulation of prototypes and U matrix takes another form


Problem Formulation
Extra Structures :
• b = [b1, b2, …, bN] the vector of labels, bi=0|1 indicates if a
pattern is labeled or not.
• F[CxN] = [fij] a partition matrix which contains the membership
values for labeled patterns. The columns that correspond to
unlabeled data have zero values.
•α nonnegative weight factor for setting up a suitable balance
between the supervised and unsupervised mode of learning


Problem Formulation (cont..)
C N C N
2 2
Q = ∑ ∑ u X j − proti + α ∑∑ (uij − f ij ) bk X j − proti
m
ij
2

i =1 j =1 i =1 j =1

The extra term is the augmentation we need. It addresses the
effect of partial supervision


Examples

• Handwritten Digits • Reliance? of a training set


Real Example
• Partially Supervised Clustering for Image Segmentation

“This paper describes a new method (ssFCM) for
classification. The method is well suited to problems such as
the segmentation of Magnetic Resonance Images (MRI). A
small set of labeled pixels provides a clustering algorithm
with a form of partial supervision”


“All for one and one for all”

What if we have to deal with several data sets and we
are interested in revealing a global structure?
“The concept of collaboration : We process each data set separately
and we have a collaboration by exchanging information about the
individual results”

Why don’t we put everything in one data set and do our
job?
“The paradigm of different organizations with different databases. We
don’t have access to others’ sources but we appreciate any external
assistant information”

Collaborative Clustering(2)
Horizontal Collaborative Clustering

X[1],X[2],..,X[p] data sets
Same objects but in
different feature spaces
ex. Same patients in
different institute database

The collaboration / communication platform is based between
the individual partition matrices



• matrix of Connections : α[ii,jj] >= 0
• the higher the value the
stronger the collaboration between
subsets
• matrix α is not essentially
symmetric, α[ii, jj] ≠ α[jj, ii]


Problem Formulation
N C
2
Q [ii] = ∑ ∑
j=1 i=1
u m
ij [ii ] X j [ii ] − p r o i[ii ] +
p N C
2
∑
jj =1, jj ≠ ii
α [ii, jj ]∑∑ {uij [ii ] − uij [ jj ]} X j [ii ] − proi [ii ]
j =1 i =1
m

The second term makes the clustering based on the iith subset
“aware” of the other partitions. If the structures in data sets are
similar then the differences between U tend to be lower, and the
resulting structure becomes more similar


Vertical Collaborative Clustering

X[1],X[2],..,X[p] different data sets
Same feature space, different objects

ex. Auditory evoked responses
3 conditions/datasets (attentive,
stimulation, spontaneous activity)

We have the collaboration /
communication at the level of the
prototypes

Vertical Collaborative Clustering
Problem Formulation
N C
2
Q[ii ] = ∑∑ u [ii ] X j [ii ] − proti [ii ] +
m
ij
j =1 i =1
p N C
2
∑
jj =1, jj ≠ ii
β [ii, jj ]∑∑ u [ii ] proti [ii ] − proti [ jj ]
j =1 i =1
m
ij

The second term articulates the differences between the
prototypes


The 2 algorithmic Phases of Collaborative clustering

PHASE 1
FCM to each data set number of clusters have to be the same for all
data sets.
// compute proti[ii], i=1,…,C and U[ii] for all subsets //
PHASE 2
Setting up the collaboration level and reach to an optimization
// compute α[ii, jj] (Horizontal Clust.) or β[ii, jj] (Vertical Clust.) and
optimize the partition matrices //

A combination of Horizontal and Vertical clustering

The Objective Function will be
a combination of the objective
functions from Horizontal and
Vertical Clustering



Consensus Clustering

• Different objects – Same feature space – Lack of interaction
• Clustering in the produced prototypes from each data set =
Meta – Clustering
• Different number of clusters C[1], C[2], …, C[p]
• Building meta-structure – A partition matrix in a higher level
• U at the higher level is formed on the basis of the
prototypes of the data sets


Examples
• Semantic Content Analysis : A Study in Proximity-Based
Collaborative Clustering “clustering semantic web documents
under the collaboration of semantic and data view”

• Clustering in the framework of
collaborative agents
“…a model of collaborative clustering
(horizontal and vertical) realized over a
collection of data sets in which a
computing agent carries out an individual
clustering process”

Directional Clustering
“Direction except from relation”

X[1] and X[2] different data sets
• Our goal is to form a map
between the information
granules developed for these
two data sets.
• Clustering the data set X[1] is
the first step. Then cluster the
data set X[2] under 2 criteria.
1) Reveal its granular structure 2) This structure can be reached
through a logic mapping of granules from data set X[1]

Directional Clustering(2)
Problem Formulation
X[1] data set Standard FCM objective function
X[2] data set We need an obj_func to face the two
main objectives: Relational and Directional
C [2] N
2
Q = ∑ ∑ u [2] X j [2] − proti [2] +
m
ij
i =1 j =1

C [2] N
2
β ∑ ∑ (uij [2] − φi (U [1])) X j [2] − proti [2] 2

i =1 j =1


Problem Formulation (cont…)

• The first term of Q equation is for revealing structure in X[2]
(relational).
• The second term captures the differences between U[2] and
the mapping φ(.) of the structure detected in X[1] (directional).
• The factor β is for keeping a balance between the relational
and directional facets of the optimization


Logic Transformations Between A n’ B information granules
How we formulate THE Mapping – TWO APPROACHES
1. OR-Based Aggregation

Bi = (A1 t wi1) s (A2 t wi2) s…s (AC[1] t wiC[1])
t- and s- norms can be compare to ∪ and ∩ operators

The most common used t-norm is the min() and given the t-norm
we can compute the s-norm via
a s b = 1 − (1 − a ) t (1 − b)

Logic Transformations Between A n’ B information granules
How we formulate THE Mapping – TWO APPROACHES
2. AND-Based Aggregation

Bi = (A1 s wi1) t (A2 s wi2) t…t (AC[1] s wiC[1])
Which approach is the best for use?
Empirically, OR-Based when C[1] > C[2] and
AND-Based when C[1] < C[2]



Examples

• Directional fuzzy clustering and its application to fuzzy modelling
“presentation of the technique and its role in a two-phase fuzzy
identification scheme”


Fuzzy Relational Clustering
“Focusing on pairs of patterns”
FROM
patterns with vector features
TO
relational patterns with degrees of dissimilarity

• N cities distances between pairs of them : dij
Matrix of distances includes the relational patterns
• Compare faces in a pair-wise manner and compute
proximity degrees (relational patterns)


Fuzzy Relational Clustering(2)
FCM for relational data
The input of the algorithm is the dissimilarity matrix Rij which
includes all the degrees of similarity between patterns instead
of original patterns
Similarity Matrix Dij = 1 - Rij


Fuzzy Relational Clustering(3)
Examples
• Low-complexity fuzzy relational clustering algorithms for Web
mining
“new Fuzzy Relational Clustering techniques in Web Mining*:
(1)FCMdd (Fuzzy C Medoids) and (2)RFCMdd (Robust Fuzzy
C Medoids)
Comparison tests with standard RFCM”

*Web document clustering, snippet clustering and Web access log analysis


References
W. Pedrycz, “Knowledge-Based Clustering from Data to
Information Granules”

Fuzzy c-Means Clustering of Incomplete Data

FCM-Based Model Selection Algorithms for Determining the
Number of Clusters

Using CFCM to mine event-related brain dynamics

Partially Supervised Clustering for Image Segmentation

References
Semantic Content Analysis : A Study in Proximity-Based

Clustering in the framework of collaborative agents

Directional fuzzy clustering and its application to fuzzy modeling

Low-complexity fuzzy relational clustering algorithms for Web
mining


Knowledge Based Clustering

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (14)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie Knowledge Based Clustering

Ähnlich wie Knowledge Based Clustering (12)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Knowledge Based Clustering