1. 5 - Unsupervised Learning Introduction
• Unsupervised Learning
• Learner receives no explicit information about
• Introduction classification of input examples.
• Statistical Clustering • Information is implicit.
• Aim of learning process - to discover regularities in the input
• Conceptual Clustering data.
• UNIMEM • Typically, consists of partitioning instances into classes
(based on some similarity metric).
• COBWEB • ie finding clusters of instances in the instance space.
• Not surprising that unsupervised learning systems sometimes
closely resemble statistical clustering systems.
What is Clustering ? Simple Clustering
Algorithm
• Initialize
• Common problem - construction of meaningful • Set D to be the set of singleton sets such that each
classifications of observed objects or situations. set contains a unique set.
• Often known as numerical taxonomy - since it • Until D contains only 1 element, do the following:
involves production of a class hierarchy • Form a matrix of similarity values for all
(classification scheme) using a mathematical elements of D
measure of similarity over the instances. • Using some given similarity function
• Merge those elements of D which have a
maximum similarity value.
• Often known as agglomerative clustering.
• Works bottom-up - trying to build larger clusters.
• Alternative - divisive clustering.
• Works top-down (cf ID3)
2. Clustering Clustering
• Traditional techniques • Consider this example:
• Often inadequate - as they arrange objects into classes solely
on the basis of a numerical measure of object similarity.
• Only information used is that contained in the instances A B
themselves.
• Algorithms unable to take account of semantic relationships
among instance attributes or global concepts that might be of
relevance in forming a classification scheme.
• Conceptual Clustering • WE would not cluster A and B together - but would
• Idea first introduced by R S Michalski - 1980 cluster them into the 2 diamonds.
• Defined as process of constructing a concept network • Partitioning using concept membership rather
characterizing a collection of objects with nodes marked by than distance.
concepts describing object classes & links marked by the • Points are placed in the same cluster if
relationships between the classes. collectively they represent the same concept.
• This is basis of conceptual clustering
Conceptual Clustering Conceptual Clustering
• Can be regarded as:
name body-cover heart-chamber body-temp fertilisation
mammal hair four regulated internal
• Given: • Given animal bird feathers four regulated internal
• A set of objects descriptors: reptile cornified-skin imperfect-four unregulated internal
• A set of attributes to be used to characterise objects amphibian moist-skin three unregulated external
fish scales two unregulated external
• A body of background knowledge - includes problem
constraints, properties of attributes, criteria for evaluating
quality of constructed classifications. animals
• Find: • Classification
• A hierarchy of object classes hierarchy mammals/bird reptile amphibian/fish
• Each node should form a coherent concept produced:
• Compact
• Easily represented in terms of a definition or rule that mammal bird amphibian fish
has a natural interpretation for humans
3. Conceptual Clustering UNIMEM
• Lebowitz - 1987
• Michalski - 1980
• Essentially a divisive clustering algorithm
• Conjunctive conceptual clustering • Uses a decision tree structure as its basic representation.
• Concept class consists of conjunctive statements
involving relations on selected object attributes.
• If asked to classify an instance - searches down through the
• Method arranges objects into a hierarchy of classes. tree, testing attributes & returns a classification based on the
• CLUSTER/2 relevant leaf nodes.
• Used to construct classification hierarchy of a large
collection of Spanish folk songs. • If asked to update the tree so as to represent a new instance
- searches down through the tree looking for a suitable place
to add in new structure.
UNIMEM UNIMEM
• Basic clustering principle:
• Add new nodes into tree as & when they appear
to be warranted by the presented instances. • Instance matches a node if it is covered by that node (concept)
• UNIMEM actually stores each presented instance • Matching determined by testing to see what proportion of
at all nodes which cover it. the instance's attributes are associated with the node.
• Search process returns all the most specific nodes that explain
• If two instances stored at a node that are (cover) the new instance.
particularly similar - then create an extra child • UNIMEM then generalizes each node in this set as necessary
node whose definition covers the two instances in in order to account for the new instance.
question. • The new instance is then classified with all other instances
• Two instances are then relocated to this node. stored at the node.
• As new instances are processed - new nodes are
created & hierarchy grows downwards.
4. UNIMEM Algorithm UNIMEM as Memory
• UNIMEM actually stores new instances inside the tree.
• Initialize decision tree to be an empty root node. • Can thus be viewed as a type of memory.
• Apply following steps to each instance: • GBM - Generalisation-Based Memory
• Search the tree depth-first for most specific concept • Structure of hierarchy enables classes of instances to be
nodes that the instance matches. accessed much more efficiently than would be the case
• Add new instance to the tree at or below these nodes if all instances were stored in a linear memory
• Involves comparing new instance to ones already structure.
stored there & creating new subnodes if appropriate.
COBWEB COBWEB
• Incremental system for hierarchical conceptual
• Fisher - 1987 clustering
• Based on principle that a good clustering should • Carries out hill-climbing search through a space of
minimize distance between two points within a cluster & hierarchical classification schemes using operators
maximize distance between points in different clusters. which enable bidirectional travel through this space.
• Good clustering defined as: • Features of COBWEB:
• One which maximizes intra-cluster similarity & • Heuristic evaluation function to guide search.
minimizes inter-cluster similarity.
• State representation - structure of hierarchies &
representation of concepts.
• Goal of COBWEB - to find optimum tradeoff between
these two ! • Operators used to build classification schemes
• Control strategy.
5. Category Utility Representation
• Can be viewed as a function which rewards • Choice of category utility as heuristic measure dictates a
similarity of objects within same class & concept representation different to logical, typically
dissimilarity of objects in different classes. conjunctive representations used in AI.
• Probabilistic representation of {fish, amphibian, mammal}
• Gluck & Corter - 1985
Attributes Values & Probabilities
• Category utility function:
body-cover scales (0.33), moist-skin (0.33), hair (0.33)
n heart-chamber two (0.33), three (0.33), four (0.33)
∑k=1 P(Ck) [ ∑i ∑j P(Ai = Vij/Ck)2 - ∑i ∑j P(Ai = Vij)2 ]
body-temp unregulated (0.67), regulated (0.33)
n fertilisation external (0.67), internal (0.33)
• Each node in the classification tree is a probabilistic concept
which represents an object class & summarises the objects
classified under the node.
Operators Operators contd ...
• Classifying object in existing class
• Incorporation of a new object into the tree is a process of • To determine which category best "hosts" a new object,
classifying an object by descending the tree along an COBWEB tentatively places the object in each category.
appropriate path & performing one of several operations at • Partition which results from adding object to a given node
each level. is evaluated using category utility function.
• Operators include: • Node which results in the best partition (highest CU) is
• Classifying object with respect to an existing class. identified as the best existing host for the new object.
• Creating a new class. • Creating a new class
• Combining two classes into a single class. • Quality of the partition resulting from placing the object
• Dividing a class into several classes. in the best existing host is compared to partition resulting
from creation of a new singleton class containing the
object.
• Depending on which partition is best - object is placed in the
best existing class or a new class is created.
6. Example Operators contd ...
• Add "mammal":
P(C0) = 1.0 • While the first two operators are effective in many
P(scales | C0) = 0.33
...
ways - by themselves they are very sensitive to
ordering of input data.
P(C0) = 1.0
P(C1) = 0.33
P(scales | C1) = 1.0
P(C2) = 0.33
P(moist | C2) = 1.0
P(C3) = 0.33
P(hair | C3) = 1.0
• Merging & splitting operators implemented to guard
P(scales | C0) = 0.5 ... ... ... against these effects.
...
• Merging
P(C1) = 0.5 P(C2) = 0.5 • Add "bird": • Two nodes of a level are combined in hope that
P(scales | C1) = 1.0 P(moist | C2) = 1.0
... ...
P(C0) = 1.0 the resultant partition is of better quality.
P(scales | C0) = 0.25
...
• Involves creating a new node
Existing Classification Structure
P(C1) = 0.25
• Two original nodes are made children of newly
P(C2) = 0.25 P(C3) = 0.5
P(scales | C1) = 1.0 P(moist | C2) = 1.0 P(hair | C3) = 0.5 created node.
... ... ...
• Splitting
P(C4) = 0.5 P(C5) = 0.5 • Node may be deleted and its children promoted.
P(hair | C4) = 1.0 P(feath | C5) = 1.0
... ...
Merging & Splitting COBWEB Control
Operators Structure
P
COBWEB ( Object , Root of classification tree )
P
1. Update counts of the Root
• Node Merging
New node 2. IF Root is a leaf
A B THEN Return the expanded leaf to accommodate Object
ELSE Find the child of Root which best hosts Object & perform
A B one of the following:
a. Consider creating a new class & do so if appropriate
b. Consider node merging & do so if appropriate, call
P
COBWEB ( Object, Merged node )
P c. Consider node splitting & do so if appropriate, call
• Node Splitting COBWEB ( Object, Root )
A d. IF None of the above were performed
B
THEN Call COBWEB ( Object, Best child of Root )
A B
7. AutoClass
• Cheeseman et al - 1988
• Bayesian statistical technique
• Bayes' theorem - formula for combining probabilities
• Technique determines:
• Most probable number of classes
• Their probabilistic descriptions
• Probability that each object is a member of each class
• AutoClass does not do absolute partitioning of data into
classes.
• Calculates the probability of each object's membership in
each class.